The genetic relatedness between individual parasite haplotypes and among parasite populations has several practical uses in the study of malaria. For example, relatedness information can help determine the geographic origin of imported infections, define the extent to which parasites are dispersing or are contained within landscapes, and identify whether specific strains are being selected for over time. Relatedness information is also very helpful in understanding longitudinal (within-individual) infection dynamics. In the case of P. vivax, for example, it can distinguish whether infection represents newly acquired parasites, recrudescence after treatment, or relapse from longer-lasting hypnozoite reservoirs. Relatedness information can also help resolve polyclonality signals, i.e., clarify the number of different haplotypes co-infecting individual patients.

Relatedness is defined as the probability that, at any locus in the genome, the alleles sampled from two different individuals are identical by descent (\(IBD\)). Genetic markers used for this purpose include SNPs, microsatellites, and (increasingly) amplicon micro-haplotypes (MHAP). Relatedness can be estimated using a Hidden Markov Model approach implemented in the R package paneljudge (see mathematical framework in AR Taylor et al. 2019). In this package, relatedness (\(r\)) is estimated as a function of the haplotype of the two sampled parasites (\(Y^{(i)}\) and \(Y^{(j)}\), where \(i\) and \(j\) denote two different sampled genotypes from the population), the frequency of the alleles in the population (\(f_t(g)\), where \(t\) denotes locus), the physical distance (\(d_t\), in base-pairs) between successively analyzed loci (\(t-1\) and \(t\)), the recombination rate (\(\rho\)), a switching rate of the Markov chain (\(k\)), and a constant genotyping error rate (\(\varepsilon\)).

Pairwise relatedness comparisons between categories

For this report all possible pairwise IBD comparisons between samples from different categories of Variable1 and Variable2 are computed, and the results are shown in the following table:

source('~/Documents/Github/intro_to_genomic_surveillance/docs/functions_and_libraries/amplseq_required_libraries.R')
source('~/Documents/Github/intro_to_genomic_surveillance/docs/functions_and_libraries/amplseq_functions.R')
#sourceCpp('~/Documents/Github/intro_to_genomic_surveillance/docs/functions_and_libraries/hmmloglikelihood.cpp')

Read the ampseq_object in csv format:

ampseq_object = read_ampseq(file = '~/Documents/Github/intro_to_genomic_surveillance/docs/data/Pfal_example/Pfal_ampseq_filtered', 
                   format = 'csv')

Run the function pairwise_hmmIBD:

pairwise_relatedness_table = '~/Documents/Github/intro_to_genomic_surveillance/docs/data/Pfal_example/pairwise_relatedness.csv'

if(!file.exists(pairwise_relatedness_table)){
      pairwise_relatedness = NULL
      
      nChunks = 500
      
      for(w in nChunks){
        start = Sys.time()
        pairwise_relatedness = rbind(pairwise_relatedness,
                                     pairwise_hmmIBD(ampseq_object, parallel = TRUE, w = w, n = nChunks))
        time_diff = Sys.time() - start
        
        print(paste0('step ', w, ' done in ', time_diff, ' secs'))
        
      }
      
      write.csv(pairwise_relatedness,
                '~/Documents/Github/intro_to_genomic_surveillance/docs/data/Pfal_example/pairwise_relatedness.csv',
                quote = FALSE,
                row.names = FALSE)
      
    }else{
      
      pairwise_relatedness = read.csv(pairwise_relatedness_table)
      
    }

Plot the distribution of relatedness between sites using the function plot_relatedness_distribution

plot_relatedness_distribution_between = plot_relatedness_distribution(
      pairwise_relatedness = pairwise_relatedness,
      metadata = ampseq_object@metadata,
      Population = 'Subnational_level2',
      fill_color = rep('gray50', length(unique(ampseq_object@metadata[['Subnational_level2']]))*(length(unique(ampseq_object@metadata[['Subnational_level2']]))-1)/2),
      type_pop_comparison = 'between',
      ncol = 3,
      pop_levels = NULL
    )
View(plot_relatedness_distribution_between$relatedness)

The distribution of pairwise genetic relatedness values is presented using histograms as follows:

plot_relatedness_distribution_between$plot
**Figure 1:** Pairwise IBD distribution between categories of Variable1 (panels). The x-axis shows genetic relatedness values, ranging from 0 (unrelated) to 1 (clonal). The y-axis shows the number of pairwise comparisons corresponding to each of these relatedness values. The dotted vertical line represents the median genetic relatedness in the total dataset (including both within and between-population comparisons).

Figure 1: Pairwise IBD distribution between categories of Variable1 (panels). The x-axis shows genetic relatedness values, ranging from 0 (unrelated) to 1 (clonal). The y-axis shows the number of pairwise comparisons corresponding to each of these relatedness values. The dotted vertical line represents the median genetic relatedness in the total dataset (including both within and between-population comparisons).