The MHap-Analysis pipeline is based on a series of functions
organized in an R script, allowing flexible execution of three tasks
using parameters from a .json file. These tasks are executed from the
terminal via a bash script. The user only needs to modify the .json file
to customize their analysis parameters.
The json file
The json file with all parameters should looks as follow:
{
//UGER parameters
"vmem": 32,
"cores": 8,
"h_rt1": 00:30:00,
"h_rt2": 02:00:00,
"h_rt3": 01:00:00,
"nTasks": 1,
//PATHs to references and functions
"wd": "/Users/pam3650/Documents/Github/MHap-Analysis/docs/data/Pfal_example3/",
"fd": "/Users/pam3650/Documents/Github/MHap-Analysis/docs/functions_and_libraries/",
"rd": "/Users/pam3650/Documents/Github/MHap-Analysis/docs/reference/Pfal_3D7/",
//Upload of raw data
"cigar_paths": "Seq_runs", // Path to output folders of the denoising pipeline, e.i.: "Seq_runs", null
"cigar_files": null,//"cigar_tables", null
"asv_table_files": null,//"asv_tables", null
"asv2cigar_files": null,//"asv2cigar", null
"asv_seq_files": null,//"asv_seqs", null
"zero_read_sample_list": null,//"zeroReadSamples", null
"ampseq_jsonfile": null,
"ampseq_excelfile": null,
"ampseq_csvfolder": null,
"sample_id_pattern": "^SP",
"markers": "markers.csv",
//Parameters to export data and results
"output": "MHap_test_excel",
"ampseq_export_format": "excel",//"csv", "excel", "json" null
//Removing or masking ASVs in the genotype table
"min_abd": 10,
"min_ratio": 0.1,
"off_target_formula": "dVSITES_ij>=0.3", //nVSITES>40, nSNVs>30, nINDELs>10
"flanking_INDEL_formula": "flanking_INDEL==TRUE&h_ij>=0.66",
"homopolymer_length": 5,
"SNV_in_homopolymer_formula": "SNV_in_homopolymer==TRUE&h_ij>=0.66",
"INDEL_in_homopolymer_formula": "INDEL_in_homopolymer==TRUE&h_ij>=0.66",
"bimera_formula": "bimera==TRUE&h_ij>=0.66",
"PCR_errors_formula": "h_ij>=0.66&h_ijminor>=0.66&p_ij>=0.05",
//Removing samples or loci with low performance
"sample_ampl_rate": 0.75,
"locus_ampl_rate": 0.75,
//Adding metadata
"metadata": "colombia_metadata_cleaned.csv",
"join_by": "Sample_id",
"Variable1": "Subnational_level2",
"Variable2": "Quarter_of_Collection",
"Longitude": null,
"Latitude": null,
//Filtering desired or undesired populations or samples
"var_filter": "Subnational_level2;Keep;Buenaventura,Quibdó,Guapi/Quarter_of_Collection;Keep;2021-Q1,2021-Q2,2021-Q3,2021-Q4,2022-Q1,2022-Q2",
//Reports
"PerformanceReport": true,
"Drug_Surveillance_Report": true,
"Variants_of_Interest_Report": false,
"ibd_thres": null, // e.i. 0.99, null
"poly_formula": "NHetLoci>1", //e.i. "NHetLoci>1", "NHetLoci>1&Fws<0.97", null
// Parameters for DSR
"ref_gff": "PlasmoDB-59_Pfalciparum3D7.gff",
"ref_fasta": "PlasmoDB-59_Pfalciparum3D7_Genome.fasta",
"amplicon_fasta": "pf3d7_ref_updated_v3.fasta",
"reference_alleles": "drugR_alleles.csv",
"hap_color_palette": "random",
"gene_names": "PfDHFR,PfMDR1,PfDHPS,PfKelch13,PF3D7_1447900",
"gene_ids": "PF3D7_0417200,PF3D7_0523000,PF3D7_0810800,PF3D7_1343700,PF3D7_1447900",
"drugs": "Artemisinin,Chloroquine,Pyrimethamine,Sulfadoxine,Lumefantrine,Mefloquine",
"include_all_drug_markers": true,
"na_var_rm": true,
"na_hap_rm": true,
//Parameters for IBD and Conectivity report
"pairwise_relatedness_table": "Pfal_pairwise_relatedness.csv",
"nchunks": 500,
"parallel": true,
"ibd_ncol": 4,
"pop_levels": null,
//Parameters for COI report
"poly_quantile": 0.75
}
However, as most of them have default values only the parameters to
upload the data are mandatory in the json file (wd,
fd, rd, and at least one of the following:
cigar_paths, cigar_files,
asv_table_files, asv2cigar_files,
asv_seq_files, zero_read_sample_list,
ampseq_jsonfile, ampseq_excelfile,
ampseq_csvfolder). The json file showed in the previous
chunk of code can be downloaded from here.