nf-core/viralintegration
Analysis pipeline for the identification of viral integration events in genomes using a chimeric read approach.
chimeric-alignmentctatviral-integrationvirusvirusintegrationfinder
Version history
v0.1.1 Caladrius - [2023-07-19]
Added
- Pipeline summary to README #63
- Update to nf-core/tools v2.9 #63
[2023-03-29]
Initial release of nf-core/viralintegration, created with the nf-core template. This pipeline is a re-implementation of CTAT-VirusIntegrationFinder v1.5.0. Main Contributors: (@alyssa-ab) (@Emiller88)
Pipeline Summary
- Input Check
- Input path to sample FASTAs in samplesheet.csv
- Check that sample meets requirements (samplesheet_check)
- Read QC (FastQC)
- Align reads to human genome
- Generate index and perform alignment (STAR)
- Quality trimming for unaligned reads
- Quality and adaptor trimming (Trimmomatic)
- Remove polyAs from reads (PolyAStripper)
- Identify chimeric reads
- Combine human and virus FASTAs (cat_fasta)
- Generate index and perform alignment to combined human + viral reference (STAR)
- Sort and index alignments (SAMtools)
- Determine potential insertion site candidates and optimize file (insertion_site_candidates, abridged_TSV)
- Virus Report outputs:
- Viral read counts in a tsv table and png plot
- Preliminary genome wide abundance plot
- Bam and bai for reads detected in potential viral insertion site
- Web based interactive genome viewer for virus infection evidence (VirusDetect.igvjs.html)
- Verify chimeric reads
- Create chimeric FASTA and GTF extracts (extract_chimeric_genomic_targets)
- Generate index and perform alignment to verify chimeric reads (STAR)
- Sort and index validated alignments (SAMtools)
- Remove duplicate alignments (remove_duplicates)
- Generate evidence counts for chimeric reads (chimeric_contig_evidence_analyzer)
- Summary Report outputs:
- Refined genome wide abundance plog png
- Insertion site candidates in tab-delimited format with gene annotations (vif.refined.wRefGeneAnnots.tsv)
- Web based interactive genome viewer for virus insertion sites (vif.html)
- Present quality checking and visualization for raw reads, adaptor trimming, and STAR alignments (MultiQC)
Added
- Add CTAT-VIF virus_db.fasta #1 (@alyssa-ab) (@Emiller88)
- Add small human test data set (chromosomes 6, 11, and 18 FASTA and GTF) #31 (@alyssa-ab)
- Write nf-test for full workflow #39 (@alyssa-ab) (@Emiller88)
- Add local module labels for resource management #35 (@alyssa-ab)
- Write documentation #50 (@alyssa-ab) (@Emiller88)