Friday, October 24, 2014

BAM Analysis Kit

(Version 1.4 released - 24-Oct-2014)

BAM Analysis Kit is a bundle of genome tools that will analyse .BAM raw data file and outputs in file format similar to genetic genealogy companies. The goal of this kit to enable end users to analyse their own genome on their personal computer.

The tool provides the following output,
  • Complete_Autosomal.csv - Autosomal SNPs in RSID / Position / Genotype format.
  • Complete_mtDNA.csv - mtDNA with RSIDSs.
  • mtDNA.fasta - mtDNA in FASTA format.
  • Complete_X.csv - X-Chromosome SNPs in RSID / Position / Genotype format.
  • Complete_Y.csv - Y-Chromosome SNPs in RSID / Position / Genotype format.
  • Complete_SNPs_y.csv - Y-SNPs in ISOGG Nomenclature.
  • RSRS_mtDNA.txt - mtDNA mutations in RSRS format.
  • Variants_Y.csv - Y-Chromosome Variants in Position / Reference / Genotype
  • Y_SNPs.txt - Y-SNPs in ISOGG Nomenclature separated by comma.
  • Y-STR_Markers.txt - Y-STR Markers
  • CODIS_Markers.txt - CODIS Markers
  • lobSTR_Y-STR.outY-STR output from lobSTR. Provided for advanced use.
  • lobSTR_CODIS.outCODIS output from lobSTR. Provided for advanced use.
  • telomere.txt - Telomere Length
  • telseq.out - Raw output format from telseq project. Provided for advanced use.
  • bam_out.vcf - VCF file
Prerequisites: 
Usage:

Extract the download and click 'BAM Analysis Kit.exe'. Select the .BAM file and click 'Start Analysis'. After clicking 'Start Analysis', a command prompt will automatically open and start executing series of commands.

Screenshot of UI
After a few minutes to several hours (or even days depending on your BAM file input and computer speed), the output will be available inside a subfolder called 'out'.

Download:  BAM_Analysis_Kit.zip (4.4 GB)

Source Code: GitHub

Human Genome Reference: The kit uses ucsc.hg19.fasta as reference. and uses snp138 for annotation of output genotype files.

License: The download bundles the following software for easy usage. So, if you are using this tool for non-commercial and/or personal use, you should be alright.

References:
  • Li H.*, Handsaker B.*, Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. and 1000 Genome Project Data Processing Subgroup (2009) The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics, 25, 2078-9. [PMID: 19505943]
  • McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA (2010). The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20:1297-303. [Pubmed]
  • Ding, Zhihao, Massimo Mangino, Abraham Aviv, Tim Spector, and Richard Durbin. "Estimating telomere length from whole genome sequence data." Nucleic acids research (2014): gku181.
  • Gymrek M, Golan D, Rosset S, & Erlich Y. lobSTR: A short tandem repeat profiler for personal genomes. Genome Research. 2012 April 22.

Change Log :1.4
  • Upgraded lobSTR to v3.0.2, GATK to v3.2.2
  • Output includes more accurate Y-STR values.
  • Includes CODIS output.
  • Separated Y-STR and CODIS as optional.
  • Displays used software version for convenience and advanced use.
  • Adds Read Group tags for BAM files without them.
Change Log :1.3
  • Updated with lobSTR from v2.0.4 to v2.0.8 (beta).
  • Does not delete the VCF file after processing completes.
Change Log :1.2
  • UCSC reference positions are zero based. This caused an offset of 1 position in output result - bug fixed.
Change Log :1.1
  • Some files weren't created if mtDNA is not selected - bug fixed.
Change Log :1.0
  • Works on all BAM files with build 37 positions
  • Extracts SNPs from Autosomal DNA, X-DNA, Y-DNA and mtDNA.
  • Provides mtDNA FASTA.
  • Auto-converts Yoruba references in mtDNA and provides RSRS values.
  • Provides Y-SNPs in ISOGG Nomenclature.
  • Provides Y-STR markers.
  • Calculates Telomere Length.

Wednesday, October 22, 2014

Ancient Hungarian Neolithic genome - NE3

The Great Hungarian Plain was a crossroads of cultural transformations that have shaped European prehistory. The authors had analysed a 5,000-year transect of human genomes, sampled from petrous bones giving consistently excellent endogenous DNA yields, from 13 Hungarian Neolithic, Copper, Bronze and Iron Age burials including two to high (~22 × ) and seven to ~1 × coverage, to investigate the impact of these on Europe’s genetic landscape. I converted the raw data of NE3 from Garadna site in Hungary into formats familiar to genetic genealogists. I also filtered with SNPs tested by DNA testing companies like FTDNA, 23andMe and Ancestry in order to upload to GEDMatch but found this ancient DNA has less SNPs that are common with them. Hence, I did not upload this to GEDMatch. However, complete SNPs are available for download.

Download: 
Reference:
Cristina Gamba, Eppie R. Jones, Matthew D. Teasdale, Russell L. McLaughlin, Gloria Gonzalez-Fortes, Valeria Mattiangeli, László Domboróczki, Ivett Kővári, Ildikó Pap, Alexandra Anders, Alasdair Whittle, János Dani, Pál Raczky, Thomas F. G. Higham, Michael Hofreiter, Daniel G. Bradley & Ron Pinhasi "Genome flux and stasis in a five millennium transect of European prehistory" doi:10.1038/ncomms6257.

Data Used

Ancient Hungarian Neolithic genome - NE4

The Great Hungarian Plain was a crossroads of cultural transformations that have shaped European prehistory. The authors had analysed a 5,000-year transect of human genomes, sampled from petrous bones giving consistently excellent endogenous DNA yields, from 13 Hungarian Neolithic, Copper, Bronze and Iron Age burials including two to high (~22 × ) and seven to ~1 × coverage, to investigate the impact of these on Europe’s genetic landscape. I converted the raw data of NE4 from M. Neol. Tiszadob-Bükk Culture found at Polgár-Ferenci-hát site in Hungary into formats familiar to genetic genealogists. I also filtered with SNPs tested by DNA testing companies like FTDNA, 23andMe and Ancestry in order to upload to GEDMatch but found this ancient DNA has less SNPs that are common with them. Hence, I did not upload this to GEDMatch. However, complete SNPs are available for download.

Download: 
Reference:
Cristina Gamba, Eppie R. Jones, Matthew D. Teasdale, Russell L. McLaughlin, Gloria Gonzalez-Fortes, Valeria Mattiangeli, László Domboróczki, Ivett Kővári, Ildikó Pap, Alexandra Anders, Alasdair Whittle, János Dani, Pál Raczky, Thomas F. G. Higham, Michael Hofreiter, Daniel G. Bradley & Ron Pinhasi "Genome flux and stasis in a five millennium transect of European prehistory" doi:10.1038/ncomms6257.

Data Used

Monday, October 20, 2014

Ajvide70 DNA

The authors had generated between 0.01 to 2.2-fold genome wide coverage for 6 neolithic hunter-gathers from pitted ware culture, 4 neolithic farmers from funnel beaker culture and 1 late Mesolithic hunter-gatherer. I converted the raw data of Ajvide70 from Pitted Ware Culture excavated in Sweden into formats familiar to genetic genealogists. I also filtered with SNPs tested by DNA testing companies like FTDNA, 23andMe and Ancestry in order to upload to GEDMatch but found this ancient DNA has less SNPs that are common with them. Hence, I did not upload this to GEDMatch. However, complete SNPs are available for download.

Download: 
Reference:
Skoglund, Pontus, Helena Malmström, Ayça Omrak, Maanasa Raghavan, Cristina Valdiosera, Torsten Günther, Per Hall et al. "Genomic Diversity and Admixture Differs for Stone-Age Scandinavian Foragers and Farmers." Science 344, no. 6185 (2014): 747-750.

Data Used

Sunday, October 19, 2014

Ajvide52 DNA

The authors had generated between 0.01 to 2.2-fold genome wide coverage for 6 neolithic hunter-gathers from pitted ware culture, 4 neolithic farmers from funnel beaker culture and 1 late Mesolithic hunter-gatherer. I converted the raw data of Ajvide52 from Pitted Ware Culture excavated in Sweden into formats familiar to genetic genealogists. I also filtered with SNPs tested by DNA testing companies like FTDNA, 23andMe and Ancestry in order to upload to GEDMatch but found this ancient DNA has less SNPs that are common with them. Hence, I did not upload this to GEDMatch. However, complete SNPs are available for download.

Download: 
Reference:
Skoglund, Pontus, Helena Malmström, Ayça Omrak, Maanasa Raghavan, Cristina Valdiosera, Torsten Günther, Per Hall et al. "Genomic Diversity and Admixture Differs for Stone-Age Scandinavian Foragers and Farmers." Science 344, no. 6185 (2014): 747-750.

Data Used