Wednesday, August 26, 2015

SRA / FASTQ to BAM Kit

Most Ancient DNA are uploaded as SRA or FASTQ files. This kit is developed to allow anyone to download and convert SRA / FASTQ files to BAM files. Once converted, it can be further processed using BAM Analysis Kit, which can be further used for genetic genealogy.

Usage: Make sure the files ends with .sra / .fastq
sra2bam.bat <sra-file>.sra
(or)
fq2bam.bat <fastq-file>.fastq

Prerequisites: 64-bit Windows

Download : SRA_FASTQ to BAM Kit.zip (3.2 GB)

Change Log
    Version 1.0
    • Initial Release.

    Friday, July 31, 2015

    8300 year old Ancient DNA of Kennewick Man

    The authors sequenced DNA from a 8358 year-old man from Kennewick (Washington state) USA. I converted the raw data of these samples into formats familiar to genetic genealogists. The complete SNPs are available for download. I also found common SNPs when filtered with SNPs tested by DNA testing companies like FTDNA, 23andMe and Ancestry, and uploaded to GEDmatch as Kit# F999970. Haplogroups, site location, and age of the sample etc as per the authors can be found in Ancient DNA page.

    Download: 
    Reference:
    Rasmussen, Morten, Martin Sikora, Anders Albrechtsen, Thorfinn Sand Korneliussen, J. Víctor Moreno-Mayar, G. David Poznik, Christoph PE Zollikofer et al. "The ancestry and affiliations of Kennewick Man." Nature (2015).

    Data Used

    Sunday, July 5, 2015

    Y-STR Kit

    Y-STR Kit will analyse .BAM raw data file or VCF files and outputs in HTML file format with all Y-STR values. It supports build 37 (hg19). If you are selecting VCF file, it must have SNPs/indels and all confident sites (not just the variants). Currently supports FTDNA 111 Y-STR Markers.

    The tool provides the following output,
    • Y-STR_Report.html - Output HTML Report
    • bam_chrY.vcf.gz - VCF output with Indels, SNPs and all confident sites.
    Prerequisites: 
    Usage:

    Extract the download and click 'Y-STR Kit UI.exe'. Select the .BAM or VCF file and click ' Analysis'. After clicking ''Execute', a command prompt will automatically open and start executing series of commands.

    User Interface

    Y-STR Report

    Y-STR Report

    After a few minutes to several hours, the output will be available inside a subfolder called 'out'.

    Download:  Y-STR Kit.zip (76 MB)

    Configuration Guide: Y-STR Kit Guide.pdf

    Source Code:
    Located at 'src' folder and/or uploaded to GitHub

    License: The download bundles the following software for pre-processing BAM and VCF files. So, if you are using this tool for non-commercial and/or personal use, you should be alright.
    For  my binary and source code, you can use MIT License.

    References:
    • Li H.*, Handsaker B.*, Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. and 1000 Genome Project Data Processing Subgroup (2009) The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics, 25, 2078-9. [PMID: 19505943]
    • McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA (2010). The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20:1297-303. [Pubmed]
    Change Log :1.1
    • Bug Fix - Unable to load BAM from folders with spaces fixed.
    Change Log :1.0
    • FTDNA 111 Y-STR Markers

    Friday, July 3, 2015

    Ancient DNA from Peştera cu Oase, Romania

    The authors analyzed DNA from a 37,000-42,000-year-old modern human from Peştera cu Oase, Romania. They found that on the order of six to nine percent of the genome of the Oase individual is derived from Neanderthals, more than any other modern human sequenced to date. Three chromosomal segments of Neanderthal ancestry are over 50 cM in size, indicating that this individual had a Neanderthal ancestor as recently as four to six generations back. The Oase individual does not share more alleles with later Europeans than with East Asians, suggesting that the Oase population did not contribute substantially to later humans in Europe. I converted the raw data of these samples into formats familiar to genetic genealogists. The complete SNPs are available for download. However, there is not much common SNPs when filtered with SNPs tested by DNA testing companies like FTDNA, 23andMe and Ancestry, hence not uploaded to GEDmatch. Haplogroups, site location, and age of the sample etc as per the authors can be found in Ancient DNA page.

    Download: 
    Reference:
    Trinkaus, Erik, Oana Moldovan, Adrian Bîlgăr, Laurenţiu Sarcina, Sheela Athreya, Shara E. Bailey, Ricardo Rodrigo et al. "An early modern human from the Peştera cu Oase, Romania." Proceedings of the National Academy of Sciences 100, no. 20 (2003): 11231-11236.Trinkaus, Erik, Oana Moldovan, Adrian Bîlgăr, Laurenţiu Sarcina, Sheela Athreya, Shara E. Bailey, Ricardo Rodrigo et al. "An early modern human from the Peştera cu Oase, Romania." Proceedings of the National Academy of Sciences 100, no. 20 (2003): 11231-11236.

    Data Used

    Monday, June 29, 2015

    Ancient DNA of Black Death Victim #8291

    Ancient DNA was sequenced from tooth of sample #8291, a Black Death victim from c. 1348 AD, East Smithfield Cemetery, London, UK. I converted the raw data of these samples into formats familiar to genetic genealogists. The complete SNPs are available for download. However, there is not much common SNPs when filtered with SNPs tested by DNA testing companies like FTDNA, 23andMe and Ancestry, hence not uploaded to GEDmatch. Haplogroups, site location, and age of the sample etc as per the authors can be found in Ancient DNA page.

    Download: 
    Reference:
    Schuenemann, Verena J., et al. "Targeted enrichment of ancient pathogens yielding the pPCP1 plasmid of Yersinia pestis from victims of the Black Death." Proceedings of the National Academy of Sciences 108.38 (2011): E746-E752.

    Data Used

    Tuesday, June 23, 2015

    Two Ancient DNA from indigenous Botocudos of Brazil

    Understanding the peopling of the Americas remains an important and challenging question. The authors present 14C dates, and morphological, isotopic and genomic sequence data from two human skulls from the state of Minas Gerais, Brazil, part of one of the indigenous groups known as ‘Botocudos’. I converted the raw data of these samples into formats familiar to genetic genealogists. The complete SNPs are available for download. I also filtered with SNPs tested by DNA testing companies like FTDNA, 23andMe and Ancestry and uploaded to GEDmatch. Haplogroups, site location, GEDmatch ID and age of the sample etc as per the authors can be found in Ancient DNA page.

    Download: 
    Reference:
    Malaspinas, Anna-Sapfo, Oscar Lao, Hannes Schroeder, Morten Rasmussen, Maanasa Raghavan, Ida Moltke, Paula F. Campos et al. "Two ancient human genomes reveal Polynesian ancestry among the indigenous Botocudos of Brazil." Current Biology 24, no. 21 (2014): R1035-R1037.

    Data Used

    Ancient DNA from 10 Petrous Bones

    The authors carried out intra-petrous comparisons for 10 petrous bones from specimens from Holocene archaeological contexts across Eurasia dated between 10,000-1,800 calibrated years before present (cal. BP). The methodology they used may potentially enable ancient DNA analyses of samples from hot regions that are otherwise not amenable to ancient DNA analyses. The ancient DNA from 10 petrous bones were provided by authors with their paper. I converted the raw data of these samples into formats familiar to genetic genealogists. The complete SNPs are available for download. I also filtered with SNPs tested by DNA testing companies like FTDNA, 23andMe and Ancestry but found significantly less common SNPs. Haplogroups, site location, age of the sample etc as per the authors can be found in Ancient DNA page.

    Download: 
    Reference:
    Ron Pinhasi , Daniel Fernandes, Kendra Sirak, Mario Novak, Sarah Connell, Songül Alpaslan-Roodenberg, Fokke Gerritsen, Vyacheslav Moiseyev, Andrey Gromov, Pál Raczky, Alexandra Anders, Michael Pietrusewsky, Gary Rollefson, Marija Jovanovic, Hiep Trinhhoang, Guy Bar-Oz, Marc Oxenham, Hirofumi Matsumura, Michael Hofreiter et al. "Optimal Ancient DNA Yields from the Inner Ear Part of the Human Petrous Bone" doi:10.1371/journal.pone.0129102

    Data Used

    Saturday, June 13, 2015

    101 Ancient Eurasian DNA

    The Bronze Age (BA) of Eurasia (c. 3,000-1,000 years BC, 3-1 ka BC) was a period of major cultural changes. Authors had provided 101 ancient genomes with their publications. Alex of Russian forum.molgen.org community, Sergey (ethnocalc (at) mail.ru) and myself converted the raw data of these sample into formats familiar to genetic genealogists. The complete SNPs are available for download. We also filtered with SNPs tested by DNA testing companies like FTDNA, 23andMe and Ancestry. If we find it to have more SNPs common with them, we uploaded to GEDMatch and the reference GEDmatch IDs Haplogroups, site location, age of the sample etc as per the authors can be found in Ancient DNA page.

    Download: 
    Reference:
    Morten E. Allentoft, Martin Sikora, Karl-Göran Sjögren, Simon Rasmussen, Morten Rasmussen, Jesper Stenderup, Peter B. Damgaard, Hannes Schroeder, Torbjörn Ahlström, Lasse Vinner, Anna-Sapfo Malaspinas, Ashot Margaryan, Tom Higham, David Chivall, Niels Lynnerup, Lise Harvig, Justyna Baron, Philippe Della Casa, Paweł Dąbrowski, Paul R. Duffy, Alexander V. Ebel, Andrey Epimakhov, Karin Frei, Mirosław Furmanek, Tomasz Gralak et al. "Population genomics of Bronze Age Eurasia" doi:10.1038/nature14507

    Data Used

    Monday, February 16, 2015

    Health Variant Report


    The tool is not intended to provide any health interpretations. If you notice the result requires health attention then please visit a physician and do proper tests and confirm.


    Variant Report takes the risk information from a configurable file 'app.conf' and generates report with health related information from autosomal files. Currently, 23andMe, FTDNA and Ancestry files are supported.

    Prerequisites: Microsoft .Net Framework 4.0

    Usage: Edit the app.conf in a notepad (or any text editor), add the required risk alleles and generate the report. The app.conf is a comma separated file with the following fields. Category, Gene, RSID, Risk Allele, Notes and any line starting with # is a comment.

    Screenshot:





    Download : Variant Report.zip (111 KB)

    Source Code at GitHub.

    Change Log
    Version 1.0
    • Initial Release.

    Wednesday, January 28, 2015

    The Imputer

    If you had done your autosomal DNA testing and you want to know the genotypes of all possible untested SNPs, this is the tool. The Imputer accurately predicts all genotypes of untested SNPs. The tool supports FTDNA, 23andMe and Ancestry autosomal build 37 files.

    Usage: Select the autosomal DNA input file, enter the output filename and select an option for what to do for unidentified allele, then click 'Impute'.

    Prerequisites: Microsoft .Net Framework 4.0

    Screenshot:

    Download : The Imputer.zip (8.31 GB)

    To avoid download failures on large files from Google drive, make sure you use a download manager. Please refer to post, Downloading large files from Google Drive using Download Manager for a quick tutorial.

    Source Code at GitHub

    Change Log
      Version 1.0
      • Initial Release.

      Tuesday, January 27, 2015

      Assembly Converter

      Converts human genome coordinates from one assembly to another on raw autosomal DNA files. The tool supports FTDNA, Ancestry and 23andMe autosomal files. This tool replaces the obsolete build converter.

      Usage: Select the autosomal DNA input file, appropriate LiftOver chain file and enter the output filename, then click 'Convert'. The output will be in exact same format as the input, except the coordinates changed.

      Prerequisites: Microsoft .Net Framework 4.0

      Screenshot:

      Download : Assembly Converter.exe (385 Kb)

      Conversion LiftOver Chain Files
      Hg38/GRCh38 to Hg19/GRCh37hg38ToHg19.over.chain.gz (1.2 MB)
      Hg19/GRCh37 to Hg38/GRCh38hg19ToHg38.over.chain.gz (222 Kb)
      Hg19/GRCh37 to Hg18hg19ToHg18.over.chain.gz (221 Kb)
      Hg18 to Hg38/GRCh38hg18ToHg38.over.chain.gz (336 Kb)
      Hg18 to Hg19/GRCh37hg18ToHg19.over.chain.gz (137 Kb)
      The complete list of chain files for all human genome assemblies can be downloaded from here.

      Source Code at GitHub

      Change Log
        Version 1.0
        • Initial Release.

        Downloading large files from Google Drive using Download Manager

        Downloading large files are always a pain. It takes long time and worse is it if disconnects or fails just before it completes all. I rely mostly on Google Drive and sometimes upload large files. Hence, I thought it would be appropriate to post a blog explaining how to download without any issues from Google Drive using a download manager. This post assumes you are using Google Chrome and screenshots are from it. As an example, I will show how to download SNP Prophet.zip which is 13 GB, a really massive file.

        Download the file normally

        Just click on the link (or follow the link) open the link in a incognito (or make sure you haven't signed into google/gmail in that browser) and download the file normally from the browser.




        Cancel the download

        Now, cancel the download from the browser. You can cancel the download by clicking on small context menu from the download display.Once cancelled, click on 'Show all downloads' to go to downloads page.



        Copy the link

        On the downloads page, right click on the link and select 'Copy link address'


        Paste the link on Download Manager

        You can paste this copied link URL on any download manager,. Here, I had used Free Download Manager. Make sure the 'Save As' is modified as the required filename.




        Download using Download Manager

        Now, the file is being downloaded by a download manager.



        The important advantage of using a download manager is that, if any network interruption happens, the download resumes and will not download from the beginning. This helps in avoiding download failures and saving bandwidth when trying to download large files.

        Wednesday, January 21, 2015

        SNP Prophet

        If you had done your autosomal DNA testing and you want to know the genotype of an untested SNP, this is the tool. SNP prophet predicts your genotype for an untested SNP using your autosomal DNA. It has an offline version and an online version.The offline is huge but does not require any internet, nor depends on 3rd party services to be available on the internet, since everything it requires will be present in your computer. The online version however is extremely less is size but entirely depends on 3rd party APIs (OpenSNP's JSON and DAS) and their service availability. The tool supports FTDNA, Ancestry and 23andMe files.

        Usage: Select the autosomal DNA file, enter the SNP you want to know it's genotype and click 'Find GenoType'. The total process for both online and offline can take sometimes 5 to 10 minutes.

        Prerequisites: Microsoft .Net Framework 4.0

        Screenshot:

        Download :
        To avoid download failures on large files from Google drive, make sure you use a download manager. Please refer to post, Downloading large files from Google Drive using Download Manager for a quick tutorial.

        Source Code:

        Change Log
          Version 1.0
          • Initial Release.

          Tyrolean Ancient DNA

          The Tyrolean Iceman, a 5300-year-old Copper age individual, was discovered in 1991 on the Tisenjoch Pass in the Italian part of the Oetztal Alps. The authors sequenced the complete genome of the Iceman. The authors had mentioned in the paper that they were able to extract 125,729 SNPs from all samples. I was able to convert only 2 samples ERR107308 and ERR107309 due to technical limitations.

          Download: 
          Reference:
          Keller, Andreas, et al. "New insights into the Tyrolean Iceman's origin and phenotype as inferred by whole-genome sequencing." Nature communications 3 (2012): 698.

          Data Used

          Monday, January 19, 2015

          GEDmatch Plus for Google Chrome


          Note: The chrome extension accesses GEDMatch website (gedmatch.com) to get details but GEDMatch neither supports nor promotes its use in any way.


          GEDmatch Plus is a chrome browser extension that adds additional functionality for GEDmatch website (www.gedmatch.com) like themes and caching.

          The extension adds a number of features to GEDmatch:

          • Several Themes, Styles etc
          • Caching of 1-to-Many, 1-to-1 Autosomal and 1-to-1 X-DNA, to reduce server load and boost performance. Caches are automatically expired after 7 days.


          Prerequisites: Google Chrome

          Screenshots:


           

           

           


          Usage: Install the addon and go to Options page and select the theme you want. Caching is enabled by default.

          Install: GEDmatch Plus Chrome Extension

          Source Code at GitHub.

          Change Log :1.0.0
          • Initial Release.

          Sunday, January 11, 2015

          Segment Compatibility

          If you want to check the segment compatibility from different kit versions from DNA testing companies, (or) if you match someone on a particular segment on lower thresholds and you want to check if that result is due to different kit versions from DNA testing companies, then this is the tool. It supports FTDNA's Affymetrix and Illumina, 23andMe's V2, V3, V4 and Ancestry files.

          Usage: To use this tool, select your kit version, your match's kit version and enter the segment details like chromosome, start and end positions. The positions should be in build 37/hg19. Then click 'Verify' button. A message box will popup to say pass/fail for the segment which is based on the requirement of atleast 100 SNPs per Mb for the specified segment. The below grid will display the available SNPs in each kit version and further below that, you find the overlapping SNP count. You can also download detailed information on the specified segment.

          Prerequisites: Microsoft .Net Framework 4.0

          Screenshot:


          Download : Segment Compatibility.exe (10.2 MB)

          Source Code at GitHub.

          Change Log
            Version 1.0
            • Initial Release.

            Thursday, January 1, 2015

            Sub Project: Autosomal Tree Visualizer

            This is a sub-project for Autosomal Pedigree Creator.

            This tool allows to visualize triangulated segments on autosomal pedigree trees. This tool is very similar to Ancient Ancestry and has all features of it, except instead of triangulated segments from ancient kits, it contains triangulated segments from the kits you select.

            Prerequisites:
            Usage:

            1. Copy the downloaded executable into Autosomal Pedigree Creator root folder. 
            2. Execute the file and it will prompt you to select the kits folder. This folder is the same folder used to generate the pedigree using Autosomal Pedigree Creator.
            3. Please wait until the kits are loaded. You can see the progress in the status bar.
            4. Once loaded, you can open entirely a different kit (a potential match) and see how segments match the kits in the pedigree.

            Screenshot:

             

            Download Autosomal Tree Visualizer.exe (686 KB)

            Source Code at GitHub.

            License: MIT License.