Wednesday, July 2, 2014

Genetic Genealogy Kit

GGK is a kit management, analysis and matching tool for Autosomal, X, Y and Mitochondrial DNA. It supports build 37 autosomal files.

Prerequisites:
Usage / Tips: When you run the first time, 'Checking Integrity of DB..' will take a while for the large download version that includes reference populations. Subsequent launches will not take longer. Open, click 'new' and add a kit by just drag and drop. To add autosomal and X (e.g, for FTDNA), just select the two files and then drag both files and drop it into the grid view. After autosomal, add ySNPs and mtDNA details. Once all kits are added, make sure to process them. Autosomal Processing will take a lot of time, so make sure you do it over night. When the kit is phased, you need to double click on the segment in one-to-one or one-to-many matches to open phased-segment-analyzer. Deleting a kit can sometimes be very slow because, it deletes all comparison data with all other kits as well. So, it is best to disable than delete unless you want to reclaim space.

Screenshot:







Download GGK v1.2.zip (390 MB). If you don't require admixture using reference populations, you can download GGK v1.2 Reduced.zip (10.7 MB).

Version Upgrades: Just copy the ggk.db into the new version. You don't need to download the larger version everytime. Download the reduced version and copy/overwrite with your larger version's ggk.db. The binary executables are exactly same for both versions. The only difference is in ggk.db which contains reference populations.

Documentation: Genetic Genealogy Kit.pdf (2.46 MB)

Source Code at GitHub.

License: MIT License.

References:
Change Log
Version 1.2
  • Bug fixes. Better support for AncestryDNA.
Version 1.1
  • Import/Export bug solved. A few bug fixes.
Version 1.0
  • Initial release.

Thursday, June 19, 2014

Autosomal Segment Analyzer

Autosomal Segment Analyzer allows to analyze individual matching segments of autosomal comparison between two autosomal DNA files and drill down to each SNP.

Prerequisites: Microsoft .Net Framework 4.0

Usage: Just execute it, click Open Files, select two files (FTDNA and/or 23andMe format) and click OK. If you wish to change preferences, click settings and preferences.

Screenshot:


Download Autosomal Segment Analyzer.exe (257 KB)

Source Code at GitHub.

Change Log
Version 2.0
  • Several bug-fixes. Removed cM and made Mb only to make it work on new SNP positions.
Version 1.1
  • Unhandled Exception: System.IndexOutOfRangeException - Fixed.
Version 1.0
  • Initial release.

Friday, May 2, 2014

BAM Analysis Kit

BAM Analysis Kit is a bundle of genome tools that will analyse .BAM raw data file and outputs in file format similar to genetic genealogy companies. The goal of this kit to enable end users to analyse their own genome on their personal computer.

The tool provides the following output,
  • Complete_Autosomal.csv - Autosomal SNPs in RSID / Position / Genotype format.
  • Complete_mtDNA.csv - mtDNA with RSIDSs.
  • mtDNA.fasta - mtDNA in FASTA format.
  • Complete_X.csv - X-Chromosome SNPs in RSID / Position / Genotype format.
  • Complete_Y.csv - Y-Chromosome SNPs in RSID / Position / Genotype format.
  • Complete_SNPs_y.csv - Y-SNPs in ISOGG Nomenclature with Position / Derived / Reference / Genotype.
  • RSRS_mtDNA.txt - mtDNA mutations in RSRS format.
  • Variants_Y.csv - Y-Chromosome Variants in Position / Reference / Genotype
  • Y_SNPs.txt - Y-SNPs in ISOGG Nomenclature separated by comma.
  • y_str.csv - Y-STR Markers
  • telomere.txt - Telomere Length
  • telseq.out - Raw output format from telseq project. Provided for advanced use.
Prerequisites: 
Usage:

Extract the download and click 'BAM Analysis Kit.exe'. Select the .BAM file and click 'Start Analysis'. After clicking 'Start Analysis', a command prompt will automatically open and start executing series of commands.



After a few minutes to several hours (or even days depending on your BAM file input and computer speed), the output will be available inside a subfolder called 'out'.

Download:  BAM Analysis Kit (64 bit).zip (1.42 GB)

Source Code: GitHub

License: The download bundles the following software for easy usage. So, if you are using this tool for non-commercial and/or personal use, you should be alright.
References:
  • Li H.*, Handsaker B.*, Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. and 1000 Genome Project Data Processing Subgroup (2009) The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics, 25, 2078-9. [PMID: 19505943]
  • McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA (2010). The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20:1297-303. [Pubmed]
  • Ding, Zhihao, Massimo Mangino, Abraham Aviv, Tim Spector, and Richard Durbin. "Estimating telomere length from whole genome sequence data." Nucleic acids research (2014): gku181.
  • Gymrek M, Golan D, Rosset S, & Erlich Y. lobSTR: A short tandem repeat profiler for personal genomes. Genome Research. 2012 April 22.
Change Log :1.2
  • UCSC reference positions are zero based. This caused an offset of 1 position in output result - bug fixed.
Change Log :1.1
  • Some files weren't created if mtDNA is not selected - bug fixed.
Change Log :1.0
  • Works on all BAM files with build 37 positions
  • Extracts SNPs from Autosomal DNA, X-DNA, Y-DNA and mtDNA.
  • Provides mtDNA FASTA.
  • Auto-converts Yoruba references in mtDNA and provides RSRS values.
  • Provides Y-SNPs in ISOGG Nomenclature.
  • Provides Y-STR markers.
  • Calculates Telomere Length.

Monday, April 28, 2014

Big-Y Telomere


This tool is replaced by BAM Analysis Kit with more advanced features.


A telomere is a region of repetitive nucleotide sequences at each end of a chromatid, which protects the end of the chromosome from deterioration or from fusion with neighbouring chromosomes. The longer the telomere, the more long life you have.

I did a small experiment to see if I could extract the telomere length information from BigY BAM and indeed I was able to. So, I made a small tool using telseq on windows using cygwin so that anyone can use it.

The tool provides the following output,
  • telomere.txt - Information on telomere length. 
Supported BAM files:
  • Big-Y BAM
  • Any BAM file with UCSC convention (hg1x) ordering for human reference genome.
Please let me know if any of the other BAM files are supported and/or the above is not supported.

Prerequisites: 
Usage:

Extract the download and click 'BigY Telomere UI'. Select the .BAM file and click 'Start Analysis'.



After clicking 'Start Analysis', a command prompt will automatically open and start executing a few commands.



After a few minutes (depending on your computer speed), the output will be available inside a subfolder called 'out', and the result file will automatically open in notepad. The estimated telomere length is in kb. 

Download:  BigY Telomere (64 bit).zip (20 MB)

License: The download bundles the following software for easy usage.
References:
  • Ding, Zhihao, Massimo Mangino, Abraham Aviv, Tim Spector, and Richard Durbin. "Estimating telomere length from whole genome sequence data." Nucleic acids research (2014): gku181.
Change Log :1.1
  • Some modifications for compatibility.
Change Log :1.0
  • Initial Release.

Saturday, April 26, 2014

Big-Y BAM STR Analysis Tool


This tool is replaced by BAM Analysis Kit with more advanced features.


Similar to Big-Y BAM SNP Analysis Tool where you were able to analyse SNPs, this tool is for STRs.

The tool provides the following output,
  • y_str.csv - contains all identified STRs in BigY BAM file. 
Supported BAM files:
  • Big-Y BAM
  • Any BAM file with UCSC convention (hg1x) ordering for human reference genome.
Please let me know if any of the other BAM files are supported and/or the above is not supported.

Prerequisites: 
Usage:

Extract the download and click 'Big-Y BAM STR Analysis UI.exe'. Select the .BAM file and click 'Start Analysis'.



After clicking 'Start Analysis', a command prompt will automatically open and start executing series of commands.

After nearly an hour (depending on your computer speed), the output will be available inside a subfolder called 'out'.

Download:  Big-Y BAM STR Analysis (64 bit).zip (97.3 MB)

Source Code: (Inside src folder)

License: The download bundles the following software for easy usage. So, if you are using this tool for non-commercial and/or personal use, you should be alight.
References:
  • Li H.*, Handsaker B.*, Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. and 1000 Genome Project Data Processing Subgroup (2009) The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics, 25, 2078-9. [PMID: 19505943]
  • Gymrek M, Golan D, Rosset S, & Erlich Y. lobSTR: A short tandem repeat profiler for personal genomes. Genome Research. 2012 April 22.
Change Log :1.0
  • Initial Release.

Wednesday, April 23, 2014

23andMe to FASTA

If you have a 23andMe raw data file which contains mt-DNA data with refSNPs/RSID but not in FASTA file format, this tool will help you.

Prerequisites: Microsoft .Net Framework 4.0

Usage: Open the 23andMe raw data and save the mtDNA FASTA file. Once saved, you can use FASTA to RSRS (With Visualizer) to get RSRS markers and visualize the mutations from RSRS. You can also use James Lick's mtDNA Haplogroup analysis.

Screenshot:


Download : 23andMe to FASTA.exe (298 KB)

Source Code at GitHub.

Assumption: 23andMe mtDNA raw data uses rCRS as reference (and positions for v2 and v3) and covers all variations from rCRS. If my assumption is wrong, please do alert me and I can fix the tool.

Change Log
Version 1.0
  • Initial Release.

Wednesday, April 16, 2014

Big-Y BAM SNP Analysis Tool


This tool is replaced by BAM Analysis Kit with more advanced features.


FamilyTreeDNA provides raw data as BAM files for Big-Y results upon request for advanced users. But many are not advanced users. Given the fact the raw data is around 500 MB to 1 GB in size, and much effort is required for analysis, it is certainly for advanced users. But, not any more! With this tool, anyone can extract all information from Big-Y BAM and get interpretations themselves using tools available in this website or any other third party. Also note that, even though the tool name seems to be Big-Y specific, it is common for any .BAM file. If you are looking for non-BigY BAM, please try BAM SNP Analysis Kit.

The tool provides the following output,
  • complete_autosomal.csv - contains all identified SNPs in BigY BAM file.
  • complete_mtdna.fasta - mtDNA found in BigY in FASTA format.
  • complete_x.csv - contains all identified X-DNA SNPs in BigY BAM file.
  • complete_y.csv - contains all identified SNPs in 23andMe format with RSIDs.
  • ftdna_autosomal.csv - contains only SNPs tested by FTDNA that are found in BigY BAM file.
  • ftdna_mtdna.fasta - mtDNA found in BigY in FASTA format (duplicate output - same as complete_bigy_mtdna.fasta).
  • ftdna_x.csv - contains only X-DNA SNPs tested by FTDNA that are found in BigY BAM file.
  • ftdna_y.csv - contains all identified Y-SNPs in FTDNA table format.
  • ftdna_ysnps.txt - contains all identified Y-SNPs separated by comma.
  • variants_y.csv - Y-DNA variants specific to you. 
Supported BAM files:
  • Big-Y BAM
  • Any BAM file with UCSC convention (hg1x) ordering for human reference genome.
Please let me know if any of the other BAM files are supported and/or the above is not supported.

Prerequisites: 
Usage:

Extract the download and click 'Big-Y BAM SNP Analysis UI.exe'. Select the .BAM file and click 'Start Analysis'.


After clicking 'Start Analysis', a command prompt will automatically open and start executing series of commands.



After around 4-8 hours (depending on your computer speed), the output will be available inside a subfolder called 'out'. Also ignore any error with file-not-found, esp. on the last completion screen.

Download:  Big-Y BAM SNP Analysis (64 bit).zip (1.4 GB)

Patch: patch.zip (125 KB), - (Upgrades all previous versions to latest. Just extract and overwrite over the older version).

Source Code at GitHub.

License: The download bundles the following software for easy usage. So, if you are using this tool for non-commercial and/or personal use, you should be alight.
Human Genome Reference: ucsc.hg19.fasta (Build 37)

References:
  • Li H.*, Handsaker B.*, Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. and 1000 Genome Project Data Processing Subgroup (2009) The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics, 25, 2078-9. [PMID: 19505943]
  • McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA (2010). The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20:1297-303. [Pubmed]
Change Log :1.5
  • Bug-fix in Y-DNA Variants.
Change Log :1.4
  • Includes Y-DNA Variants.
Change Log :1.3
  • Genotype in Y-DNA gets populated properly.
Change Log :1.2
  • Forms UI error handling.
  • A fail-back console batch file include.
  • Source for UI included.
Change Log :1.1
  • Reduced the size of download by removing unnecessary files.
  • Streamlined everything to be 64-bit.
  • Changed the user interface from console to windows forms.
  • Optimized to use 90% of available memory in the system.
Change Log :1.0
  • Initial Release.

Saturday, April 12, 2014

ISOGG Y-Tree AddOn for Google Chrome

ISOGG Y-Tree AddOn is a chrome browser extension that adds additional functionality of plotting your Y-SNP results on ISOGG Y-Tree webpage (isogg.org/tree). Please note that, this AddOn replaces the ISOGG tree functionality of Big-Y AddOn.

The extension adds a number of features to ISOGG Y-Tree based on your Y-DNA results.
  • Allows upto 10 kits.
  • Highlights Positive and Negative SNPs in ISOGG Y-Tree.
Note: To use this add-on, you must have purchased a Y-DNA test from any of the DNA testing companies for genealogy purposes and had received the results as Y-SNPs. ISOGG Tree is from International Society of Genetic Genealogy (www.isogg.org). Once the AddOn is installed, go to the Options and enter your Y-SNPs.

Prerequisites: Google Chrome

Note: Previously FTDNA provided all SNPs below the tree but now which was easy enough to copy/paste into this AddOn bit now it seems the list only includes a very few. However the CSV download does contains all. If you want to the SNPs in the format supported by this AddOn from the BigY CSV download you can use Merge-Y to add the file and export the SNPs. This exported SNPs can now be pasted into the AddOn.

Usage: Install the addon and go to Options page and enter your Y-SNPs. Then, to go isogg.org/tree to get those entered SNPs plotted.



Install: ISOGG AddOn Chrome AddOn

Source Code at GitHub.

Misc Info: Fast mode is an important feature to accelerate the plotted for better user experience. It works in such a way that the AddOn will have pre-knowledge of what SNPs will be in the Tree.  E.g., Big-Y may have 25000+ SNPs but only a quarter are actually found in Y-Tree. Hence, instead of searching for all 25000+ SNPs in ISOGG Y-Tree which is very inefficient, the AddOn will ignore all the SNPs from Big-Y results that aren't in Y-Tree. Hence, only ~5000+ SNPs are searched against the SNPs in Y-Tree, thus improving the overall user experience. If you are not sure what to do, just leave it as ticked.


If fast mode is unchecked, then plot interval is considered. This is also to adjust your overall user experience based on your requirement. Plot interval is simply the time internal between one plot and the other. If fast mode is enabled, plot interval is 0, which means, the browser literally hangs until the plot is complete. However, if fast mode is not enabled, you have two options. Either you can give preference to plotting but have the ability to watch the SNPs (by selecting 1 ms) or  give preference to user experience where you want to browse the site without any inconvenience irrespective of whether the plotting happens or not (selecting 600 ms).

Change Log :1.0.3
  • Kit selection disabled even after scanning had completed when fast mode is unchecked - bug fixed.
Change Log :1.0.2
  • Kit selection bug fixed.
Change Log :1.0.1
  • Icon changes.
Change Log :1.0.0
  • Initial Release.

Wednesday, April 9, 2014

YSNP Novel Variants

If you have downloaded the Novel Variants using Big-Y AddOn for Google Chrome, it gets downloaded exactly as in the table. However, it would be nice to see if there is a mapping of the Y-SNPs and knowing if it is positive or not. This tool exactly does that.

Prerequisites: Microsoft .Net Framework 4.0

Usage: Open the Novel Variants download and save the displayed table or Y-SNPs. After saving the Y-SNPs, you may want to look at it in ISOGG Y-Tree 2014

Screenshot:

Download : YSNP Novel Variants.exe (782 Kb)

Source Code at GitHub.

Change Log
Version 1.1
Version 1.0
  • Initial Release.
Note: Y-SNP data is taken from ISOGG and  Dr Jim Wilson and ScotlandsDNA.

Tuesday, April 8, 2014

23andMe To YSNPs

If you have a 23andMe raw data file which contains Y-DNA data with refSNPs/RSID but not the names of Y-SNPs in ISOGG format, this tool will help you. Please note that only positions of build 37 are supported.

Prerequisites: Microsoft .Net Framework 4.0

Usage: Open the 23andMe raw data and save the Y-SNPs. After saving the Y-SNPs, you may want to look at ISOGG Y-Tree 2014

Screenshot:

Download : 23andMe To YSNPs.exe (782 Kb)

Source Code at GitHub.

Change Log
Version 1.1
Version 1.0
  • Initial Release.
Note: Y-SNP data is taken from ISOGG and  Dr Jim Wilson and ScotlandsDNA.