Wednesday, April 16, 2014

Big-Y BAM Analysis Tool

FamilyTreeDNA provides raw data as BAM files for Big-Y results upon request for advanced users. But many are not advanced users. Given the fact the raw data is around 500 MB to 1 GB in size, and much effort is required for analysis, it is certainly for advanced users. But, not any more! With this tool, anyone can extract all information from Big-Y BAM and get interpretations themselves using tools available in this website or any other third party. Also note that, even though the tool name seems to be Big-Y specific, it is common for any .BAM file.

The tool provides the following output,
  • complete_bigy_autosomal.csv - contains all identified SNPs in BigY BAM file.
  • complete_bigy_mtdna.fasta - mtDNA found in BigY in FASTA format.
  • complete_bigy_x.csv - contains all identified X-DNA SNPs in BigY BAM file.
  • complete_bigy_y.csv - contains all identified SNPs in 23andMe format with RSIDs.
  • ftdna_bigy_autosomal.csv - contains only SNPs tested by FTDNA that are found in BigY BAM file.
  • ftdna_bigy_mtdna.fasta - mtDNA found in BigY in FASTA format (duplicate output - same as complete_bigy_mtdna.fasta).
  • ftdna_bigy_x.csv - contains only X-DNA SNPs tested by FTDNA that are found in BigY BAM file.
  • ftdna_bigy_y.csv - contains all identified Y-SNPs in FTDNA table format.
  • ftdna_bigy_ysnps.txt - contains all identified Y-SNPs separated by comma.

Prerequisites: 
Usage:

Extract the download and click 'Big-Y BAM Analysis UI.exe'. Select the .BAM file and click 'Start Analysis'.


After clicking 'Start Analysis', a command prompt will automatically open and start executing series of commands.



After around 4-8 hours (depending on your computer speed), the output will be available inside a subfolder called 'out'. Also ignore any error with file-not-found, esp. on the last completion screen.

Download:  Big-Y BAM Analysis (64 bit).zip (1.4 GB)

Source Code at GitHub.

License: The download bundles the following software for easy usage. So, if you are using this tool for non-commercial and/or personal use, you should be alight.

References:
  • Li H.*, Handsaker B.*, Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. and 1000 Genome Project Data Processing Subgroup (2009) The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics, 25, 2078-9. [PMID: 19505943]
  • McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA (2010). The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20:1297-303. [Pubmed]
Change Log :1.2
  • Forms UI error handling.
  • A fail-back console batch file include.
  • Source for UI included.
Change Log :1.1
  • Reduced the size of download by removing unnecessary files.
  • Streamlined everything to be 64-bit.
  • Changed the user interface from console to windows forms.
  • Optimized to use 90% of available memory in the system.
Change Log :1.0
  • Initial Release.

Saturday, April 12, 2014

ISOGG Y-Tree AddOn for Google Chrome

ISOGG Y-Tree AddOn is a chrome browser extension that adds additional functionality of plotting your Y-SNP results on ISOGG Y-Tree webpage (isogg.org/tree). Please note that, this AddOn replaces the ISOGG tree functionality of Big-Y AddOn.

The extension adds a number of features to ISOGG Y-Tree based on your Y-DNA results.
  • Allows upto 10 kits.
  • Highlights Positive and Negative SNPs in ISOGG Y-Tree.
Note: To use this add-on, you must have purchased a Y-DNA test from any of the DNA testing companies for genealogy purposes and had received the results as Y-SNPs. ISOGG Tree is from International Society of Genetic Genealogy (www.isogg.org). Once the AddOn is installed, go to the Options and enter your Y-SNPs.

Prerequisites: Google Chrome

Usage: Install the addon and go to Options page and enter your Y-SNPs. Then, to go isogg.org/tree to get those entered SNPs plotted.



Install: ISOGG AddOn Chrome AddOn

Source Code at GitHub.

Misc Info: Fast mode is an important feature to accelerate the plotted for better user experience. It works in such a way that the AddOn will have pre-knowledge of what SNPs will be in the Tree.  E.g., Big-Y may have 25000+ SNPs but only a quarter are actually found in Y-Tree. Hence, instead of searching for all 25000+ SNPs in ISOGG Y-Tree which is very inefficient, the AddOn will ignore all the SNPs from Big-Y results that aren't in Y-Tree. Hence, only ~5000+ SNPs are searched against the SNPs in Y-Tree, thus improving the overall user experience. If you are not sure what to do, just leave it as ticked.


If fast mode is unchecked, then plot interval is considered. This is also to adjust your overall user experience based on your requirement. Plot interval is simply the time internal between one plot and the other. If fast mode is enabled, plot interval is 0, which means, the browser literally hangs until the plot is complete. However, if fast mode is not enabled, you have two options. Either you can give preference to plotting but have the ability to watch the SNPs (by selecting 1 ms) or  give preference to user experience where you want to browse the site without any inconvenience irrespective of whether the plotting happens or not (selecting 600 ms).

Change Log :1.0.3
  • Kit selection disabled even after scanning had completed when fast mode is unchecked - bug fixed.
Change Log :1.0.2
  • Kit selection bug fixed.
Change Log :1.0.1
  • Icon changes.
Change Log :1.0.0
  • Initial Release.

Wednesday, April 9, 2014

YSNP Novel Variants

If you have downloaded the Novel Variants using Big-Y AddOn for Google Chrome, it gets downloaded exactly as in the table. However, it would be nice to see if there is a mapping of the Y-SNPs and knowing if it is positive or not. This tool exactly does that.

Prerequisites: Microsoft .Net Framework 4.0

Usage: Open the Novel Variants download and save the displayed table or Y-SNPs. After saving the Y-SNPs, you may want to look at it in ISOGG Y-Tree 2014

Screenshot:

Download : YSNP Novel Variants.exe (782 Kb)

Source Code at GitHub.

Change Log
Version 1.1
Version 1.0
  • Initial Release.
Note: Y-SNP data is taken from ISOGG and  Dr Jim Wilson and ScotlandsDNA.

Tuesday, April 8, 2014

23andMe To YSNPs

If you have a 23andMe raw data file which contains Y-DNA data with refSNPs/RSID but not the names of Y-SNPs in ISOGG format, this tool will help you. Please note that only positions of build 37 are supported.

Prerequisites: Microsoft .Net Framework 4.0

Usage: Open the 23andMe raw data and save the Y-SNPs. After saving the Y-SNPs, you may want to look at ISOGG Y-Tree 2014

Screenshot:

Download : 23andMe To YSNPs.exe (782 Kb)

Source Code at GitHub.

Change Log
Version 1.1
Version 1.0
  • Initial Release.
Note: Y-SNP data is taken from ISOGG and  Dr Jim Wilson and ScotlandsDNA.

Tuesday, April 1, 2014

ISOGG Y-Tree Plotter


This tool is replaced by ISOGG Y-Tree AddOn for Google Chrome


Note: This tool is replaced by ISOGG Y-Tree AddOn for Google Chrome. With ever changing Y-Tree, the best solution is to directly plot on the website itself. Hence, this tool is obsolete.

ISOGG Y-Tree 2014 is a desktop application for ISOGG Y-Tree allows you to mark and identify the haplogroup and optimized for Big Y results. This application replaces the earlier My Y-SNP Tree.

Prerequisites: Microsoft .Net Framework 4.0

Usage: Just double click on it, paste your y-SNPs on the textbox provided and click 'Mark on Tree'.

Screenshot:

Download from Google Drive.

Source Code at GitHub.

Citation: International Society of Genetic Genealogy (2014). Y-DNA Haplogroup Tree 2014, Version:  9.35, Date: 10 March 2014, http://www.isogg.org/tree/ [Date of access: 10, Mar, 2014].

Change Log
Version 1.0
  • Y-SNP tree based on ISOGG's latest 2014 Y-Tree and optimized for the use with Big-Y. Initial release.

Merge Y

If you have done different tests for Y-DNA, then you can merge Y-DNA test results from different companies or different products. Currently supports Big-Y AddOn output, Geno 2.0, 23andMe.

Prerequisites: Microsoft .Net Framework 4.0

Usage: Add the files and save the merged. You can also save the SNPs alone.

If you are using the  Big-Y AddOn, you might have downloaded two files for Known SNP and Novel Variants. This tool helps to identify the new SNPs not mentioned in Novel Variants download and merges them as one download. Along with these two, you can also include other DNA files like Geno 2.0 and 23andMe to merge and remove duplicates.

Screenshot:
Merge-Y screenshot

Download : Merge Y.exe (480 Kb)

Source Code from GitHub.

Change Log
Version 1.2
  • Fixes the alignment when position doesn't exist in merged output.
Version 1.1
  • Hangs during view merged or export - fixed. Now, includes a progress-bar to show the progress.
Version 1.0
  • Initial Release.
Note: I don't have sufficient raw files in different formats for the same person to test it effectively. Testing is purely done using simulated data and my own test results. Please let me know if you find any bugs.

Thursday, March 13, 2014

Big Y AddOn for Google Chrome

Note: The ISOGG functionality of this Big-Y AddOn is moved as a seperate ISOGG Y-Tree AddOn with ability to enter SNPs manually.

Enhances FTDNA's Big Y Results with many options! Big Y AddOn is a chrome browser extension that adds additional functionality to Family Tree DNA's Big Y results and plots the results on up-to-date Y-DNA Trees.

The extension adds a number of features to BigY Results
  • Download Big Y SNPs.
  • Download the Known SNPs Table as CSV file which can be opened in Excel.
  • Download the Novel Variants Table as CSV file which can be opened in Excel.
  • Auto-Populates SNPs into MorleyDNA Y-Tree for easy analysis.
  • Highlights Positive and Negative SNPs in ISOGG Y-Tree.

Note: To use this add-on, you must have purchased Big-Y and received the results. Big-Y is a product from Family Tree DNA (familytreedna.com) that traces deep ancestry paternal lineage using DNA tests. ISOGG Tree is from International Society of Genetic Genealogy (www.isogg.org). Morley Y-DNA Project is a surname project maintained by Chris Morley (ytree.morleydna.com). Also note that, markers downloaded and plotted includes only Medium and High confidence and does not include Unknown and no-calls.

Prerequisites: Google Chrome

Usage: Install the addon and go to the Big-Y results page. Additional links will be become visible. Use reset button from FTDNA Big Y result page to clear all data stored by the extension.



Install: Big Y AddOn Chrome Extension

Source Code at GitHub.

Change Log: 1.0.11
  • ISOGG functionality moved into a seperate AddOn.
Change Log: 1.0.10
  • Now works for isogg.org/tree without the 'www' prefix. (Please note this will prompt a new permission request).
Change Log: 1.0.9
  • Minor bug-fix (Doesn't mark/populate properly in tree if Known_SNPs was downloaded last)
Change Log: 1.0.8
  • Minor bug-fix (Bolded individual SNPs in ISOGG isn't marked - fixed)
Change Log: 1.0.7
  • Minor bug-fix (Novel Variants downloads only High - fixed. Now it includes all)
Change Log: 1.0.6
  • Minor bug-fix (List populated with an extra 'values' along with kits in Y-Tree page fixed)
Change Log :1.0.5
  • Select kits directly from Y-Trees. Group Administrator friendly.
Change Log :1.0.4
  • Caching improved esp. for Group Administrators who browse different kits.
Change Log :1.0.3
  • Bug-fix (Novel Variants tab now works)
  • Ability to download Novel Variants table.
  • Downloads are prefixed with kit numbers.
Change Log :1.0.2
  • Minor bugfix (some SNPs not highlighted in ISOGG tree fixed).
Change Log :1.0.1
  • Supports Big Y results through GAP.
  • Ability to clear caches using Reset button on FTDNA page. This helps the Group Administrator to reset for each account.
Change Log :1.0.0
  • Supports downloading table, markers. Supports ISOGG 2014 tree, MorleyDNA tree.

Thursday, December 26, 2013

Palaeo-Eskimo 2000 BC DNA

The Saqqaq Genome Project generated 20x sequence coverage over the genome of an individual from the Extinct Palaeo-Eskimo Saqqaq culture. The project was a large collaboration between many Centres across the world, coordinated by Professor Eske Willerslev from the Centre for GeoGenetics at University of Copenhagen, Denmark. Full details of authors and the cooperation can be obtained from the Feb 2010 article referenced below.

This project aims to convert the raw data of the extinct Palaeo-Eskimo Genome to a raw data download file if FTDNA (or) 23andMe did the test. So, basically, I am just extracting the SNPs from the Genome and constructing the autosomal raw data file. This project is aimed to be more like factoids provided by FTDNA, just doing it from a hobbyist research perspective (and may produce scientific results). The source files are taken from Data for the Saqqaq genome project.


Download: 
  • GEDMatch# F999906 (FTDNA and 23andMe SNPs)
  • Download from Google Drive.
Reference:  Ancient Human Genome Sequence of an Extinct Palaeo-Eskimo Rasmussen M, Li Y, Lindgreen S, Pedersen JS, Albrechtsen A, Moltke I, Metspalu M, Metspalu E, Kivisild T, Gupta R, Bertalan M, Nielsen K, Gilbert MTP, Wang Y, Raghavan M, Campos PF, Kamp HM, Wilson AS, Gledhill A, Tridico S, Bunce M, Lorenzen ED, Binladen J, Guo X, Zhao J, Zhang X, Zhang H, Li Z, Chen M, Orlando L, Kristiansen K, Bak M, Tommerup N, Bendixen C, Pierre TL, Gr√łnnow B, Meldgaard M, Andreasen C, Fedorova SA, Osipova LP, Higham TFG, Ramsey CB, Hansen TV, Nielsen FC, Crawford MH, Brunak S, Sicheritz-Ponten T, Villems R, Nielsen R, Krogh A, Wang J, Willerslev E Nature 463, 757-762 (11 February 2010)

Change Log Version 1.0
  • Initial release.
Data Used

Thursday, September 12, 2013

Pirate DNA

Extracts DNA from matches to reconstruct (reverse engineer) DNA profile. With this, you can artificially reconstruct a subject's DNA.This makes it a pirate. However, it does has its limits. Since it is purely based on matches, it can only reveal if you have access to the DNA of the subject's close matches. Hence, there is absolutely no risk of privacy or piracy in using the software. I named it Pirate DNA just for fun. I designed this to create artificial ancestor profiles through matches, to explore the ancestry beyond genealogical time-frame. It supports Family finder matches export file (supports only 2 - i.e., select just 1 in chromosome browser and export) and also GEDMatch's one to one match (save the output as a HTML file from browser). The more number of matches and autosomal DNA files you have matching towards a subject will increase your accuracy.

Prerequisites: Microsoft .Net Framework 4.0

Usage: It is a wizard based. So, the steps in user interface will guide you.

Screenshot:

Download : PirateDNA.exe (164 Kb)

Source Code at GitHub.

Change Log
Version 1.0
  • Initial Release.

Friday, September 6, 2013

FASTA to RSRS (With Visualizer)

A simple tool to get the mtDNA RSRS markers and visualize them in mtDNA Map.

Prerequisites: Microsoft .Net Framework 4.0

Usage: Just drag and drop the FASTA file into the text box.

Screenshot:



Download : FASTA_to_RSRS.exe (437 Kb)

Source Code at GitHub.

Change Log
Version 1.1
  • Known differences fixed. mtDNA Map - Visualizer Added.
Version 1.0
  • Initial Release.