Sunday, September 28, 2014

Afontova Gora-2 DNA

Similar to the ancient Malta boy's DNA, the same paper also provided raw data as BAM for another ancient DNA from Afontova Gora-2, located on the western bank of the Enisei River in south-central Siberia. I converted the raw data supplied in this scientific paper to formats familiar with genetic genealogists. I also filtered with SNPs tested by DNA testing companies like FTDNA, 23andMe and Ancestry in order to upload to GEDMatch but found this ancient DNA has less SNPs (~47000) that are common with them. Hence, I am not uploading this to GEDMatch. However, I will be reprocessing from sequence read run files and update this page once completed.

Download: 
Reference:
Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans.
Raghavan, Maanasa, Pontus Skoglund, Kelly E. Graf, Mait Metspalu, Anders Albrechtsen, Ida Moltke, Simon Rasmussen et al. "Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans." Nature (2013).

Data Used

La Braña-Arintero DNA

Approximately 7,000-year-old Mesolithic skeleton discovered at the La Braña-Arintero site in León, Spain, had been sequenced to retrieve a complete pre-agricultural European human genome and the sequence reads were made available to public by the authors. I converted these raw sequence reads supplied in the scientific paper to formats familiar with genetic genealogists.

Download: 
Reference:
Derived immune and ancestral pigmentation alleles in a 7,000-year-old Mesolithic European.
Olalde, Iñigo, Morten E. Allentoft, Federico Sánchez-Quinto, Gabriel Santpere, Charleston WK Chiang, Michael DeGiorgio, Javier Prado-Martinez et al. "Derived immune and ancestral pigmentation alleles in a 7,000-year-old Mesolithic European." Nature 507, no. 7491 (2014): 225-228.

Data Used

Friday, September 26, 2014

Tianyuan DNA

Ancient DNA of human from Tianyuan Cave outside Beijing, China. I converted the raw data supplied in the scientific paper to formats familiar with genetic genealogists. Please note this download contains only data for chromosome 21.

Download: 
Reference:
DNA analysis of an early modern human from Tianyuan Cave, China.
Fu, Qiaomei, Matthias Meyer, Xing Gao, Udo Stenzel, Hernán A. Burbano, Janet Kelso, and Svante Pääbo. "DNA analysis of an early modern human from Tianyuan Cave, China." Proceedings of the National Academy of Sciences 110, no. 6 (2013): 2223-2227. (Ref)

Data Used

Mal’ta MA-1 DNA

The origins of the First Americans remain contentious. Although Native Americans seem to be genetically most closely related to east Asians there is no consensus with regard to which specific Old World populations they are closest to. Here the authors sequence an ancient genome of individual (MA-1), from Mal’ta in south-central Siberia, to an average depth of 1x. Based on the author knowledge, this MA-1 DNA is the oldest anatomically modern human genome reported to date. I converted this raw sequence reads supplied in this scientific paper to formats familiar with genetic genealogists.

Download: 
Reference:
Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans.
Raghavan, Maanasa, Pontus Skoglund, Kelly E. Graf, Mait Metspalu, Anders Albrechtsen, Ida Moltke, Simon Rasmussen et al. "Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans." Nature (2013).

Data Used
Related Blogs

Thursday, September 25, 2014

Autosomal Compare (*nix)

For comparing two autosomal files, there is a Windows tool called, Autosomal Segment Analyzer. However, in Unix-based systems, you can compare using the below command. The below command assumes that the autosomal files are in 23andMe format. If you have autosomal files in any other format, you should be able to convert using the commands provided in Autosomal Converter for *nix page

Prerequisites: Any Unix-based system

$ join --nocheck-order -e EMPTY --header file1.txt file2.txt |awk 'BEGIN { snp_threshold = 700; mb_threshold = 7; error_radius = 350; largest_mb = 0; total_mb = 0; count=0; seg_start=0;seg_end=0; chr=0; pchr=0; error_pos=0; print "\nChr\tStart Position\tEnd Position\tLen(Mb)\tSNPs";} { chr = $2; seg_len = (seg_end-seg_start)/1000000; if( !($4 == $7 || substr($4,1,1) == substr($7,1,1)|| substr($4,2,1) == substr($7,2,1) || substr($4,1,1) == substr($7,2,1)|| substr($4,2,1) == substr($7,1,1) ) || pchr!=chr) { if( seg_end - error_pos > error_radius ) { count++; seg_end = $3; } else { if( count > snp_threshold && seg_len > mb_threshold) { total_mb = total_mb + seg_len; if(largest_mb < seg_len) largest_mb = seg_len; print chr"\t"seg_start"\t"seg_end"\t"seg_len"\t"count; } count = 0; seg_start = $3; } error_pos=$3; } else { count++; seg_end = $3; } pchr = chr;}END {print "\nLargest Segment: "largest_mb" Mb";print "Total Shared: "total_mb" Mb\n";}'

Note: file1.txt and file2.txt are the two files being compared. The SNP threshold of 700, Mb Threshold of 7 and error radius of 350 SNPs can be modified which are bolded for convenience.

Screenshot: