Thursday, September 25, 2014

FASTA to RSRS (*nix)

You might have known a windows tool to get RSRS markers from FASTA called FASTA to RSRS (With Visualizer). However, in Unix-like platforms, you can extract right from console. The following commands will help you get RSRS markers from FASTA mtDNA file.

Prerequisites:

  • Any Unix-based system
  • Connected to internet (or) RSRS.fasta is downloaded and kept in current directory. If manually downloaded, the wget command can be skipped.


Commands:
$ wget http://www.phylotree.org/resources/RSRS.fasta
$ cat RSRS.fasta |tail -n +2|sed ':a;N;$!ba;s/\n//g' |sed -E "s/([ATGCN])/\1\n/g" > RSRS.seq
$ cat input.fasta |tail -n +2|sed ':a;N;$!ba;s/\n//g' |sed -E "s/([ATGCN])/\1\n/g" > input.seq
$ diff -B --old-line-format=' %l%3dn' --new-line-format='%L'  --suppress-common-lines RSRS.seq input.seq|grep -P '([0-9])+'|sed s/\\s/\\n/g |  sed '/^\s*$/d'|grep -P -v 'N[0-9]+'

Note: The input.fasta is your fasta input file.

Screenshot:
Output for how the RSRS markers will be displayed.