我想要使用以下功能awk
,sed
或使用其他工具。
例如如下
第一个文件名:File1.txt
内部(以制表符分隔的表格式)
ID Match Length
100 OK 1000
200 OK 1000
300 OK 2000
400 OK 2000
500 OK 3000
第二文件名:File2.fasta
该信息包含如下信息
>100
ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTG
>200
CTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGA
>300
TGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAC
>400
GACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACT
>500
ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTG
所以我想从File2.fasta再扩展一列到File1.txt文件,所以这是最终结果
ID Match Length Sequence
100 OK 1000 ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTG
200 OK 1000 CTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGA
300 OK 2000 TGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAC
400 OK 2000 GACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACT
500 OK 3000 ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTG
有没有人对如何合并这两个文件有什么好主意?
我相信,您正在寻找加入的机会。
首先,您需要对文件进行排序,并且使用通用格式(相同的定界符)。
cat File2.fasta |sed 's/$/\t/g'|tr -d '\n' |sed 's/>/\n/g'|sort > File2.fasta.sorted
cat File1.txt|sort > File1.txt.sorted
然后,您只需要像这样加入:
join -a1 -t'$TAB' File1.txt.sorted File2.fasta.sorted
注意这里$ TAB是指制表符。
这将产生如下内容:
100 OK 1000 ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTG
200 OK 1000 CTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGA
300 OK 2000 TGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGAC
400 OK 2000 GACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACT
500 OK 3000 ACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTGACTG
ID Match Length
您想要的是哪一个(列名/位置除外)。
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句