我有2个fastq文件F1.fastq和F2.fastq。F2.fastq是一个较小的文件,是从F1.fastq读取的子集。我想读取F1.fastq中的内容,而不是F2.fastq中的内容。以下python代码似乎无效。您可以建议修改吗?
needed_reads = []
reads_array = []
chosen_array = []
for x in Bio.SeqIO.parse("F1.fastq","fastq"):
reads_array.append(x)
for y in Bio.SeqIO.parse("F2.fastq","fastq"):
chosen_array.append(y)
for y in chosen_array:
for x in reads_array:
if str(x.seq) != str(y.seq) : needed_reads.append(x)
output_handle = open("DIFF.fastq","w")
SeqIO.write(needed_reads,output_handle,"fastq")
output_handle.close()
您可以使用集完成你的要求,你可以转换list1
到set
,然后list2
到set
,然后做set(list1) - set(list2)
,它会给项目在list1
不在list2
。
样例代码-
needed_reads = []
reads_array = []
chosen_array = []
for x in Bio.SeqIO.parse("F1.fastq","fastq"):
reads_array.append(x)
for y in Bio.SeqIO.parse("F2.fastq","fastq"):
chosen_array.append(y)
needed_reads = list(set(reads_array) - set(chosen_array))
output_handle = open("DIFF.fastq","w")
SeqIO.write(needed_reads,output_handle,"fastq")
output_handle.close()
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句