根据部分匹配合并两个文件

鸽子11

我有两个档案

FileA.txt

ID
479432_Sros_4274
330214_NIDE2792
517722_CJLT1_010100003977
257310_BB0482
...

FileB.txt（**仅用于帮助您识别匹配项）

members   category
6085.XP_002168109,**479432_Sros_4274**,4956.XP_002495993.1,457425.SSHG_03214,51511.ENSCSAVP000  P
7159.AAEL006372-PA,**257310_BB0482** J
**517722_CJLT1_010100003977**,701176.VIBRN418_17773,9785.ENSLAFP00000010769,28377.ENSACAP00000014901,4081.Solyc03g120250.2.1,3847.GLYMA18G02240.1 U
500485.XP_002561312.1,1042876.PPS_0730,222929.XP_003071446.1,**330214_NIDE2792**  S
...

预期产量

Output.txt

ID  category
479432_Sros_4274  P
330214_NIDE2792  S
517722_CJLT1_010100003977  U
257310_BB0482  J
...

我已经根据对其他问题的答案在awk和R中尝试了一些代码，但无法获得所需的输出。

RavinderSingh13

您可以尝试以下吗？

awk '
BEGIN{
  print "ID  category"
}
FNR==NR{
  a[$0]
  next
}
{
  for(i in a){
    if(match($0,i)){
      print i,$NF
    }
  }
}
'  Input_filea   Input_fileb

说明：添加以上代码的说明。

awk '                               ##Starting awk program here.
BEGIN{                              ##Starting BEGIN section from here.
  print "ID  category"              ##Printing string ID, category here.
}                                   ##Closing BLOCK for BEGIN section.
FNR==NR{                            ##Checking condition FNR==NR which will be TRUE when 1st Input_file is being read.
  a[$0]                             ##Creating an array named a whose index is $).
  next                              ##next will skip all further statements from here.
}
{
  for(i in a){                      ##Traversing through array a with for loop.
    if(match($0,i)){                ##Checking condition if match is having a proper regex matched then do following.
      print i,$NF                   ##Printing variable i and $NF of current line.
    }
  }
}
'  Input_filea   Input_fileb        ##Mentioning Input_file names here.

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。