I have two files:
file1.txt
dn_id101_400_CT_TC string1
dn_id111_60_TT_AA string2
file2.txt
dn_id101_400_XX_XX diffstring1
dn_id400_40_XY_YX diffstring2
dn_id111_60_GG_CC diffstring3
I want to print the lines from file2.txt if the first three elements separated by _
from file1.txt are present in the line in file2.txt. Here is my desired output:
dn_id101_400_XX_XX diffstring1
dn_id111_60_GG_CC diffstring3
Is there a way to to do this? Maybe by changing the delimiter of an awk
? I'm not sure how to handle multiple delimiters in an awk
command. Here's an example of what I'd like to use:
awk -F"\t" 'FNR==NR {a[$1]; next}; $1 in a' file1.txt file2.txt
You can do:
$ awk -F"\t" '
{s=$1; sub(/_[[:upper:]]+_[[:upper:]]+$/, "", s)}
FNR==NR { arr[s]++}
FNR<NR && (s in arr)' f1 f2
dn_id101_400_XX_XX diffstring1
dn_id111_60_GG_CC diffstring3
That assumes that /_[[:upper:]]+_[[:upper:]]+$/
correctly describes the part you need to remove to make the data keys overlap between the two files.
If you want to go left to right (irrespective of the number of _
after the first three) use split
instead:
$ awk -F"\t" '
{ split($1, a, /_/); s=a[1]"_"a[2]"_"a[3]}
FNR==NR { arr[s]++}
FNR<NR && (s in arr)' f1 f2
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments