print lines that have similar columns with multiple delimiters

cosmictypist

I have two files:

file1.txt

dn_id101_400_CT_TC    string1
dn_id111_60_TT_AA    string2

file2.txt

dn_id101_400_XX_XX    diffstring1
dn_id400_40_XY_YX    diffstring2
dn_id111_60_GG_CC    diffstring3

I want to print the lines from file2.txt if the first three elements separated by _ from file1.txt are present in the line in file2.txt. Here is my desired output:

dn_id101_400_XX_XX    diffstring1
dn_id111_60_GG_CC    diffstring3

Is there a way to to do this? Maybe by changing the delimiter of an awk? I'm not sure how to handle multiple delimiters in an awk command. Here's an example of what I'd like to use:

awk -F"\t" 'FNR==NR {a[$1]; next}; $1 in a' file1.txt file2.txt
dawg

You can do:

$ awk -F"\t" '     
            {s=$1; sub(/_[[:upper:]]+_[[:upper:]]+$/, "", s)} 
    FNR==NR { arr[s]++} 
    FNR<NR && (s in arr)' f1 f2
dn_id101_400_XX_XX  diffstring1
dn_id111_60_GG_CC   diffstring3

That assumes that /_[[:upper:]]+_[[:upper:]]+$/ correctly describes the part you need to remove to make the data keys overlap between the two files.

If you want to go left to right (irrespective of the number of _ after the first three) use split instead:

$ awk -F"\t" '     
            { split($1, a, /_/); s=a[1]"_"a[2]"_"a[3]} 
    FNR==NR { arr[s]++} 
    FNR<NR && (s in arr)' f1 f2

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Awk to print lines in file with multiple delimiters

How to have two similar Delimiters

Merge columns with multiple delimiters

Split multiple columns with multiple delimiters; Newly-formed columns should have count of original values

compare two file and print the lines that have matching columns awk

Parsing string with multiple delimiters into columns

Exclude lines that have duplicate words between delimiters

Print sets of lines from multiple folders as rows, not columns?

Print table with empty columns for consecutive delimiters

html + CSS - split columns when cells have multiple lines of text

Print result in multiple lines

How can I print out multiple similar patterns I have matched in perl?

print lines as columns two by two

awk lines to multiple columns

Similar Columns Split Into Multiple Dataframes

Drop Columns in multiple tables that have Column Name similar to values in another Table

How to print previous/multiple lines?

print multiple lines by getchar and putchar

How to print multiple lines in C

python: multiple print lines to be overwritten

Split one column into multiple columns by multiple delimiters in Pandas

Extracting multiple lines of text between delimiters from a single cell

PowerShell - Removing multiple lines of text between delimiters in a text file

how to use split and strip in python for lines with multiple delimiters on linux

Merge two similar dataframes that have the same columns

Check if columns have similar like value in Excel

Writing multiple lines to columns instead

AWS print only the lines which have dot

AWK print only the lines which have dot