I have a tabulated file like
col1 col2 6 29 61 63 67 70 133 134 150 159 166 208 220 260 261 262 303 312 316 327 330 349 378 387 396 408 415 454 465 V 260 135 49 159
and so on up to thousand rows.
divided in five columns. I have converted third and fifth column in arrays by means of split (space delimiter) in order to compare both of them and print matching numbers. However I have tried different ways without results by the following code
awk 'BEGIN {FS=OFS="\t"} { allpos=split($3,arr1," "); posSNP=split($5,arr2," "); { for (j in arr2) {for (i in arr1) { if ( arr2[j] == arr1[i]) {printf "%s ", i arr1[i]}} printf "\n"}}}' "input" > "output";
and similar codes.
My desired output and would be something like:
col1 col2 V: 159 - 260
How can I get it in unix environment? Thanks in advance
a hash look up will be faster, you can further optimize by using the lengths to pick the hashed one.
awk 'BEGIN {FS=OFS="\t"}
{n=split($3,a3," ");
m=split($5,a5," ");
for(i=1;i<=m;i++) a[a5[i]];
SEP=""
for(i=1;i<=n;i++) if(a3[i] in a) {both=both SEP a3[i]; SEP="-"}
print $1,$2,$4 ":" both }' file
col1 col2 V:159-260
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments