我有一个test-file.csv
带有树列的数据集 ( ):
node,contact,mail
AAAA,Peter,[email protected]
BBBB,Hans,[email protected]
CCCC,Dieter,[email protected]
ABABA,Peter,[email protected]
CCDDA,Hans,[email protected]
我喜欢按列扩展标题count
并将其重命名node
为nodes
. 此外,所有条目都应排在第二列 ( mail
) 之后。在列count
我喜欢得到列的出现次数的数量mail
,在nodes
所有具有在列的相同值的条目mail
应当被打印(空间分离,并且按字母顺序排序)。
这就是我试图实现的目标:
contact,mail,count,nodes
Dieter,dieter@anything,com,1,CCCC
Hans,[email protected],2,BBBB CCDDA
Peter,peter@anything,com,2,AAAA ABABA
我有这个 awk 命令:
awk -F"," '
BEGIN{
FS=OFS=",";
printf "%s,%s,%s,%s\n", "contact","mail","count","nodes"
}
NR>1{
counts[$3]++; # Increment count of lines.
contact[$2]; # contact
}
END {
# Iterate over all third-column values.
for (x in counts) {
printf "%s,%s,%s,%s\n", contact[x],x,counts[x],"nodes"
}
}
' test-file.csv | sort --field-separator="," --key=2 -n
然而,这是我的结果 :-( 除了出现次数起作用之外什么都没有。
,[email protected],1,nodes
,[email protected],2,nodes
,[email protected],2,nodes
contact,mail,count,nodes
任何帮助表示赞赏!
你可以使用这个gnu awk
:
awk '
BEGIN {
FS = OFS = ","
printf "%s,%s,%s,%s\n", "contact","mail","count","nodes"
}
NR > 1 {
++counts[$3] # Increment count of lines.
name[$3] = $2
map[$3] = ($3 in map ? map[$3] " " : "") $1
}
END {
# Iterate over all third-column values.
PROCINFO["sorted_in"]="@ind_str_asc";
for (k in counts)
print name[k], k, counts[k], map[k]
}
' test-file.csv
输出:
contact,mail,count,nodes
Dieter,[email protected],1,CCCC
Hans,[email protected],2,BBBB CCDDA
Peter,[email protected],2,AAAA ABABA
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句