我有多个.csv文件,我想从每个文件中提取特定的列。假设第5列。我想将该列添加到csv文件中,并从每个后续文件中将新列附加到该列中。我可以使用从他人那里获取的以下代码来执行此操作:
awk '{_[FNR]=(_[FNR] OFS $1)}END{for (i=1; i<=FNR; i++) {sub(/^ /,"",_[i]); print _[i]}}' input*.csv > output.csv`
当查看输出文件时,我注意到添加列的顺序不是顺序的。结果,我希望修改代码,以使列的标题是该列来自的文件名。我该怎么做呢?
例如:input1.csv可以是:
1,2,3,4,5
6,7,8,9,10
input2.csv可以是:
11,12,13,14,15
16,17,18,19,20
我希望output.csv为:
input1.csv, input2.csv
5,15
10,20
我希望这是有道理的,并预先感谢。
您可以尝试以下吗?使用GNU中显示的示例编写和测试awk
。
awk '
BEGIN{ FS=OFS="," }
FNR==1{
fileName=(fileName?fileName", ":"")FILENAME
}
{
max=(max>FNR?max:FNR)
val[FNR]=(val[FNR] == "" ? "" : val[FNR] OFS) $NF
}
END{
print fileName
for(i=1;i<=max;i++){
print val[i]
}
}
' *.csv > output.csv
在显示的示例中,名为的输出文件output.csv
将包含以下内容。
input1.csv, input2.csv
5,15
10,20
说明:添加了以上的详细说明。
awk ' ##Starting awk program from here.
BEGIN{ FS=OFS="," } ##Starting BEGIN section from here and setting field separator and output field separator as comma here.
FNR==1{ ##If this is first line of all Input_file
fileName=(fileName?fileName", ":"")FILENAME ##Creating fileName which has current Input_file name in it and keep adding it.
}
{
max=(max>FNR?max:FNR) ##Creating max, to get highest number of lines.
val[FNR]=(val[FNR] == "" ? "" : val[FNR] OFS) $NF ##Creating val with index of FNR and keep adding values of last field in it.
}
END{ ##Starting END block of this program from here.
print fileName ##Printing all file names in outputFile.
for(i=1;i<=max;i++){ ##Starting for loop from 1 to max here.
print val[i] ##Printing array val value here.
}
}
' *.csv > output.csv ##Mentioning all *.csv files here.
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句