如何使用awk按列合并两个文件？

Caio Rocha

我有以下两个文本文件：

文件1

-7.7
-7.4
-7.3
-7.3
-7.3

文件2

我想将它们并排合并，并用逗号隔开：

文件3

-7.7,4.823
-7.4,5.472
-7.3,5.856
-7.3,4.770
-7.3,4.425

我知道可以使用轻松完成此操作paste -d ',' file1 file2 > file3，但是我想要一个解决方案，使我能够控制每次迭代，因为我的数据集很大，而且我还需要向输出文件中添加其他列。例如：

A,-7.7,4.823,3
A,-7.4,5.472,2
B,-7.3,5.856,3
A,-7.3,4.770,1
B,-7.3,4.425,1

这是到目前为止我得到的：

awk 'NR==FNR {a[$count]=$1; count+=1; next} {print a[$count] "," $1; count+=1;}' file1 file2 > file3

输出：

-7.3,4.823
-7.3,5.472
-7.3,5.856
-7.3,4.770
-7.3,4.425

我是bash和awk的新手，所以详细的回复将不胜感激:)

编辑：
假设我有一个文件对目录，以两个扩展名结尾：.ext1和.ext2。这些文件的名称中包含参数，例如file_0_par1_par2.ext1有一对，即file_0_par1_par2.ext2。每个文件包含5个值。我有一个函数可以从名称中提取其序列号和参数。我的目标是在单个csv文件（file_out.csv）中写入文件中存在的值以及从文件名中提取的参数。
码：

for file1 in *.ext1 ; do
    for file2 in *.ext2 ; do
        # for each file ending with .ext2, verify if it is file1's corresponding pair
        # I know this is extremely time inefficient, since it's a O(n^2) operation, but I couldn't find another alternative
        if [[ "${file1%.*}" == "${file2%.*}" ]] ; then
            # extract file_number, and par1, par2 based on some conditions, then append to the csv file
            paste -d ',' "$file1" "$file2" | while IFS="," read -r var1 var2;
            do
                echo "$par1,$par2,$var1,$var2,$file_number" >> "file_out.csv" 
            done
        fi
    done
done

埃德·莫顿

有效执行您更新的问题所描述的方式：

假设我有一个文件对目录，以两个扩展名结尾：.ext1和.ext2。这些文件的名称中包含参数，例如file_0_par1_par2.ext1有一对，即file_0_par1_par2.ext2。每个文件包含5个值。我有一个函数可以从名称中提取其序列号和参数。我的目标是在单个csv文件（file_out.csv）中写入文件中存在的值以及从文件名中提取的参数。

for file1 in *.ext1 ; do
    for file2 in *.ext2 ; do
        # for each file ending with .ext2, verify if it is file1's corresponding pair
        # I know this is extremely time inefficient, since it's a O(n^2) operation, but I couldn't find another alternative
        if [[ "${file1%.*}" == "${file2%.*}" ]] ; then
            # extract file_number, and par1, par2 based on some conditions, then append to the csv file
            paste -d ',' "$file1" "$file2" | while IFS="," read -r var1 var2;
            do
                echo "$par1,$par2,$var1,$var2,$file_number" >> "file_out.csv" 
            done
        fi
    done
done

将（未测试）：

for file1 in *.ext1; do
    base="${file1%.*}"
    file2="${base}.ext2"
    paste -d ',' "$file1" "$file2" |
    awk -v base="$base" '
        BEGIN { split(base,b,/_/); FS=OFS="," }
        { print b[3], b[4], $1, $2, b[2] }
    '
done > 'file_out.csv'

相比之下，执行base="${file1%.*}"; file2="${base}.ext2"自身的效率要高N ^ 2倍（给定N对文件），for file2 in *.ext2 ; do if [[ "${file1%.*}" == "${file2%.*}" ]] ; then并且执行| awk '...'自身的效率要高出一个数量级| while IFS="," read -r var1 var2; do echo ...; done（请参阅为什么使用shell循环循环处理文本-认为是不好的实践），因此您可以期望看到与现有脚本相比性能有了巨大的提高。

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-01-23

我来说两句

0 条评论

登录后参与评论

上一篇：尝试连接到Azure SQL服务器时，我得到PDOException找不到驱动程序

TOP 榜单

文章

如何使用awk按列合并两个文件？

如何使用awk按列合并两个文件？

计算数据帧R中的字符串频率

Android Studio Kotlin：提取为常量

Excel 2016图表将增长与4个参数进行比较

获取并汇总所有关联的数据

如何使用Redux-Toolkit重置Redux Store

http：// localhost：3000 /＃！/为什么我在localhost链接中得到“＃！/”。

将加号/减号添加到jQuery菜单

算术中的c ++常量类型转换

TYPO3：将 Formhandler 添加到新闻扩展

TreeMap中的自定义排序

如何开始为Ubuntu开发

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

无法使用 envoy 访问 .ssh/config

在Ubuntu和Windows中，触摸板有时会滞后。硬件问题？

遍历元素数组以每X秒在浏览器上显示

在Jenkins服务器中使用Selenium和Ruby进行的黄瓜测试失败，但在本地计算机中通过

警告消息：在matrix（unlist（drop.item），ncol = 10，byrow = TRUE）中：数据长度[16]不是列数的倍数[10]>？

未捕获的SyntaxError：带有Ajax帖子的意外令牌u

如何使用tweepy流式传输来自指定用户的推文（仅在该用户发布推文时流式传输）

尝试在Dell XPS13 9360上安装Windows 7时出错

如果从DB接收到的值为空，则JMeter JDBC调用将返回该值作为参数名称