在LINUX中重新排列以:分隔的列

BNRINBOX

问题是如何以所需顺序重新排列列和值。

输入

"a":"val1","c":"val2","b":"val3","d":"val4"
"a":"val1","b":[],"c":"val3","d":"val4"
"a":"val1","d":["val2","val32],"c":"val3","b":"val4"
"d":"val1","a":"val2","c":"val3","b":"val4"

预期输出应为a,b,c,d及其相应的值。

"a":"val1"|"b":"val3"|"c":"val2"|"d":"val4"
"a":"val1"|"b":[]|"c":"val3"|"d":"val4"
"a":"val1"|"b":"val4"|"c":"val3"|"d":["val2","val32]
"a":"val2"|"b":"val4"|"c":"val3"|"d":"val1"
弗桑

由于您的问题随着时间的推移而发生了显着变化,因此我将尝试解决三个不同的问题。

你的尝试1

您的awk命令正在尝试在出现时拆分行admin:即使有意义,您也只能引用$1$2,因为admin:每一行中都只出现一次

您可能正在寻找一些东西:

printf '%s\n' '"_id":"asc" ,"name":"enygren" ,"admin":[] ,"creat":"date3"' |
  sed 's/"//g' |
  awk -F' ,' -v OFS='|' '{if ($2~/name:/){print $1,$3,$4,$2} else {$1=$1; print $0}}'

当然,这可能不是一个好主意:/name:/匹配包含的 所有内容name:,而不仅仅是精确的label name:

无论如何,这看起来像是XY问题


重新排列列

awk您可以定制以下解决方案,以选择和重新排列列(假设它们来自定界文本文件)

它假定输入数据中的字段不能包含任何",根据您发布的代码1这听起来很合理,但实际上并非如此。您应该使用一些专门处理结构化数据的工具(请参见下文),例如,csvkit用于CSV或jq用于JSON(感谢Kiwi的提示)。

给定脚本prog_file

BEGIN {
                        # Create an array of labels for the fileds you want
                        # to keep, in the order you want to print them
    labels[1] = "\"_id\""
    labels[2] = "\"admin\""
    labels[3] = "\"creat\""
    labels[4] = "\"name\""
}
{
                        # Split any field on ":" and make an array of
                        # full fields indexed by their label.
                        # This assumes labels DO NOT CONTAIN any ":"
    for ( i=1; i<=NF; i++ ) {
        split($i, chunks, ":")
        fields[chunks[1]] = $i
    }
                        # Reset the record
    $0 = ""
                        # Re-build the record with only the fields
                        # whose labels are in the array we defined in
                        # the BEGIN block.
                        # Explicitly use "4" as the upper bound because
                        # POSIX does not specify the order in which
                        # "for (var in array)" assigns indexes to var
    for ( i=1; i<=4; i++ ) {
        $i = fields[labels[i]]
    }
                        # Strip any double quote
    gsub("\"","")
    print $0
}

和输入2

"_id":"123" ,"admin":[src] ,"creat":"date1" ,"name":"dedu"
"_id":"2w3" ,"admin":[analise] ,"creat":"date2" ,"name":"csv"
"_id":"asc" ,"name":"enygren" ,"admin":[] ,"creat":"date3"
"_id":"scd" ,"admin":[] ,"creat":"date4" ,"name":"tzpi"

调用:

awk -v FS=' ,' -v OFS='|' -f prog_file input_file

给出3

_id:123|admin:[src]|creat:date1|name:dedu
_id:2w3|admin:[analise]|creat:date2|name:csv
_id:asc|admin:[]|creat:date3|name:enygren
_id:scd|admin:[]|creat:date4|name:tzpi

处理数据格式

您编辑到问题中的输入数据最后一个样本似乎不是来自定界文本文件。它看起来像JSON对象的列表。
尽管JSON人类可读的,但它是一种数据格式,并且需要使用其他方法-实际上,上述awk解决方案不适用于该输入。

添加位结构,您的示例可以(返回?)转换为有效的JSON:

$ cat file
"a":"val1","c":"val2","b":"val3","d":"val4"
"a":"val1","b":[],"c":"val3","d":"val4"
"a":"val1","d":["val2","val32"],"c":"val3","b":"val4"
"d":"val1","a":"val2","c":"val3","b":"val4"

(请注意,我认为缺少""d":["val2","val32]是一个错字,并用于"d":["val2","val32"]代替)。

$ sed 's/^/{/; s/$/},/; 1 s/^/[/; $ s/,$/]/' file >tmpfile
$ cat tmpfile 
[{"a":"val1","c":"val2","b":"val3","d":"val4"},
{"a":"val1","b":[],"c":"val3","d":"val4"},
{"a":"val1","d":["val2","val32"],"c":"val3","b":"val4"},
{"d":"val1","a":"val2","c":"val3","b":"val4"}]

然后,安全的方法是使用JSON处理器jq来过滤和重新排序数据:

$ jq -r '.[] | {a: .a, b: .b, c: .c, d: .d} | @text' tmpfile
{"a":"val1","b":"val3","c":"val2","d":"val4"}
{"a":"val1","b":[],"c":"val3","d":"val4"}
{"a":"val1","b":"val4","c":"val3","d":["val2","val32"]}
{"a":"val2","b":"val4","c":"val3","d":"val1"}

删除剩余的左和右花括号是简单而安全的,而盲目地删除双引号()或用竖线(替换逗号以完全匹配示例输出将是不安全的",|


1从问题的修订版n°4n°7
2从问题修订的第6部分的最后部分推论得出
3从问题的修订版第6号开始

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章