How to get the file extension of the below shown data. Apparently, I have millions of rows in the csv file.
col1 ,col2 ,col3 ,col4 , col5, col6, col7
aaaaa/ ,0 ,2018-03-16T09:31:42.000Z, xx-daily.......
aaaaa/201802/ ,0 ,2019-01-17T06:16:34.000Z, xx-daily
aaaaa/201802/Feb2018000000_0.gzip,32602738,2018-09-11T04:05:38.000Z, xx-daily
aaaaa/201802/Feb2018000001_0.gzip,32602738,2018-09-11T04:05:38.000Z, xx-daily
aaaaa/201802/Feb2018000002_0.gzip,32602738,2018-09-11T04:05:38.000Z, xx-daily
aaaaa/201802/Feb2018000003_0.gzip,32602187,2018-09-11T04:05:38.000Z, xx-daily
aaaaa/201802/Feb2018000004_0.gzip,32602187,2018-09-11T04:05:39.000Z, xx-daily
aaaaa/201802/Feb2018000005_0.gzip,32602187,2018-09-11T04:05:39.000Z, xx-daily
aaaaa/201802/Feb2018000006_0.gzip,32578449,2018-09-11T04:05:39.000Z, xx-daily
I need to split the file extension and create another column to populate the file extension value in the same csv file.
Need the output as below
col1 ,col2 ,col3 ,col4 , col5, col6, col7
aaaaa/ ,0 ,2018-03-16T09:31:42.000Z, xx-daily.......
aaaaa/201802/ ,0 ,2019-01-17T06:16:34.000Z, xx-daily
aaaaa/201802/Feb2018000000_0.gzip, gzip ,32602738,2018-09-11T04:05:38.000Z, xx-daily
aaaaa/201802/Feb2018000001_0.gzip, gzip ,32602738,2018-09-11T04:05:38.000Z, xx-daily
aaaaa/201802/Feb2018000002_0.gzip, gzip ,32602738,2018-09-11T04:05:38.000Z, xx-daily
This is a bit clunky, does not add the spaces that you seem to want, and introduces a blank column in those rows that do not have a file extension (I believe that is correct behavior, and it's easy enough to modify this to stop doing that if you like). However, under no circumstances would I condone writing back into the same file from which you are reading. Some implementations of awk provide a feature for doing so, but using it is misguided. Use a filter and write your output to a different file. If you need to, you can overwrite the original file.
awk '{c=split($1,a,"."); ext=c>1?a[c]:""; $2=ext OFS $2}1' FS=, OFS=, input-file
You can get better spacing with:
awk '{c=split($1,a,"."); ext=c>1?a[c]:""; $2=ext OFS $2}1' FS=, OFS=',\t' input
and you can avoid the empty column (but you really don't want to do this) with:
awk '{c=split($1,a,"."); if( c > 1) $2=a[c] OFS $2}1' FS=, OFS=',\t' input
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments