I'm looking for a way to remove one specific line from a bunch of files, but only if it occurs more than once in that file. Other lines should be kept, even if they are duplicates.
For example, a file like this where I would like to remove the duplicates of AAA
AAA
BBB
AAA
BBB
CCC
should become
AAA
BBB
BBB
CCC
I guess I should use sed
but I have no idea how to write the command.
With GNU sed
:
sed '0,/^AAA$/b;//d'
That is, let everything through (b
branches off like a continue
) up to the first AAA
(from the 0th line (that is even before the first line) and the first one matching /^AAA$/
(which could be the first line)), and then for the remaining lines, delete every occurrence of AAA
(an empty //
pattern reuses the last pattern).
GNU sed
is needed for the 0
address (and the ability to have other commands after the b
one in the same expression, though that could be easily worked around in other implementations by using two -e
expressions)
With awk
:
awk '$0 != "AAA" || !n++'
(or for a regexp pattern: awk '!/^AAA$/ || !n++'
)
a shorthand for:
awk '! (&0 == "AAA" && count > 0) {print; count++}'
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments