Trying to turn
a: 1, 2, 3
a: a, b, v
b: 5, 6, 7
b: 10, 1543, 1345
b: e, fe, sdf
cd: asdf, asdfas dfasdfa,asdfasdfa,afdsfa sdf
e1: asdfas, dafasd, adsf, asdfasd
e1: 1, 3, 2
e1: 9, 8, 7, 6
into
a: 1, 2, 3
a, b, v
b: 5, 6, 7
10, 1543, 1345
e, fe, sdf
cd: asdf, asdfas dfasdfa,asdfasdfa,afdsfa sdf
e1: asdfas, dafasd, adsf, asdfasd
1, 3, 2
9, 8, 7, 6
So, the lines are sorted. If consecutive lines start with the same sequence of characters up to / including some separator (here the colon (and the blank following it)), only the first instance should be preserved - as should be the remainder of all lines. There could be up to about a dozen (and a half) lines starting with the identical sequence of characters. The input holds about 4,500 lines…
Tried in TextWrangler.
Whilst the search pattern
^([[:alnum:]]+): (.+)\r((\1:) (.+)\r)*
matches correctly, neither the replacement
\1:\t\2\r\t\3\r
nor
\1:\t\2\r\t\4\r
gets me anywhere close to what I'm looking for.
The search pattern
^(.+): (.+)\r((?<=\1:) (.+)\r)*
is rejected for the lookbehind not being fixed length. - Not sure, it's going into the right direction anyway, though.
Looking at How to merge lines that start with the same items in a text file I wonder, whether there is an elegant (say: one search pattern, one replacement, run once) solution at all.
On the other hand, I might just not be able to come up with the right question to search the net for. If you know better, please, point me into the right direction.
Keeping the remainder of the rows aligned is, of course, sugar on the cake…
Thank you for your time.
As a workaround for variable length lookbehind: PCRE allows alternatives of variable length
PCRE is not fully Perl-compatible when it comes to lookbehind. While Perl requires alternatives inside lookbehind to have the same length, PCRE allows alternatives of variable length.
An idea that requires to add a pipe for each character of max prefix length:
(?<=(\w\w:)|(\w:)) (.*\n?)\1?\2?
And replace with \t\3
. See test at regex101. Capturing inside the lookbehind is important for not consuming / not skipping a match. Same pattern variable eg .NET: (?<=(\w+:)) (.*\n?)\1?
(?<=(\w\w:)|(\w:))
first two capture groups inside lookbehind for capturing prefix: Two or one word characters followed by a colon. \w
is a shorthand for [A-Za-z0-9_]
(.*\n?)
third capture group for stuff between prefixes. Optional newline to get the last match.
\1?\2?
will optionally replace the same prefix if in the following line. Only one of both can be set: \1
xor \2
. Also space after colon would always be matched - regardless prefix.
Summary: Space after each prefix is converted to tab. Prefix of following line only if matches current.
To match and replace multiple spaces and tabs: (?<=(\w\w:)|(\w:))[ \t]+(.*\n?)\1?\2?
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments