RegEx to remove repeated start of line using TextWrangler

Abecee

Trying to turn

a: 1, 2, 3
a: a, b, v
b: 5, 6, 7
b: 10, 1543, 1345
b: e, fe, sdf
cd: asdf, asdfas dfasdfa,asdfasdfa,afdsfa sdf
e1: asdfas, dafasd, adsf, asdfasd
e1: 1, 3, 2
e1: 9, 8, 7, 6

into

a: 1, 2, 3
   a, b, v
b: 5, 6, 7
   10, 1543, 1345
   e, fe, sdf
cd: asdf, asdfas dfasdfa,asdfasdfa,afdsfa sdf
e1: asdfas, dafasd, adsf, asdfasd
    1, 3, 2
    9, 8, 7, 6

So, the lines are sorted. If consecutive lines start with the same sequence of characters up to / including some separator (here the colon (and the blank following it)), only the first instance should be preserved - as should be the remainder of all lines. There could be up to about a dozen (and a half) lines starting with the identical sequence of characters. The input holds about 4,500 lines…

Tried in TextWrangler.

Whilst the search pattern

^([[:alnum:]]+): (.+)\r((\1:) (.+)\r)*

matches correctly, neither the replacement

\1:\t\2\r\t\3\r

nor

\1:\t\2\r\t\4\r

gets me anywhere close to what I'm looking for.

The search pattern

^(.+): (.+)\r((?<=\1:) (.+)\r)*

is rejected for the lookbehind not being fixed length. - Not sure, it's going into the right direction anyway, though.

Looking at How to merge lines that start with the same items in a text file I wonder, whether there is an elegant (say: one search pattern, one replacement, run once) solution at all.

On the other hand, I might just not be able to come up with the right question to search the net for. If you know better, please, point me into the right direction.

Keeping the remainder of the rows aligned is, of course, sugar on the cake…

Thank you for your time.

Jonny 5

As a workaround for variable length lookbehind: PCRE allows alternatives of variable length

PCRE is not fully Perl-compatible when it comes to lookbehind. While Perl requires alternatives inside lookbehind to have the same length, PCRE allows alternatives of variable length.

An idea that requires to add a pipe for each character of max prefix length:

(?<=(\w\w:)|(\w:)) (.*\n?)\1?\2?

And replace with \t\3. See test at regex101. Capturing inside the lookbehind is important for not consuming / not skipping a match. Same pattern variable eg .NET: (?<=(\w+:)) (.*\n?)\1?

  • (?<=(\w\w:)|(\w:)) first two capture groups inside lookbehind for capturing prefix: Two or one word characters followed by a colon. \w is a shorthand for [A-Za-z0-9_]

  • (.*\n?) third capture group for stuff between prefixes. Optional newline to get the last match.

  • \1?\2? will optionally replace the same prefix if in the following line. Only one of both can be set: \1 xor \2. Also space after colon would always be matched - regardless prefix.


Summary: Space after each prefix is converted to tab. Prefix of following line only if matches current.
       To match and replace multiple spaces and tabs: (?<=(\w\w:)|(\w:))[ \t]+(.*\n?)\1?\2?

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Regex in Textwrangler - Remove String Between Two Characters

Remove a specifc repeated word using python regex?

Textwrangler grep regex expression to remove every <span> except one

Remove repeated(consecutive or non-consecutive) word in a String using regex

Extract repeated patterns from a single-line string using Regex

Regex for repeated numbers with predifined start

remove line comment from file by using regex

Remove star and empty line using regex

Matching spaces between \n to a character in regex to remove whitespaces at the start of a line

remove line if word start in line

How to remove all lines that start with a certain string using java regex?

using the command line and regex to determine words that start sentences

How to match the start of a line using a Visual Studio Code regex?

How to force certain strings start with a new line using Java regex?

Regex to remove line break when next line does not start with a given string

Ruby how to remove repeated regex in string

Regex remove repeated +-*/ characters from a string in javascript

Regex : Remove Repeated and Single X from text

Regex remove entire line

How to remove line breaks when using Regex Pattern.compile

Remove multi-line C style /* comments */ using Perl regex

Regex to remove leading 9 from a phone number, that isn't at the start of the line

Scala split and line start in the regex

Why is this regex cutting at the start of the line?

Replacement at the line start JS RegEx

Regex match tabs that are not at the start of the line

Using Grep Find and Replace TextWrangler

Remove char at beginning of line, but start at line 2

Remove line break if line does not start with : with Powershell