RegEx to remove repeated start of line using TextWrangler

Abecee Published at Dev

Abecee

Trying to turn

a: 1, 2, 3
a: a, b, v
b: 5, 6, 7
b: 10, 1543, 1345
b: e, fe, sdf
cd: asdf, asdfas dfasdfa,asdfasdfa,afdsfa sdf
e1: asdfas, dafasd, adsf, asdfasd
e1: 1, 3, 2
e1: 9, 8, 7, 6

into

a: 1, 2, 3
   a, b, v
b: 5, 6, 7
   10, 1543, 1345
   e, fe, sdf
cd: asdf, asdfas dfasdfa,asdfasdfa,afdsfa sdf
e1: asdfas, dafasd, adsf, asdfasd
    1, 3, 2
    9, 8, 7, 6

So, the lines are sorted. If consecutive lines start with the same sequence of characters up to / including some separator (here the colon (and the blank following it)), only the first instance should be preserved - as should be the remainder of all lines. There could be up to about a dozen (and a half) lines starting with the identical sequence of characters. The input holds about 4,500 lines…

Tried in TextWrangler.

Whilst the search pattern

^([[:alnum:]]+): (.+)\r((\1:) (.+)\r)*

matches correctly, neither the replacement

\1:\t\2\r\t\3\r

nor

\1:\t\2\r\t\4\r

gets me anywhere close to what I'm looking for.

The search pattern

^(.+): (.+)\r((?<=\1:) (.+)\r)*

is rejected for the lookbehind not being fixed length. - Not sure, it's going into the right direction anyway, though.

Looking at How to merge lines that start with the same items in a text file I wonder, whether there is an elegant (say: one search pattern, one replacement, run once) solution at all.

On the other hand, I might just not be able to come up with the right question to search the net for. If you know better, please, point me into the right direction.

Keeping the remainder of the rows aligned is, of course, sugar on the cake…

Thank you for your time.

Jonny 5

As a workaround for variable length lookbehind: PCRE allows alternatives of variable length

PCRE is not fully Perl-compatible when it comes to lookbehind. While Perl requires alternatives inside lookbehind to have the same length, PCRE allows alternatives of variable length.

An idea that requires to add a pipe for each character of max prefix length:

(?<=(\w\w:)|(\w:)) (.*\n?)\1?\2?

And replace with \t\3. See test at regex101. Capturing inside the lookbehind is important for not consuming / not skipping a match. Same pattern variable eg .NET: (?<=(\w+:)) (.*\n?)\1?

(?<=(\w\w:)|(\w:)) first two capture groups inside lookbehind for capturing prefix: Two or one word characters followed by a colon. \w is a shorthand for [A-Za-z0-9_]
(.*\n?) third capture group for stuff between prefixes. Optional newline to get the last match.
\1?\2? will optionally replace the same prefix if in the following line. Only one of both can be set: \1 xor \2. Also space after colon would always be matched - regardless prefix.

Summary: Space after each prefix is converted to tab. Prefix of following line only if matches current.
To match and replace multiple spaces and tabs: (?<=(\w\w:)|(\w:))[ \t]+(.*\n?)\1?\2?

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2020-10-21

Comments

0 comments

TOP Ranking

Article

RegEx to remove repeated start of line using TextWrangler

RegEx to remove repeated start of line using TextWrangler

pump.io port in URL

Loopback Error: connect ECONNREFUSED 127.0.0.1:3306 (MAMP)

Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

How to import an asset in swift using Bundle.main.path() in a react-native native module

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

ngClass error (Can't bind ngClass since it isn't a known property of div) in Angular 11.0.3

Using Response.Redirect with Friendly URLS in ASP.NET

Can a 32-bit antivirus program protect you from 64-bit threats

Double spacing in rmarkdown pdf

How to fix "pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, '<'" using YOLOv3?

3D Touch Peek Swipe Like Mail

Bootstrap 5 Static Modal Still Closes when I Click Outside

Assembly definition can't resolve namespaces from external packages

Vector input in shiny R and then use it

Emulator wrong screen resolution in Android Studio 1.3

Svchost high CPU from Microsoft.BingWeather app errors

Graphics Context misaligned on first paint

Python connect to firebird docker database

Is this docker-for-mac password dialog legit?

How to save models trained locally in Amazon SageMaker?