Regex for CSV split including multiple double quotes

Gruber

I have a CSV column data containing text. Each row is separated with double quotes "

Sample text in a row is similar to this (notice: new lines and the spaces before each line are intended)

"Lorem ipsum dolor sit amet, 
 consectetur adipisicing elit, sed do eiusmod
 tempor incididunt ut labore et dolore magna 
 aliqua. Ut ""enim ad"" minim veniam,
 quis nostrud exercitation ullamco laboris nisi 
 ut aliquip ex ea commodo
 consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
 cillum dolore eu fugiat ""nulla pariatu"""
"ex ea commodo
 consequat. Duis aute irure ""dolor in"" reprehenderit 
 in voluptate velit esse
 cillum dolore eu fugiat nulla pariatur. 
 Excepteur sint occaecat cupidatat non
 proident, sunt in culpa qui officia deserunt 
 mollit anim id est laborum."

The above represent 2 subsequent rows.

I want to select as separated groups all the text contained between every first double quote " (starting a line) and every LAST double quote "

As you can see tho, there are line break in the text, along with subsequent escaped double quotes "" wich are part of the text that I need to select.

I came up with something like this

(?s)(?!")[^\s](.+?)(?=")

but the multiple double quotes are breaking my desired match

I'm a real novice with regex, so I think maybe I'm missing something very basic. Dunno if relevant but I'm using Sublime Text 3 so should be python I think.

What can I do to achieve what I need?

Wiktor Stribiżew

You can use the following regex:

"[^"]*(?:""[^"]*)*"

See demo

This regex will match either a non-quote, or 2 consequent double quotes inside double quotation marks.

How does it work? Let me share a graphics from debuggex.com:

enter image description here

With the regex, we match:

  • " - (1) - a literal quote
  • [^"]* - (2, 3) - 0 or more characters other than a quote (yes, including a newline, this is a negated character class), if there are none, then the regex searches for the final literal quote (6)
  • (?:""[^"]*)* - (4,5) - 0 or more sequences of:
    • "" - (4) - double double quotation marks
    • [^"]* - (5) - 0 or more characters other than a quote
  • " - (6) - the final literal quote.

This works faster than "(?:[^"]|"")*" (although yielding the same results), because the processing the former is linear, involving much less backtracking.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Regex split string in double quotes

Postgres split string with double quotes to multiple rows?

RegEx Match split by space escaping double quotes / single quotes?

Regex: split string by character except if inside quotes or double quotes

Including double quotes as part of the Regex pattern in C#

Including double quotes while writing CSV using apache commons in java

Regex split a string using newline (unless it is between double quotes)

Java split string with regex, anything inside Double Quotes

Javascript regex, split string by period unless wrapped in double quotes ""

split at double quotes in python

How to replace double quotes from csv inside a tag using Regex

Regex Split with Quotes and Commas

java - Regex to split a string using spaces but not considering double quotes or single quotes

Remove multiple double quotes using scala in JSON using regex

Regex parse multiple separate words and ignore double quotes

Regex in python: replace multiple occurrences of comma between double quotes

Escaping double quotes in .csv

Java regex split on multiple delimiters including substrings of other delimiters

Having multiple double quotes inside quoted string csv file

PostgreSQL COPY csv including Quotes

Repost - JavaScrpt split line Regex - 'double quote' inside 'double-quotes'

Echo text including double quotes in PHP

Regex to extract double quotes and string in quotes R

Regex not working due to single quotes and double quotes

regex multiple quotes selection

Antislash extra double quotes with regex

Javascript regex for single, double and no quotes

Regex divide text by double quotes

RegEx for not allowing double quotes not working