php mysql keep only words ans punctuation without tabs, new lines, etc

user2986055

I have a database full of titles and descriptions from rss feed items from different sources and langagues...

This question is not about white spaces, but about keeping words and punctuation.

I'm trying to keep ONLY words WITH punctuation like ' " , . ; ( ) ! ? and also remove tabs, double spaces, new lines, etc.

I have a partially working solution, but in my database I still see new lines paragraphs, empty new lines... I also remove tags because I want to keep only the text.

$onlywords = strip_tags(html_entity_decode($insUrlsOk['rss_summary'])); //html_entity_decode because some times it's &lt; instead of <
$onlywords = trim($onlywords); // works partially -->> I still have new lines paragraphs, empty new lines
$onlywords = preg_replace('/[^\w\s]+/u',' ',$onlywords); //keeps ONLY words from any langages but also remove punctuation
$onlywords = str_replace('  ',' ',$onlywords);

I think my preg pattern '/[^\w\s]+/u' needs to be a bit more refined...

I'm also open to other solution as long as it is short and stays within few lines of code (without extra plugins to install in server).

Thanks.

Barmar

trim() only removes whitespace at the beginning and end of the string, so it won't get rid of paragraphs.

Newlines and tabs are included in \s, so the preg_replace() keeps them. Use preg_replace instead of str_replace to turn all sequences of whitespace into a single space:

$onlywords = preg_replace('/\s{2,}/', ' ', $onlywords);

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Counting blank spaces, tabs and new lines in java without file handling

New lines and tabs in json_decode() (PHP 7)

php echo not showing new lines stored in mysql

How to split words in new lines?

Comma separated words into new lines

PHP only removing specific words without affecting rest of the string

Regx + Java : split a text into words and removing punctuation only if they are alone or at the end

PHP - Matching only unicode alphanumeric and punctuation symbols

Keep only number in regex lines

Remove all occurences of new lines and tabs

Adding new lines and tabs in XML using Augeas

Preg_match to ignore new lines or tabs

Removing whitespace, tabs and new lines from array

PHP tokenize a tweet in words, punctuation, hashtag, mentions, emoticons

Removing all punctuation from a set of words, or writing all unique words from txt file into a set (without punctuation) in c++

Removing punctuation and tabs with sed

PHP/MYSQL: UPDATE Statement containing carriage returns or new lines

MySQL : query to find only lines in a table without similar lines in another table

Regex to match only words without _ or -

Extracting only certain words from string and ignoring words with numbers, etc

Python - regex to keep only words with textual characters

only keep words in a list which is an element of dataframe

Edit richtextbox to keep only words that contains underscore

keep only one space between words in a string

Keep only matched words in pandas column

only keep certain words in a string python

Regex: split new lines between constant words

RegEx for matching specific words and ignoring new lines

create new lines based on words from string