Concatenate the column values based on specific patterns in the column in Postgres

rshar

I have following columns in Postgres table

row_id  seq         symbol
37697   QQQQQQQQQQ  p.Q296[65]
37698   QQQQQQQ     p.Q296[65]
37760   HNHH        p.Q296[65]
37761   HNHH        p.Q296[65]
47791   QQQ         p.Q296[65]
47792   QQQ         p.Q296[65]
47793   HNHH        p.Q296[65]
47794   HNHH        p.Q296[65]
47795   NHRFCDS     p.Q296

I want to concatenate extracted and generated strings (p.Q296, Q297, ins) from 'symbol' and 'seq' if the seq starts with 'Q' and symbol has a pattern matching the value in column. Subsequently, drop the rows where the above pattern is missing (not able to figure out how to implement that).

The below query is unable to catch the pattern and not concatenating the strings.

The desired output is:

row_id  seq         new_col
37697   QQQQQQQQQQ  p.Q296_Q297insQQQQQQQQQQ
37698   QQQQQQQ     p.Q296_Q297insQQQQQQQ
47791   QQQ         p.Q296_Q297insQQQ
47792   QQQ         p.Q296_Q297insQQQ
47795   NHRFCDS     p.Q296                                                      

I am using following query for this.


SELECT              
   row_id,
   seq,                 

           CASE 

                WHEN symbol ~ '(^p..\d+)\[\d+]$'  AND  seq ~ '^Q.*')   
                THEN (substring(symbol, '(^p..\d+)\[\d+]$')) || '_' || ((substring(symbol, '^p..(\d+)\[\d+]$'))::INT)+1 ||  'ins' || seq

                WHEN symbol ~ '(^p..\d+)\[\d+]$'  AND  seq !~ '^Q.*')   
                THEN DROP ....

                ELSE symbol

                END AS new_col



                FROM table;

Explanation of regexp:

'(^p..\d+)\[\d+]$') --> extract  p.Q296
_  --> used for concatenation
^p..(\d+)\[\d+]$'))::INT)+1  --> extract the value after Q and add 1 to it
ins --> concatenate string 'ins'
seq --> concatenate value from column 'seq' 

Thanks

Hambone

Try this:

select
  row_id, seq,
  case
    when seq like 'Q%' and symbol ~ '^p..\d+\[\d+\]$' then
      concat(
        regexp_replace(symbol, '\[.+', ''),
        '_',
        substring (symbol, 3, 1),
        cast (substring (symbol from 'p..(\d+)') as integer) + 1,
        'ins',
        seq
      )
    else symbol
  end as new_col
from foo
where
  seq like 'Q%' or symbol !~ '^p..\d+\[\d+\]$'

The "drop rows" can simply be invoked by putting your conditions in the where clause.

I ran it against your sample data and got this:

37697   QQQQQQQQQQ  p.Q296_Q297insQQQQQQQQQQ
37698   QQQQQQQ     p.Q296_Q297insQQQQQQQ
47791   QQQ         p.Q296_Q297insQQQ
47792   QQQ         p.Q296_Q297insQQQ
47795   NHRFCDS     p.Q296

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Concatenate special characters to the column values based on pattern matching in Postgres

Concatenate rows of a column based on values of another column

Concatenate column values based on type result

Concatenate pandas column values based on common index

Concatenate rows based on multiple column values

Concatenate values in column 2 based on values in column 1 using awk

Concatenate column values of multiple rows based on another column value

concatenate values in dataframe if a column has specific values and None or Null values

Concatenate data in specific column

concatenate column values in a loop

Concatenate column values

Display missing values of specific column based on another specific column

Updating values in a specific column based on values in another column

Split column values based on delimiter and matches with another column in postgres

Lexicographical sorting of a Postgres table column based on values of another column

Concatenate data frames together based on similar column values

Concatenate column names based on row-wise values

How to concatenate multiple column values based on a matrix in R

How to concatenate row values to a new column based on other columns?

merge or concatenate map/array values to a new column based on condition

Add specific column values based on other Dataframe

Changing specific column values based on conditions in R

dplyr filter for a specific values based on column names

Sum of all rows based on specific column values

splitting file based on values in specific column

Remove duplicates based on specific column values

extract multi columns based on values of specific column

Create column based on specific values in other columns

select rows based on specific column values

TOP Ranking

HotTag

Archive