concatenate values in dataframe if a column has specific values and None or Null values

hSin

I have a dataframe with name+address/email information based on the type. Based on a type I want to concat name+address or name+email into a new column (concat_name) within the dataframe. Some of the types are null and are causing ambiguity errors. Identifying the nulls correctly in place is where I'm having trouble.

NULL = None
data = {
    'Type': [NULL, 'MasterCard', 'Visa','Amex'],
    'Name': ['Chris','John','Jill','Mary'],
    'City': ['Tustin','Cleveland',NULL,NULL ],
    'Email': [NULL,NULL,'[email protected]','[email protected]']
}

df_data = pd.DataFrame(data)
#Expected resulting df column:
df_data['concat_name'] = ['ChrisTustin', 'JohnCleveland','[email protected],'[email protected]']

Attempt one using booleans

if df_data['Type'].isnull() | df_data[df_data['Type'] == 'Mastercard':
   df_data['concat_name'] = df_data['Name']+df_data['City']
if df_data[df_data['Type'] == 'Visa' | df_data[df_data['Type'] == 'Amex':
   df_data['concat_name'] = df_data['Name']+df_data['Email']
else:
   df_data['concat_name'] = 'Error'

Error

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Attempt two using np.where

df_data['concat_name'] = np.where((df_data['Type'].isna()|(df_data['Type']=='MasterCard'),df_data['Name']+df_data['City'],
np.where((df_data['Type']=="Visa")|(df_data['Type]=="Amex"),df_data['Name']+df_data['Email'], 'Error'

Error

ValueError: Length of values(2) does not match length of index(12000)
Ingwersen_erik

Does the following code solve your use case?

# == Imports needed ===========================
import pandas as pd
import numpy as np


# == Example Dataframe =========================
df_data = pd.DataFrame(
    {
        "Type": [None, "MasterCard", "Visa", "Amex"],
        "Name": ["Chris", "John", "Jill", "Mary"],
        "City": ["Tustin", "Cleveland", None, None],
        "Email": [None, None, "[email protected]", "[email protected]"],
        # Expected output:
        "concat_name": [
            "ChrisTustin",
            "JohnCleveland",
            "[email protected]",
            "[email protected]",
        ],
    }
)

# == Solution Implementation ====================
df_data["concat_name2"] = np.where(
    (df_data["Type"].isin(["MasterCard", pd.NA, None])),
    df_data["Name"].astype(str).replace("None", "")
    + df_data["City"].astype(str).replace("None", ""),
    np.where(
        (df_data["Type"].isin(["Visa", "Amex"])),
        df_data["Name"].astype(str).replace("None", "")
        + df_data["Email"].astype(str).replace("None", ""),
        "Error",
    ),
)
# == Expected Output ============================
print(df_data)
# Prints:
#          Type   Name       City           Email         concat_name          concat_name2
# 0        None  Chris     Tustin            None         ChrisTustin           ChrisTustin
# 1  MasterCard   John  Cleveland            None       JohnCleveland         JohnCleveland
# 2        Visa   Jill       None  [email protected]  [email protected]    [email protected]
# 3        Amex   Mary       None    [email protected]    [email protected]      [email protected]

Notes

You might also consider simplifying the problem, by replacing the first condition (Type == 'MasterCard' or None) with the opposite of your second condition (Type == 'Visa' or 'Amex'):

df_data["concat_name2"] = np.where(
    (~df_data["Type"].isin(["Visa", "Amex"])),
    df_data["Name"].astype(str).replace("None", "")
    + df_data["City"].astype(str).replace("None", ""),
    df_data["Name"].astype(str).replace("None", "")
    + df_data["Email"].astype(str).replace("None", "")
)

Additionally, if you are dealing with messy data, you can also improve the implementation by converting the Type column to lowercase, or uppercase. This makes your code also account for cases where you have values like "mastercard", or "Mastercard", etc.:

df_data["concat_name2"] = np.where(
    (df_data["Type"].astype(str).str.lower().isin(["mastercard", pd.NA, None, "none"])),
    df_data["Name"].astype(str).replace("None", "")
    + df_data["City"].astype(str).replace("None", ""),
    np.where(
        (df_data["Type"].astype(str).str.lower().isin(["visa", "amex"])),
        df_data["Name"].astype(str).replace("None", "")
        + df_data["Email"].astype(str).replace("None", ""),
        "Error",
    ),
)

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Concatenate column values in Pandas DataFrame with "NaN" values

Concatenate two columns of spark dataframe with null values

Concatenate column values in Pandas DataFrame replacing “NaN” values with comma

Pandas dataframe compare values == none / nothing / null

Convert dataframe values into str where is not null (None)

Delete rows if there are null values in a specific column in Pandas dataframe

How to normalise a column of a dataframe, ignoring None values in it?

aggregate column values into list ignoring the None values in pandas dataframe

Filling up "None" values in pandas dataframe with values from previous column

concatenate column values in a loop

Concatenate column values

Replacing values in DataFrame with None

Concatenate the column values based on specific patterns in the column in Postgres

How do you concatenate multiple columns in a DataFrame into a another column when some values are null?

Copy an array if a specific column has a numeric values

Add specific column values based on other Dataframe

Seperate values in specific dataframe column to some row

Repeating values in a specific pattern in pandas dataframe column

Replace specific values in a dataframe column using Pandas

Replace specific column values in pandas dataframe

How to change specific values of a column in a dataframe?

pandas dataframe filter by sequence of values in a specific column

Count values in dataframe column that include specific parameters

Convert values to dummies on threshold in dataframe or specific column

Count rows in a dataframe column if not some specific values

Divide all columns in a dataframe with values of specific column

Add column to dataframe depending on specific row values

duplicate specific rows of a dataframe based on column values

Replace specific values in a dataframe by column mean in pandas