Create new column into dataframe based on values from other columns using apply function onto multiple columns

Varun Vishnoi

I am using apply function to create a new column i.e. ERROR_TV_TIC into dataframe based on existing columns [TV_TIC and ERRORS] values. I am not sure what I am doing wrong. With some conditions it works and with another it doesn't and throw error.

DataFrame:

ERRORS|TV_TIC
|2.02101E+41
['Length of Underlying Symbol for Option Contract is exceeding allowed limits(10 chars)']|nan
['Future Option Indicator is missing']|nan
['Trade Id is missing', 'Future Option Indicator is missing']|nan
['Trade Id is missing', 'Future Option Indicator is missing']|nan

Code when it works:

def validate_tv_tic(trades):
    tv_tiv_errors = list() 
    if pd.isnull(trades['TV_TIC']):
        tv_tiv_errors.append("Initial validations passed still TV_TIC missing")
    if pd.notnull(trades['TV_TIC']) and len(trades['TV_TIC']) != 42:
        tv_tiv_errors.append("Initial validations passed and TV_TIC is also generated but length is != 42 chars")
    return tv_tiv_errors if len(tv_tiv_errors) > 0 else np.nan

trades['ERROR_TV_TIC'] = trades.apply(validate_tv_tic, axis=1)

Code when it doesn't work: Here now condition is on 2 columns of series and I am making sure that I am passing "&" and not "and"

def validate_tv_tic(trades):
    tv_tiv_errors = list()
    if pd.isnull(trades['ERRORS']) & pd.isnull(trades['TV_TIC']):
        tv_tiv_errors.append("Initial validations passed still TV_TIC missing")
    if pd.isnull(trades['ERRORS']) & pd.notnull(trades['TV_TIC']) & len(trades['TV_TIC']) != 42:
        tv_tiv_errors.append("Initial validations passed and TV_TIC is also generated but length is != 42 chars")
    return tv_tiv_errors if len(tv_tiv_errors) > 0 else np.nan

trades['ERROR_TV_TIC'] = trades.apply(validate_tv_tic, axis=1)

Error I am getting: ('The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()', 'occurred at index 3')

Error description with used "and" Error Screenshot 2

Error description when used "&" Error Screenshot 2

My gut feeling is saying that pd.isnull is somewhere causing problem but not sure.

Varun Vishnoi

There was no problem with code. Problem exists with data inside dataframe.

column ERRORS was list of string and error was thrown when > 1 item exists as column value. So, I was getting error for line 3 and 4

ERRORS

['Length of Underlying Symbol for Option Contract is exceeding allowed limits(10 chars)']
['Future Option Indicator is missing']
['Trade Id is missing', 'Future Option Indicator is missing']
['Trade Id is missing', 'Future Option Indicator is missing']

After finding the root cause I changed the list to string where elements are separated by non-comma element and that works for me.

Changed my return statement of function validate_tv_tiv from

return tv_tiv_errors if len(tv_tiv_errors) > 0 else np.nan

to

return ' & '.join(errors) if len(errors) > 0 else np.nan

and this created my dataframe column ERRORS as below:

ERRORS

Length of Underlying Symbol for Option Contract is exceeding allowed limits(10 chars)
Future Option Indicator is missing
Trade Id is missing & Future Option Indicator is missing
Trade Id is missing & Future Option Indicator is missing

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

pandas create new column based on values from other columns / apply a function of multiple columns, row-wise

Apply function to dataframe column based on combinations of values from other columns

Pandas/Python: How to create new column based on values from other columns and apply extra condition to this new column

Create new pandas column with apply based on conditions of multiple other columns

How to create a new column in a DataFrame based on values of two other columns

New column using apply function on other columns in dataframe

Create new Python DataFrame column based on conditions of multiple other columns

Create new dataframe column based on values in multiple columns

Dataframe create new column based on other columns

Create new column based on values in other columns

How to create a new column based on values from other columns in a Pandas DataFrame

Create a new column using a condition from other two columns in a dataframe

Aggregate by multiple columns, sum one column and keep other columns? Create new column based on aggregated values?

How to create a new dataframe column using values and groupings from other rows and columns in pandas?

Grouping a column based on values on other columns to create new columns in pandas

How to create new columns based on other columns' values using R

creating new column based on values in other columns with multiple values

Apply function with multiple argument to multiple columns to create a new column

Create multiple new DataFrame columns using DataFrame.assign and apply

Panda Dataframe - Add values to new column based on criteria of other columns

How to populate values inside a new column based values from other columns in a dataframe in Pandas

Fill new column in one dataframe with values from another, based on values in two other columns? (Python/Pandas)

Pandas dataframe create a new column based on columns of other dataframes

Create new column in Pandas DataFrame based on other columns

How to create new column in DataFrame based on other columns in Python Pandas?

create new column based on other columns in pandas dataframe

Create a new Column in pandas dataframe based on the filetered values in the row of other columns

How to create new string column in PySpark DataFrame based on values of other columns?

Create new dataframe based on values of other columns whilst grouping by ID