I am trying to apply a function on multiple columns and in turn create multiple columns to count the length of each entry.
Basically I have 5 columns with indexes 5,7,9,13 and 15 and each entry in those columns is a string of the form 'WrappedArray(|2008-11-12, |2008-11-12)'
and in my function I try to strip the wrappedArray part and split the two values and count the (length - 1)
using the following;
def updates(row,num_col):
strp = row[num_col.strip('WrappedAway')
lis = list(strp.split(','))
return len(lis) - 1
where num_col is the index of the column and cal take the value 5,7,9,13,15. I have done this but only for 1 column:
fn = lambda row: updates(row,5)
col = df.apply(fn, axis=1)
df = df.assign(**{'count1':col.values})
I basically want to apply this function to ALL the columns (not just 5 as above) with the indexes mentioned and then create a separate column associated with columns 5,7,9,13 and 15 all in short code instead of doing that separately for each value.
I hope I made sense.
In regards to finding the amount of elements in the list, looks like you could simply use str.count()
to find the amount of ','
in the strings. And in order to apply a defined function to a set of columns you could do something like:
cols = [5,7,9,13,15]
for col in cols:
col_counts = {'{}_count'.format(col): df.iloc[:,col].apply(lambda x: x.count(','))}
df = df.assign(**col_counts)
Alternatively you can also usestrip('WrappedAway').split(',')
as you where using:
def count_elements(x):
return len(x.strip('WrappedAway').split(',')) - 1
for col in cols:
col_counts = {'{}_count'.format(col):
df.iloc[:,col].apply(count_elements)}
df = df.assign(**col_counts)
So for example with the following dataframe:
df = pd.DataFrame({'A': ['WrappedArray(|2008-11-12, |2008-11-12, |2008-10-11)', 'WrappedArray(|2008-11-12, |2008-11-12)'],
'B': ['WrappedArray(|2008-11-12,|2008-11-12)', 'WrappedArray(|2008-11-12, |2008-11-12)'],
'C': ['WrappedArray(|2008-11-12|2008-11-12)', 'WrappedArray(|2008-11-12|2008-11-12)']})
Redefining the set of columns on which we want to count the amount of elements:
for col in [0,1,2]:
col_counts = {'{}_count'.format(col):
df.iloc[:,col].apply(count_elements)}
df = df.assign(**col_counts)
Would yield:
A \
0 WrappedArray(|2008-11-12, |2008-11-12, |2008-1...
1 WrappedArray(|2008-11-12, |2008-11-12)
B \
0 WrappedArray(|2008-11-12,|2008-11-12)
1 WrappedArray(|2008-11-12, |2008-11-12)
C 0_count 1_count 2_count
0 WrappedArray(|2008-11-12|2008-11-12) 2 1 0
1 WrappedArray(|2008-11-12|2008-11-12) 1 1 0
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments