Optimise processing of for loop?

datanewbie96

I have this basic dataframe:

     dur    type    src    dst
0     0     new     543     1
1     0     new     21      1
2     1     old     4828    2
3     0     new     321     1
...
(total 450000 rows)

My aim is to replace the values in src with either 0, 1 or 2 depending on the values. I created a for loop/if else below:

for i in df['src']:
    if i <= 1000:
        df['src'].replace(to_replace = [i], value = [1], inplace = True)
    elif i <= 2500:
        df['src'].replace(to_replace = [i], value = [2], inplace = True)
    elif i <= 5000:
        df['src'].replace(to_replace = [i], value = [3], inplace = True)
    else:
        print('End!')

The above works as intended, but it is awfully slow trying to replace the entire dataframe with 450000 rows (it is taking more than 30 minutes to do this!).

Is there a more Pythonic way to speed up this algorithm?

sammywemmy

Try numpy.select, for multiple conditions:

cond1 = df.src.le(1000)
cond2 = df.src.le(2500)
cond3 = df.src.le(5000)

condlist = [cond1, cond2, cond3]
choicelist = [1, 2, 3]
df.assign(src=np.select(condlist, choicelist))

    dur     type    src     dst
0   0   new     1   1
1   0   new     1   1
2   1   old     3   2
3   0   new     1   1

Collected from the Internet

Please contact [email protected] to delete if infringement.