How to properly apply a lambda function into a pandas data frame column

Amani :

I have a pandas data frame, sample, with one of the columns called PR to which am applying a lambda function as follows:

sample['PR'] = sample['PR'].apply(lambda x: NaN if x < 90)

I then get the following syntax error message:

sample['PR'] = sample['PR'].apply(lambda x: NaN if x < 90)
                                                         ^
SyntaxError: invalid syntax

What am I doing wrong?

jezrael :

You need mask:

sample['PR'] = sample['PR'].mask(sample['PR'] < 90, np.nan)

Another solution with loc and boolean indexing:

sample.loc[sample['PR'] < 90, 'PR'] = np.nan

Sample:

import pandas as pd
import numpy as np

sample = pd.DataFrame({'PR':[10,100,40] })
print (sample)
    PR
0   10
1  100
2   40

sample['PR'] = sample['PR'].mask(sample['PR'] < 90, np.nan)
print (sample)
      PR
0    NaN
1  100.0
2    NaN
sample.loc[sample['PR'] < 90, 'PR'] = np.nan
print (sample)
      PR
0    NaN
1  100.0
2    NaN

EDIT:

Solution with apply:

sample['PR'] = sample['PR'].apply(lambda x: np.nan if x < 90 else x)

Timings len(df)=300k:

sample = pd.concat([sample]*100000).reset_index(drop=True)

In [853]: %timeit sample['PR'].apply(lambda x: np.nan if x < 90 else x)
10 loops, best of 3: 102 ms per loop

In [854]: %timeit sample['PR'].mask(sample['PR'] < 90, np.nan)
The slowest run took 4.28 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 3.71 ms per loop

이 기사는 인터넷에서 수집됩니다. 재 인쇄 할 때 출처를 알려주십시오.

침해가 발생한 경우 연락 주시기 바랍니다[email protected] 삭제

에서 수정
0

몇 마디 만하겠습니다

0리뷰
로그인참여 후 검토

관련 기사

How to access R data.frame column descriptions after read.spss

Apply function with args in pandas

How to display a column based on a condition that meets true for corresponding column in data.frame in R

How to use .le() and .ge() when filtering pandas data frame columns?

Add new column in pandas data frame based on condition and replace Nan values from different columns

Error in function to normalize data applied to a data frame

Using Pandas how to use column data for statistics analysis for big data

How to append each column of a data-frame to a series in pandas?

perform operation on column of data frame based on condition given to column in another data frame in pandas

How to convert data from row to column in python pandas?

How do you index and use current and previous column values to calculate the next column value in a pandas.apply function?

Is there a pandas function to select latest available date of a data frame?

Pandas data frame from dictionary

rename elements in a column of a data frame using pandas

Column mean of data.frame (list) in R

Extract URLs with regex into a new data frame column

How to replace non-ASCII by ASCII in pandas data frame

Applying function to each row of pandas data frame - with speed

Transforming pandas data frame using stack function

Transforming pandas data frame using stack function

How to count the values in column of data frame by group in R?

Create a calculated column in pandas data frame containing sorted groups

How to add values in a pandas data frame based on values of two columns of one of the data frame merged

How to write an apply() function to limit each element in a matrix column to a maximum allowable value?

Looking for a way to speed up this apply function in pandas

How can I insert a data frame in a function and then group by groups with tapply

How to save each column of a data frame into separate sheets in one excel file

How to change python string into pandas data frame?

How to get each values of a column in a data frame that contains a list of dictionaries?

TOP 리스트

뜨겁다태그

보관