removing rows with given criteria


I am beginer with both python and pandas and I came across an issue I can't handle on my own. What I am trying to do is: 1) remove all the columns except three that I am interested in 2) remove all rows which contains serveral strings in column "asset number". And here is difficult part. I removed all the blanks but I can't remove other ones because nothing happens (example with string "TECHNOLOGIES" - tried part of the word and whole word and both don't work.

Here is the code:

import modin.pandas as pd

File1 = 'abi.xlsx'

df = pd.read_excel(File1, sheet_name = 'US JERL Dec-19')

df = df[['asset number','Cost','accumulated depr']] #removing other columns

df = df.dropna(axis=0, how='any', thresh=None, subset=None, inplace = False)

df = df[~df['asset number'].str.contains("TECHNOLOGIES, INC", na=False)]


And besides that, file has 600k rows and it loads so slow to see the output. Do you have any advice for it?

Thank you!

@Kenan - thank you for your answer. Now the code looks like below but it still doesn't remove rows which contains in chosen column specified stirngs. I also attached screenshot of the output to show you that the rows still exist. Any thoughts?

import modin.pandas as pd

File1 = 'abi.xlsx'

df = pd.read_excel(File1, sheet_name = 'US JERL Dec-19', usecols=['asset number','Cost','accumulated depr'])

several_strings = ['', 'TECHNOLOGIES', 'COST CENTER', 'Account', '/16']

df = df[~df['asset number'].isin(several_strings)]


rows still are not deleted @Andy I attach sample of the input file. I just changed the numbers in two columns because these are confidential and removed not needed columns (removing them with code wasn't a problem).

Here is the link. Let me know if this is not working properly. enter link description here


You can combine your first two steps with:

df = pd.read_excel(File1, sheet_name = 'US JERL Dec-19', usecols=['asset number','Cost','accumulated depr'])

I assume this is what your trying to remove several_strings = ['TECHNOLOGIES, INC','blah','blah']

df = df[~df['asset number'].isin(several_string)]

Update Based on the link you provided this might be a better approach

df = df[df['asset number'].str.len().eq(7)]

