removing rows with given criteria

gervith

I am beginer with both python and pandas and I came across an issue I can't handle on my own. What I am trying to do is: 1) remove all the columns except three that I am interested in 2) remove all rows which contains serveral strings in column "asset number". And here is difficult part. I removed all the blanks but I can't remove other ones because nothing happens (example with string "TECHNOLOGIES" - tried part of the word and whole word and both don't work.

Here is the code:

import modin.pandas as pd


File1 = 'abi.xlsx'

df = pd.read_excel(File1, sheet_name = 'US JERL Dec-19')


df = df[['asset number','Cost','accumulated depr']] #removing other columns


df = df.dropna(axis=0, how='any', thresh=None, subset=None, inplace = False)

df = df[~df['asset number'].str.contains("TECHNOLOGIES, INC", na=False)]

df.to_excel("abi_output.xlsx")

And besides that, file has 600k rows and it loads so slow to see the output. Do you have any advice for it?

Thank you!

@Kenan - thank you for your answer. Now the code looks like below but it still doesn't remove rows which contains in chosen column specified stirngs. I also attached screenshot of the output to show you that the rows still exist. Any thoughts?

import modin.pandas as pd


File1 = 'abi.xlsx'

df = pd.read_excel(File1, sheet_name = 'US JERL Dec-19', usecols=['asset number','Cost','accumulated depr'])


several_strings = ['', 'TECHNOLOGIES', 'COST CENTER', 'Account', '/16']

df = df[~df['asset number'].isin(several_strings)]


df.to_excel("abi_output.xlsx")

rows still are not deleted @Andy I attach sample of the input file. I just changed the numbers in two columns because these are confidential and removed not needed columns (removing them with code wasn't a problem).

Here is the link. Let me know if this is not working properly. enter link description here

Kenan

You can combine your first two steps with:

df = pd.read_excel(File1, sheet_name = 'US JERL Dec-19', usecols=['asset number','Cost','accumulated depr'])

I assume this is what your trying to remove several_strings = ['TECHNOLOGIES, INC','blah','blah']

df = df[~df['asset number'].isin(several_string)]
df.to_excel("abi_output.xlsx")

Update Based on the link you provided this might be a better approach

df = df[df['asset number'].str.len().eq(7)]

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Removing Excel rows by search criteria

Summarizing Only Rows with given criteria

Removing a combination from a list given criteria (Python)

Removing duplicate rows in panel data by criteria

Pandas Identifying and Removing Similar/Duplicate Rows by Criteria

Removing Rows based on criteria using python

Removing rows based on certain criteria in a column number

Select random rows according to a given criteria PostgreSQL

Drop index given criteria on multiple rows

Removing rows in a data frame based on multiple criteria in R

Excel - Counting cells with given text only in rows and columns meeting criteria?

Removing random rows from a data frame until count is equal some criteria

Expand row given criteria

Sum based on a given criteria

mysql get rows that doesn't exist on a given criteria relative to another value

how to pick specific table rows by comparing its inner table cells with given criteria?

Removing list in list by criteria with numpy

Removing list components based on a criteria

How to find rows in excel that that only match a given criteria across all rows where one specific field has duplicates

How to subset a data frame by removing all rows from columns with a given string, and value less than X?

Count Rows with Multiple Criteria

Deleting rows based on criteria

Copy rows on criteria VBA

Drop rows after criteria

Split a given std::variant type by a given criteria

Count subdocuments that match a given criteria

Removing unique rows in Python

Removing duplicate rows with condition

Removing csv rows in Python