How to select values with a condition on multiple columns and multiple rows in pandas (best practice)

guitarokh

I want to select (unique) values from one column in a pandas data frame based on conditions on multiple columns and multiple rows. Consider the following example data frame:

df = pd.DataFrame({'Developer': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
                   'Language': ['Java', 'Python', 'Python', 'Java', 'Python', 'Python', 'Java', 'Python', 'C++'],
                   'Skill_Level': [1, 3, 3, 3, 2, 3, 3, 1, 3],
                   'Version': ["x.x", "2.x", "3.x", "x.x", "2.x", "3.x", "x.x", "3.x", "x.x"]
                   })

    Developer    Language    Skill_Level    Version
0           A        Java              1        x.x
1           A      Python              3        2.x
2           A      Python              3        3.x
3           B        Java              3        x.x
4           B      Python              2        2.x
5           B      Python              3        3.x
6           C        Java              3        x.x
7           C      Python              1        3.x
8           C         C++              3        x.x

Now I want to find all developers who know Java with a skill level of at least 3 and also know Python (no matter the version) with a skill level of at least 2.

The way I solved it for now was by selecting one set based on the Java condition, another set based on the Python condition and then doing an inner merge to get the set of developers matching all conditions:

result_java_df = df[(df["Language"] == "Java") & (df["Skill_Level"] >= 3)][["Developer"]]
result_python_df = df[(df["Language"] == "Python") & (df["Skill_Level"] >= 2)][["Developer"]]
result_df = result_java_df.merge(result_python_df, on="Developer")
result_df = result_df.drop_duplicates()

    Developer
0   B

Is there a more "elegant" way to do this? I feel like I am overlooking smth. Especially if I want to select based on more row-based conditions (e.g. selecting developers who know 4 languages at certain skill levels) this will become quite convoluted, and of course justify writing a function to handle such selections. Hence I am wondering if pandas supports this somehow and I just didn't find that feature.

Acccumulation

When I ran

    qualified=    df.groupby("Developer").apply(
        lambda x: 
            any(
                    (x.Language == "Java") & 
                    (x.Skill_Level >=3)
                ) & 
            any(
                    (x.Language == "Python") & 
                    (x.Skill_Level >= 2))
        )

I got

Developer
A    False
B     True
C    False
dtype: bool

You can then subset with various methods, such as

[developer for developer,status in qualified.items() if status]

(returns a list)

qualified[qualified]

(returns a Series)

If you want to make it more general, you could do something like:

minimum_skill_levels = {"Java":3,
                    "Python":2}

qualified=    df.groupby("Developer").apply(
        lambda x: 
            all([any(
                    (x.Language == Language)&
                    (x.Skill_Level >= Skill_Level)
                    )
                 for Language, Skill_Level in minimum_skill_levels.items()
                 ])
        )

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2020-12-8

Comments

0 comments

TOP Ranking

Article

How to select values with a condition on multiple columns and multiple rows in pandas (best practice)

How to select values with a condition on multiple columns and multiple rows in pandas (best practice)

pump.io port in URL

How to import an asset in swift using Bundle.main.path() in a react-native native module

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

Double spacing in rmarkdown pdf

SQL Server : need add a dot before two last character

C++ 16 bit grayscale gradient image from 2D array

JMeter: Why get error when try to save test plan

JWT gives JsonWebTokenError "invalid token"

How to make thrown errors visible outside of a Promise?

How to tell if iOS Today Widget is being updated in the background?

Calling Doctrine clear() with an argument is deprecated

Capybara Selenium Chrome opens About Google Chrome

How to update azerothcore-wotlk docker container

Adding Ripple Effect to RecyclerView item

mysql.connector.errors.InterfaceError: 2003: Can't connect to MySQL server on '127.0.0.1:3306' (111 Connection refused)

Error while applying filter on dataframe - PySpark

Unable to add slack to bluemix project

MyPy fails dataclass argument with optional list of objects type

How can I validate and parse phone numbers to extract their country calling code and area code?

Single Sign-On in Spring by using SAML Extension and Shibboleth

python how to create many-to-many of lists inside one list