Anova test in Python with a very large number of Groups

artemis Published at Dev

Artemis

I have a relatively big dataset (approx 273,744 records) containing among others names of people and the dioptrics power they use:

Name   | Dioptric | Gender | Town |
-----------------------------------
'John' |  0.25    |   M    |  A   |
'Jack' |  0.5     |   M    |  C   |
'John' |  25      |   M    |  A   |
'Mary' |  0.25    |   F    |  C   |
........

I need to find if there is a correlation between name and dioptrics power. I decided to use the ANOVA test since there is one categorial and one quantitative variable. My problem is that the dataset contains a large number of name-dioptric groups (around 21,000) therefore I am not realy sure how to implement the

stats.f_oneway( Name_Dioptrics_GroupA, Name_Dioptrics_GroupB,....)

What I have done so far is:

imported data as a numpy dataframe from the csv
attempt to group based on name-dioptrics


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as stats

# read data
data = pd.read_csv("dioptrics-to-name.csv")

# prepare data
dioptrics = data['value']
name = data['firstName']

"""
group based on name-dioptrics power
"""
name_dioptric_frame = pd.DataFrame({"Name":name,"dioptrics":dioptrics})
name_dioptrics_groups = name_dioptric_frame.groupby("Name").groups

## break into name-dioptrics groups
## name_dioptrics_GroupA = dioptrics[name_dioptrics_groups["John"]]
## name_dioptrics_GroupB = dioptrics[name_dioptrics_groups["Jamie"]] 
## and so on ....

print(stats.f_oneway( dioptrics[name_dioptrics_groups[ name_dioptrics_groups.keys()] ]) ) 
print(stats.f_oneway( dioptrics[name_dioptrics_groups[ [ name for x in name_dioptrics_groups() ] ] ]) )

It doesn't work of course... Am I taking a correct approach here?

vurmux

Pandas groupby function allows you to group your dataframe by several columns. You can use this feature if you use a list of columns instead of one column:

df = pd.DataFrame([
    ['WAKA', 2, '1'],
    ['WAKA-WAKA', 3, '7'],
    ['WAKKA', 1, '0'],
    ['WAKA', 2, '1'],
    ['WAKA-WAKA', 1, '7'],
    ['WAKKA', 1, '1'],
    ['WAKA', 5, '1'],
    ['WAKA-WAKA', 3, '7'],
    ['WAKKA', 1, '2'],
])
df.columns = ['name', 'd', 'info']

df.groupby(['name', 'd']).groups

Will return:

{('WAKA', 2): Int64Index([0, 3], dtype='int64'),
 ('WAKA', 5): Int64Index([6], dtype='int64'),
 ('WAKA-WAKA', 1): Int64Index([4], dtype='int64'),
 ('WAKA-WAKA', 3): Int64Index([1, 7], dtype='int64'),
 ('WAKKA', 1): Int64Index([2, 5, 8], dtype='int64')}

In your code you are trying to group by only name, without dioptrics.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2020-12-5

Comments

0 comments

TOP Ranking

Article

Anova test in Python with a very large number of Groups

Anova test in Python with a very large number of Groups

pump.io port in URL

Failed to listen on localhost:8000 (reason: Cannot assign requested address)

How to import an asset in swift using Bundle.main.path() in a react-native native module

Inner Loop design for webscrapping

Can't pre-populate phone number and message body in SMS link on iPhones when SMS app is not running in the background

ggplotly no applicable method for 'plotly_build' applied to an object of class "NULL" if statements

mysql.connector.errors.InterfaceError: 2003: Can't connect to MySQL server on '127.0.0.1:3306' (111 Connection refused)

Removed zsh, but forgot to change shell back to bash, and now Ubuntu crashes (wsl)

Ambiguous use of 'init' with CFStringTransform and Swift 3

Resetting Value of <input type="time"> in Firefox

Execute ./script.sh with a crontab

Converting a class method to a property with a backing field

Spring Boot JPA PostgreSQL Web App - Internal Authentication Error

How to update azerothcore-wotlk docker container

How to set tab order for array of cluster,where cluster elements have different data types in LabVIEW?

Grails with Oracle thick OCI driver authenticate to Oracle with wrong user

How to pass data to the ng2-bs3-modal?

Making Array From Page Elements in jQuery

Retrieve Element Tag Value XML Using Bash

Laravel's ORM sync with timestamps doesn't update timestamps

Do animations stop css changes after animation completion?