如何使用python pandas打印相关功能？

Aly 发表于 Dev

艾莉

我正在尝试获取有关自变量的相关性的信息。

我的数据集有很多变量，因此热图不是解决方案，非常难以读取。

目前，我已经制作了一个仅返回高度相关变量的函数。我想以指示相关特征对的方式对其进行更改。

以下是其他说明：

def find_correlated_features(df, threshold, target_variable):

    df_1 = df.drop(target_variable)

    #corr_matrix has in index and columns names of variables
    corr_matrix = df_1.corr().abs()

    # I'm taking only half of this matrix to prevent doubling results
    half_of_matrix = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k = 1).astype(np.bool))

    # This prints list of columns which are correlated 
    to_drop = [column for column in half_of_matrix.columns if any(half_of_matrix[column] > threshold)]
    
    return to_drop

最好的方法是返回带有column_1的pandas数据帧；column_2; corr_coef仅超过阈值的变量。

像这样：

output = {'feature name 1': column_name,
          'feature name 2': index,
          'correlation coef': corr_coef}

output_list.append(output)
return pd.DataFrame(output_list).sort_values('corr_coef', ascending=False)

CainãMax Couto-Silva

编辑后：

在OP注释和@ user6386471回答之后，我再次阅读了该问题，我认为对相关矩阵进行简单的重组就可以了，而无需循环。喜欢half_of_matrix.stack().reset_index()加过滤器。看到：

def find_correlated_features(df, threshold, target_variable):
    # remove target column
    df = df.drop(columns=target_variable).copy()
    # Get correlation matrix
    corr_matrix = df.corr().abs()
    # Take half of the matrix to prevent doubling results
    corr_matrix = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k = 1).astype(np.bool))
    # Restructure correlation matrix to dataframe
    df = corr_matrix.stack().reset_index()
    df.columns = ['feature1', 'feature2', 'corr_coef']
    # Apply filter and sort coefficients
    df = df[df.corr_coef >= threshold].sort_values('corr_coef', ascending=False)
    return df

原始答案：

您可以轻松创建Series系数大于阈值的a，如下所示：

s = df.corr().loc[target_col]
s[s.abs() >= threshold]

这里df是你的数据框，target_col您的目标列，并且threshold，你知道，阈值。

例：

import pandas as pd
import seaborn as sns

df = sns.load_dataset('iris')

print(df.shape)
# -> (150, 5)

print(df.head())

   sepal_length  sepal_width  petal_length  petal_width species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa

def find_correlated_features(df, threshold, target_variable):
    s = df.corr().loc[target_variable].drop(target_variable)
    return s[s.abs() >= threshold]

find_correlated_features(df, .7, 'sepal_length')

输出：

petal_length    0.871754
petal_width     0.817941
Name: sepal_length, dtype: float64

您可以使用.to_frame()后跟.T输出来获取熊猫数据框。

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2020-11-29

我来说两句

0 条评论

登录后参与评论

上一篇：Weblogic服务器中的thymeleaf登录错误

如何正确使用打印功能

如何使用Python Pandas执行三个变量相关

如何打印与密码相关的输出？

如何打印功能的地址？

如何使用 Python 打印 Pandas 数据帧的单行？

如何使用python pandas根据日期打印一行？

*在python打印功能中

python打印功能错误

Python 打印功能失败

如何使用格式功能打印列表

如何使用Struct打印交换功能

如何使用重新定义的打印功能打印Lua表？

如何使用窗口打印功能打印条形码标签？

PYTHON：如何使用 Python 打印 SSH 密钥

如何分别打印每行相关的分数？

如何使用不同的功能名称打印功能

我如何覆盖“打印”按钮的打印功能

Python打印功能无法按顺序打印

如何使用QuickCheck测试数据库相关功能？

数据列为整数时如何使用日期相关功能

使用`Deref`时，如何访问目标的相关功能？

使用打印功能打印金字塔

python spark：使用PCA缩小最相关的功能

使用 python 和 django 或 react 制作相关文章功能

如何使用python检索相关列表

如何制作打印菜单的功能？

如何刷新打印功能的输出？

如何获得打印代码的功能？

TOP 榜单

文章

如何使用python pandas打印相关功能？

如何使用python pandas打印相关功能？

UITableView的项目向下滚动后更改颜色，然后快速备份

Linux的官方Adobe Flash存储库是否已过时？

用日期数据透视表和日期顺序查询

应用发明者仅从列表中选择一个随机项一次

Mac OS X更新后的GRUB 2问题

验证REST API参数

Java Eclipse中的错误13，如何解决？

带有错误“ where”条件的查询如何返回结果？

ggplot：对齐多个分面图-所有大小不同的分面

尝试反复更改屏幕上按钮的位置 - kotlin android studio

如何从视图一次更新多行（ASP.NET - Core）

计算数据帧中每行的NA

蓝屏死机没有修复解决方案

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

离子动态工具栏背景色

VB.net将2条特定行导出到DataGridView

通过 Git 在运行 Jenkins 作业时获取 ClassNotFoundException

在Windows 7中无法删除文件（2）

python中的boto3文件上传

当我尝试下载 StanfordNLP en 模型时，出现错误

Node.js中未捕获的异常错误，发生调用