Scikit-learn - 我在预测什么？

用户14928608

我的目标是根据六列的 csv 数据预测数组中的五到六个数字。下面的脚本应该只从 5 的数组中预测一个数字。我认为我可以从那里计算到整个 5 或 6，但我可能错了。

先生：

import csv
import numpy as np 
import pandas as pd
from math import sqrt
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.preprocessing import StandardScaler

df = pd.read_csv('subdata.csv')

ft = [9,8,15,4,6]

fintest = np.array(ft)

def train():

    df.astype(np.float64)
    df.drop(['One'], axis = 1)
    X = df
    y = X['One']

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=123)

    scaler = StandardScaler()
    train_scaled = scaler.fit_transform(X_train)
    test_scaled = scaler.transform(X_test)

    tree_model = DecisionTreeRegressor()
    rf_model = RandomForestRegressor()

    tree_model.fit(train_scaled, y_train)
    rf_model.fit(train_scaled, y_train)

    rfp = rf_model.predict(fintest.reshape(1, -1))
    tmp = tree_model.predict(fintest.reshape(1, -1))

    print(rfp)
    print(tmp)

train()

你能否澄清一下，我要求这个脚本在最后rfp和tmp几行中预测什么？

我的数据如下所示：

目前的脚本给出了一个错误：

    Traceback (most recent call last):
  File "C:\Users\conra\Desktop\Code\lotto\pie.py", line 43, in <module>
    train()
  File "C:\Users\conra\Desktop\Code\lotto\pie.py", line 37, in train
    rfp = rf_model.predict(fintest.reshape(1, -1))
  File "C:\Users\conra\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\ensemble\_forest.py", line 784, in predict
    X = self._validate_X_predict(X)
  File "C:\Users\conra\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\ensemble\_forest.py", line 422, in _validate_X_predict
    return self.estimators_[0]._validate_X_predict(X, check_input=True)
  File "C:\Users\conra\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\tree\_classes.py", line 402, in _validate_X_predict
    X = self._validate_data(X, dtype=DTYPE, accept_sparse="csr",
  File "C:\Users\conra\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\base.py", line 437, in _validate_data
    self._check_n_features(X, reset=reset)
  File "C:\Users\conra\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\base.py", line 365, in _check_n_features
    raise ValueError(
ValueError: X has 5 features, but DecisionTreeRegressor is expecting 6 features as input.

通过在ft数组中添加第六位数字，我可以绕过这个错误并收到非常不准确的输出，这些输出似乎与数据没有任何关联。例如，通过设置变量ft来[9,8,15,4,6,2]作为第一行中的CSV文件，设定x和y为使用“四”的标签; 我得到的输出[37.22]和[37.]。

我的其他问题可能会由我的第一个回答。但他们在这里：

您能否还澄清一下为什么我需要传递一个 6 的数组？

为什么我的预测如此接近（全部约 35 个），无论我为预测传递什么数组？

液晶显示器

您定义 X 的方式是错误的。它包含 6 个功能。

您的 y 以您定义的方式包含在您的 X 中：

X = df #6 features
y = X['One'] #1 feature

我想你想做的是这样的：

X = df[['Two', 'Three', 'Four', 'Five', 'Zero']]
y = df['One']

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-08-20

我来说两句

0 条评论

登录后参与评论

Scikit-learn - 我在预测什么？

Scikit-learn - 我在预测什么？

蓝屏死机没有修复解决方案

计算数据帧中每行的NA

UITableView的项目向下滚动后更改颜色，然后快速备份

Node.js中未捕获的异常错误，发生调用

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

Linux的官方Adobe Flash存储库是否已过时？

验证REST API参数

ggplot：对齐多个分面图-所有大小不同的分面

Mac OS X更新后的GRUB 2问题

通过 Git 在运行 Jenkins 作业时获取 ClassNotFoundException

带有错误“ where”条件的查询如何返回结果？

用日期数据透视表和日期顺序查询

VB.net将2条特定行导出到DataGridView

如何从视图一次更新多行（ASP.NET - Core）

Java Eclipse中的错误13，如何解决？

尝试反复更改屏幕上按钮的位置 - kotlin android studio

离子动态工具栏背景色

应用发明者仅从列表中选择一个随机项一次

当我尝试下载 StanfordNLP en 模型时，出现错误

python中的boto3文件上传

在同一Pushwoosh应用程序上Pushwoosh多个捆绑ID