我的目标是根据六列的 csv 数据预测数组中的五到六个数字。下面的脚本应该只从 5 的数组中预测一个数字。我认为我可以从那里计算到整个 5 或 6,但我可能错了。
先生:
import csv
import numpy as np
import pandas as pd
from math import sqrt
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.preprocessing import StandardScaler
df = pd.read_csv('subdata.csv')
ft = [9,8,15,4,6]
fintest = np.array(ft)
def train():
df.astype(np.float64)
df.drop(['One'], axis = 1)
X = df
y = X['One']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=123)
scaler = StandardScaler()
train_scaled = scaler.fit_transform(X_train)
test_scaled = scaler.transform(X_test)
tree_model = DecisionTreeRegressor()
rf_model = RandomForestRegressor()
tree_model.fit(train_scaled, y_train)
rf_model.fit(train_scaled, y_train)
rfp = rf_model.predict(fintest.reshape(1, -1))
tmp = tree_model.predict(fintest.reshape(1, -1))
print(rfp)
print(tmp)
train()
你能否澄清一下,我要求这个脚本在最后rfp
和tmp
几行中预测什么?
我的数据如下所示:
目前的脚本给出了一个错误:
Traceback (most recent call last):
File "C:\Users\conra\Desktop\Code\lotto\pie.py", line 43, in <module>
train()
File "C:\Users\conra\Desktop\Code\lotto\pie.py", line 37, in train
rfp = rf_model.predict(fintest.reshape(1, -1))
File "C:\Users\conra\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\ensemble\_forest.py", line 784, in predict
X = self._validate_X_predict(X)
File "C:\Users\conra\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\ensemble\_forest.py", line 422, in _validate_X_predict
return self.estimators_[0]._validate_X_predict(X, check_input=True)
File "C:\Users\conra\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\tree\_classes.py", line 402, in _validate_X_predict
X = self._validate_data(X, dtype=DTYPE, accept_sparse="csr",
File "C:\Users\conra\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\base.py", line 437, in _validate_data
self._check_n_features(X, reset=reset)
File "C:\Users\conra\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\base.py", line 365, in _check_n_features
raise ValueError(
ValueError: X has 5 features, but DecisionTreeRegressor is expecting 6 features as input.
通过在ft
数组中添加第六位数字,我可以绕过这个错误并收到非常不准确的输出,这些输出似乎与数据没有任何关联。例如,通过设置变量ft
来[9,8,15,4,6,2]
作为第一行中的CSV文件,设定x和y为使用“四”的标签; 我得到的输出[37.22]
和[37.]
。
我的其他问题可能会由我的第一个回答。但他们在这里:
您能否还澄清一下为什么我需要传递一个 6 的数组?
为什么我的预测如此接近(全部约 35 个),无论我为预测传递什么数组?
您定义 X 的方式是错误的。它包含 6 个功能。
您的 y 以您定义的方式包含在您的 X 中:
X = df #6 features
y = X['One'] #1 feature
我想你想做的是这样的:
X = df[['Two', 'Three', 'Four', 'Five', 'Zero']]
y = df['One']
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句