我有一个小的数据集,由于某种原因,输出与Excel的不匹配。
这就是我所做的。我必须专栏:
行驶里程 | 旅行时间 |
---|---|
89 | 7.0 |
66 | 5.4 |
78 | 6.6 |
111 | 7.4 |
44 | 4.8 |
77 | 6.4 |
80 | 7.0 |
66 | 5.6 |
109 | 7.3 |
76 | 6.4 |
这是我在Google表格上获得的输出:
坡 | 截距 | |
---|---|---|
系数 | 0.04025678079 | 3.185560249 |
标准误差 | 0.005706415564 | 0.4669507938 |
R平方,标准误 | 0.8615153295 | 0.3423088398 |
统计 | 49.76812677 | 8 |
回归SS /残留SS | 5.831597265 | 0.9374027345 |
此输出也与excel输出匹配。
但是,当我在statsmodel上执行以下操作时:
milesTravelled = [89.0, 66.0, 78.0, 111.0, 44.0, 77.0, 80.0, 66.0, 109.0, 76.0]
travelTime = [7.0, 5.4, 6.6, 7.4, 4.8, 6.4, 7.0, 5.6, 7.3, 6.4]
model = sm.OLS(travelTime, milesTraveled).fit()
print(model.summary())
我得到以下内容:
OLS Regression Results
=======================================================================================
Dep. Variable: Travel Time R-squared (uncentered): 0.985
Model: OLS Adj. R-squared (uncentered): 0.983
Method: Least Squares F-statistic: 575.6
Date: Mon, 01 Feb 2021 Prob (F-statistic): 1.82e-09
Time: 10:18:44 Log-Likelihood: -11.951
No. Observations: 10 AIC: 25.90
Df Residuals: 9 BIC: 26.20
Df Model: 1
Covariance Type: nonrobust
==================================================================================
coef std err t P>|t| [0.025 0.975]
----------------------------------------------------------------------------------
Miles Traveled 0.0781 0.003 23.991 0.000 0.071 0.085
==============================================================================
Omnibus: 2.179 Durbin-Watson: 2.654
Prob(Omnibus): 0.336 Jarque-Bera (JB): 1.033
Skew: -0.777 Prob(JB): 0.597
Kurtosis: 2.741 Cond. No. 1.00
==============================================================================
如您所见,标准误差,R平方等的值根本与Google Sheet / Excel不匹配。我究竟做错了什么?如何获得确切的结果摘要(例如Google Sheet / Excel)?
默认情况下,OLS
该类在线性模型中不包含常数项。您可以用来sm.add_constant
为创建适当的exog
参数OLS
:
In [36]: milesTraveled = [89.0, 66.0, 78.0, 111.0, 44.0, 77.0, 80.0, 66.0, 109.0, 76.0]
In [37]: travelTime = [7.0, 5.4, 6.6, 7.4, 4.8, 6.4, 7.0, 5.6, 7.3, 6.4]
In [38]: X = sm.add_constant(milesTraveled)
In [39]: model = sm.OLS(travelTime, X).fit()
In [40]: print(model.summary())
/Users/warren/a2020.11/lib/python3.8/site-packages/scipy/stats/stats.py:1603: UserWarning: kurtosistest only valid for n>=20 ... continuing anyway, n=10
warnings.warn("kurtosistest only valid for n>=20 ... continuing "
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.862
Model: OLS Adj. R-squared: 0.844
Method: Least Squares F-statistic: 49.77
Date: Mon, 01 Feb 2021 Prob (F-statistic): 0.000107
Time: 13:04:53 Log-Likelihood: -2.3532
No. Observations: 10 AIC: 8.706
Df Residuals: 8 BIC: 9.312
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 3.1856 0.467 6.822 0.000 2.109 4.262
x1 0.0403 0.006 7.055 0.000 0.027 0.053
==============================================================================
Omnibus: 0.542 Durbin-Watson: 2.608
Prob(Omnibus): 0.763 Jarque-Bera (JB): 0.554
Skew: 0.370 Prob(JB): 0.758
Kurtosis: 2.115 Cond. No. 353.
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句