我有一个包含以下列的数据集:偏离党派、民主党、与党在社会问题上的分歧、该受访者的 gss 年份。偏离党派、民主党、与党在社会问题上的分歧在 Object 数据类型中可用,因此我必须将它们转换为字符串以将它们编码为数字数据。
“此受访者的 gss 年份”包含 1970 年至 2000 年间的年份,并以 int 数据类型提供,因此我不会将其转换为字符串以执行编码。以下是我正在使用的代码:
#importing libraires
import pandas as pd
from sklearn.preprocessing import LabelEncoder
#importing data sets
df = pd.read_excel('sec3_data.xlsx')
df.fillna(0, inplace=True)
#converting categorical data to numeric data.
df['Deviation from Partisanship'] = df['Deviation from Partisanship'].astype('str')
le = preprocessing.LabelEncoder()
df['Deviation from Partisanship'] = le.fit_transform(df['Deviation from Partisanship'])
df['Democrat'] = df['Democrat'].astype('str')
le = preprocessing.LabelEncoder()
df['Democrat'] = le.fit_transform(df['Democrat'])
df['Disagreement with Party on Social Issues'] = df['Disagreement with Party on Social Issues'].astype('str')
le = preprocessing.LabelEncoder()
df['Disagreement with Party on Social Issues'] = le.fit_transform(df['Disagreement with Party on Social Issues'])
le = preprocessing.LabelEncoder()
df['gss year for this respondent'] = le.fit_transform(df['gss year for this respondent'])
pd.set_option('display.max_rows', 164)
df
当我运行此代码时,它给了我以下错误:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2896 try:
-> 2897 return self._engine.get_loc(key)
2898 except KeyError:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'gss year for this respondent'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-44-fbfcad1e7a05> in <module>
13
14 le = preprocessing.LabelEncoder()
---> 15 df['gss year for this respondent'] = le.fit_transform(df['gss year for this respondent'])
16
17 pd.set_option('display.max_rows', 164)
~\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
2978 if self.columns.nlevels > 1:
2979 return self._getitem_multilevel(key)
-> 2980 indexer = self.columns.get_loc(key)
2981 if is_integer(indexer):
2982 indexer = [indexer]
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2897 return self._engine.get_loc(key)
2898 except KeyError:
-> 2899 return self._engine.get_loc(self._maybe_cast_indexer(key))
2900 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2901 if indexer.ndim > 1 or indexer.size > 1:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'gss year for this respondent'
知道为什么我会收到此错误吗?
重新发布我自己的评论作为答案以快速使他人受益。
这种类型的错误与引用不存在的 DataFrame 列一致。您可以通过 快速检查您引用的列是否存在于您的 DataFrame 中'<COLUMN_NAME>' in df.columns
。如果该列存在,它应该返回 True。
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句