在 jupyter notebook 中加载数据集

Obay 发表于 Dev

欧贝

我正在尝试在 jupyter notebook 中下载和加载数据集，但出现问题，这是代码：

import os
import tarfile
from six.moves import urllib

DOWNLOAD_ROOT = "https://raw.githubusercontent.com/ageron/handson-ml/master/"
HOUSING_PATH = os.path.join("datasets", "housing")
HOUSING_URL = DOWNLOAD_ROOT + "datasets/housing/housing.tgz"

def fetch_housing_data(housing_url=HOUSING_URL, housing_path=HOUSING_PATH):
    if not os.path.isdir(housing_path):
        os.makedirs(housing_path)
    tgz_path = os.path.join(housing_path, "housing.tgz")
    urllib.request.urlretrieve(housing_url, tgz_path)
    housing_tgz = tarfile.open(tgz_path)
    housing_tgz.extractall(path=housing_path)
    housing_tgz.close()

import pandas as pd
def load_housing_data(housing_path=HOUSING_PATH):
    csv_path = os.path.join(housing_path, "housing.csv")
    return pd.read_csv(csv_path)

housing = load_housing_data()
housing.head()

运行上述代码后，我收到此错误：

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-5-6a9011700846> in <module>
----> 1 housing = load_housing_data()
      2 housing.head()

<ipython-input-4-4d0bff7b3608> in load_housing_data(housing_path)
      2 def load_housing_data(housing_path=HOUSING_PATH):
      3     csv_path = os.path.join(housing_path, "housing.csv")
----> 4     return pd.read_csv(csv_path)

~/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, doublequote, delim_whitespace, low_memory, memory_map, float_precision)
    676                     skip_blank_lines=skip_blank_lines)
    677 
--> 678         return _read(filepath_or_buffer, kwds)
    679 
    680     parser_f.__name__ = name

~/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    438 
    439     # Create the parser.
--> 440     parser = TextFileReader(filepath_or_buffer, **kwds)
    441 
    442     if chunksize or iterator:

~/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
    785             self.options['has_index_names'] = kwds['has_index_names']
    786 
--> 787         self._make_engine(self.engine)
    788 
    789     def close(self):

~/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
   1012     def _make_engine(self, engine='c'):
   1013         if engine == 'c':
-> 1014             self._engine = CParserWrapper(self.f, **self.options)
   1015         else:
   1016             if engine == 'python':

~/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
   1706         kwds['usecols'] = self.usecols
   1707 
-> 1708         self._reader = parsers.TextReader(src, **kwds)
   1709 
   1710         passed_names = self.names is None

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()

FileNotFoundError: File b'datasets/housing/housing.csv' does not exist

我尝试手动下载数据并将 .CSV 文件添加到同一文件夹中，使用以下代码可以正常工作：

import pandas as pd
import numpy as np

pd.read_csv('housing.csv', delimiter = ',')

我的问题是关于第一个编码有什么问题？，如果有人能解释一下，我将不胜感激。顺便说一下，我使用的是 Mac 10.14。

注意：该编码是“Hands on Machine Learning with Scikit Learn and Tensorflow”一书中的示例

奥马尔阿尔基西

def fetch_housing_data()没有被调用，所以没有目录或下载的文件。你需要调用fetch_housing_data()在体内def load_housing_data

像这样：

def load_housing_data(housing_path=HOUSING_PATH):
    # missing function call to fetch the data
    fetch_housing_data()
    csv_path = os.path.join(housing_path, "housing.csv")
    return pd.read_csv(csv_path)

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-07-15

我来说两句

0 条评论

登录后参与评论

上一篇：從某個位置開始檢查字符串是否與給定的字符串匹配 javascript

在 jupyter notebook 中加载数据集

在 jupyter notebook 中加载数据集

验证REST API参数

带有错误“ where”条件的查询如何返回结果？

使用SciPy的最小值来找到图中的最短路径

OpenGL纹理格式的颜色错误

未捕获的SyntaxError：带有Ajax帖子的意外令牌u

如何清除已撤销的GPG密钥？

OpenCv：改变 putText() 的位置

Python PIL putdata颜色必须为int或tuple

如何通过 iOS SDK 通过蓝牙将字体发送到 Zebra 打印机 (Zebra imz320)

如何从JavaScript中的MP3文件读取元数据属性？

如何根据Azure中的部署名称删除所有部署的资源

IE 11中的FormData未定义

混乱的EFI分区，启动时没有启动选项

如何在R中转置数据

Redux动作正常，但减速器无效

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

超过时间限制错误C ++

如何使用HttpClient的在使用SSL证书，无论多么“糟糕”是

如何对treeView的子节点进行排序

去噪自动编码器和常规自动编码器有什么区别？

在where子句中使用AVG函数和DATEADD进行嵌套查询