使用python读取.gz文件的内容

Adam 发表于 Dev

亚当

我是Python新手，在读取.gz文件内容时遇到问题：

我有一个充满.gz文件的文件夹，该文件是使用私有API通过编程方式提取的。每个.gz文件的内容都是一个.xml文件，因此我需要遍历该目录并提取它们。

问题是当我以编程方式将这些.gz文件提取到各自的.xml版本中时...文件创建无误，并且当我打开其中一个文件时（使用TextWrangler），它看起来像是常规的.xml文件，但是当我在其中查看该文件时却没有十六进制编辑器。另外，当我以编程方式打开.xml文件并打印其内容时，它显示为一堆（二进制？）混乱的文本。

考虑到以上几点，如果我手动提取其中一个文件（即：使用OSX，但不使用Python），则可以在十六进制编辑器中查看该文件，就像我期望的那样。

这是我的代码段（未显示适当的导入，但它们是glob和gzip）：

searchpattern = siteid + "_" + resource + "_*.gz"
for infile in glob.glob(workingDir + searchpattern):
    print infile

    #read the zipped contents  (https://docs.python.org/2/library/gzip.html)
    f = gzip.open(infile, 'rb')
    file_content = f.read()
    file_content = str(file_content) #This was an attempt to fix
    print file_content #  This shows a bunch of mumbo jumbo

    #write the contents we just read to a new file (uncompressed)
    newfilename = infile[0:-3] # the filename without the ".gz"
    newfilename = newfilename + ".xml"
    fnew = open(newfilename, 'w+b')
    fnew.write(str(file_content))
    fnew.close()

    #delete the .gz version of the file
    #os.remove(infile)

亚当

因此，这对我而言确实是一个愚蠢的错误，但是我会将其发布为对其他犯过与我相同的错误的人的跟进。

问题是我正在压缩程序中早先已压缩的内容。因此，考虑到这一点，我在该线程上的代码片段没有任何问题。我创建（.gz）文件的代码也没有（技术上）。如您在下面看到的。正常地打开文件，而不是使用程序前面的gzip库打开文件就可以了。

    #Download and write the contents of each response to a .gz file
    if limitCounter < limit or int(limit) == 0:
        print _name + "  " + scopeStartDate + " through " + scopeEndDate + " at " + href
        file = api.get(href)
        gz_file_content = file.content
        #gz_file = gzip.open(workingDir + _name, "wb") # This breaks the program later
        gz_file = open(workingDir + _name, 'wb') # This works.
        gz_file.write(gz_file_content)
        gz_file.close()

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-03-27

我来说两句

0 条评论

登录后参与评论

TOP 榜单

文章

使用python读取.gz文件的内容

使用python读取.gz文件的内容

Linux的官方Adobe Flash存储库是否已过时？

如何使用HttpClient的在使用SSL证书，无论多么“糟糕”是

错误：“ javac”未被识别为内部或外部命令，

Modbus Python施耐德PM5300

为什么Object.hashCode（）不遵循Java代码约定

如何正确比较 scala.xml 节点？

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

在令牌内联程序集错误之前预期为 ')'

数据表中有多个子行，asp.net核心中来自sql server的数据

VBA 自动化错误：-2147221080 (800401a8)

错误TS2365：运算符'！=='无法应用于类型'“（”'和'“）”'

如何在JavaScript中获取数组的第n个元素？

检查嵌套列表中的长度是否相同

如何将sklearn.naive_bayes与（多个）分类功能一起使用？

ValueError：尝试同时迭代两个列表时，解包的值太多（预期为 2）

ES5的代理替代

在同一Pushwoosh应用程序上Pushwoosh多个捆绑ID

如何监视应用程序而不是单个进程的CPU使用率？

如何检查字符串输入的格式

解决类Koin的实例时出错

如何自动选择正确的键盘布局？-仅具有一个键盘布局