我有一个包含数千个文件的文件夹。我正在尝试使用beautifulsoup4解析其中的XML标记。
我能够分别为每个文件执行此操作,但是无法使用for循环使脚本工作。
到目前为止,这是我的代码:
import bs4 as bs
import glob
path = r"~/Desktop/pythontest/*.txt"
files = glob.glob(path)
# ------------------------READ AND PARSE TEXT-----------------------------------------
for f in files:
# open file in read mode
source = open(f, "rt")
# parse xml as soup
soup = bs.BeautifulSoup(source, "lxml")
soupText = soup.get_text()
text = soupText.replace(r"\n", " ")
# close file
source.close()
# --------------------------OVERWRITE FILE---------------------------------------------
for f in files:
# open file in write mode
source = open(f, "wt")
# overwrite the file with the soup
source.write((text))
# # close file
source.close()
print(text)
当我运行它时,控制台会显示以下信息:
Traceback (most recent call last):
File "./camltest.py", line 34, in <module>
print(text)
NameError: name 'text' is not defined
我怀疑这是一个范围问题,但无法解决。有什么建议?谢谢
您可以在同一循环中简单地读取然后写入文件。
for f in files:
source = open(f, "w+")
soup = bs.BeautifulSoup(source, "lxml")
soupText = soup.get_text()
text = soupText.replace(r"\n", " ")
source.write(text)
source.close()
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句