为什么在使用BeautifulSoup刮擦表格标题以删除不需要的HTML时不能使用“ .text”

朱利安·阿姆琳（Julian Amrine）

当我运行这段代码时，我可以看到标题列表中填充了我想要的结果，但是它们被一些我不想保留的HTML包围着。

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup

# barchart.com uses javascript, so for now I need selenium to get full html
url = 'https://www.barchart.com/stocks/quotes/qqq/constituents'
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--disable-gpu")
browser = webdriver.Chrome(options=chrome_options)
browser.get(url)
page = browser.page_source

#  BeautifulSoup find table
soup = BeautifulSoup(page, 'lxml')
table = soup.find("table")
browser.quit()

# create list headers, then populate with th tagged cells
headers = []

for i in table.find_all('th'):
    title = i()
    headers.append(title)

所以我尝试了：

for i in table.find_all('th'):
    title = i.text()
    headers.append(title)

哪个回来了 "TypeError: 'str' object is not callable"

在某些示例文档中，这似乎可行，但是那里使用的Wikipedia表似乎比Barchart上的简单。有任何想法吗？

休伯特·格热斯科维克（Hubert Grzeskowiak）

正如@MendelG指出的那样，错误在于i.text()因为text是属性而不是函数。

另外，您也可以使用get_text()函数。

我还建议添加一个，strip()以消除文本周围多余的空格。或者，如果您要使用get_text()它，则内置此功能：

title = i.get_text(strip=True)

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup

# barchart.com uses javascript, so for now I need selenium to get full html
url = 'https://www.barchart.com/stocks/quotes/qqq/constituents'
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--disable-gpu")
browser = webdriver.Chrome(options=chrome_options)
browser.get(url)
page = browser.page_source

#  BeautifulSoup find table
soup = BeautifulSoup(page, 'lxml')
table = soup.find("table")
browser.quit()

# create list headers, then populate with th tagged cells
headers = []

for i in table.find_all('th'):
    title = i.text.strip()
    # Or alternatively:
    #title = i.get_text(strip=True)
    headers.append(title)

print(headers)

打印：

['Symbol', 'Name', '% Holding', 'Shares', 'Links']

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2021-03-4

我来说两句

0 条评论

登录后参与评论

上一篇：将Javascript日期从日期值转换为日期obj

如何快速删除不需要的 searchbar.text 字符？

为什么我不能使用 text-align: center 在 html 中将此文本类型的表单输入字段居中？

为什么我无法在Beautifulsoup中使用.text提取文本

Flutter 如何从 Text 小部件中删除不需要的填充？

为什么.text（）。replace（）删除我的html？

使用BeautifulSoup在Python中删除不需要的标签

为什么[“ text”] == [“ text”]错误？

为什么要编码标题内容类型：text / html; 在Outlook插件（带有GMAIL的SMTP服务器：SMTP.GMAIL.COM）中使用MailItem时缺少邮件？

使用BeautifulSoup刮擦URL

为什么在使用__syncthreads时我们不需要使用volatile变量

如何使用本地存储并在不需要时删除

为什么节点使用不需要导入？

为什么在使用GLFW时不需要加载着色器？

为什么在rc.local中使用sudo时不需要密码？

为什么在Linq表达式中使用String.contains时，不需要括号？

在EF Core中使用Include（或ThenInclude）时，为什么不需要指定类型？

为什么我的“ubuntu”用户在使用 sudo 时不需要密码

使用.net代码更新cshtml文件时，为什么不需要编译？

为什么返回表达式在不需要时使用分号？

为什么在使用 JUnit 编写测试类时不需要扩展 TestCase？

为什么在使用 Mongoose 时不需要 index.js 中的 MongoDB 常量？

为什么出于相同目的使用.html（）比.text（）这么快？

不需要时使用super有什么害处吗？

使用beautifulsoup get_text（）

使用beautifulsoup python在span类HTML中刮擦值

应该在不需要HTML时使用$ scope

何时使用 text/plain 而不是 text/html？

人们为什么继续使用“ text / css”？

为什么我有时需要使用JSON.stringify而有时不需要

TOP 榜单

文章

为什么在使用BeautifulSoup刮擦表格标题以删除不需要的HTML时不能使用“ .text”

为什么在使用BeautifulSoup刮擦表格标题以删除不需要的HTML时不能使用“ .text”

Android Studio Kotlin：提取为常量

IE 11中的FormData未定义

计算数据帧R中的字符串频率

如何在R中转置数据

如何使用Redux-Toolkit重置Redux Store

Excel 2016图表将增长与4个参数进行比较

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

未捕获的SyntaxError：带有Ajax帖子的意外令牌u

OpenCv：改变 putText() 的位置

ActiveModelSerializer仅显示关联的ID

算术中的c ++常量类型转换

如何开始为Ubuntu开发

将加号/减号添加到jQuery菜单

去噪自动编码器和常规自动编码器有什么区别？

获取并汇总所有关联的数据

OpenGL纹理格式的颜色错误

在 React Native Expo 中使用 react-redux 更改另一个键的值

http：// localhost：3000 /＃！/为什么我在localhost链接中得到“＃！/”。

TreeMap中的自定义排序

Redux动作正常，但减速器无效

如何对treeView的子节点进行排序