BeautifulSoup从锚标记中的脚本获取文本

Minial 发表于 Dev

极小

所以我有一个<tr>带有多个<td>作为其子字符串的标签。

<tr>
    <td align='center' class="row2">
        <a href="javascript:who_posted(4713426);">10</a>    
    </td>
    <td align="center" class="row2">
        <a href='https://forum.net/index.php?;showuser=17311'>xxboxx</a>
    </td>
    <td align="center" class="row2"> 
            <!--script type="text/javascript">
            s = "236".replace(/,/g,'');
            document.write(abbrNum(s,1));
            </script-->
            236
    </td>
</tr>

这是我当前的代码；我没有问题，但是我想尝试退出脚本，但是我尝试了其他类似的关于stackoverflow的问题所提供的方法。但我没有成功。

def extractDataFromRow2(_url, 'td', 'row2', 'align' , 'center'):
    try:
        for container in _url.find_all('td', {'class': 'row2','align': 'center''}):
            # get data from topic title in table cell
            replies_numb = container.select_one(
                'a[href^="javascript:]"').text
            print('there are ' + replies_numb + ' replies')
            topic_starter = container.next_sibling.text
            print('the owner of this topic is ' + topic_starter)
            for total_view in container.find('a', href=True, style=True):
                #total_view = container.select_one(style="background-color:").text
                #total_view = container.find(("td")["style"])
                #total_view = container.next_sibling.find_next_sibling/next_sibling
                #but they're not able to access the last one within <tr> tag
                print(total_view )
            if replies_numb and topic_starter is not None:
                dict_replies = {'Replies' : replies_numb}
                dict_topic_S = {'Topic_Starter' : topic_starter}
                list_1.append(dict_replies)
                list_2.append(dict_topic_S)
            else:
                print('no data')
    except Exception as e:
        print('Error.extractDataFromRow2:', e)
        return None

我试图从中获取数据的页面链接。

是否有更清洁的方法；我很高兴从给出的反馈中学习。

塞尔丘克

您共享的html代码可能不足以回答问题，因此我签出了您共享的url。这是刮桌子的方法。

from bs4 import BeautifulSoup
import requests

r = requests.get("https://forum.lowyat.net/ReviewsandGuides")

soup = BeautifulSoup(r.text, 'lxml')

index = 0
#First two rows of table is not data so we skip it. Last row of table is for searching we also skip it. Table contains 30 rows of data. That is why we are slicing list
for row in soup.select('table[cellspacing="1"] > tr')[2:32]:   
    replies = row.select_one('td:nth-of-type(4)').text.strip()
    topic_started = row.select_one('td:nth-of-type(5)').text.strip()
    total_views = row.select_one('td:nth-of-type(6)').text.strip()
    index +=1

    print(index,replies, topic_started, total_views)

结果是

1 148 blurjoey 9,992
2 10 xxboxx 263
3 18 JayceOoi 1,636
4 373 idoblu 54,589
5 237 blurjoey 16,101
6 526 JayceOoi 57,577
7 131 JayceOoi 34,354
8 24 blurjoey 4,261
9 2 JayceOoi 249
10 72 KeyMochi 26,622
11 7 champu 331
12 0 asunakirito 210
13 0 asunakirito 172
14 0 asunakirito 199
15 17 blurjoey 3,351
16 860 blurjoey 112,556
17 0 chennegan 174
18 0 goldfries 185
19 4 JayceOoi 601
20 2 JayceOoi 309
21 10 blurjoey 1,826
22 3 JayceOoi 398
23 4 squallz05 310
24 0 asunakirito 265
25 25 asunakirito 12,326
26 0 blurjoey 279
27 14 JayceOoi 2,092
28 0 chennegan 305
29 8 Pharamain 732
30 19 idoblu 1,273

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2020-12-21

我来说两句

0 条评论

登录后参与评论

上一篇：有什么办法解决错误HTTP ERROR 503？

如何从锚标记获取文本？

BeautifulSoup：从锚标记中提取文本

BeautifulSoup从锚标记中的脚本获取文本

BeautifulSoup从锚标记中的脚本获取文本

隐藏发件人没有短信PHP

材质UI垂直滑块。如何改变在垂直材料UI滑块导轨的厚度（反应）

在Windows 7中无法删除文件（2）

HttpClient中的角度变化检测

Azure VM启动/停止日志

如何在 Vb.net 中使用函数返回多个值

Powerpoint-条形长度错误的堆积条形图

最新歌剧断断续续的快速拨号和渲染错误

Mac OS X更新后的GRUB 2问题

需要公式以vlookup逗号分隔单个单元格中的值

Hashchange事件侦听器在将事件处理程序附加到事件之前进行侦听

ggplot：对齐多个分面图-所有大小不同的分面

OS X-为什么我需要打开WiFi才能确定最近的位置

用日期数据透视表和日期顺序查询

Java Eclipse中的错误13，如何解决？

如何在Django中使用UUID

加载Microsoft Visual菜单时出现问题

具有if条件的SQL UPDATE

从JSON到JSONL的Python转换

如何在Kod中更改字体？

共享图像将路径放入地址