从Python中的HTML抓取嵌入式Google表格

爱国者_25

这对我来说比较棘手。我正在尝试从python的Google表格中提取嵌入式表。

这是链接

我不拥有这张纸,但可以公开获得。

到目前为止,这是我的代码,当我去输出标题时,显示为“”。任何帮助将不胜感激。最终目标是将该表转换为熊猫DF。多谢你们

import lxml.html as lh
import pandas as pd

url = 'https://docs.google.com/spreadsheets/u/0/d/e/2PACX-1vQ--HR_GTaiv2dxaVwIwWYzY2fXTSJJN0dugyQe_QJnZEpKm7bu5o7eh6javLIk2zj0qtnvjJPOyvu2/pubhtml/sheet?headers=false&gid=1503072727'

page = requests.get(url)

doc = lh.fromstring(page.content)

tr_elements = doc.xpath('//tr')

col = []
i = 0

for t in tr_elements[0]:
    i +=1
    name = t.text_content()
    print('%d:"%s"'%(i,name))
    col.append((name,[])) 
Python的Sherpa

好吧,如果您想将数据放入DataFrame中,则可以直接加载它:

df = pd.read_html('https://docs.google.com/spreadsheets/u/0/d/e/2PACX-1vQ--HR_GTaiv2dxaVwIwWYzY2fXTSJJN0dugyQe_QJnZEpKm7bu5o7eh6javLIk2zj0qtnvjJPOyvu2/pubhtml/sheet?headers=false&gid=1503072727', 
                  header=1)[0]
df.drop(columns='1', inplace=True)  # remove unnecessary index column called "1"

这将为您提供:

                               Target Ticker                   Acquirer  \
0       Acacia Communications Inc Com   ACIA      Cisco Systems Inc Com   
1  Advanced Disposal Services Inc Com   ADSW   Waste Management Inc Com   
2                    Allergan Plc Com    AGN             Abbvie Inc Com   
3           Ak Steel Holding Corp Com    AKS   Cleveland Cliffs Inc Com   
4      Td Ameritrade Holding Corp Com   AMTD  Schwab (Charles) Corp Com   

  Ticker.1 Current Price Take Over Price Price Diff % Diff Date Announced  \
0     CSCO        $68.79          $70.00      $1.21  1.76%       7/9/2019   
1       WM        $32.93          $33.15      $0.22  0.67%      4/15/2019   
2     ABBV       $197.05         $200.22      $3.17  1.61%      6/25/2019   
3      CLF         $2.98           $3.02      $0.04  1.34%      12/3/2019   
4     SCHW        $49.31          $51.27      $1.96  3.97%     11/25/2019   

  Deal Type  
0      Cash  
1      Cash  
2       C&S  
3     Stock  
4     Stock  

注意read_html返回一个列表。在这种情况下,只有1个DataFrame,因此我们可以引用第一个也是唯一的索引位置[0]

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章

TOP 榜单

热门标签

归档