从具有多个页面的网站抓取数据

猫王

https://finviz.com/screener.ashx?v=152&f=cap_midover&c=1,16,17,18,65

我想使用VBA从上面的网站中抓取数据,以便获得5个我想要的列(股票行情,EPS,EPS这个Y,EPS下一个Y,价格)。有99个页面需要循环浏览,每个页面有20个行情自动收录器,这意味着我需要抓取将近2000行的数据。我可以使用PowerQuery做到这一点,但是如果我使用PowerQuery,似乎需要3分钟左右的时间来刷新数据。

我不确定是否使用VBA抓取数据是否可以加快数据刷新的时间,否则是否希望获得帮助。我是VBA的新手,下面是我的代码,该代码为我提供整个网站页面的输出(不是我想要的),并且该代码不会在1-99的不同页面之间循环。

Sub GetFinvizData()
 
Dim str As String
 
'Delete existing data
Sheets("Data").Activate 'Name of sheet the data will be downloaded into. Change as required.
Range("A1").Select
Range(Selection, Selection.End(xlToRight)).Select
Range(Selection, Selection.End(xlDown)).Select
Selection.ClearContents
 
'Download stock quotes. Be patient - takes a few seconds.
str = "https://finviz.com/screener.ashx?v=152&f=cap_midover&r=1&c=1,16,17,18,65"
QueryQuote:
            With Sheets("Data").QueryTables.Add(Connection:="URL;" & str, Destination:=Sheets("Data").Range("a1"))
                .BackgroundQuery = True
                .TablesOnlyFromHTML = False
                .refresh BackgroundQuery:=False
                .SaveData = True
            End With
 
Sheets("Data").Range("a1").CurrentRegion.TextToColumns Destination:=Sheets("Data").Range("a1"), DataType:=xlDelimited, _
TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=False, Tab:=False, _
Semicolon:=False, Comma:=True, Space:=False, other:=True, OtherChar:=",", FieldInfo:=Array(1, 2)
 
Sheets("Data").Columns("A:B").ColumnWidth = 12
Range("A1").Select
 
End Sub

希望得到一些建议,谢谢!

SIM卡

尝试以下操作以获取该站点所有页面上的上述字段:

Option Explicit
Sub FetchTabularData()
    Const base$ = "https://finviz.com/"
    Dim elem As Object, S$, R&, oPage As Object, nextPage$
    Dim Http As Object, Html As Object, ws As Worksheet, Url$
    
    Set ws = ThisWorkbook.Worksheets("Data")
    Set Http = CreateObject("MSXML2.XMLHTTP")
    Set Html = CreateObject("HTMLFile")
    
    Url = "https://finviz.com/screener.ashx?v=152&f=cap_midover&r=1&c=1,16,17,18,65"
    
    ws.Range("A1:E1") = Array("Ticker", "EPS", "EPS This Y", "EPS Next Y", "Price")
    
    R = 1
    
    While Url <> ""
        With Http
            .Open "GET", Url, False
            .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36"
            .send
            S = .responseText
        End With
    
        With Html
            .body.innerHTML = S
            For Each elem In .getElementById("screener-content").getElementsByTagName("tr")
                If InStr(elem.className, "table-dark-row-cp") > 0 Or InStr(elem.className, "table-light-row-cp") > 0 Then
                    R = R + 1: ws.Cells(R, 1) = elem.Children(0).innerText
                    ws.Cells(R, 2) = elem.Children(1).innerText
                    ws.Cells(R, 3) = elem.Children(2).innerText
                    ws.Cells(R, 4) = elem.Children(3).innerText
                    ws.Cells(R, 5) = elem.Children(4).innerText
                End If
            Next elem
            
            Url = vbNullString
            
            For Each oPage In .getElementsByTagName("a")
                If InStr(oPage.className, "tab-link") And InStr(oPage.innerText, "next") > 0 Then
                    nextPage = oPage.getAttribute("href")
                    Url = base & Replace(nextPage, "about:", "")
                End If
            Next oPage
        End With
    Wend
End Sub

您无需向参考库添加任何内容即可执行上述脚本。

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章