使用VBA从网站抓取innerHTML

杰里米

我正在尝试声明一个节点数组(这不是问题),然后innerHTML在该数组的每个元素中抓取两个子节点的节点-以SE为例(使用IEobject方法),假设我正在尝试为了在首页上抓取标题和问题摘录,有一个节点数组(类名:“ question-summary ”)。

然后有两个子节点(tile类名:“问题超链接”和extract类名:“摘录”),我使用的代码如下:

Sub Scraper()
Dim ie As Object
Dim doc As Object, oQuestionShells As Object, oQuestionTitle As Object, oQuestion As Object, oElement As Object
Dim QuestionShell As String, QuestionTitle As String, Question As String, sURL As String

Set ie = CreateObject("internetexplorer.application")
sURL = "https://stackoverflow.com/questions/tagged/excel-formula"

QuestionShell = "question-summary"
QuestionTitle = "question-hyperlink"
Question = "excerpt"

With ie
    .Visible = False
    .Navigate sURL
End With

Set doc = ie.Document 'Stepping through so doc is getting assigned (READY_STATE = 4)

Set oQuestionShells = doc.getElementsByClassName(QuestionShell)

For Each oElement In oQuestionShells
    Set oQuestionTitle = oElement.getElementByClassName(QuestionTitle) 'Assigning this object causes an "Object doesn't support this property or method"
    Set oQuestion = oElement.getElementByClassName(Question) 'Assigning this object causes an "Object doesn't support this property or method"
    Debug.Print oQuestionTitle.innerHTML
    Debug.Print oQuestion.innerHTML
Next

End Sub
罗宾·麦肯齐

getElementByClassName 不是一种方法。

您只能使用getElementsByClassName(请注意方法名称中的复数形式)返回一个IHTMLElementCollection

使用Object代替IHTMLElementCollection是可以的-但是您仍然必须通过提供索引来访问集合中的特定元素。

假设对于每个类oElement,只有一个实例question-summary和一个实例question-hyperlink然后,您可以使用末尾getElementsByClassName使用(0)来拉出返回的数组的第一个元素。

因此,您的代码更正为:

Set oQuestionTitle = oElement.getElementsByClassName(QuestionTitle)(0)
Set oQuestion = oElement.getElementsByClassName(Question)(0)

完整的工作代码(进行一些更新,即使用Option Explicit并等待页面加载):

Option Explicit

Sub Scraper()

    Dim ie As Object
    Dim doc As Object, oQuestionShells As Object, oQuestionTitle As Object, oQuestion As Object, oElement As Object
    Dim QuestionShell As String, QuestionTitle As String, Question As String, sURL As String

    Set ie = CreateObject("internetexplorer.application")
    sURL = "https://stackoverflow.com/questions/tagged/excel-formula"

    QuestionShell = "question-summary"
    QuestionTitle = "question-hyperlink"
    Question = "excerpt"

    With ie
        .Visible = True
        .Navigate sURL
        Do
            DoEvents
        Loop While .ReadyState < 4 Or .Busy
    End With

    Set doc = ie.Document

    Set oQuestionShells = doc.getElementsByClassName(QuestionShell)

    For Each oElement In oQuestionShells
        'Debug.Print TypeName(oElement)

        Set oQuestionTitle = oElement.getElementsByClassName(QuestionTitle)(0)
        Set oQuestion = oElement.getElementsByClassName(Question)(0)

        Debug.Print oQuestionTitle.innerHTML
        Debug.Print oQuestion.innerHTML
    Next

    ie.Quit

End Sub

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章