我编写了一个简短的VBA脚本,该脚本可生成URL并下载页面内容并放入新的工作表中。但是,数据始终显示在两页上,从而产生以下类型的URL:
对于结果的第一页:
resultat_annuaire.php?loc=01&item=hopital&session=clear (with 01 being the region)
对于第二页:
resultat_annuaire.php?loc=01&item=hopital&page=2 (session=clear is gone, replaced by page=2)
当我的VBA脚本生成和擦伤网址的第一页,它工作正常(即我得到95个不同的网页下载到我的Excel)
但是,当我运行相同的VBA脚本(仅更改生成第二页的URL的方式)时,它下载的内容是第一个URL第2页内容的95倍。
现在,我尝试通过执行以下操作在Web浏览器中简单地摆弄URL:
输入第二个页面URL:
resultat_annuaire.php?loc=01&item=hopital&page=2
然后像这样将01更改为05:
resultat_annuaire.php?loc=05&item=hopital&page=2
再说一次,什么都没有发生,页面保持不变,即好像我没有将01切换到05。
这是VBA脚本:
Sub Data_scraping()
For x = 1 To 9
ActiveWorkbook.Worksheets.Add
With ActiveSheet.QueryTables.Add(Connection:= _
"URL;" _
& "http://etablissements.hopital.fr/resultat_annuaire.php?loc=" _
& "0" _
& x _
& "&item=hopital&session=clear" _
, Destination:=Range("$A$1"))
'.CommandType = 0
.Name = "resultat_annuaire.php?loc=01&item=hopital&session=clear"
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.BackgroundQuery = True
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.WebSelectionType = xlEntirePage
.WebFormatting = xlWebFormattingNone
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = False
.WebDisableRedirections = False
.Refresh BackgroundQuery:=False
End With
ActiveWindow.SmallScroll Down:=18
Rows("1:31").Select
Selection.Delete Shift:=xlUp
Range("A5").Select
Next x
End Sub
有人可以提供解释或帮助吗?
您的宏非常适合page = 2,
Sub sof20287920Data_scrapping()
Dim x, strLoc, strUrl
Dim wkb
Set wkb = Workbooks.Add()
wkb.Activate
For x = 1 To 9
ActiveWorkbook.Worksheets.Add After:=ActiveWorkbook.Sheets(ActiveWorkbook.Sheets.Count)
strLoc = "resultat_annuaire.php?loc=" & "0" & x & "&item=hopital&session=clear&page=2"
strUrl = "http://etablissements.hopital.fr/" & strLoc
With ActiveSheet.QueryTables.Add(Connection:= _
"URL;" & strUrl _
, Destination:=Range("$A$1"))
'.CommandType = 0
.Name = strLoc
.FieldNames = True
.RowNumbers = False
.FillAdjacentFormulas = False
.PreserveFormatting = True
.RefreshOnFileOpen = False
.BackgroundQuery = True
.RefreshStyle = xlInsertDeleteCells
.SavePassword = False
.SaveData = True
.AdjustColumnWidth = True
.RefreshPeriod = 0
.WebSelectionType = xlEntirePage
.WebFormatting = xlWebFormattingNone
.WebPreFormattedTextToColumns = True
.WebConsecutiveDelimitersAsOne = True
.WebSingleBlockTextImport = False
.WebDisableDateRecognition = False
.WebDisableRedirections = False
.Refresh BackgroundQuery:=False
End With
ActiveWindow.SmallScroll Down:=18
Rows("1:31").Select
Selection.Delete Shift:=xlUp
Range("A5").Select
Next x
End Sub
即使对于page = 2,session = clear似乎也是必需的,如下所示:
http://etablissements.hopital.fr/resultat_annuaire.php?loc=01&item=hopital&session=clear&page=2
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句