如何使用python从具有多个动态选择字段的页面中抓取数据?

乙草胺

我是网络爬虫的新手,并且正在尝试使用python来爬网此页面上的特定内容:TRAC-人员和人员编制

我需要获取的数据是列表中每个区域(既有历史的也有最近的)的律师人数(甚至不需要姓名)。

我用谷歌搜索并导入了漂亮的汤和干景观。我已经阅读了几本教程,但是我不了解javascript,因此我很难确定哪个软件包是最好的,以及如何更改我发现的示例代码。

编辑:我没有订阅此数据库,但是当我查看源代码时,我可以看到我需要的所有数据:

截屏

这说明当选择一个地区,一个时间段的名称列表; 本质上,我需要的是一种自动迭代此方法并存储输出(按地区排序)的方法。

格夫拉索夫

在大多数情况下,您无需了解JavaScript即可进行网络抓取。

打开开发人员工具的“网络”选项卡,然后尝试更改表单中的地区。您会看到您的浏览器向某些地址发出http请求:其中一个包含您需要的数据:

开发者工具

浏览器对类似于http://tracfed.syr.edu/express/fedstaf/initials/hidden_​​ausa/menu_ausa_Arizona.html的地址执行GET请求,并获得如下所示的响应:

<SCRIPT LANGUAGE="JavaScript">
<!--
var group= new Array() ;
   group[0] = new Array("Aguilar, Arturo","14006");
   group[1] = new Array("Albert, Jerry R.","00113");
   group[2] = new Array("Alexander, Bret S.","13950");
   group[3] = new Array("Anderson, Beverly K.","10220");
   group[4] = new Array("Arellano, Raquel","00114");
   group[5] = new Array("Bachus, Alison S.","13936");
   group[6] = new Array("Barry, Patrick T.","13937");
   group[7] = new Array("Battista, Frederick A.","00119");
   group[8] = new Array("Bibles, Camilla","00121");
   group[9] = new Array("Bostwick, Reese V.","00123");
   group[10] = new Array("Boyle, John Z.","10223");
   group[11] = new Array("Brown, Christopher A.","13982");
   group[12] = new Array("Bullis, Paul","14005");
   group[13] = new Array("Cabanillas, Cristina M.","00126");
   group[14] = new Array("Cassell, Matthew Colson","14017");
   group[15] = new Array("Cerow, Darcy A.","00127");
   group[16] = new Array("Christiansen, Callie","14010");
   group[17] = new Array("Clausen, Monte C.","00132");
   group[18] = new Array("Clemens, Shelley K. G.","13931");
   group[19] = new Array("Cocio, Fredrick A.","13945");
   group[20] = new Array("Corbin, Carmen","13996");
   group[21] = new Array("Coughlin, Robert E.","14022");
   group[22] = new Array("Davenport, Gordon","13985");
   group[23] = new Array("DeJong, Sarah Elizabeth","14014");
   group[24] = new Array("Decosta, Sabrina","13952");
   group[25] = new Array("Dejoe, Ryan","14001");
   group[26] = new Array("Demarais, Ann L.","13962");
   group[27] = new Array("Dokken, Roger W.","00135");
   group[28] = new Array("Duryee, Carin","00138");
   group[29] = new Array("Eltringham, Matthew G.","14019");
   group[30] = new Array("Evans, John Reynolds","05657");
   group[31] = new Array("Feldmeier, Mary Sue","00142");
   group[32] = new Array("Fellrath, Robert A.","13981");
   group[33] = new Array("Ferg, Bruce M.","10234");
   group[34] = new Array("Ferraro, D. Thomas","10236");
   group[35] = new Array("Figueroa, Jesse J.","00143");
   group[36] = new Array("Fisher, Joshua Ivan","14012");
   group[37] = new Array("Flannigan, David P","10237");
   group[38] = new Array("Fugitive","00141");
   group[39] = new Array("Galati, Frank T.","13927");
   group[40] = new Array("Galbraith, Charles","13998");
   group[41] = new Array("Granoff, Jonathan B.","10241");
   group[42] = new Array("Granoff, Liza M.","13941");
   group[43] = new Array("Green, Jennifer E.","13954");
   group[44] = new Array("Greer, Dyanne C.","10242");
   group[45] = new Array("Hanley, Joseph","13997");
   group[46] = new Array("Hansen, Sandra M.","00153");
   group[47] = new Array("Healey, Kyle","13994");
   group[48] = new Array("Hernandez, Rachel","00155");
   group[49] = new Array("Hodahkwen, Marnie","14002");
   group[50] = new Array("Hollon, Leta","13971");
   group[51] = new Array("Hopkins, Kimberly E.","13989");
   group[52] = new Array("Howe, Randall","13969");
   group[53] = new Array("Hurley, Emory T.","10248");
   group[54] = new Array("Hyder, Charles F.","00159");
   group[55] = new Array("Jennis, Lisa","00161");
   group[56] = new Array("Kasprzyk, Brian","13976");
   group[57] = new Array("Kelly, Kristen","14003");
   group[58] = new Array("Kern, David A.","00166");
   group[59] = new Array("Kim, Wendy","13991");
   group[60] = new Array("Kimmins, Lynnette","00167");
   group[61] = new Array("Kirby, Vincent Q.","00168");
   group[62] = new Array("Kleindienst, Wallace H.","00170");
   group[63] = new Array("Kleiner, Albert L.","10255");
   group[64] = new Array("Knapp, James Richard","13940");
   group[65] = new Array("Koehler, Joseph E.","00172");
   group[66] = new Array("Kokanovich, Mark Samuel","14021");
   group[67] = new Array("Lacey, James T.","00174");
   group[68] = new Array("Landis, Brent H.","13968");
   group[69] = new Array("Langhofer, Kory","13992");
   group[70] = new Array("Lanham, Krissa","14009");
   group[71] = new Array("Lanza, Dominc","13984");
   group[72] = new Array("Laramore, Stephen W.","00175");
   group[73] = new Array("Larson, Brian","00176");
   group[74] = new Array("Lee, Lawrence C.","13987");
   group[75] = new Array("Lee, Michael A.","10258");
   group[76] = new Array("Lefkowitz, Claire K.","00177");
   group[77] = new Array("Lemke, Kathy","13951");
   group[78] = new Array("Levinson, Jennifer F.","13964");
   group[79] = new Array("Lewis, Christopher J.","13963");
   group[80] = new Array("Lodge, Joseph J.","00180");
   group[81] = new Array("Logalbo, Michael D.","13932");
   group[82] = new Array("Logan, Steven P.","00181");
   group[83] = new Array("Lopez, John R., IV","10261");
   group[84] = new Array("Lucca, Jonell L.","13965");
   group[85] = new Array("Maingot, Anthony Edward","13939");
   group[86] = new Array("Markovick, Erick","00183");
   group[87] = new Array("Marlowe, Joelyn D.","00184");
   group[88] = new Array("McCallum, Erica","13973");
   group[89] = new Array("McCormick, Glenn B.","10267");
   group[90] = new Array("McDonald, Karen S.","00190");
   group[91] = new Array("McGhee, James","14007");
   group[92] = new Array("McGhee, Melanie A.","14018");
   group[93] = new Array("McLaughlin, Jane E.","13960");
   group[94] = new Array("McMurray, Molly","14008");
   group[95] = new Array("Meister, Melissa","14004");
   group[96] = new Array("Melissa, Karlen","13978");
   group[97] = new Array("Mellor, Joshua C.","13988");
   group[98] = new Array("Miskell, Robert L.","00193");
   group[99] = new Array("Morrissey, Michael T.","00194");
   group[100] = new Array("Morse, James B.","13953");
   group[101] = new Array("Mosher, Anne E.","00195");
   group[102] = new Array("Nelson, Heather H.","13957");
   group[103] = new Array("Novitsky, Sharon K.","00198");
   group[104] = new Array("Parecki, Josh P.","13958");
   group[105] = new Array("Passos, Raynette","00201");
   group[106] = new Array("Peery, Amy Elizabeth","14020");
   group[107] = new Array("Perkel, Walter P","13977");
   group[108] = new Array("Petermann, David P.","10272");
   group[109] = new Array("Phillips, Sheila","13979");
   group[110] = new Array("Picton, Cory","13972");
   group[111] = new Array("Pimsner, David","00203");
   group[112] = new Array("Pop, Anca","13999");
   group[113] = new Array("Rapp, Kevin M.","00206");
   group[114] = new Array("Rassas, Theresa","13970");
   group[115] = new Array("Reid-Moore, Christina","13990");
   group[116] = new Array("Restaino, Gary M.","10274");
   group[117] = new Array("Rocker, Wilbert L., Jr.","13959");
   group[118] = new Array("Roetzel, Danny N.","10276");
   group[119] = new Array("Rolley, Karen","14000");
   group[120] = new Array("Rood, Paul V.","00209");
   group[121] = new Array("Russell, Craig Howard","14016");
   group[122] = new Array("Sampson, Dimitra H.","13967");
   group[123] = new Array("Sardelli, Brian G.","14015");
   group[124] = new Array("Savel, Nicole P.","10277");
   group[125] = new Array("Scheel, Ann B.","00212");
   group[126] = new Array("Schmit, Gerard Micah","13956");
   group[127] = new Array("Schneider, Patrick J.","00213");
   group[128] = new Array("Sexton, Peter S.","00214");
   group[129] = new Array("Sharda, Munish","10278");
   group[130] = new Array("Silver, James Anthony","13949");
   group[131] = new Array("Simon, Thomas C.","00216");
   group[132] = new Array("Spaven, Michelle","13995");
   group[133] = new Array("Strange, Elizabeth A.","13961");
   group[134] = new Array("Sukenic, Howard D.","10283");
   group[135] = new Array("Timm, Craig M.","13983");
   group[136] = new Array("Tsethlikai, Serra","00220");
   group[137] = new Array("Uhl, Louis","13974");
   group[138] = new Array("Van Buskirk, Tracy","13986");
   group[139] = new Array("Vercauteren, Keith E.","10285");
   group[140] = new Array("Walsh, Janet M.","10286");
   group[141] = new Array("Wang, Rui","13993");
   group[142] = new Array("Wiles, Jennifer Marie","14011");
   group[143] = new Array("Woo, Cassie","13975");
   group[144] = new Array("Woo, Raymond K.","13943");
   group[145] = new Array("Woolridge, Angela W.","13942");
   group[146] = new Array("Zipps, David R.","13980");
parent.frames['form'].changeAUSAMenu(group); 
delete group ; 
//-->
</SCRIPT>
<HTML><BODY BGCOLOR="#F5F6C8"></BODY></HTML>

现在,确定开发人员选择响应AJAX请求是一种怪异的方式(通常,网页上的客户端-服务器交互是通过JSON / XML完成的),但是我们仍然可以使用不太复杂的正则表达式来解析该数据(我将保留它)告诉您如何编写一种可以从响应文本中提取律师姓名的方法)。

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章

如何从具有多个“选择”字段的网站中抓取?

如何使用scrapy抓取具有多个页面的网站

如何使用python和beautifulsoup4循环抓取网站中多个页面的数据

如何将具有多个页面和内部链接的网站抓取到 Pandas 数据框中?

从具有多个页面的网站抓取数据

如何通过使用rvest动态更新url从多个页面抓取数据

使用 BeautifulSoup 在 python 中抓取多个页面

如何使用node.js抓取具有动态内容的页面?

Python如何在多个字段具有相同名称的情况下为POST数据选择字段

如何使用python中的BeautifulSoup库从具有“查看更多”选项的网站上抓取数据

python 抓取具有多个页面的站点

Beautifulsoup在具有多个表的页面中抓取特定表

如何将具有多个聚合字段和多个索引字段的pandas数据框旋转到python中的sumIfs?

数据抓取:如何从多个页面(使用下一页)读取所有表内容?

网页抓取具有多个表的页面

具有多个部分的网页抓取页面

如何使用yield函数从多个页面抓取数据

如何使用python从多个维基百科页面抓取数据?

如何从多个具有相同名称的输入中获取值,其中使用jQuery动态创建输入字段?

使用 Selenium Python 从包含多个下拉选项的动态表中抓取数据

如何使用VBA从具有多个数据字段的Excel数据透视表中删除小计

如何使用Python和BeautifulSoup抓取多个Google页面

如何使用不变的网址抓取多个页面-Python 3

如何使用Selenium(Python)网页抓取多个页面

如何从 django 表单中的下拉列表中选择多个数据(字段在数据库中具有 M2M 关系)?

Java中具有多个条件字段的动态排序

如何使用 selenium python 从具有多个页面(分页)的特定 div 容器中获取子元素的所有超链接

如何使用python在单个数据框中查找具有多个公共值的列

如何使用phoenix_html发布具有多个选择表单字段的ID集合