我正在尝试使用特定标签从网页 ( http://steamcommunity.com/id/Winning117/games/?tab=all )获取数据,但我一直为空。我想要的结果是获得特定游戏的“游戏时间” - 在这种情况下是 Cluckles' Adventure。
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
public class TestScrape {
public static void main(String[] args) throws Exception {
String url = "http://steamcommunity.com/id/Winning117/games/?tab=all";
Document document = Jsoup.connect(url).get();
Element playTime = document.select("div#game_605250").first();
System.out.println(playTime);
}
}
编辑:如何判断网页是否使用 JavaScript 并因此无法被 Jsoup 解析?
要在 Java 代码中执行 javascript,有 Selenium :
Selenium-WebDriver 使用每个浏览器对自动化的本机支持直接调用浏览器。
要将其包含在 Maven 中,请使用此依赖项:
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-server</artifactId>
<version>3.4.0</version>
</dependency>
接下来我给你简单的 JUnit 测试代码,它创建 WebDriver 的实例并转到给定的 url 并执行简单的脚本来获取rgGames
. chromedriver
您必须在https://sites.google.com/a/chromium.org/chromedriver/downloads下载文件。
package SeleniumProject.selenium;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Map;
import org.junit.After;
import org.junit.AfterClass;
import org.junit.Before;
import org.junit.BeforeClass;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.junit.runners.JUnit4;
import org.openqa.selenium.By;
import org.openqa.selenium.JavascriptExecutor;
import org.openqa.selenium.WebDriver;
import org.openqa.selenium.WebElement;
import org.openqa.selenium.chrome.ChromeDriverService;
import org.openqa.selenium.chrome.ChromeOptions;
import org.openqa.selenium.remote.DesiredCapabilities;
import org.openqa.selenium.remote.RemoteWebDriver;
import org.openqa.selenium.support.ui.ExpectedCondition;
import org.openqa.selenium.support.ui.WebDriverWait;
import junit.framework.TestCase;
@RunWith(JUnit4.class)
public class ChromeTest extends TestCase {
private static ChromeDriverService service;
private WebDriver driver;
@BeforeClass
public static void createAndStartService() {
service = new ChromeDriverService.Builder()
.usingDriverExecutable(new File("D:\\Downloads\\chromedriver_win32\\chromedriver.exe"))
.withVerbose(false).usingAnyFreePort().build();
try {
service.start();
} catch (IOException e) {
System.out.println("service didn't start");
// TODO Auto-generated catch block
e.printStackTrace();
}
}
@AfterClass
public static void createAndStopService() {
service.stop();
}
@Before
public void createDriver() {
ChromeOptions chromeOptions = new ChromeOptions();
DesiredCapabilities capabilities = DesiredCapabilities.chrome();
capabilities.setCapability(ChromeOptions.CAPABILITY, chromeOptions);
driver = new RemoteWebDriver(service.getUrl(), capabilities);
}
@After
public void quitDriver() {
driver.quit();
}
@Test
public void testJS() {
JavascriptExecutor js = (JavascriptExecutor) driver;
// Load a new web page in the current browser window.
driver.get("http://steamcommunity.com/id/Winning117/games/?tab=all");
// Executes JavaScript in the context of the currently selected frame or
// window.
ArrayList<Map> list = (ArrayList<Map>) js.executeScript("return rgGames;");
// Map represent properties for one game
for (Map map : list) {
for (Object key : map.keySet()) {
// take each key to find key "name" and compare its vale to
// Cluckles' Adventure
if (key instanceof String && key.equals("name") && map.get(key).equals("Cluckles' Adventure")) {
// print all properties for game Cluckles' Adventure
map.forEach((key1, value) -> {
System.out.println(key1 + " : " + value);
});
}
}
}
}
}
如您所见,硒加载页面位于
driver.get("http://steamcommunity.com/id/Winning117/games/?tab=all");
并通过 Winning117 获取所有游戏的数据,它返回rgGames
变量:
ArrayList<Map> list = (ArrayList<Map>) js.executeScript("return rgGames;");
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句