Web Scraping mit Rvest - NA zurückgeben, wenn der Knoten nicht gefunden wird?

JC

Ich stecke hier ein bisschen fest. Ich möchte Daten von einer Website kratzen und einige Dinge wie Benutzerbewertungen, Kommentare usw. extrahieren. Ich versuche, die Daten einem Datenrahmen hinzuzufügen.

Unten ist der Code, den ich bisher habe:

# Read html and select the URLs for each game review. 

library(rvest)
library(dplyr)
library(plyr)

# Read the webpage and the number of ratings.

getGame <- function(metacritic_game) {

total_ratings<- metacritic_game %>%
  html_nodes("strong") %>%
  html_text()

total_ratings <- ifelse(length(total_ratings) == 0, NA, 
as.numeric(strsplit(total_ratings, " ") [[1]][1]))

# Get the game title and the platform.

game_title <- metacritic_game %>%
  html_nodes("h1") %>%
  html_text()

game_platform <- metacritic_game %>%
  html_nodes(".platform a") %>%
  html_text()

game_platform <- strsplit(game_platform," ")[[1]][57:58]
game_platform <- gsub("\n","", game_platform)
game_platform<- paste(game_platform[1], game_platform[2], sep = " ")

game_publisher <- metacritic_game %>%
  html_nodes(".publisher a:nth-child(1)") %>%
  html_attr("href") %>%
  strsplit("/company/")%>%
  unlist() 

game_publisher <- gsub("\\W", " ", game_publisher)
game_publisher <- strsplit(game_publisher,"\\t")[[2]][1]

release_date <- metacritic_game %>%
  html_nodes(".release_data .data") %>%
  html_text()


user_ratings <- metacritic_game %>%
  html_nodes("#main .indiv") %>%
  html_text() %>%
  as.numeric()


user_name <- metacritic_game %>%
  html_nodes(".name a") %>%
  html_text()



review_date <- metacritic_game %>%
  html_nodes("#main .date") %>%
  html_text()


user_comment <- metacritic_game %>%
  html_nodes("#main .review_section .review_body") %>%
  html_text()



record_game <- data.frame(game_title = game_title,
                      game_platform = game_platform,
                      game_publisher = game_publisher,
                      username = user_name,
                      ratings =  user_ratings,
                      date = review_date,
                      comments = user_comment)

}

metacritic_home <-read_html("https://www.metacritic.com/browse/games/score/metascore/90day/all/filtered")

game_urls <- metacritic_home %>%
  html_nodes("#main .product_title a") %>%
  html_attr("href")

get100games <- function(game_urls) {
  data <- data.frame()
  i = 1
  for(i in 1:length(game_urls)) {
    metacritic_game <- read_html(paste0("https://www.metacritic.com", 
game_urls[i], "/user-reviews"))
    record_game <- getGame(metacritic_game)
    data <-rbind.fill(data, record_game)
    print(i)
  }
  data
}

df100games <- get100games(game_urls)

Einige der Links haben jedoch keine Benutzerkritiken und daher kann rvest den Knoten nicht finden, und es wird der folgende Fehler angezeigt: Fehler in data.frame (game_title = game_title, game_platform = game_platform ,: Argumente implizieren unterschiedliche Anzahl der Zeilen: 1, 0.

Ich habe versucht, ifelse-Anweisungen wie:

username = ifelse(length(user_name) !=0 , user_name, NA),
                      ratings =  ifelse(length(user_ratings) != 0, 
user_ratings, NA),
                      date = ifelse(length(review_date) != 0, 
review_date, NA),
                      comments = ifelse(length(user_comment) != 0, 
user_comment, NA))

Der Datenrahmen gibt jedoch nur eine Bewertung pro Spiel zurück, anstatt alle Bewertungen zurückzugeben. Irgendwelche Gedanken dazu?

Vielen Dank..!

DiceboyT

Sie können den Funktionsoperator possiblyaus dem purrrPaket verwenden:

df100games <- purrr::map(game_urls, purrr::possibly(get100games, NULL)) %>%
  purrr::compact() %>% 
  dplyr::bind_rows()

Ich glaube, dies wird Ihre gewünschte Ausgabe zurückgeben.

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章