Problem connecting to a specific URL with Jsoup's connect method

Matthew S. :

First off, the Jsoup's connect method may not be at fault; it's possible that my concern is due to a misunderstanding of Document's html() method, which is inherited from Element.

The focus of my problem is extracting information from a specific URL, but due to the String that is currently being returned by Document's html() method I fear that Jsoup's connect() method is not connecting to the specified URL but rather the generic URL of the website.

This is the specific URL I would like my program to connect to: http://redditsearch.io/?term=&dataviz=false&aggs=false&subreddits=&searchtype=posts&search=true&start=1587355200&end=1587441600&size=100

but instead I think it's only connecting to the generic URL of that website: http://redditsearch.io/

The reason I believe this is because of the String that is returned by Document's html() method:

Document doc = Jsoup.connect("http://redditsearch.io/?term=&dataviz=false&aggs=false&subreddits=&searchtype=posts&search=true&start=1587355200&end=1587441600&size=100").get();
String html = doc.html();
System.out.println(html);

Which prints a whole lot of HTML so I will only share with you guys the pertinent aspect of it (bear in mind, the following text is returned from Document's html() method):

<div id="results-container" class="data-display"> 
 <div id="posts" class="results"></div> 
 <div id="comments" class="results"></div> 
</div>

This aspect of the HTML to this specific URL in my browser's inspector (firefox) looks like this (bear in mind, the following text is NOT returned by Document's html() method, rather it is displayed by the inspector in my browser):

<div id="results-container" class="data-display"> 
 <div id="posts" class="results"></div> 
  <div class="submission"...> </div> (first line under "posts")
  ...
  <div class="submission"...> </div> (Nth line under "posts")
 <div id="comments" class="results"></div> 
</div>

Meaning there are multiple lines under the div id="posts" tag when I connect my browser to the specific URL; however, there are no lines under that tag in my browser's inspector when I connect it to the generic URL "redditsearch.io" (i.e. this aspect of HTML looks just like the first HTML example given here in my inspector when my browser is connected to the generic URL). This is why I believe my program is connecting to the generic URL even though I use the specific URL for the argument.

dankito :

Another point could be that the "submission" divs get added via JavaScript in your browser.

To check this either turn off JavaScript in your browser (e.g. with NoScript plugin) or in the network tab of the developer console check the first returned html file.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related