I would like to dump all the names on this page and all the remaining 146 pages.
The red/orange previous/next buttons uses JavaScript it seams, and gets the names by AJAX.
Question
Is it possible to write a script to crawl the 146 pages and dump the names?
Does there exist Perl modules for this kind of thing?
You can use WWW::Mechanize or another Crawler for this. Web::Scraper might also be a good idea.
use Web::Scraper;
use URI;
use Data::Dump;
# First, create your scraper block
my $scraper = scraper {
# grab the text nodes from all elements with class type_firstname (that way you could also classify them by type)
process ".type_firstname", "list[]" => 'TEXT';
};
my @names;
foreach my $page ( 1 .. 146) {
# Fetch the page (add page number param)
my $res = $scraper->scrape( URI->new("http://www.familiestyrelsen.dk/samliv/navne/soeginavnelister/godkendtefornavne/drengenavne/?tx_lfnamelists_pi2[gotopage]=" . $page) );
# add them to our list of names
push @names, $_ for @{ $res->{list} };
}
dd \@names;
It will give you a very long list with all the names. Running it may take some time. Try with 1..1
first.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments