我正在尝试从Alexa报告中获取前3个国家/地区的信息,但无法使用访问该网站curl
。但是,当我这样做时,我收到了来自Alexa的错误消息,告诉我注册Amazon。我知道这curl
是不可阻挡的,但他们似乎已经做到了。
$url="http://www.alexa.com/siteinfo/google.com";
$agent= 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_URL,$url);
$result=curl_exec($ch);
echo('<textarea>'.$result.'</textarea>');
这应该工作。注意我使用了我喜欢使用的一组标准卷曲选项。随时根据您的实际需求进行调整。我这样做的原因是因为在进行设置时,$agent
您实际上并没有以curl
任何方式将其传递给它。因此,我的选项CURLOPT_USERAGENT
以及其他一些东西都可以正确设置。
$url ="http://www.alexa.com/siteinfo/google.com";
$agent = 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSLVERSION, 3);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
$result = curl_exec($ch);
curl_close($ch);
echo('<textarea>'.$result.'</textarea>');
这是我在Macintosh上通过MAMP使用PHP 5.4的本地测试环境获得的结果。
EDIT: According to the original poster, this script works on one host but not another where he is met with a “403: Forbidden” error. Which points to some kind of blocking happening on the Alexa server. I would recommend debugging by using curl -I
from the command line like this:
curl -I http://www.alexa.com/siteinfo/google.com
And on my local Mac OS X 10.9.4 setup, I get this in response to the request:
HTTP/1.1 200 OK
Content-Type: text/html; charset=UTF-8
Date: Thu, 10 Jul 2014 01:24:51 GMT
Server: Apache
Set-Cookie: rpt=%21; expires=Fri, 11-Jul-2014 02:24:51 GMT; domain=alexa.com
Set-Cookie: lv=1404955491; expires=Fri, 10-Jul-2015 07:24:51 GMT; path=/; domain=alexa.com
Vary: Accept-Encoding
X-Frame-Options: SAMEORIGIN
Connection: keep-alive
The HTTP/1.1 200 OK
means all is good. If you run the same command from the command line & get anything other than that, you can bet you are being blocked. Possibly a block based just on an IP range. Or even blocked via something like ModSecurity which would do heuristic analysis of traffic to catch & block non-standard web requests. Regardless, if you are being blocked on the server side of this, there is not much you can do to unblock yourself.
也就是说,请注意我如何正确设置$agent
脚本版本,但您没有设置?可能是在测试中,您在curl
没有适当的用户代理的情况下运行了如此多的请求,而现在暂时禁止了IP测试。因此,请等待一两天,然后重试,但请使用我的脚本版本,以便设置正确的用户代理。我敢打赌,那会很好的。
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句