如何使用Python Requests模块登录Web?

狼 :

我一直在阅读有关请求模块的信息,并尝试了几种不同的方法。

但是,在Web身份验证方面存在问题。

Testing site: http://testing-ground.scraping.pro/login
Username: admin
Password: 12345

这是示例代码

>>> import requests, re
>>> url = 'http://testing-ground.scraping.pro/login'
>>> username = 'admin'
>>> password = '12345'
>>> requests.get(url)
<Response [200]>

未经身份验证

>>> print(requests.get(url).text)
<!DOCTYPE html>
<!--[if lt IE 7]>      <html class="no-js lt-ie9 lt-ie8 lt-ie7"> <![endif]-->
<!--[if IE 7]>         <html class="no-js lt-ie9 lt-ie8"> <![endif]-->
<!--[if IE 8]>         <html class="no-js lt-ie9"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js"> <!--<![endif]-->
    <head>
        <meta charset="utf-8">
        <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
        <title>Web Scraper Testing Ground</title>
        <meta name="description" content="">
        <meta name="viewport" content="width=device-width">
        <link rel="stylesheet" href="/css/normalize.css">
        <link rel="stylesheet" href="/css/main.css">
        <script src="/js/vendor/modernizr-2.6.1.min.js"></script>
        <script src="/js/vendor/jquery-1.9.1.min.js"></script>
        <script src="/js/vendor/jquery-ui-1.10.2.min.js"></script>
        <script src="/js/plugins.js"></script>
        <script src="/js/main.js"></script>

        <link rel="stylesheet" href="/css/QapTcha.jquery.css" />
        <script src="/js/QapTcha.jquery.js"></script>
        
        <link rel="stylesheet" href="/fancy-captcha/captcha.css" />
        <script src="/fancy-captcha/jquery.captcha.js"></script>

    </head>
    <body>
        <script type="text/javascript">
        
          var _gaq = _gaq || [];
          _gaq.push(['_setAccount', 'UA-4436411-8']);
          _gaq.push(['_setDomainName', 'extract-web-data.com']);
          _gaq.push(['_trackPageview']);
        
          (function() {
            var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
            ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
            var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
          })();
        
        </script>
        <!--[if lt IE 7]>
            <p class="chromeframe">You are using an outdated browser. <a href="http://browsehappy.com/">Upgrade your browser today</a> or <a href="http://www.google.com/chromeframe/?redirect=true">install Google Chrome Frame</a> to better experience this site.</p>
        <![endif]-->
        <div id="topbar"></div>
        <a href="/" style="text-decoration: none">
            <div id="title">WEB SCRAPER TESTING GROUND</div>
            <div id="logo"></div>
        </a>
        <div id="content">
<h1>LOGIN</h1>
<div id="caseinfo">Often in order to reach the desired information you need to be logged in to the website. Most of today's websites use so-called form-based authentication which implies sending user credentials using POST method, authenticating it on the server and storing user's session in a cookie.</p>
<p>This simple test shows scraper's ability to:</p>
    <ol>
        <li>Send user credentials via POST method</li>
        <li>Receive, Keep and Return a session cookie</li>
        <li>Process HTTP redirect (302)</li>
    </ol>
<p>How to test:</p>
    <ol>
        <li>Enter <b>admin</b> and <b>12345</b> in the form below and press <b>Login</b></li>
        <li>If you see <span class="success">WELCOME :)</span> then the user credentials were sent, the cookie was passed and HTTP redirect was processed</li>
        <li>If you see <span class="error">ACCESS DENIED!</span> then either you entered wrong credentials or they were not sent to the server properly</li>
        <li>If you see <span class="error">THE SESSION COOKIE IS MISSING OR HAS A WRONG VALUE!</span> then the user credentials were properly sent but the session cookie was not properly stored or passed</li>
        <li>If you see <span class="success">REDIRECTING...</span> then the user credentials were properly sent but HTTP redirection was not processed</li>
        <li>Click <b>GO BACK</b> to start again</li>
    </ol>
</div>

<hr/>

<div id="case_login">
<h3>Please, login:</h3>
    <form action="login?mode=login" method="POST">
        <label for="usr">User name:</label>
        <input id="usr" name="usr" type="text" placeholder="enter 'admin' here">
        <label for="pwd">Password:</label>
        <input id="pwd" name="pwd" type="text" placeholder="enter '12345' here">
        <input type="submit" value="Login">
    </form>
</div>
<br/><br/><br/>
        </div>
    </body>
</html>
>>> 

有认证

>>> print(requests.get(url, auth=(username, password)).text)
<!DOCTYPE html>
<!--[if lt IE 7]>      <html class="no-js lt-ie9 lt-ie8 lt-ie7"> <![endif]-->
<!--[if IE 7]>         <html class="no-js lt-ie9 lt-ie8"> <![endif]-->
<!--[if IE 8]>         <html class="no-js lt-ie9"> <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js"> <!--<![endif]-->
    <head>
        <meta charset="utf-8">
        <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
        <title>Web Scraper Testing Ground</title>
        <meta name="description" content="">
        <meta name="viewport" content="width=device-width">
        <link rel="stylesheet" href="/css/normalize.css">
        <link rel="stylesheet" href="/css/main.css">
        <script src="/js/vendor/modernizr-2.6.1.min.js"></script>
        <script src="/js/vendor/jquery-1.9.1.min.js"></script>
        <script src="/js/vendor/jquery-ui-1.10.2.min.js"></script>
        <script src="/js/plugins.js"></script>
        <script src="/js/main.js"></script>

        <link rel="stylesheet" href="/css/QapTcha.jquery.css" />
        <script src="/js/QapTcha.jquery.js"></script>
        
        <link rel="stylesheet" href="/fancy-captcha/captcha.css" />
        <script src="/fancy-captcha/jquery.captcha.js"></script>

    </head>
    <body>
        <script type="text/javascript">
        
          var _gaq = _gaq || [];
          _gaq.push(['_setAccount', 'UA-4436411-8']);
          _gaq.push(['_setDomainName', 'extract-web-data.com']);
          _gaq.push(['_trackPageview']);
        
          (function() {
            var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
            ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
            var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
          })();
        
        </script>
        <!--[if lt IE 7]>
            <p class="chromeframe">You are using an outdated browser. <a href="http://browsehappy.com/">Upgrade your browser today</a> or <a href="http://www.google.com/chromeframe/?redirect=true">install Google Chrome Frame</a> to better experience this site.</p>
        <![endif]-->
        <div id="topbar"></div>
        <a href="/" style="text-decoration: none">
            <div id="title">WEB SCRAPER TESTING GROUND</div>
            <div id="logo"></div>
        </a>
        <div id="content">
<h1>LOGIN</h1>
<div id="caseinfo">Often in order to reach the desired information you need to be logged in to the website. Most of today's websites use so-called form-based authentication which implies sending user credentials using POST method, authenticating it on the server and storing user's session in a cookie.</p>
<p>This simple test shows scraper's ability to:</p>
    <ol>
        <li>Send user credentials via POST method</li>
        <li>Receive, Keep and Return a session cookie</li>
        <li>Process HTTP redirect (302)</li>
    </ol>
<p>How to test:</p>
    <ol>
        <li>Enter <b>admin</b> and <b>12345</b> in the form below and press <b>Login</b></li>
        <li>If you see <span class="success">WELCOME :)</span> then the user credentials were sent, the cookie was passed and HTTP redirect was processed</li>
        <li>If you see <span class="error">ACCESS DENIED!</span> then either you entered wrong credentials or they were not sent to the server properly</li>
        <li>If you see <span class="error">THE SESSION COOKIE IS MISSING OR HAS A WRONG VALUE!</span> then the user credentials were properly sent but the session cookie was not properly stored or passed</li>
        <li>If you see <span class="success">REDIRECTING...</span> then the user credentials were properly sent but HTTP redirection was not processed</li>
        <li>Click <b>GO BACK</b> to start again</li>
    </ol>
</div>

<hr/>

<div id="case_login">
<h3>Please, login:</h3>
    <form action="login?mode=login" method="POST">
        <label for="usr">User name:</label>
        <input id="usr" name="usr" type="text" placeholder="enter 'admin' here">
        <label for="pwd">Password:</label>
        <input id="pwd" name="pwd" type="text" placeholder="enter '12345' here">
        <input type="submit" value="Login">
    </form>
</div>
<br/><br/><br/>
        </div>
    </body>
</html>
>>> 

由于输出中有Web登录表单,因此我认为身份验证未按预期进行。

<h3>Please, login:</h3>
    <form action="login?mode=login" method="POST">
        <label for="usr">User name:</label>
        <input id="usr" name="usr" type="text" placeholder="enter 'admin' here">
        <label for="pwd">Password:</label>
        <input id="pwd" name="pwd" type="text" placeholder="enter '12345' here">
        <input type="submit" value="Login">
    </form>

在这种情况下有什么问题,我应该怎么做才能解决?

排水:

您应该在登录页面指向的位置发帖:


>>> import requests, re
>>> url = 'http://testing-ground.scraping.pro/login?mode=login'
>>> username = 'admin'
>>> password = '12345'
>>> requests.post(url, data={'usr':username, 'pwd':password})

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章