HTML清理代码效果不佳

gdogg371

我正在Windows Vista 64位上运行Scrapy.org版本2.7 64位。我具有以下代码,旨在从Guardian Open Platform API中提取数据并使用某些Scrapy模块进行清理:

import requests
from scrapy.utils.markup import remove_tags
from scrapy.selector import Selector


def get_content():
    api_url = 'http://beta.content.guardianapis.com/football/premierleague'
    payload = {
        'api-key':              '',
        'page-size':            10,
        'show-editors-picks':   'true',
        'show-elements':        'image',
        'show-fields':          'all'


    }
    response = requests.get(api_url, params=payload)

    def parse(self, response):
        titles = response.selector.xpath("normalize-space(//title)")
        for titles in titles:
            body = response.xpath("//p").extract()
            body2 = "".join(body)
            print remove_tags(body2).encode('utf-8')
            return titles

get_content()

代码运行时不会产生错误,但是没有任何内容打印到Python IDLE。我怀疑这是因为我没有正确缩进一些东西。我尝试过使用缩进,但是我什么也没找到。这是我的问题,还是我的代码完全有问题?

谢谢

帕德拉克·坎宁安(Padraic Cunningham)

尝试使用beautifulSoup进行解析:

from bs4 import BeautifulSoup
api_url = 'http://beta.content.guardianapis.com/football/premierleague'
payload = {
    'api-key':              '',
    'page-size':            10,
    'show-editors-picks':   'true',
    'show-elements':        'image',
    'show-fields':          'all'


    }
response = requests.get(api_url, params=payload).content
soup = BeautifulSoup(response)

text = [''.join(s.findAll(text=True))for s in soup.findAll('p')]

好的,这段代码应该正是您想要的:

from bs4 import BeautifulSoup
response = requests.get(api_url, params=payload).content
soup = BeautifulSoup(response)

text = [''.join(s.findAll(text=True)).encode("utf-8") for s in soup.findAll('p')]
for x in text:
     print x

*Plenty of sides tried free-flowing, pacey Latin football this summer – even England had 

their moments. A moment. But Argentina stayed functional. They haven’t conceded once in the knockouts, they’ve not been behind in any game, and they don’t mind a lack of respect. Coach Alejandro “The Sloth” Sabella says his side are “sore, beaten and tired after the war [with Holland]. But with work, humility and seriousness, we’ll get there”; Pablo Zabaleta says their strengths are spoiling, staying “compact and tight”, “closing down” and feeding on negativity. “Sometimes, if you have all the people against you, you feel even stronger.”
A series of heroic performances were undone by moments of cold quality – Switzerland, Mexico and Nigeria among those losing to cruel late strikes; and the USA stopped in extra time. But raw passion was at the heart of all the summer’s enduring images: Brazil’s maelstrom; Ivory Coast’s Serey Die in tears during his anthem; Suárez against England; Suárez against Chiellini; and the best squad meltdown for years – Ghana’s trip featuring a fist fight, suspensions, a plane load of cash and an inquiry. FA president Kwesi Nyantakyi: “We will unravel this farce.”
Van Gaal’s goalkeeper subbing move went down well: widely taken as evidence of brave, unsentimental, original thinking (even if Martin O’Neill did it first, in Leicester City’s 1996 play-off final) – and not as evidence of daft, look-at-me risk-taking, which it could have been if Tim Krul had gaffed. But the wider signs for Manchester United were good: a readiness to be flexible on tactics, to switch his back-line formation mid-game, to make space for flair, and to treat the press in a no-nonsense “je lot zijn idioten” way that’ll bring back warm pre-Moyes memories. He had no interest in the third-place play-off, and wasn’t shy to say so.
It’s a biennial revelation. The fundamentals of Germany’s 2002 football reboot are well-known - new academies with German quotas, leading to more German Bundesliga first-teamers at clubs where “50+1” ownership rules stop single entities from taking over. Joachim Löw was installed with a long-term brief, and will lead his team out in the final. England, in the same period, tried four different managers, giving each a smaller talent pool to pick from as the Premier League filled out with foreign owners and foreign players, gorged on its £5.5bn income, and grassroots facilities festered. Still, a Premier B League should fix it.
Bryan Ruiz, not good enough for Fulham’s relegation campaign and shipped out to make way for Kostas Mitroglou, captained Costa Rica into the knockout stage, scoring twice. He starred alongside Joel Campbell, who faces another season on loan from Arsenal. Also making points: Swiss Arsenal reject Johan Djourou; Colombia’s Pablo Armero, a loan flop at West Ham; Algeria pair Rafik Halliche (ex-Fulham) and Carl Medjani (ex-Liverpool); Mexico’s Spurs reject Giovani dos Santos; Germany’s Shkodran Mustafi, given a free by Everton in 2012; and former West Brom and Forest defender Gonzalo Jara, a star for Chile, despite a brutal own-goal/penalty miss double. Even Gervinho looked good.
The surprise on 2014’s top player lists so far: the number of keepers. There’s Tim Howard, whose old high school yearbook photo motto, “It will take a nation of millions to hold me back”, went viral; Costa Rica’s Keylor Navas, now in talks with Bayern Munich; Mexico’s free agent Guillermo Ochoa, whose Gordon Banks moment against Brazil put him in a good bargaining position; Nigeria’s Vincent Enyeama; Germany’s Manuel Neuer; Argentina’s Sergio Romero; and potentially Van Gaal’s strutting mind-gamer Tim Krul, who revelled in his cameo chance. Being a keeper is cool again. Even the ones who play for backwater minnows have their own Head & Shoulders ads.
Pre-tournament, DeAndre Yedlin was a Seattle Sounders homegrown full-back – low on European scouting lists, a known unknown. He’s now the answer to everyone’s full-back needs – his USMNTMVP game against Belgium drawing Roma, Liverpool, Inter, Genoa, Anderlecht and others. Club owner Adrian Hanauer says he doesn’t fancy selling Yedlin, but, on the other hand, “there’s always a number”. Among other talents who weren’t so well-known in inward-looking Premier League circles, where even Monaco’s €45m James Rodríguez counts as a breakthrough act: PSV’s winger Memphis Depay, Lille’s Divock Origi, about to join Liverpool, and Atlético Madrid’s José Giménez, whose buyout clause is on the rise.
Holland’s kicks were clinical against Costa Rica. Then, four days later, two players refused to take one and they lost 4-2. But science says it’s not a lottery. Among the historical World Cup data from analyst Robert O’Connor: the side kicking first win 60% of the time; players aged under 22 score 85% of their kicks, over-22s score 78%; keepers dive low and away from the centre of their net 94% of the time. Overall the ideal taker is young, left-footed, with a “well-established pre-shot routine” and wearing a red shirt. Today’s tailored facts: Argentina have won four out of five of their shootouts, Germany four out of four. One was against Argentina, in 2006.
For all the bad press, when something really bad happened – something disgusting – Fifa didn’t hold back. They fined Argentina £200,000 for breaching press conference regulations – failing to provide a player to give quotes “on three consecutive occasions”. Luis Suárez, meanwhile, was fined £96,000. But Suárez’s four-month ban did represent an unexpectedly heavy hit - a bit “fascist”, reckoned the Uruguay president José Mujica, who called it “an assault on the poor” driven by “Fifa’s bunch of old sons of bitches”. Meanwhile, Pepe was fined £10,000 for a headbutt, Alex Song £13,000 for an elbow chop, and Algeria £32,000 for fans using lasers, while none of the 12 complaints made about racist, homophobic or far-right chants or banners led to any Fifa action.


16 August Arsenal v Crystal Palace
16 August Burnley v Chelsea
16 August Leicester v Everton
16 August Liverpool v Southampton
16 August Man Utd v Swansea
etc...............*

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章