我正在通过一些网页搜索关键字。再次感谢@Abdou 帮助我解决静默错误处理!我给你举个例子:
# this is basically what I do
import pandas as pd
import requests
data = [{"URLs" : "https://www.mercedes-benz.de", "electric" : 1},
{"URLs" : "https://www.audi.de", "electric" : 0},
{"URLs" : "https://ww.audo.de", "electric" : 0},
{"URLs" : "NaN", "electric" : 0}]
def contains_keywords(link, keywords):
try:
output = requests.get(link).text
return int(any(x in output for x in keywords))
except:
return "Wrong/Missing URL"
df = pd.DataFrame(data)
mykeywords = ('car', 'vehicle', 'automobile')
df['extra_column'] = df.URLs.apply(lambda l: contains_keywords(l, mykeywords))
如您所见,我请求存储在 .xml 中的 URLdf.data
并从中搜索关键字mykeywords
并将二进制结果存储在extra_column
. 该脚本基本上产生以下内容:
# URLs electric extra_column
# 0 https://www.mercedes-benz.de 1 1
# 1 https://www.audi.de 0 1
# 2 https://ww.audo.e 0 0
# 3 NaN 0 Wrong/Missing URL
到目前为止,我只知道,如果我找到一个关键字。但我想找出我找到了哪些关键字 - 无需单独运行contains_keywords()
每个关键字mykeywords
。有没有办法为每个关键字创建一个新列并将结果(1
=关键字找到)存储在DataFrame
? 那就是:我需要df
为每个关键字添加额外的列。
import pandas as pd
import requests
data = [{"URLs" : "https://www.mercedes-benz.de", "electric" : 1},
{"URLs" : "https://www.audi.de", "electric" : 0},
{"URLs" : "https://ww.audo.de", "electric" : 0},
{"URLs" : "NaN", "electric" : 0}]
def contains_keywords(link, keyword):
try:
output = requests.get(link).text
return int(keyword in output)
except:
return "Wrong/Missing URL"
df = pd.DataFrame(data)
mykeywords = ('car', 'vehicle', 'automobile')
for keyword in mykeywords:
df[keyword] = df.URLs.apply(lambda l: contains_keywords(l, keyword))
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句