the goal for this project is to webscrape data from a table and put the results into an sqllite database however i don't know if the current way i'm trying to go about it is possible. currently the data is stored in a list seperated by each row on the table, the only issue is trying to insert it into the database. The error I find with this code is an incomplete input for the sql insert. i've tried searching online for a solution but nothing so far has helped, it will either cause this issue or a list index out of range.
from bs4 import BeautifulSoup
import requests
import sqlite3
headers = {'user-agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0"}
url = "https://en.wikipedia.org/wiki/Comparison_of_computer_viruses"
r = requests.get(url,headers=headers)
soup = BeautifulSoup(r.content, "html.parser")
table = soup.find_all('table')[1]
rows = table.find_all('tr')
row_list= list()
for tr in rows:
td = tr.find_all('td')
row = [i.text for i in td]
row_list.append(row)
print(row_list)
print(row_list[1][1])
maldb = sqlite3.connect("maldb")
cursor = maldb.cursor()
cursor.execute('''drop table if exists mal''')
cursor.execute('''create table mal
(virus text primary key,
alias text,
typeof text,
subtype text,
isolation_date text,
isolation text,
origin text,
author text,
notes text)
''')
for z in range(1,95):
cursor.execute('''INSERT into mal ('?','?','?','?','?','?','?','?','?')''',(row_list[z][0],row_list[z][1],row_list[z][2],row_list[z][3],row_list[z][4],row_list[z][5],row_list[z][6],row_list[z][7],row_list[z][8]))
maldb.commit()
maldb.close()
There are different problems here. The first one is that your syntax is incorrect: you shall not quote the ?
characters, so at least your query should be:
cursor.execute('''INSERT into mal (?,?,?,?,?,?,?,?,?)''',(row_list[z][0],row_list[z][1],row_list[z][2],row_list[z][3],row_list[z][4],row_list[z][5],row_list[z][6],row_list[z][7],row_list[z][8]))
Worse, some rows have not 9 items (first has none, and another one has only 8), so you should check for that. Finally, it is better to use executemany
that looping over execute
, because the query is only compiled once. So I would suggest:
cursor.executemany('''INSERT into mal values(?,?,?,?,?,?,?,?,?)''',
[row for row in row_list if len(row) == 9])
Finally, you should not use a PRIMARY KEY
attribute for the virus
column, because the list actually contains a duplicate for 'Jerusalem\n'
while a primary key has to be unique.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments