I have an input file that I am trying to build a data base from.
Each line looks like this:
Amy Shchumer, Trainwreck, I Feel Pretty, Snatched, Inside Amy Shchumer
Bill Hader,Inside Out, Trainwreck, Tropic Thunder
And so on.
The first string is an actor\actress, and then movies they played in.
The data isn't sorted and they are some trailing whitespaces.
I would like to create a dictionary that would look like this:
{'Trainwreck': {'Amy Shchumer', 'Bill Hader'}}
The key would be the movie, the values should be the actors in it, unified in a set data type.
def create_db():
my_dict = {}
raw_data = open('database.txt','r+')
for line in raw_data:
lst1 = line.split(",") //to split by the commas
len_row = len(lst1)
lst2 = list(lst1)
for j in range(1,len_row):
my_dict[lst2[j]] = set([lst2[0]])
print(my_dict)
It doesn't work... it doesn't solve the issue that when a key already exists then the actor should be unified in a set with the prev actor
Instead I end up with:
'Trainwreck': {'Amy Shchumer'}, 'Inside Out': {'Bill Hader'}
def create_db():
db = {}
with open("database.txt") as data:
for line in data.readlines():
person, *movies = line.split(",")
for m in movies:
m = m.strip()
db[m] = db.get(m, []) + [person]
return db
Output:
{'Trainwreck': ['Amy Shchumer', 'Bill Hader'],
'I Feel Pretty': ['Amy Shchumer'],
'Snatched': ['Amy Shchumer'],
'Inside Amy Shchumer': ['Amy Shchumer'],
'Inside Out': ['Bill Hader'],
'Tropic Thunder': ['Bill Hader']}
This will loop through the data and assign the first value of each line to person
and the rest to movies
(see here for an example of how *
unpacks tuples). Then for all the movies, it uses .get to check if it’s in the database yet, returning the list if it is and an empty list if it isn’t. Then it adds the new actor to the list.
Another way to do this would be to use a defaultdict:
from collections import defaultdict
def create_db():
db = defaultdict(lambda: [])
with open("database.txt") as data:
for line in data.readlines():
person, *movies = line.split(",")
for m in movies:
db[m.strip()].append(person)
return db
which automatically assigns []
if the key does not exist.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments