I have a pandas dataframe as shown here. There are many more columns in that frame that are not important concerning the task.
id pos value sente
1 a I 21
2 b have 21
3 b a 21
4 a cat 21
5 d ! 21
1 a My 22
2 a cat 22
3 b is 22
4 a cute 22
5 d . 22
I would like to make a list out of certain colums so the first sentence (sente=21) and every other looks something like that. Meaing that every sentence has an unique entry for itself.
`[('I', 'a', '1'), ..., ('!','d','5')]`
I already have a function to do this for one sentence but I can not figure out how to do it for all sentences (sentences that have the same sente value) in the frame.
`class SentenceGetter(object):
def __init__(self, data):
self.n_sent = 1
self.data = data
self.empty = False
def get_next(self):
for t in self.data:
try:
s = self.data[(self.data["sente"] == 21)]
self.n_sent += 1
return
s["id"].values.tolist(),
s["pos"].values.tolist(),
s["value"].values.tolist()
except:
self.empty = True
return None,None,None
foo = SentenceGetter(df)
sent, pos, token = foo.get_next()
in = zip(token, pos, sent)
`
As my frame is very large there is no way to use constructions like this:
df.loc[((df["sente"] == df["sente"].shift(-1)) & (df["sente"] == df["sente"].shift(+1))), ["pos","value","id"]]
Any ideas?
If you are open to using the standard library, collections.defaultdict
offers an O(n) solution:
from collections import defaultdict
d = defaultdict(list)
for _, num, *data in df[['sente', 'value', 'pos', 'id']].itertuples():
d[num].append(data)
Result:
defaultdict(list,
{21: [('I', 'a', 1),
('have', 'b', 2),
('a', 'b', 3),
('cat', 'a', 4),
('!', 'd', 5)],
22: [('My', 'a', 1),
('cat', 'a', 2),
('is', 'b', 3),
('cute', 'a', 4),
('.', 'd', 5)]})
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments