我有这个数据框:
df = [{"username": "last",
"time_data": "{\"hours\":[{\"hour\":\"00:00\",\"postCount\":\"5\",\"topicCount\":\"3\",\"totalCount\":80},{\"postCount\":\"20\",\"topicCount\":\"11\",\"name\":\"Marketplace\",\"url\",\"totalCount\":31},{\"postCount\":\"26\",\"topicCount\":\"1\",\"name\":\"Atari 5200\",\"url\",\"totalCount\":27},{\"postCount\":\"9\",\"topicCount\":0,\"name\":\"Atari 8\",\"url\"\"totalCount\":9}"
},
{"username": "truk",
"time_data": "{\"hours\":[{\"hour\":\"00:00\",\"postCount\":\"11\",\"topicCount\":\"6\",\"totalCount\":362},{\"postCount\":\"333\",\"topicCount\":\"22\",\"name\":\"Hardware\",\"url\",\"totalCount\":355},{\"postCount\":\"194\",\"topicCount\":\"8\",\"name\":\"Marketplace\",\"url\",\"totalCount\":202}"
}]
df = pd.DataFrame(df)
df
我已经运行了这段代码:
df_h0 = df.copy()
df_h0['hour']='00:00'
df_h0['totalCount']=df.post_time_data.str.split('"00:00","postCount":"').str[1].str.split('","topic').str[0]
df_h0 = df_h0.fillna(0)
df_h0.head()
但是实际上,我需要在“ totalCount”之后获取数字。我不知道该怎么做,因为还有其他“ totalCount”,而我需要的是“ 00:00”之后的一个。
这是预期的输出:
hour totalCount username
0 00:00 80 last
1 00:00 362 truk
在您的位置,我将调查那些试图模仿json表示形式的字符串的来源。确保相应的字典不能被检索/提取。但是,如果不允许这样做,则可以使用以下Series.str.extract
功能:
In [230]: df_h0['totalCount'] = df['time_data'].str.extract(r'totalCount\":(\d+)')
In [231]: df_h0
Out[231]:
username hour totalCount
0 last 00:00 80
1 truk 00:00 362
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句