从ElasticSearch结果创建DataFrame

贾斯汀S：

我正在尝试使用对ElasticSearch的非常基本的查询结果在pandas中构建DataFrame。我正在获取所需的数据，但这只是将结果切片成一种方式来构建适当的数据框架。我真的只在乎获取每个结果的时间戳和路径。我尝试了几种不同的es.search模式。

码：

from datetime import datetime
from elasticsearch import Elasticsearch
from pandas import DataFrame, Series
import pandas as pd
import matplotlib.pyplot as plt
es = Elasticsearch(host="192.168.121.252")
res = es.search(index="_all", doc_type='logs', body={"query": {"match_all": {}}}, size=2, fields=('path','@timestamp'))

这给出了4个数据块。[u'hits'，u'_shards'，u'took'，u'timed_out']。我的结果在命中。

res['hits']['hits']
Out[47]: 
[{u'_id': u'a1XHMhdHQB2uV7oq6dUldg',
  u'_index': u'logstash-2014.08.07',
  u'_score': 1.0,
  u'_type': u'logs',
  u'fields': {u'@timestamp': u'2014-08-07T12:36:00.086Z',
   u'path': u'app2.log'}},
 {u'_id': u'TcBvro_1QMqF4ORC-XlAPQ',
  u'_index': u'logstash-2014.08.07',
  u'_score': 1.0,
  u'_type': u'logs',
  u'fields': {u'@timestamp': u'2014-08-07T12:36:00.200Z',
   u'path': u'app1.log'}}]

我唯一关心的是获取时间戳和每次点击的路径。

res['hits']['hits'][0]['fields']
Out[48]: 
{u'@timestamp': u'2014-08-07T12:36:00.086Z',
 u'path': u'app1.log'}

我无法终生想出谁能将这个结果放入熊猫的数据框中。因此，对于我返回的2个结果，我希望有一个类似的数据框。

   timestamp                   path
0  2014-08-07T12:36:00.086Z    app1.log
1  2014-08-07T12:36:00.200Z    app2.log

CT Zhu :

有一个很好的玩具pd.DataFrame.from_dict，可以在以下情况下使用：

In [34]:

Data = [{u'_id': u'a1XHMhdHQB2uV7oq6dUldg',
      u'_index': u'logstash-2014.08.07',
      u'_score': 1.0,
      u'_type': u'logs',
      u'fields': {u'@timestamp': u'2014-08-07T12:36:00.086Z',
       u'path': u'app2.log'}},
     {u'_id': u'TcBvro_1QMqF4ORC-XlAPQ',
      u'_index': u'logstash-2014.08.07',
      u'_score': 1.0,
      u'_type': u'logs',
      u'fields': {u'@timestamp': u'2014-08-07T12:36:00.200Z',
       u'path': u'app1.log'}}]
In [35]:

df = pd.concat(map(pd.DataFrame.from_dict, Data), axis=1)['fields'].T
In [36]:

print df.reset_index(drop=True)
                 @timestamp      path
0  2014-08-07T12:36:00.086Z  app2.log
1  2014-08-07T12:36:00.200Z  app1.log

分为四个步骤显示：

1，将列表中的每个项目（是dictionary）读入DataFrame

2，我们可以将列表中的所有项目DataFrame按concat行顺序放大，因为我们将对每个项目执行步骤1，因此可以使用map它。

3，然后我们访问标记为 'fields'

4，我们可能想旋转DataFrame90度（转置），并且reset_index如果我们希望索引为默认int序列。

在此处输入图片说明

本文收集自互联网，转载请注明来源。

如有侵权，请联系 [email protected] 删除。

编辑于 2020-08-8

我来说两句

0 条评论

登录后参与评论

从ElasticSearch结果创建DataFrame

从ElasticSearch结果创建DataFrame

Linux的官方Adobe Flash存储库是否已过时？

用日期数据透视表和日期顺序查询

应用发明者仅从列表中选择一个随机项一次

Java Eclipse中的错误13，如何解决？

在Windows 7中无法删除文件（2）

在 Python 2.7 中。如何从文件中读取特定文本并分配给变量

套接字无法检测到断开连接

带有错误“ where”条件的查询如何返回结果？

有什么解决方案可以将android设备用作Cast Receiver？

Mac OS X更新后的GRUB 2问题

ggplot：对齐多个分面图-所有大小不同的分面

验证REST API参数

如何从视图一次更新多行（ASP.NET - Core）

尝试反复更改屏幕上按钮的位置 - kotlin android studio

计算数据帧中每行的NA

检索角度选择div的当前值

离子动态工具栏背景色

UITableView的项目向下滚动后更改颜色，然后快速备份

VB.net将2条特定行导出到DataGridView

蓝屏死机没有修复解决方案

通过 Git 在运行 Jenkins 作业时获取 ClassNotFoundException