python从elasticsearch结果创建数据框

岛间

我有来自Elasticsearch的查询结果,格式如下:

[

{
    "_index": "product",
    "_type": "_doc",
    "_id": "23234sdf",
    "_score": 2.2295187,
    "_source": {
        "SERP_KEY": "",
        "r_variant_info": "",
        "s_asin": "",
        "pid": "394",
        "r_gtin": "00838128000547",        
        "additional_attributes_remarks": "publisher:0|size:0",            
        "s_gtin": "",            
        "r_category": "",
        "confidence_score": "2.4545",      
        "title_match": "45.45"
    }
},
{
    "_index": "product",
    "_type": "_doc",
    "_id": "23234sdf",
    "_score": 2.2295187,
    "_source": {
        "SERP_KEY": "",
        "r_variant_info": "",
        "s_asin": "",
        "pid": "394",
        "r_gtin": "00838128000547",        
        "additional_attributes_remarks": "publisher:0|size:0",            
        "s_gtin": "",            
        "r_category": "",
        "confidence_score": "2.4545",      
        "title_match": "45.45"
    }
},

]

我正在尝试将_source字段_id加载到数据帧中。

我尝试了这个:

def fetch_records_from_elasticsearch_index(index, filter_json):
    search_param = prepare_es_body(filter_json_dict=filter_json)
    response = settings.ES.search(index=index, body=search_param, size=10)

    if len(response['hits']['hits']) > 0:
        import pandas as pd

        all_hits = response['hits']['hits']
        # return all_hits
        # export es hits to pandas dataframe
        df = pd.concat(map(pd.DataFrame.from_dict, all_hits), axis=1)['_source'].T

        return df
    else:
        return 0

df_source包含字段,但我也想向其添加_id字段。

这是df输出格式:

{

"AdminEdit": [
    "False",
    "False",
    "False",
    "False",        
],
"Group": [
    "Grp2",
    "Grp2",
    "Grp2",
    "Grp2"       
],

}

如何添加_id呢?

比什沃·阿迪卡里(Bishwo Adhikari)

有两种方法可以解决此问题:

  1. 直接代码

    import pandas as pd
    df = pd.json_normalize(all_hits)
    
  2. 改进您的代码

    import json
    import pandas as pd
    df = pd.concat(map(pd.DataFrame.from_dict, all_hits), axis=1)['_source'].T
    df["_id"] = [i["_id"] for i in all_hits]
    

使用的JSON是:

all_hits = [

{
    "_index": "product",
    "_type": "_doc",
    "_id": "23234sdg",
    "_score": 2.2295187,
    "_source": {
        "SERP_KEY": "",
        "r_variant_info": "",
        "s_asin": "",
        "pid": "394",
        "r_gtin": "00838128000547",        
        "additional_attributes_remarks": "publisher:0|size:0",            
        "s_gtin": "",            
        "r_category": "",
        "confidence_score": "2.4545",      
        "title_match": "45.45"
    }
},
{
    "_index": "product",
    "_type": "_doc",
    "_id": "23234sdf",
    "_score": 2.2295187,
    "_source": {
        "SERP_KEY": "",
        "r_variant_info": "",
        "s_asin": "",
        "pid": "394",
        "r_gtin": "00838128000547",        
        "additional_attributes_remarks": "publisher:0|size:0",            
        "s_gtin": "",            
        "r_category": "",
        "confidence_score": "2.4545",      
        "title_match": "45.45"
    }
},

]

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章