假设我有这些数据样本:
{
"date": "2019-06-16",
"rank": 150
"name": "doc 1"
}
{
"date": "2019-07-16",
"rank": 100
"name": "doc 1"
}
{
"date": "2019-06-16",
"rank": 50
"name": "doc 2"
}
{
"date": "2019-07-16",
"rank": 80
"name": "doc 2"
}
预期结果是从两个相同名称的不同日期(旧日期 - 新日期)的文档中减去 rank 字段:
{
"name": "doc 1",
"diff_rank": 50
}
{
"name": "doc 2",
"diff_rank": -30
}
并diff_rank
尽可能排序,否则我将在获得结果后手动排序。
我尝试过的是使用date_histogram
,serial_diff
但有些结果diff_rank
以某种方式丢失了我确信数据存在的值:
{
"aggs" : {
"group_by_name": {
"terms": {
"field": "name"
},
"aggs": {
"days": {
"date_histogram": {
"field": "date",
"interval": "day"
},
"aggs": {
"the_rank": {
"sum": {
"field": "rank"
}
},
"diff_rank": {
"serial_diff": {
"buckets_path": "the_rank",
"lag" : 30 // 1 month or 30 days in this case
}
}
}
}
}
}
}
}
非常感谢帮助解决我上面的问题!
最后,我从官方文档中找到了一种使用Filter、Bucket Script Aggregation 和Bucket Sort对结果进行排序的方法。这是最终的片段代码:
{
"size": 0,
"aggs" : {
"group_by_name": {
"terms": {
"field": "name",
"size": 50,
"shard_size": 10000
},
"aggs": {
"last_month_rank": {
"filter": {
"term": {"date": "2019-06-17"}
},
"aggs": {
"rank": {
"sum": {
"field": "rank"
}
}
}
},
"latest_rank": {
"filter": {
"term": {"date": "2019-07-17"}
},
"aggs": {
"rank": {
"sum": {
"field": "rank"
}
}
}
},
"diff_rank": {
"bucket_script": {
"buckets_path": {
"lastMonthRank": "last_month_rank>rank",
"latestRank": "latest_rank>rank"
},
"script": "params.lastMonthRank - params.latestRank"
}
},
"rank_bucket_sort": {
"bucket_sort": {
"sort": [
{"diff_rank": {"order": "desc"}}
],
"size": 50
}
}
}
}
}
}
本文收集自互联网,转载请注明来源。
如有侵权,请联系 [email protected] 删除。
我来说两句