Getting the total count of all the documents in buckets

V V

When I'm searching by the following aggregation:

"aggregations": {
"codes": {
  "terms": {
    "field": "code"
  },
  "aggs": {
    "dates": {
      "date_range": {
        "field": "created_time",
        "ranges": [
          {
            "from": "2017-12-06T00:00:00.000",
            "to": "2017-12-06T16:00:00.000"
          },
          {
            "from": "2017-12-07T00:00:00.000",
            "to": "2017-12-07T23:59:59.999"
          }
        ]
      }
    }
  }
}
}

I get the following result:

"aggregations": {
"codes": {
  "buckets": [
    {
      "key": "123456",
      "doc_count": 104005499,
      "dates": {
        "buckets": [
          {
            "key": "2017-12-05T20:00:00.000Z-2017-12-06T12:00:00.000Z",
            "from_as_string": "2017-12-05T20:00:00.000Z",
            "to_as_string": "2017-12-06T12:00:00.000Z",
            "doc_count": 156643
          },
          {
            "key": "2017-12-06T20:00:00.000Z-2017-12-07T19:59:59.999Z",
            "from_as_string": "2017-12-06T20:00:00.000Z",
            "to_as_string": "2017-12-07T19:59:59.999Z",
            "doc_count": 11874
          }
        ]
      }
    },
    ...
  ]
 }
}

So now I have a list of buckets of buckets. I need to have a total count value for each bucket, which is the sum of doc_counts of the buckets inside. For example, the total count for my first bucket should be 156643 + 11874 = 168517. I've tried using Sub Bucket aggregation, but

 "totalcount": {
      "sum_bucket": {
        "buckets_path": "dates"
      }
    }

this is not going to work, because "buckets_path must reference either a number value or a single value numeric metric aggregation, got: org.elasticsearch.search.aggregations.bucket.range.date.InternalDateRange.Bucket". Any ideas how should I do this?

Nikolay Vasiliev

Looks like this is a known problem. There's a discussion on Elastic forum, where I have found a hack to solve it (thanks to Ruslan_Didyk, the author, btw):

POST my_aggs/my_doc/_search
{
  "size": 0,
  "aggregations": {
    "codes": {
      "terms": {
        "field": "code"
      },
      "aggs": {
        "dates": {
          "date_range": {
            "field": "created_time",
            "ranges": [
              {
                "from": "2017-12-06T00:00:00.000",
                "to": "2017-12-06T16:00:00.000"
              },
              {
                "from": "2017-12-07T00:00:00.000",
                "to": "2017-12-07T23:59:59.999"
              }
            ]
          },
          "aggs": {
            "my_cnt": {
              "value_count": {
                "field": "created_time"
              }
            }
          }
        },
        "totalcount": {
          "stats_bucket": {
            "buckets_path": "dates>my_cnt"
          }
        }
      }
    }
  }
}

The reason why you can't make only totalcount is because date_range implicitly creates sub-buckets and pipeline aggregations cannot handle it (I would say it is a bug of Elasticsearch).

So the hack is to add another sub-aggregation to dates: my_cnt that just counts the amount of documents in the bucket. (Note that I used value_count aggregation on created_time field assuming it's present in all documents and has only one value.)

Given the set of documents like this:

{"code":"1234","created_time":"2017-12-06T01:00:00"}
{"code":"1234","created_time":"2017-12-06T17:00:00"}
{"code":"1234","created_time":"2017-12-07T01:00:00"}
{"code":"1234","created_time":"2017-12-06T02:00:00"}
{"code":"1235","created_time":"2017-12-07T18:00:00"}
{"code":"1234","created_time":"2017-12-07T18:00:00"}

The result of the aggregation will be:

  "aggregations": {
    "codes": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "1234",
          "doc_count": 5,
          "dates": {
            "buckets": [
              {
                "key": "2017-12-06T00:00:00.000Z-2017-12-06T16:00:00.000Z",
                "from": 1512518400000,
                "from_as_string": "2017-12-06T00:00:00.000Z",
                "to": 1512576000000,
                "to_as_string": "2017-12-06T16:00:00.000Z",
                "doc_count": 2,
                "my_cnt": {
                  "value": 2
                }
              },
              {
                "key": "2017-12-07T00:00:00.000Z-2017-12-07T23:59:59.999Z",
                "from": 1512604800000,
                "from_as_string": "2017-12-07T00:00:00.000Z",
                "to": 1512691199999,
                "to_as_string": "2017-12-07T23:59:59.999Z",
                "doc_count": 2,
                "my_cnt": {
                  "value": 2
                }
              }
            ]
          },
          "totalcount": {
            "count": 2,
            "min": 2,
            "max": 2,
            "avg": 2,
            "sum": 4
          }
        },
        {
          "key": "1235",
          "doc_count": 1,
          "dates": {
            "buckets": [
              {
                "key": "2017-12-06T00:00:00.000Z-2017-12-06T16:00:00.000Z",
                "from": 1512518400000,
                "from_as_string": "2017-12-06T00:00:00.000Z",
                "to": 1512576000000,
                "to_as_string": "2017-12-06T16:00:00.000Z",
                "doc_count": 0,
                "my_cnt": {
                  "value": 0
                }
              },
              {
                "key": "2017-12-07T00:00:00.000Z-2017-12-07T23:59:59.999Z",
                "from": 1512604800000,
                "from_as_string": "2017-12-07T00:00:00.000Z",
                "to": 1512691199999,
                "to_as_string": "2017-12-07T23:59:59.999Z",
                "doc_count": 1,
                "my_cnt": {
                  "value": 1
                }
              }
            ]
          },
          "totalcount": {
            "count": 1,
            "min": 1,
            "max": 1,
            "avg": 1,
            "sum": 1
          }
        }
      ]
    }
  }

The desired value is under totalcount.sum.

Some considerations

As I already said, this is valid only if the assumption that created_time is always present and is exactly one holds. If in a different situation the field under date_range aggregation would have several values (e.g. update_time to indicate all updates of a document), then sum will no longer be equal to the actual number of the matched documents (if these dates overlap).

In this case you can always go with filter aggregation with a range query inside.

Hope that helps!

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

Mongo: count total documents across all groups

Getting count of total documents when using aggregation along with skip and limit

Count in buckets (Total by Row, aka Tabulate)

Why i am getting the total count as ZERO even after importing 240 documents succesfully?

Elastic Search, limit results size by buckets count instead of documents

FireStore Swift 4 : How to get total count of all the documents inside a collection, and get details of each document?

Getting total count from the string

How to count the total documents in phalcon mongodb collection

Getting the total documents that is relevant to a value in MongoDB

Getting the total value in all the rows

Is it possible for total doc_count of aggregation buckets to be greater than the total hits value?

GET total COUNT of all the count in SQL Server

Group by with count along with total count in all the rows

get a count of users in all OUs and total the count

Get the names and count of all buckets in AWS S3

Getting the count of documents within a document in mongodb

DAX Running Total with Buckets

getting the total count of search results in wordpress

Getting the total count of a relationship tables results

MongoDB Count total number of true and false values for documents matching a query

Get a count of total documents with MongoDB when using limit

elasticsearch get documents total count in index using php API

Querying with `skip` and `limit`: return the paginated documents, and Total count

MongoDB: Count total number of true and false values for documents matching a query

How to get a total count of S3 Buckets with Python Boto3

count array occurrences across all documents with mongo

How to get all documents and its count with mongoose

CouchDB count all documents and return a number?

Getting all documents from one collection in Firestore

TOP Ranking

HotTag

Archive