Module tantivy::aggregation
source · [−]Expand description
Aggregations
An aggregation summarizes your data as statistics on buckets or metrics.
Aggregations can provide answer to questions like:
- What is the average price of all sold articles?
- How many errors with status code 500 do we have per day?
- What is the average listing price of cars grouped by color?
There are two categories: Metrics and Buckets.
Prerequisite
Currently aggregations work only on fast fields. Single value fast fields
of type u64
, f64
, i64
, date
and fast fields on text fields.
Usage
To use aggregations, build an aggregation request by constructing
Aggregations
.
Create an AggregationCollector
from this request. AggregationCollector
implements the
Collector
trait and can be passed as collector into
Searcher::search()
.
JSON Format
Aggregations request and result structures de/serialize into elasticsearch compatible JSON.
let agg_req: Aggregations = serde_json::from_str(json_request_string).unwrap();
let collector = AggregationCollector::from_aggs(agg_req, None);
let searcher = reader.searcher();
let agg_res = searcher.search(&term_query, &collector).unwrap_err();
let json_response_string: String = &serde_json::to_string(&agg_res)?;
Supported Aggregations
Example
Compute the average metric, by building agg_req::Aggregations
, which is built from an
(String, agg_req::Aggregation)
iterator.
use tantivy::aggregation::agg_req::{Aggregations, Aggregation, MetricAggregation};
use tantivy::aggregation::AggregationCollector;
use tantivy::aggregation::metric::AverageAggregation;
use tantivy::query::AllQuery;
use tantivy::aggregation::agg_result::AggregationResults;
use tantivy::IndexReader;
use tantivy::schema::Schema;
fn aggregate_on_index(reader: &IndexReader, schema: Schema) {
let agg_req: Aggregations = vec![
(
"average".to_string(),
Aggregation::Metric(MetricAggregation::Average(
AverageAggregation::from_field_name("score".to_string()),
)),
),
]
.into_iter()
.collect();
let collector = AggregationCollector::from_aggs(agg_req, None, schema);
let searcher = reader.searcher();
let agg_res: AggregationResults = searcher.search(&AllQuery, &collector).unwrap();
}
Example JSON
Requests are compatible with the elasticsearch json request format.
use tantivy::aggregation::agg_req::Aggregations;
let elasticsearch_compatible_json_req = r#"
{
"average": {
"avg": { "field": "score" }
},
"range": {
"range": {
"field": "score",
"ranges": [
{ "to": 3.0 },
{ "from": 3.0, "to": 7.0 },
{ "from": 7.0, "to": 20.0 },
{ "from": 20.0 }
]
},
"aggs": {
"average_in_range": { "avg": { "field": "score" } }
}
}
}
"#;
let agg_req: Aggregations =
serde_json::from_str(elasticsearch_compatible_json_req).unwrap();
Code Organization
Check the README on github to see how the code is organized.
Nested Aggregation
Buckets can contain sub-aggregations. In this example we create buckets with the range aggregation and then calculate the average on each bucket.
use tantivy::aggregation::agg_req::*;
use tantivy::aggregation::metric::AverageAggregation;
use tantivy::aggregation::bucket::RangeAggregation;
let sub_agg_req_1: Aggregations = vec![(
"average_in_range".to_string(),
Aggregation::Metric(MetricAggregation::Average(
AverageAggregation::from_field_name("score".to_string()),
)),
)]
.into_iter()
.collect();
let agg_req_1: Aggregations = vec![
(
"range".to_string(),
Aggregation::Bucket(BucketAggregation {
bucket_agg: BucketAggregationType::Range(RangeAggregation{
field: "score".to_string(),
ranges: vec![(3f64..7f64).into(), (7f64..20f64).into()],
keyed: false,
}),
sub_aggregation: sub_agg_req_1.clone(),
}),
),
]
.into_iter()
.collect();
Distributed Aggregation
When the data is distributed on different Index
instances, the
DistributedAggregationCollector
provides functionality to merge data between independent
search calls by returning
IntermediateAggregationResults
.
IntermediateAggregationResults
provides the
merge_fruits
method
to merge multiple results. The merged result can then be converted into
AggregationResults
via the
into_final_bucket_result
method.
Modules
AggregationCollector
.into()
method from IntermediateAggregationResults
.
This conversion computes the final result. For example: The intermediate result contains
intermediate average results, which is the sum and the number of values. The actual average is
calculated on the step from intermediate to final aggregation result tree.Structs
AggregationSegmentCollector
does the aggregation collection on a segment.Enums
Constants
Type Definitions
HashMap
.