Grouping
If a connector supports aggregates, it may also support grouping data and then aggregating data in those groups. This ability is tracked by the query.aggregates.group_by
capability.
Grouping is requested in the query API alongside fields and aggregates, in the groups
field of the Query
object.
A grouping operation specifies one or more dimensions along which to partition the row set. Each dimension selects a column from which to draw values (see Dimension::Column
). For each group, every row should have equal values in each of those dimension columns.
If the dimension's column's schema defines arguments, then the arguments
property is used to provide values for those arguments.
In addition, a grouping operation specifies aggregates which should be computed and returned for each group separately.
Dimensions
Dimension columns can be:
- A column
- A object-nested column
- A column across an object relationship
- A column across an object-nested object relationship
A key property is that nested arrays or nested relationships cannot be traversed from the rows being grouped over when selecting a dimension column. Only nested objects or object relationships can be traversed.
Extraction Functions and Complex Dimensions
We can also group by components of scalar types using extraction functions.
In order to apply an extraction function to the value of a dimension, the Dimension
should specify an extraction
property, which is the name of the extraction function to apply.
For example, this query groups articles by the year component of their published date:
{
"collection": "articles",
"arguments": {},
"query": {
"groups": {
"aggregates": {
"count": {
"type": "star_count"
}
},
"dimensions": [
{
"type": "column",
"column_name": "published_date",
"path": [],
"extraction": "year"
}
]
}
},
"collection_relationships": {}
}
Filtering
Grouping operations have two types of filtering:
- The initial row set can be filtered before the grouping operation, using the
predicate
field of theQuery
object as usual, and - The groups themselves can be filtered after the grouping operation, using the
predicate
field of theGrouping
object. This is controlled by thequery.aggregates.group_by.filter
capability.
Unlike regular predicates on rows, group predicates are not allowed to compare columns, but must instead compare values of aggregates over the group. For example, we can filter groups by comparing a count of rows in the group, but not by comparing values in individual rows.
Ordering
As with filtering, group operations support two types of ordering:
- The initial row set can be ordered before the grouping operation, using the
order_by
field of theQuery
object as usual, and - The groups themselves can be ordered after the grouping operation, using the
order_by
field of theGrouping
object. This is controlled by thequery.aggregates.group_by.order
capability.
Group sort orders are restricted to comparing aggregate values, similar to filtering. For example, we can order groups by a count, but not by the value of individual rows. However, we can also choose to sort by the selected grouping dimensions.
Pagination
Pagination can also be applied both before and after grouping:
- The initial row set can be paginated before the grouping operation, using the
limit
andoffset
fields of theQuery
object as usual, and - The groups themselves can be paginated after the grouping operation, using the
limit
andoffset
fields of theGrouping
object. This is controlled by thequery.aggregates.group_by.paginate
capability.
Examples
This example partitions the articles
collection by author_id
, and then returns the row count for each group. That is, it computes the number of articles written by each author:
{
"collection": "articles",
"arguments": {},
"query": {
"groups": {
"aggregates": {
"article_count": {
"type": "star_count"
}
},
"dimensions": [
{
"type": "column",
"column_name": "author_id",
"path": []
}
]
}
},
"collection_relationships": {}
}
Filtering examples
This example applies a predicate to the rows before grouping:
{
"collection": "articles",
"arguments": {},
"query": {
"groups": {
"aggregates": {
"min_id": {
"type": "single_column",
"column": "id",
"function": "min"
},
"max_id": {
"type": "single_column",
"column": "id",
"function": "max"
}
},
"dimensions": [
{
"type": "column",
"column_name": "author_id",
"path": []
}
]
},
"predicate": {
"type": "binary_comparison_operator",
"column": {
"type": "column",
"name": "author_id",
"path": []
},
"operator": "eq",
"value": {
"type": "scalar",
"value": 1
}
}
},
"collection_relationships": {}
}
This example applies a predicate to the groups themselves, after grouping. It computes some aggregates for author groups which have exactly two articles:
{
"collection": "articles",
"arguments": {},
"query": {
"groups": {
"aggregates": {
"min_id": {
"type": "single_column",
"column": "id",
"function": "min"
},
"max_id": {
"type": "single_column",
"column": "id",
"function": "max"
}
},
"dimensions": [
{
"type": "column",
"column_name": "author_id",
"path": []
}
],
"predicate": {
"type": "binary_comparison_operator",
"target": {
"type": "aggregate",
"aggregate": {
"type": "star_count"
}
},
"operator": "eq",
"value": {
"type": "scalar",
"value": 2
}
}
}
},
"collection_relationships": {}
}
Ordering and pagination
This example computes the article count for the author with the most articles, by ordering the groups by article count, and then using pagination to select the first group:
{
"collection": "articles",
"arguments": {},
"query": {
"groups": {
"aggregates": {
"article_count": {
"type": "star_count"
}
},
"dimensions": [
{
"type": "column",
"column_name": "author_id",
"path": []
}
],
"limit": 1,
"offset": 0,
"order_by": {
"elements": [
{
"order_direction": "desc",
"target": {
"type": "aggregate",
"aggregate": {
"type": "star_count"
},
"path": []
}
}
]
}
}
},
"collection_relationships": {}
}
This example sorts the groups by the values of their dimensions. It groups articles by their author_id
, and then sorts the groups by that author_id
dimension, descending:
{
"collection": "articles",
"arguments": {},
"query": {
"groups": {
"aggregates": {
"article_count": {
"type": "star_count"
}
},
"dimensions": [
{
"type": "column",
"column_name": "author_id",
"path": []
}
],
"order_by": {
"elements": [
{
"order_direction": "desc",
"target": {
"type": "dimension",
"index": 0,
"path": []
}
}
]
}
}
},
"collection_relationships": {}
}
Requirements
- If the
Query
object specifies thegroups
field, then each correpondingRowSet
object must contain a non-nullgroups
field. - Each returned
Group
object must contain values for each requested dimension, in the order in which they were requested: - Each returned
Group
object must contain values for each requested aggregate, using the same key as used to request it:- Aggregates should be computed over the rows in each group in turn.