Filtering

A Query can specify a predicate expression which should be used to filter rows considered during field selection for returning rows. The predicate expression also filters the rows that are aggregated across and grouped over (ie. it filters the input rows to the aggregation/grouping operation).

A predicate expression can be one of

  • An application of a comparison operator to a column and a value, or
  • An EXISTS expression, or
  • A conjunction of other expressions, or
  • A disjunction of other expressions, or
  • A negation of another expression

The predicate expression is specified in the predicate field of the Query object.

Comparison Operators

Unary Operators

Unary comparison operators are denoted by expressions with a type field of unary_comparison_operator.

The only supported unary operator currently is is_null, which return true when a column value is null:

{
  "type": "unary_comparison_operator",
  "operator": "is_null",
  "column": {
    "name": "title"
  }
}

Binary Operators

Binary comparison operators are denoted by expressions with a type field of binary_comparison_operator.

The set of available operators depends on the type of the column involved in the expression. The operator property should specify the name of one of the binary operators from the field's scalar type definition.

The type ComparisonValue describes the valid inhabitants of the value field. The value field should be an expression which evaluates to a value whose type is compatible with the definition of the comparison operator.

Equality Operators

This example makes use of an eq operator, which is defined using the equal semantics, to test a single column for equality with a scalar value:

{
  "collection": "articles",
  "arguments": {},
  "query": {
    "fields": {
      "id": {
        "type": "column",
        "column": "id"
      },
      "title": {
        "type": "column",
        "column": "title"
      }
    },
    "predicate": {
      "type": "binary_comparison_operator",
      "column": {
        "type": "column",
        "name": "id"
      },
      "operator": "eq",
      "value": {
        "type": "scalar",
        "value": 1
      }
    }
  },
  "collection_relationships": {}
}

Set Membership Operators

This example uses an in operator, which is defined using the in semantics, to test a single column for membership in a set of values:

{
  "collection": "articles",
  "arguments": {},
  "query": {
    "fields": {
      "id": {
        "type": "column",
        "column": "id"
      },
      "title": {
        "type": "column",
        "column": "title"
      }
    },
    "predicate": {
      "type": "binary_comparison_operator",
      "column": {
        "type": "column",
        "name": "author_id"
      },
      "operator": "in",
      "value": {
        "type": "scalar",
        "value": [1, 2]
      }
    }
  },
  "collection_relationships": {}
}

Custom Operators

This example uses a custom like operator:

{
  "collection": "articles",
  "arguments": {},
  "query": {
    "fields": {
      "id": {
        "type": "column",
        "column": "id"
      },
      "title": {
        "type": "column",
        "column": "title"
      }
    },
    "predicate": {
      "type": "binary_comparison_operator",
      "column": {
        "type": "column",
        "name": "title"
      },
      "operator": "like",
      "value": {
        "type": "scalar",
        "value": "Functional"
      }
    }
  },
  "collection_relationships": {}
}

Nested Array Comparison Operators

If the connector declares support for the query.nested_fields.filter_by.nested_arrays capability, it can receive expressions of type array_comparison. These expressions allow scalar array-specific comparisons against columns that contain an array of scalar values.

There are two supported comparison operators that connectors can declare support for:

  • contains: Whether or not the array contains the specified scalar value. This must be supported for all types that can be contained in an array that implement an 'eq' comparison operator.
    • Capability: query.nested_fields.filter_by.nested_arrays.contains
  • is_empty: Whether or not the array is empty. This must be supported no matter what type is contained in the array.
    • Capability: query.nested_fields.filter_by.nested_arrays.is_empty

This example finds institutions where the nested location.campuses array contains the Lindholmen value:

{
  "collection": "institutions",
  "arguments": {},
  "query": {
    "fields": {
      "id": {
        "type": "column",
        "column": "id"
      },
      "name": {
        "type": "column",
        "column": "name"
      },
      "location": {
        "type": "column",
        "column": "location",
        "fields": {
          "type": "object",
          "fields": {
            "campuses": {
              "type": "column",
              "column": "campuses",
              "arguments": {
                "limit": {
                  "type": "literal",
                  "value": null
                }
              }
            }
          }
        }
      }
    },
    "predicate": {
      "type": "array_comparison",
      "column": {
        "type": "column",
        "name": "location",
        "field_path": ["campuses"]
      },
      "comparison": {
        "type": "contains",
        "value": {
          "type": "scalar",
          "value": "Lindholmen"
        }
      }
    }
  },
  "collection_relationships": {}
}

This example finds countries which have an empty cities array:

{
  "collection": "countries",
  "arguments": {},
  "query": {
    "fields": {
      "id": {
        "type": "column",
        "column": "id"
      },
      "name": {
        "type": "column",
        "column": "name"
      },
      "cities": {
        "type": "column",
        "column": "cities",
        "arguments": {
          "limit": {
            "type": "literal",
            "value": null
          }
        }
      }
    },
    "predicate": {
      "type": "array_comparison",
      "column": {
        "type": "column",
        "name": "cities",
        "arguments": {
          "limit": {
            "type": "literal",
            "value": null
          }
        }
      },
      "comparison": {
        "type": "is_empty"
      }
    }
  },
  "collection_relationships": {}
}

Columns in Operators

Comparison operators compare values. The value on the left hand side of any operator is described by a ComparisonTarget, and the various cases will be explained next.

Referencing a column from the same collection

If the ComparisonTarget has type column, then the name property refers to a column in the current collection. The arguments property allows clients to submit argument values for columns that require arguments.

Referencing nested fields within columns

If the field_path property is empty or not present then the target is the value of the named column.

If field_path is non-empty then it refers to a path to a nested field within the named column

Note: a ComparisonTarget may only have a non-empty field_path if the connector supports capability query.nested_fields.filter_by.

Computing an aggregate

If the ComparisonTarget has type aggregate, then the target is an aggregate computed over a related collection. The relationship is described by the (non-empty) path field, and the aggregate to compute is specified in the aggregate field.

For example, this query finds authors who have written exactly 2 articles:

{
  "collection": "authors",
  "arguments": {},
  "query": {
    "fields": {
      "first_name": {
        "type": "column",
        "column": "first_name"
      },
      "last_name": {
        "type": "column",
        "column": "last_name"
      }
    },
    "predicate": {
      "type": "binary_comparison_operator",
      "column": {
        "type": "aggregate",
        "aggregate": {
          "type": "star_count"
        },
        "path": [
          {
            "arguments": {},
            "relationship": "author_articles"
          }
        ]
      },
      "operator": "eq",
      "value": {
        "type": "scalar",
        "value": 2
      }
    }
  },
  "collection_relationships": {
    "author_articles": {
      "arguments": {},
      "column_mapping": {
        "id": ["author_id"]
      },
      "relationship_type": "array",
      "source_collection_or_type": "author",
      "target_collection": "articles"
    }
  }
}

Note: type aggregate will only be sent if the query.aggregates.filter_by capability is turned on. If that capability is turned on, then the schema response should also contain the capabilities.query.aggregates object. That object should indicate the scalar type used for the result type of count aggregates (star_count and column_count), so that clients can know what comparison operators are valid.

Values in Binary Operators

Binary (including array-valued) operators compare columns to values, but there are several types of valid values:

  • Scalar values, as seen in the examples above, compare the column to a specific value,
  • Variable values compare the column to the current value of a variable,
  • Column values compare the column to another column. The column may be on the same row, or it may be on a related row. Comparing against columns on related rows requires the connector to indicate support via the relationships.relation_comparisons capability.

Referencing a column from a collection in scope

When an expression appears inside one or more exists expressions, there are multiple collections in scope.

If the query.exists.named_scopes capability is enabled then these scopes can be named explicitly when referencing a column in an outer scope. The scope field of the ComparisonValue type can be used to specify the scope of a column reference.

Scopes are named by integers in the following manner:

  • The scope named 0 refers to the current collection,
  • The scope named 1 refers to the collection under consideration outside the immediately-enclosing exists expression.
  • Scopes 2, 3, and so on, refer to the collections considered during the evaluation of expressions outside subsequently enclosing exists expressions.

Therefore, the largest valid scope is the maximum nesting depth of exists expressions, up to the nearest enclosing Query object.

Put another way, we can consider a stack of scopes which grows as we descend into each nested exists expression. Each stack frame contains the collection currently under consideration. The named scopes are then the top-down indices of elements of this stack.

For example, we can express an equality between an author_id column and the id column of the enclosing author object (in scope 1):

{
  "collection": "authors",
  "arguments": {},
  "query": {
    "fields": {
      "first_name": {
        "type": "column",
        "column": "first_name"
      },
      "last_name": {
        "type": "column",
        "column": "last_name"
      }
    },
    "predicate": {
      "type": "exists",
      "in_collection": {
        "type": "unrelated",
        "arguments": {},
        "collection": "articles"
      },
      "predicate": {
        "type": "and",
        "expressions": [
          {
            "type": "binary_comparison_operator",
            "column": {
              "type": "column",
              "name": "author_id"
            },
            "operator": "eq",
            "value": {
              "type": "column",
              "path": [],
              "name": "id",
              "scope": 1
            }
          },
          {
            "type": "binary_comparison_operator",
            "column": {
              "type": "column",
              "name": "title"
            },
            "operator": "like",
            "value": {
              "type": "scalar",
              "value": "Functional"
            }
          }
        ]
      }
    }
  },
  "collection_relationships": {}
}

EXISTS expressions

An EXISTS expression tests whether a row exists in some possibly-related collection, and is denoted by an expression with a type field of exists.

EXISTS expressions can query related or unrelated collections.

Related collections are related to the original collection by a relationship in the collection_relationships field of the top-level QueryRequest.

For example, this query fetches authors who have written articles whose titles contain the string "Functional":

{
  "collection": "authors",
  "arguments": {},
  "query": {
    "fields": {
      "first_name": {
        "type": "column",
        "column": "first_name"
      },
      "last_name": {
        "type": "column",
        "column": "last_name"
      },
      "articles": {
        "type": "relationship",
        "arguments": {},
        "relationship": "author_articles",
        "query": {
          "fields": {
            "id": {
              "type": "column",
              "column": "id"
            },
            "title": {
              "type": "column",
              "column": "title"
            }
          }
        }
      }
    },
    "predicate": {
      "type": "exists",
      "in_collection": {
        "type": "related",
        "arguments": {},
        "relationship": "author_articles"
      },
      "predicate": {
        "type": "binary_comparison_operator",
        "column": {
          "type": "column",
          "name": "title"
        },
        "operator": "like",
        "value": {
          "type": "scalar",
          "value": "Functional"
        }
      }
    }
  },
  "collection_relationships": {
    "author_articles": {
      "arguments": {},
      "column_mapping": {
        "id": ["author_id"]
      },
      "relationship_type": "array",
      "source_collection_or_type": "author",
      "target_collection": "articles"
    }
  }
}

Nested relationships

If the related collection is related from a field inside a nested object, then the field path to the nested object can be first descended through using field_path before the relationship is navigated.

Only connectors that enable the relationships.nested.filtering capability will receive these sorts of queries.

In this example, the relationship joins from the nested location.country_id across to the id column on the countries collection.

{
  "collection": "institutions",
  "arguments": {},
  "query": {
    "fields": {
      "name": {
        "type": "column",
        "column": "name"
      },
      "location": {
        "type": "column",
        "column": "location",
        "fields": {
          "type": "object",
          "fields": {
            "country_id": {
              "type": "column",
              "column": "country_id"
            },
            "country": {
              "type": "relationship",
              "relationship": "location_country",
              "arguments": {},
              "query": {
                "fields": {
                  "id": {
                    "type": "column",
                    "column": "id"
                  },
                  "name": {
                    "type": "column",
                    "column": "name"
                  },
                  "area_km2": {
                    "type": "column",
                    "column": "area_km2"
                  }
                }
              }
            }
          }
        }
      }
    },
    "predicate": {
      "type": "exists",
      "in_collection": {
        "type": "related",
        "field_path": ["location"],
        "relationship": "location_country",
        "arguments": {}
      },
      "predicate": {
        "type": "binary_comparison_operator",
        "column": {
          "type": "column",
          "name": "area_km2"
        },
        "operator": "gt",
        "value": {
          "type": "scalar",
          "value": 300000
        }
      }
    }
  },
  "collection_relationships": {
    "location_country": {
      "arguments": {},
      "column_mapping": {
        "country_id": ["id"]
      },
      "relationship_type": "object",
      "target_collection": "countries"
    }
  }
}

Unrelated Collections

If the query.exists.unrelated capability is enabled, then exists expressions can reference unrelated collections.

Unrelated exists expressions can be useful when using collections with arguments. For example, this query uses the unrelated author_articles collection, providing its arguments via the source row's columns:

{
  "collection": "authors",
  "arguments": {},
  "query": {
    "fields": {
      "id": {
        "type": "column",
        "column": "id"
      }
    },
    "predicate": {
      "type": "exists",
      "in_collection": {
        "type": "unrelated",
        "arguments": {
          "author_id": {
            "type": "column",
            "name": "id"
          }
        },
        "collection": "articles_by_author"
      },
      "predicate": {
        "type": "and",
        "expressions": [
          {
            "type": "binary_comparison_operator",
            "column": {
              "type": "column",
              "name": "title"
            },
            "operator": "like",
            "value": {
              "type": "scalar",
              "value": "Functional"
            }
          }
        ]
      }
    }
  },
  "collection_relationships": {}
}

It can also be useful to reference a column in another scope when using unrelated exists expressions.

Nested Collections

If the query.exists.nested_collections capability is enabled, then exists expressions can reference nested collections.

For example, this query finds institutions which employ at least one staff member whose last name contains the letter s:

{
  "collection": "institutions",
  "arguments": {},
  "query": {
    "fields": {
      "id": {
        "type": "column",
        "column": "id"
      },
      "name": {
        "type": "column",
        "column": "name"
      },
      "staff": {
        "type": "column",
        "column": "staff",
        "arguments": {
          "limit": {
            "type": "literal",
            "value": null
          }
        }
      }
    },
    "predicate": {
      "type": "exists",
      "in_collection": {
        "type": "nested_collection",
        "arguments": {
          "limit": {
            "type": "literal",
            "value": null
          }
        },
        "column_name": "staff"
      },
      "predicate": {
        "type": "binary_comparison_operator",
        "column": {
          "type": "column",
          "name": "last_name"
        },
        "operator": "like",
        "value": {
          "type": "scalar",
          "value": "s"
        }
      }
    }
  },
  "collection_relationships": {}
}

References to columns in another scope may be useful when using these sorts of expressions, in order to refer to columns from the outer (unnested) row.

Nested Scalar Collections

If the query.exists.nested_scalar_collections capability is enabled, then exists expressions can reference columns that contain nested arrays of scalar values. In this case, each element of the nested array is lifted into a virtual row with the element value in a field called __value. This allows predicate applied to the exists to reference the __value column to compare against the scalar element.

For example, if there was a nested array such as [1,2,3], it would be converted into a virtual rows [{"__value": 1}, {"_value": 2}, {"_value": 3}].

For example, this query finds institutions that have at least one campus whose name contains the letter d (campuses are a string array nested inside location):

{
  "collection": "institutions",
  "arguments": {},
  "query": {
    "fields": {
      "id": {
        "type": "column",
        "column": "id"
      },
      "name": {
        "type": "column",
        "column": "name"
      },
      "location": {
        "type": "column",
        "column": "location",
        "fields": {
          "type": "object",
          "fields": {
            "campuses": {
              "type": "column",
              "column": "campuses",
              "arguments": {
                "limit": {
                  "type": "literal",
                  "value": null
                }
              }
            }
          }
        }
      }
    },
    "predicate": {
      "type": "exists",
      "in_collection": {
        "type": "nested_scalar_collection",
        "column_name": "location",
        "field_path": ["campuses"],
        "arguments": {}
      },
      "predicate": {
        "type": "binary_comparison_operator",
        "column": {
          "type": "column",
          "name": "__value"
        },
        "operator": "like",
        "value": {
          "type": "scalar",
          "value": "d"
        }
      }
    }
  },
  "collection_relationships": {}
}

Conjunction of expressions

To express the conjunction of multiple expressions, specify a type field of and, and provide the expressions in the expressions field.

For example, to test if the first_name column is null and the last_name column is also null:

{
  "type": "and",
  "expressions": [
    {
      "type": "unary_comparison_operator",
      "operator": "is_null",
      "column": {
        "name": "first_name"
      }
    },
    {
      "type": "unary_comparison_operator",
      "operator": "is_null",
      "column": {
        "name": "last_name"
      }
    }
  ]
}

Disjunction of expressions

To express the disjunction of multiple expressions, specify a type field of or, and provide the expressions in the expressions field.

For example, to test if the first_name column is null or the last_name column is also null:

{
  "type": "or",
  "expressions": [
    {
      "type": "unary_comparison_operator",
      "operator": "is_null",
      "column": {
        "name": "first_name"
      }
    },
    {
      "type": "unary_comparison_operator",
      "operator": "is_null",
      "column": {
        "name": "last_name"
      }
    }
  ]
}

Negation

To express the negation of an expressions, specify a type field of not, and provide that expression in the expression field.

For example, to test if the first_name column is not null:

{
  "type": "not",
  "expression": {
    "type": "unary_comparison_operator",
    "operator": "is_null",
    "column": {
      "name": "first_name"
    }
  }
}

See also