graphql-engine-1.0.0: GraphQL API over Postgres
Safe HaskellNone
LanguageHaskell2010

Hasura.GraphQL.Execute.Subscription.Plan

Description

Reasonably efficient PostgreSQL live queries

The module implements query multiplexing, which is our implementation strategy for live queries (i.e. GraphQL subscriptions) made against Postgres. Fundamentally, our implementation is built around polling, which is never ideal, but it’s a lot easier to implement than trying to do something event-based. To minimize the resource cost of polling, we use multiplexing, which is essentially a two-tier batching strategy.

The high-level idea

The objective is to minimize the number of concurrent polling workers to reduce database load as much as possible. A very naïve strategy would be to group identical queries together so we only have one poller per unique active subscription. That’s a good start, but of course, in practice, most queries differ slightly. However, it happens that they very frequently /only differ in their variables/ (that is, GraphQL query variables and session variables), and in those cases, we try to generated parameterized SQL. This means that the same prepared SQL query can be reused, just with a different set of variables.

To give a concrete example, consider the following query:

subscription vote_count($post_id: Int!) {
  vote_count(where: {post_id: {_eq: $post_id}}) {
    votes
  }
}

No matter what the client provides for $post_id, we will always generate the same SQL:

SELECT votes FROM vote_count WHERE post_id = $1

If multiple clients subscribe to vote_count, we can certainly reuse the same prepared query. For example, imagine we had 10 concurrent subscribers, each listening on a distinct $post_id:

let postIds = [3, 11, 32, 56, 13, 97, 24, 43, 109, 48]

We could iterate over postIds in Haskell, executing the same prepared query 10 times:

for postIds $ \postId ->
  Q.listQE defaultTxErrorHandler preparedQuery (Identity postId) True

Sadly, that on its own isn’t good enough. The overhead of running each query is large enough that Postgres becomes overwhelmed if we have to serve lots of concurrent subscribers. Therefore, what we want to be able to do is somehow make one query instead of ten.

Multiplexing

This is where multiplexing comes in. By taking advantage of Postgres lateral joins, we can do the iteration in Postgres rather than in Haskell, allowing us to pay the query overhead just once for all ten subscribers. Essentially, lateral joins add map-like functionality to SQL, so we can run our query once per $post_id:

SELECT results.votes
FROM unnest($1::integer[]) query_variables (post_id)
LEFT JOIN LATERAL (
  SELECT coalesce(json_agg(votes), '[]')
  FROM vote_count WHERE vote_count.post_id = query_variables.post_id
) results ON true

If we generalize this approach just a little bit more, we can apply this transformation to arbitrary queries parameterized over arbitrary session and query variables!

Implementation overview

To support query multiplexing, we maintain a tree of the following types, where > should be read as “contains”:

SubscriptionsState > Poller > Cohort > Subscriber

Here’s a brief summary of each type’s role:

  • A Subscriber is an actual client with an open websocket connection.
  • A Cohort is a set of Subscribers that are all subscribed to the same query /with the exact same variables/. (By batching these together, we can do better than multiplexing, since we can just query the data once.)
  • A Poller is a worker thread for a single, multiplexed query. It fetches data for a set of Cohorts that all use the same parameterized query, but have different sets of variables.
  • Finally, the SubscriptionsState is the top-level container that holds all the active Pollers.

Additional details are provided by the documentation for individual bindings.

Synopsis

Documentation

newtype ValidatedVariables f Source #

When running multiplexed queries, we have to be especially careful about user input, since invalid values will cause the query to fail, causing collateral damage for anyone else multiplexed into the same query. Therefore, we pre-validate variables against Postgres by executing a no-op query of the shape

SELECT 'v1'::t1, 'v2'::t2, ..., 'vn'::tn

so if any variable values are invalid, the error will be caught early.

Instances

Instances details
Eq (f TxtEncodedVal) => Eq (ValidatedVariables f) Source # 
Instance details

Defined in Hasura.GraphQL.Execute.Subscription.Plan

Show (f TxtEncodedVal) => Show (ValidatedVariables f) Source # 
Instance details

Defined in Hasura.GraphQL.Execute.Subscription.Plan

Semigroup (f TxtEncodedVal) => Semigroup (ValidatedVariables f) Source # 
Instance details

Defined in Hasura.GraphQL.Execute.Subscription.Plan

Monoid (f TxtEncodedVal) => Monoid (ValidatedVariables f) Source # 
Instance details

Defined in Hasura.GraphQL.Execute.Subscription.Plan

Hashable (f TxtEncodedVal) => Hashable (ValidatedVariables f) Source # 
Instance details

Defined in Hasura.GraphQL.Execute.Subscription.Plan

ToJSON (f TxtEncodedVal) => ToJSON (ValidatedVariables f) Source # 
Instance details

Defined in Hasura.GraphQL.Execute.Subscription.Plan

Methods

toJSON :: ValidatedVariables f -> Value

toEncoding :: ValidatedVariables f -> Encoding

toJSONList :: [ValidatedVariables f] -> Value

toEncodingList :: [ValidatedVariables f] -> Encoding

newtype CohortId Source #

Constructors

CohortId 

Fields

Instances

Instances details
Eq CohortId Source # 
Instance details

Defined in Hasura.GraphQL.Execute.Subscription.Plan

Show CohortId Source # 
Instance details

Defined in Hasura.GraphQL.Execute.Subscription.Plan

Hashable CohortId Source # 
Instance details

Defined in Hasura.GraphQL.Execute.Subscription.Plan

FromJSON CohortId Source # 
Instance details

Defined in Hasura.GraphQL.Execute.Subscription.Plan

Methods

parseJSON :: Value -> Parser CohortId

parseJSONList :: Value -> Parser [CohortId]

ToJSON CohortId Source # 
Instance details

Defined in Hasura.GraphQL.Execute.Subscription.Plan

Methods

toJSON :: CohortId -> Value

toEncoding :: CohortId -> Encoding

toJSONList :: [CohortId] -> Value

toEncodingList :: [CohortId] -> Encoding

FromCol CohortId Source # 
Instance details

Defined in Hasura.GraphQL.Execute.Subscription.Plan

data CohortVariables Source #

Constructors

CohortVariables 

Fields

  • _cvSessionVariables :: !SessionVariables
     
  • _cvQueryVariables :: !ValidatedQueryVariables
     
  • _cvSyntheticVariables :: !ValidatedSyntheticVariables

    To allow more queries to be multiplexed together, we introduce “synthetic” variables for all SQL literals in a query, even if they don’t correspond to any GraphQL variable. For example, the query

    subscription latest_tracks($condition: tracks_bool_exp!) {
      tracks(where: $tracks_bool_exp) {
        id
        title
      }
    }

    might be executed with similar values for $condition, such as {"album_id": {"_eq": "1"}} and {"album_id": {"_eq": "2"}}.

    Normally, we wouldn’t bother parameterizing over the 1 and 2 literals in the resulting query because we can’t cache that query plan (since different $condition values could lead to different SQL). However, for live queries, we can still take advantage of the similarity between the two queries by multiplexing them together, so we replace them with references to synthetic variables.

  • _cvCursorVariables :: !ValidatedCursorVariables

    Cursor variables contain the latest value of the cursor. The value of the cursor variables are updated after every poll. If the value has been changed - see [Streaming subscription polling]. Cursor variables are only used in the case of streaming subscriptions, for live queries it will be empty.

Instances

Instances details
Eq CohortVariables Source # 
Instance details

Defined in Hasura.GraphQL.Execute.Subscription.Plan

Show CohortVariables Source # 
Instance details

Defined in Hasura.GraphQL.Execute.Subscription.Plan

Generic CohortVariables Source # 
Instance details

Defined in Hasura.GraphQL.Execute.Subscription.Plan

Associated Types

type Rep CohortVariables :: Type -> Type #

Hashable CohortVariables Source # 
Instance details

Defined in Hasura.GraphQL.Execute.Subscription.Plan

ToJSON CohortVariables Source # 
Instance details

Defined in Hasura.GraphQL.Execute.Subscription.Plan

Methods

toJSON :: CohortVariables -> Value

toEncoding :: CohortVariables -> Encoding

toJSONList :: [CohortVariables] -> Value

toEncodingList :: [CohortVariables] -> Encoding

type Rep CohortVariables Source # 
Instance details

Defined in Hasura.GraphQL.Execute.Subscription.Plan

type Rep CohortVariables = D1 ('MetaData "CohortVariables" "Hasura.GraphQL.Execute.Subscription.Plan" "graphql-engine-1.0.0-inplace" 'False) (C1 ('MetaCons "CohortVariables" 'PrefixI 'True) ((S1 ('MetaSel ('Just "_cvSessionVariables") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 SessionVariables) :*: S1 ('MetaSel ('Just "_cvQueryVariables") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 ValidatedQueryVariables)) :*: (S1 ('MetaSel ('Just "_cvSyntheticVariables") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 ValidatedSyntheticVariables) :*: S1 ('MetaSel ('Just "_cvCursorVariables") 'NoSourceUnpackedness 'SourceStrict 'DecidedStrict) (Rec0 ValidatedCursorVariables))))

mkCohortVariables :: HashSet SessionVariable -> SessionVariables -> ValidatedQueryVariables -> ValidatedSyntheticVariables -> ValidatedCursorVariables -> CohortVariables Source #

Builds a cohort's variables by only using the session variables that are required for the subscription

data SubscriptionQueryPlan (b :: BackendType) q Source #

A self-contained, ready-to-execute subscription plan. Contains enough information to find an existing poller that this can be added to or to create a new poller if necessary.

Constructors

SubscriptionQueryPlan 

Fields