Implementing Local-First with GraphQL and RxDB
Scaling ProWorkflow with local-first and reactive databases
For the past couple years, we have been working on the new version of ProWorkflow, codenamed Nexus, which has lead to building lots of features and us moving really fast. Our initial setup with GraphQL and Apollo Client worked well in the beginning, but as the app got more complex it became more of a headache to manage data, especially as we started introducing real-time updates into the app.
For those unfamiliar, ProWorkflow is an all-in-one project management app that helps companies track projects, manage time and project financials, and has powerful forecasting tools. Since it’s often kept open throughout the workday, speed and real-time updates are essential. App data remains fairly consistent, we are usually dealing with active data which tends to trend downwards.
Our main challenge has been retrieving the data we need quickly and efficiently while ensuring it remains up to date in real-time. The current frontend uses React with Apollo Client for state management and querying, backed by a GraphQL API and GraphQL Subscriptions.
How Data was Queried and State Managed
If you’ve used Apollo Client before, our original approach will look familiar: write a GraphQL query, pass it to the useQuery()
hook, and get back data, a loading state, and automatic updates from the Apollo Cache.
Here is a very simple example for fetching an Item in ProWorkflow.
// This 'graphql' method types the query for us
const FETCH_ITEM = graphql(`
query fetchItem($input: ItemFilter) {
items(input: $input) {
id
name
other fields...
}
}
`)
const { data, loading } = useQuery(FETCH_ITEM, {
variables: { input: { where: { id: { EQ: itemId } } } },
fetchPolicy: 'cache-and-network',
});
This returns the data for a single item with ID itemId
, including only the fields requested in the query. The fetchPolicy: 'cache-and-network'
flag tells Apollo to return cached results instantly, while also re-running the query in the background and merging it into the cache. This gives us instant access and ensures up-to-date data.
You can also use fetchPolicy: 'cache-first'
which tells Apollo Client to only rely on the cache after the first fetch.
The process is similar for fetching a list of objects, for example a list of Projects
const FETCH_PROJECTS = graphql(`
query fetchProjects($input: ProjectFilter) {
projects(input: $input) {
id
title
other fields...
}
}
`)
const { data, loading } = useQuery(FETCH_PROJECTS, {
variables: { input: { where: { activeworkstate: { EQ: 'active' } } } },
fetchPolicy: 'cache-and-network',
});
This query returns a list of all active projects, ready to be rendered in the UI.
At a high level, that’s how data is queried throughout the app. The approach is simple and works well — at least until you have lots of queries, complex data relationships, and the added challenge of real-time updates.
Apollo Client is a great library for querying GraphQL, but it doesn’t scale that well. Working with our app’s data it gets more complex to manage as more queries are being made. In apps where data constantly grows (a social media app), network based querying is the only real solution. For ProWorkflow data often remains the same, meaning for us constant network querying is wasteful and we want to rely on caching as much as possible. Apollo Cache simply wasn’t meeting our demands as Nexus got more complex.
Pain Points
Overfetching / Underfetching
GraphQL lets you request exactly the fields you need, but different parts of the app often need overlapping sets of fields. One component might request name
and description
for an item, while another requests description
and category
— meaning description
is fetched twice. Multiply this across many fields, objects, and queries, and you’re making more requests than necessary, increasing load times and network traffic.
Because we work fast, data requirements change frequently. Components may start requesting fields they no longer need, or expect fields that aren’t queried anymore. Managing which fields are queried, and ensuring they’re consistent, quickly becomes a headache.
Real-time Updates
Real-time updates are a fundamental part of our app, it is the main way data flows back to the client to keep the app state up-to-date. For this we are using GraphQL subscriptions over websockets.
Websocket updates arrive as a list of changes, and the client is responsible for updating its state. Single-object updates (e.g., changing an item’s name
) are simple — the Apollo Cache stores objects by ID, so you can easily lookup an object and update it. Apollo Client is good at then propagating this change to all queries that asked for this item.
The tricky part comes to updating lists, as for each update you have to go through all lists and figure out the following things:
- If this update was a create, should this item be added to this list, and also in what position?
- If the item was updated, should it still be in the list and in a different order?
- If the item was updated, should it now be added to a different list?
Apollo makes this harder by treating each query as an isolated bucket of cached results. An “active projects” query and a “deleted projects” query have separate caches — an update to one doesn’t affect the other. Inserting an item also requires including every field that the original query requested; if a field is missing, Apollo may invalidate the cache and trigger an unwanted refetch (we don’t like refetches).
What this lead to was ad-hoc updating the important list queries, manually deciding if an update should be added/removed from a list. While this works alright for the main list queries, it quickly gets complicated the more list queries you are manually trying to keep up to date.
For a better idea the kind of things we would have to do, read the docs on updating the Apollo Cache (https://www.apollographql.com/docs/react/caching/cache-interaction#writequery)
Nested Updates
One thing we often relied heavily on when querying with Apollo Client was nested queries. We fetch all relations for objects all at once using a query like so.
items {
id
name
contacts {
id
firstname
lastname
}
tags {
id
name
color
}
}
This is convenient as you query the backend in the shape that you need, and the data comes back all ready in a nice object. We could then pass this items
array of objects to components, and they simply need to access items.tags
and render away.
This arose problems with real-time updates, as what if you update an item’s tags, you only want to send through the new list of tagids
, but now each query that is querying tags
needs to reassign the field to the new array of objects, but what if we have never queried those tags before? Now we need to requery the tags objects to ensure we have the full data needed. I think you can see how this becomes tricky…
We partially solved this by implementing “lookup by id”, where we would fetch all tags when loading the app, then only query tagids
instead of tags
on queries. When it came to rendering, you then fetch tag objects by mapping the returned tagids
against the tag objects fetched on app load.
Developer Experience
While in the beginning it was easy to write queries as needed in your components with just the data you needed, as more and more queries were being written the developer experience started to suffer. As mentioned with overfetching, suddenly you were asking for data you had already asked for.
Supporting many ad-hoc network queries becomes a maintenance nightmare, especially as new fields are added, some removed, you don’t really want to be thinking about making sure the way data being fetched is optimal, when you have to think about 200+ places.
In an ideal world, we’d specify only the collection (e.g., items
) and filters (e.g., “active items in project 123”), and get reactive results by default. List queries would update automatically from either websockets or optimistic updates, without manual updating.
Performance
Since most of the time a network request is made for data, we have to display a loading spinner. Anytime we encounter spinners the app can feel “slow”. When some queries end up having to do refetches, that is more network requests that you end up having to do, just in order to make sure data is kept up to date. This leads to a “chatty” app, where it is making requests for data far too often, when it often already has some of the data somewhere in it’s local state.
Finding a Solution
In looking for a better approach, we discovered the concept of local-first. At its core, this means storing application data directly on the client and treating it as the primary source of truth. A sync engine keeps the local database in sync with the server — and, by extension, with other clients — by pulling in changes and pushing out updates.
This seemed like a good solution to our data problem, as having a local copy of the database not only meant faster queries as no network requests had to be made, but also that since we have the full set of data at all times, every list query could be kept up to date.
Because the nature of our app data is “consistent”, as in it mostly remains the same with gradual changes, local-first could be a good candidate as it would be viable to store the app state on the client. Often users are only interacting with active data meaning we wouldn’t have to store all historical data on the client anyway, and could still rely on network requests for that.
Considerations
While we wanted to start trying to implement local-first right away, we knew it in itself would be a big task with lots of development work. Developing something from the ground up could easily takes months before anything could be shipped. Ideally we would use an existing library out there, that we could slowly incorporate into our existing code base without much effort.
RxDB
We came across RxDB (https://rxdb.info/) which is a NoSQL based library that helps to make local-first/offline-first apps easy. It handles the heavy lifting in terms of data persistence and reactive querying. It itself is an abstraction layer on many different storage layers (IndexedDB, SQLite, etc.), meaning we could reuse this solution for our mobile app built on React Native.
Another plus for RxDB is that it hooks into existing backends with minimal changes. Instead of having to build out a new API just for RxDB, we could hook it into our existing GraphQL backend (with a few minimal changes). This would save us lots of time in implementing a solution.
Implementing the Solution
We knew that implementing local-first in the app would not be an easy task, as the app is complex and has lots of data querying in lots of places. It is also quite a paradigm shift to go from GraphQL queries to RxDB which works on document based data (think MongoDB).
Through prototyping, we decided to start where it would have the biggest impact — the core pages most users interact with (e.g., Projects List, Project Details). Reducing network requests here would give us immediate performance gains. Many of these pages pass data down from a base component, making it easy to swap out the data source without having to change too many individual components. That said, there are plenty of smaller components with their own queries that we’ll need to address over time.
By design, RxDB stores all fields for each object, so we no longer have to worry about missing or redundant fields in queries.
Persistence
For persisting data on the client, RxDB has many Storage Adapters (https://rxdb.info/rx-storage.html) allowing you to persist data in many different ways. For ProWorkflow we found the best storage option was OPFS (https://rxdb.info/rx-storage-opfs.html) as RxDB’s implementation is the most performant and can support our storage needs. RxDB also has lots of Storage Wrappers which can optimize the storage for different scenarios, such as high-read low-write, or high write throughput scenarios.
Before we can store any data on the client we need to define a database and schemas for RxDB to work with. We create a database which is namespaced based on the logged in user and their workspace, and build up the storage it will be using. For each our collections (items, projects, etc.) we create the collection against the created database. RxDB handles setting up the underlying storage ready for data to be inserted to and queried from.
Replication
For the initial implementation, we only decided to bother with pull replication (that is receiving data from the server) and not with push replication (sending local database updates to the server). This is mainly because data mutations are done in many places in the app, and more importantly some are quite complex. Only worrying about pulling data from the server and then querying it on the client were a lot more achievable and would be the biggest win.
Bootstrap
The “Bootstrap” is the first phase when you load the app for the first time. The client will pull all data for all collections (that they have permissions for) and insert it into the local database. While this increases the initial first load time for the app, we display a nice loading screen and this initial bootstrap only happens once.
Every update comes with an updatedAt
field which is a timestamp of when that event last updated. That is important as it allows the client to track the last update they actually processed. When revisiting the app (or reconnecting after our network dropped out), we want to get all updates we missed out on.
The query used to pull the initial data for collection looks something like this
pullItems {
checkpoint {
id
updatedAt
}
documents {
id
name
..etc..
}
}
We receive the list of documents
as well as the updatedAt
needed to keep track of updates.
Next time we load the app, we send our local updatedAt
to this query, which then tells our backend to only get events greater than this updatedAt
, so only events we missed out on.
After the initial pull, we rely on real-time updates to continue being up to date.
Real-time Updates
Integrating real-time updates with RxDB was straightforward, as by design RxDB is meant to be realtime. All that was needed was to take our existing GraphQL Subscription architecture, and pipe the updates to the RxDB replication. The code was something like this.
REPLICATIONS[collection].emitEvent({
checkpoint, // Checkpoint including updatedAt
documents, // Array of document updates, e.g { id: 1, name: 'Item Name' }
});
Now we could be sure that the local RxDB storage was always up-to-date at all times.
Querying Data
Instead of using Apollo Client’s useQuery()
hook, we would now use RxDB’s methods (https://rxdb.info/rx-query.html) for querying storage. These methods had support for granular filtering, as well supporting reactivity when the local database updates (either from optimistic updates or real-time updates).
We wrapped these methods in easy to use hooks so that we could easily query RxDB in React. For this we made two hooks useCollection()
and useData()
.
useCollection()
takes the collection we wish to query, as well as the query to run (filters and sorting). Under the hood it is using RxDB to query the data, as well as make sure the hook is reactive to any updates to it’s query. You could then query a collection easily like so.
// Get all items for project id 123
const { data, loading } = useCollection('items', (col) => col.find({
selector: {
projectid: 123,
}
}));
No worrying about which fields are returned, no worrying about if it will update from websocket events correctly. It just works. data
is an array of the found items, which we then an iterate over to display to the user.
The useData()
hook simply wraps useCollection()
and takes much simpler args. It is used to get a singular object, so if we just wanted to get project with id 123…
const { data: project, loading } = useData('projects', 123);
Nested Data
In order to maintain the same paradigm of nested data, since it is convenient for components, we needed a way to build this data on the frontend, as RxDB by design doesn’t have “relations” and is just a set of documents. Since all the data is now on the frontend, we can query the collections to join from RxDB, build it up in an object, then return it in the hook. For this we pass a config of “links” to the useCollection()
hook, which then smartly builds up the data, while also ensuring the returned data
is reactive to all updates, including those deep in the returned data (E.g if joining a project to items, if a project changes, the returned data
array must update to include those changes).
Here is an example of how we pass the links
config and what it returns
const { data: project, loading } = useData(
'projects',
123,
{
links: [
{
// We want to join itemcollections to the project
collection: 'itemcollections',
// Put the joined collections on a `itemcollections` field on project
field: 'itemcollections',
// We want to join all item collections where itemcollection.projectid == project.id
linkingId: 'projectid',
// Joining based on foreign ids
isForeign: true,
// Then on the joined itemcollections, perform another join
links: [
{
collection: 'items',
field: 'items',
linkingId: 'itemcollectionid',
isForeign: true
}
]
}
]
}
This then returns data that looks like this:
{
id: 123,
title: 'My Project',
itemcollections: [
{
id: 1,
title: 'Phase 1',
items: [
{
id: 1,
name: 'Item 1'
},
{
id: 2,
name: 'Item 2'
},
]
}
]
}
And most importantly, if an update comes through that renames Item with ID 1
to “Item 1 Rename”, the returned data
object from the hook updates to reflect that change.
Mutations
As mentioned earlier, we chose not to handle mutations directly in RxDB for now. Mutations occur in many places across the app, and some are quite complex. Instead, we continue sending mutation requests through Apollo Client as before. The resulting updates flow back via our GraphQL subscriptions and are applied to RxDB automatically, keeping the local database in sync.
This approach lets us preserve our existing mutation architecture while gaining the benefits of RxDB for querying.
Downsides
Of course every solution isn’t without its caveats.
Because of the effort required to implement a new architecture like this, it isn’t realistic to overhaul everything in the app to use the new approach. Our goal was to just start implementing it in the most critical parts of the app, the pages that are hit the most. This means living with two paradigms as we slowly shift to moving more of the app to leverage local-first. Realistically this two paradigm approach may never go away, some parts of the app might always have to rely on network requests for data.
Another consideration is that we have now moved most of the data work to the client. Instead of the GraphQL server doing the heavy lifting in terms of filtering and building data, now it’s the client’s (RxDB) job to query and manage all data. This can be somewhat mitigated by using Web Workers to offload work from the main thread so the perceived performance is not affected.
But with these considerations, this approach leaves us in a better place in terms of our data architecture and delivering a performant, real-time application.
How’s it Going
While implementing local-first was not easy, its solved most of the pain points we initially had with Apollo Client.
It fully solved our real-time updates problem, as we no longer had to ad-hoc update individual queries/caches as updates come in. Every list query from RxDB automatically responds to data updates, whether it’s creating, deleting, reordering, and it’s very efficient too.
Because the app is not doing nearly as many network requests, this already leads to vastly improved performance. The client simply queries data right off the local database, no longer relying on the network (which can sometimes be slow) and doing constant re-fetches for data consistency. Another benefit is the reduced load on our servers.
Because data access was a lot quicker, it quickly shifted our focus to optimizing our data rendering as much as possible. Now that fetching thousands of rows was more performant, we need to make sure our data tables could keep up.
Of course it improved our developer experience, gone were the days of manually defining complex queries and ensuring we weren’t getting too much data. Now it was easy to get objects of a collection, and be confident that the data would always be up-to-date.
With all that said it allows us to move faster as worrying about where the data comes from doesn’t have to be thought of as much, and we like to move fast.
Next Steps
Ultimately the goal is to have as much of the app querying it’s data from local-first, but getting the initial solution over the line allows us to slowly work towards this goal.
From here we hope to make more improvements to the app’s data approach, better incorporating optimistic updates on the frontend and make the “bootstrap” process as performant as possible.