Technological background

Basic usage

On the client side, you can subscribe to some logical collection of data.

var subscription = subscription.subscribe("users", {"older": 24});
var users = subscription.collection;

On the server side, you can publish the collections to client.

publisher.publish("users", (args) {
  return mongoDatabase.collection("users").find({"age": {r"$gt": args["older"]}});
});

Now, whenever the data changes on the server, client receives the notification with the list of changes to be applied.

In this page, we are not going to deeply discuss the API and source code level implementation of this behavior (implementation is still not complete), but the principles behind it.

How is data stored?

These data are stored on the server side in MongoDB collections. To keep a track of changes and to be able to send clients short list of changes instead of all data, we keep a history and version for each logical collection. To store all this information, we use three physical MongoDB collections.

Data collection

This collection contains the actual data with additional __clean_version field describing the version when the document was last changed.

History collection

Each document of history collection describe one change in the data collection and has following structure:

{
  "before": "State of the document before the change."
  "after": "State of the document after the change."
  "change": "Data of the change."
  "action": "One of add/remove/change."
  "author": "Unique id of the client who has initiated this change."
  "version": "Version field, same as __clean_version."
}

The fields "before" and "after" are vital for computing changesets of filtered collections, as you will see later.

Lock collection

Currently, it is not safe to make more than one write to logical collection at the time. Therefore there is special collection to maintain write locks. Reads can always be concurrent. In the future, there are plans to lower the restriction to write locks per document instead of per collection.

Version

To keep a track of changes, each change increments the version of the collection by 1. Whenever the client requests actualization of its data, it sends the current version it has and server select all changes with higher version from history.

Version field is stored both in history collection as version of each change occurred and in document collection for each document as a value of version of last change associated with the particular document.

How to create list of changes?

Intial sync

On the first subscription, client receives the data from data collection and version of the data. Version of data is computed as max value of "__clean_version" field from selected documents.

Request diff

When the client requests diff, it sends information about its actual version. Then, on the server side, following actions are made.

If the published collection is created using find query, this query is applied both to "before" and "after" field of history collection. This way, we request (in this particular order) following queries

Changes that match the query with "after" or "before" field.
Changes that match the query with "after" field.
Changes that match the query with "before" field.

The first query is used to ensure consistency of the data. Because the MongoDB has no transactions, we can not be sure, that new changes are not added during our queries. Therefore, we use the first query to determine the last result, we are going to include. We will strip everything from the second and third query, that is not contained in the first one.

The elements that are present in both queries represent are documents, that just changed. They were in the published collection before the change, and they remain there even after the change.

The elements present only in the second query represents the elements, that were added to the published collection. Before the change, they have not passed the filter and now they do.

The elements present only in the third query represents the elements, that were removed from the published collection. Before the change, they have passed the filter and now they don't.

Gotchas with versions

Imagine the following set of events:

Add user john. (__clean_version = 1)
Add user paul. (__clean_version = 2)
Change user john. (__clean_version = 3)
Remove user john. (__clean_version = 4)

Data collection contains only one element, user paul with version 2. Therefore, the client thinks the 2 is actual version. Client contains only paul element with version 2 after the initial sync.

However, when client requests diff, it will receive following changes:

Change user john. (__clean_version = 3)
Remove user john. (__clean_version = 4)

And here comes the problem, we can not change nonexistent user john! We can not remove it!

Solution to this problem is to simply ignore changes to nonexistent users client side.

This is currently not implemented, code will fail with exception when encountering such a situation.

How to insert changes from client?

Id generation

Who is responsible for generation of unique ids? We do not want to generate them on the server, because we want zero latency. But how to generate unique ids on the client?

We can split the id to two parts, prefix and client_id. We use server to generate unique prefix for each client and rejects any changes from that client that do not match the prefix. It is the responsibility of the client to maintain the uniqueness of client_ids.

Order consistency

To assure the data are inserted and accessed in the order, we need to use server–client communication layer that maintains the order of operations. Currently, we use clean_ajax layer for these purposes.

Processing updates on server

When the server receives the update from client, it is processed using following steps:

Lock the collection (both data and history) containing affected documents.
Get the maximum version number from history collection and increase it by 1.
Try to insert record to data collection.
Insert the record to history collection.
Remove the lock.

If the third step fails, we just release the lock.

Concurrency issues on client

Author field

Client changes the record john.age from 25 to 37.
Client receives the diff from server containing change in john.age from 25 to 37.

We have received the diff, we already made and have already applied. We do not want to apply it twice, we do not want to be notified about the change that did not happened and also we could have meanwhile applied multiple other changes not yet propagated to server and we don't want to loose them.

Currently we solve this issue using the author field as unique identifier of the client that created the change. This way we can reject the changes we have just submitted.

Merging conflicts

Client requests diff
Client changes the record john.age from 25 to 37
Client receives the diff containing change of john.age from 25 to 18
Client receives the diff containing change of john.age from 25 to 37, but ignores it due to author field.

How to deal with this situation? Let's look what happened server side:

Someone changed the record john.age from 25 to 18.
Client requested diff.
Client requested change of john.age to 37.

At the server side, the john.age value is 37.

To achieve data consistency between the server and the client, we have to keep track of committed changes.

Client requests change and stores it into the list of uncommitted changes.
When the client receives new diff, it applies all received changes.
Client re-applies uncommitted changes and clear the list of uncommitted changes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Technological background

Basic usage

How is data stored?

Data collection

History collection

Lock collection

Version

How to create list of changes?

Intial sync

Request diff

Gotchas with versions

How to insert changes from client?

Id generation

Order consistency

Processing updates on server

Concurrency issues on client

Author field

Merging conflicts

Uh oh!

Uh oh!

Clone this wiki locally