-
Notifications
You must be signed in to change notification settings - Fork 3
Technological background
On the client side, you can subscribe to some logical collection of data.
var subscription = subscription.subscribe("users", {"older": 24});
var users = subscription.collection;On the server side, you can publish the collections to client.
publisher.publish("users", (args) {
return mongoDatabase.collection("users").find({"age": {r"$gt": args["older"]}});
});Now, whenever the data changes on the server, client receives the notification with the list of changes to be applied.
In this page, we are not going to deeply discuss the API and source code level implementation of this behavior (implementation is still not complete), but the principles behind it.
These data are stored on the server side in MongoDB collections. To keep a track of changes and to be able to send clients short list of changes instead of all data, we keep a history and version for each logical collection. To store all this information, we use three physical MongoDB collections.
This collection contains the actual data with additional __clean_version field describing the version when the document was last changed.
Each document of history collection describe one change in the data collection and has following structure:
{
"before": "State of the document before the change."
"after": "State of the document after the change."
"change": "Data of the change."
"action": "One of add/remove/change."
"author": "Unique id of the client who has initiated this change."
"version": "Version field, same as __clean_version."
}The fields "before" and "after" are vital for computing changesets of filtered collections, as you will see later.
Currently, it is not safe to make more than one write to logical collection at the time. Therefore there is special collection to maintain write locks. Reads can always be concurrent. In the future, there are plans to lower the restriction to write locks per document instead of per collection.
To keep a track of changes, each change increments the version of the collection by 1. Whenever the client requests actualization of its data, it sends the current version it has and server select all changes with higher version from history.
Version field is stored both in history collection as version of each change occurred and in document collection for each document as a value of version of last change associated with the particular document.
On the first subscription, client receives the data from data collection and version of the data. Version of data is computed as max value of "__clean_version" field from selected documents.
When the client requests diff, it sends information about its actual version. Then, on the server side, following actions are made.
If the published collection is created using find query, this query is applied both to "before" and "after" field of history collection. This way, we request (in this particular order) following queries
- Changes that match the query with
"after"or"before"field. - Changes that match the query with
"after"field. - Changes that match the query with
"before"field.
The first query is used to ensure consistency of the data. Because the MongoDB has no transactions, we can not be sure, that new changes are not added during our queries. Therefore, we use the first query to determine the last result, we are going to include. We will strip everything from the second and third query, that is not contained in the first one.
The elements that are present in both queries represent are documents, that just changed. They were in the published collection before the change, and they remain there even after the change.
The elements present only in the second query represents the elements, that were added to the published collection. Before the change, they have not passed the filter and now they do.
The elements present only in the third query represents the elements, that were removed from the published collection. Before the change, they have passed the filter and now they don't.
Imagine the following set of events:
- Add user john. (
__clean_version = 1) - Add user paul. (
__clean_version = 2) - Change user john. (
__clean_version = 3) - Remove user john. (
__clean_version = 4)
Data collection contains only one element, user paul with version 2. Therefore, the client thinks the 2 is actual version. Client contains only paul element with version 2 after the initial sync.
However, when client requests diff, it will receive following changes:
- Change user john. (
__clean_version = 3) - Remove user john. (
__clean_version = 4)
And here comes the problem, we can not change nonexistent user john! We can not remove it!
Solution to this problem is to simply ignore changes to nonexistent users client side.
This is currently not implemented, code will fail with exception when encountering such a situation.
Who is responsible for generation of unique ids? We do not want to generate them on the server, because we want zero latency. But how to generate unique ids on the client?
We can split the id to two parts, prefix and client_id. We use server to generate unique prefix for each client and rejects any changes from that client that do not match the prefix. It is the responsibility of the client to maintain the uniqueness of client_ids.
To assure the data are inserted and accessed in the order, we need to use server–client communication layer that maintains the order of operations. Currently, we use clean_ajax layer for these purposes.
When the server receives the update from client, it is processed using following steps:
- Lock the collection (both data and history) containing affected documents.
- Get the maximum version number from history collection and increase it by 1.
- Try to insert record to data collection.
- Insert the record to history collection.
- Remove the lock.
If the third step fails, we just release the lock.
- Client changes the record john.age from 25 to 37.
- Client receives the diff from server containing change in john.age from 25 to 37.
We have received the diff, we already made and have already applied. We do not want to apply it twice, we do not want to be notified about the change that did not happened and also we could have meanwhile applied multiple other changes not yet propagated to server and we don't want to loose them.
Currently we solve this issue using the author field as unique identifier of the client that created the change. This way we can reject the changes we have just submitted.
- Client requests diff
- Client changes the record john.age from 25 to 37
- Client receives the diff containing change of john.age from 25 to 18
- Client receives the diff containing change of john.age from 25 to 37, but ignores it due to author field.
How to deal with this situation? Let's look what happened server side:
- Someone changed the record john.age from 25 to 18.
- Client requested diff.
- Client requested change of john.age to 37.
At the server side, the john.age value is 37.
To achieve data consistency between the server and the client, we have to keep track of committed changes.
- Client requests change and stores it into the list of uncommitted changes.
- When the client receives new diff, it applies all received changes.
- Client re-applies uncommitted changes and clear the list of uncommitted changes.