Avoiding Pitfalls Part 1: Don't try to filter the events stream
Sometimes developers like to do things that they know will be hard. Maybe they're bored at work and just enjoy the extra challenge. Maybe they're after some higher pursuit of technical beauty. Then there are things that they think will be easy, but turn out to be hard. This is one of those things.
What is the event stream?
When Edlink syncs information from districts and universities it generates an "event" every time a piece of data changes. This "event" could be something like a new person, and updated class, or an incident between two fourth graders involving a tennis ball and a cheese grater. In fact, any time any entity is added, updated, or deleted from the dataset, Edlink faithfully stores an event indicating this change (for the next 30 days).
To make life easier for developers, we store these events in a very specific order so they can be used to stay in sync. This list of ordered events is called the "event stream".
How you should utilize the event stream
There are guides written about this elsewhere, but to summarize the key points:
- You should first save the ID of the last event in the stream.
- Then, you should perform a full sync of the dataset.
- Finally, every hour or so, check to see if there are events after the one you saved in step one. Replay these events onto your own dataset to apply the changes.
It is critical that events are replayed in order and that you do not skip any. Skipping events could have disastrous consequences on your view of the dataset and recovering from a bad state can be quite challenging.
How you shouldn't utilize the event stream
Do not try to filter or skip events for any reason. There are no natural places to shard the event stream and process it piecemeal. Yes, this includes filtering by school ID. So why is this? Let's show a basic example:
- Suppose you have a class called "Math" in a school called "Alice High School".
- You sync with Edlink and all is well with your dataset.
- One day, the district transfers "Math" to a different school called "Bob High School" (why they did this is not important - let's call it clerical error).
- If you were to filter the event stream by school ID (i.e. first applying changes at Alice High School and then applying changes at Bob High School), you'd have a pretty big problem.
- There is no event called "transfer", nor is there an event that removes the class from one school and creates it in another. Instead, you'll just see a single event that updates the
school_id
property on the class.
This example is relatively simple and you could likely write an edge case into your code (i.e. "simulating" a transfer event), but the number of such modifications you'd have to make to your code is huge. Lots of models touch schools. In some places the school ID is required, in others it's optional. In some places, entities can have more than one school ID (e.g. people).
All in all, trying to filter the event stream will be a dramatic waste of engineering time. Instead, we recommend that you focus your efforts on updating your sync capabilities to handle a single stream of items - even if you ultimately group entities by school in your platform.
If you have any questions, or would just like some engineering assistance, don't hesitate to drop us a line.
Until next time, ✌️.