Edlink Data Pipeline
The Edlink data pipeline is the backbone of our platform. It makes sure you get the right data, right when you need it. This guide will walk you through the basics of how the Edlink data pipeline works and what you can expect when you're working with our platform.
What is a Data Pipeline Anyway?
A data pipeline is a system that moves data from one place to another. Along the way, it may clean, organize, or process the data so it's ready to be used. Think of it like a factory conveyor belt for data: raw data goes in, gets refined, and comes out ready for analysis or storage.
On one end, Edlink ingests data from systems like an LMS or SIS. On the other end, it sends that data to your application via the Edlink API. In between, the data pipeline processes the data to make sure it's in the right format and structure for your application to use.
The major types of data processing that happen in the Edlink data pipeline are:
- Data Syncing - Pulling data from a source system, such as an LMS or SIS.
- Normalization - Ensuring that data is available in a consistent format.
- Enrichment - Adding additional data to make it more useful.
- Filtering - Removing data that isn't needed by the application.
- Transformation - Adjusting data for the needs of a particular integration.
- Validation - Confirming that data meets certain criteria.
- Change Tracking - Keeping track of changes to data over time.
Edlink organizes these actions into three distinct "phases" of the data pipeline: Sync, Enrich, and Materialize. Each phase has a specific purpose and set of actions that it performs on the data.
Data Sync
The data syncing process is responsible for pulling data from a source system, such as an LMS or SIS and normalizing it against our standard data model. This process ensures that the data is in a consistent format and structure, making it easier to work with later on.
The process of syncing is (unsuprisingly) called a sync
and you can learn more about the specifics of how syncs work under the hood by reading the guide here. Syncs run at least once per day (this varies by data provider) and through this process, Edlink is able to form a complete picture of the data in the source system, as well as what has been added, updated, or deleted.
Typically, syncs are initiated automatically, but they can be triggered manually by an IT administrator or an Edlink support team member. Automatic syncs can be paused by an IT administrator if needed. This is done by setting your source status to disabled
in the Edlink dashboard. Pausing syncs will not stop the rest of data pipeline, but it will prevent new data from being pulled from the source system.
Once a sync is complete, the data is ready to be enriched and materialized.
Data Enrichment
Data enrichment is the process of combining additional sources of data to make it more complete (and hopefully, useful). This can include things like adding metadata or joining data from multiple sources. It is also possible for Edlink to transform data during this step (e.g. applying a data override), but this cannot currently be done through the Edlink dashboard and you'll need to reach out to our support team for help. Transformations performed during the enrichment process affect all current and future applications that are connected to your school via Edlink.
Once an enrichment is complete, the data is ready to be materialized. If your school is connected to many different applications via Edlink, there will be a unique materialization created for each application. For example, if you connect to 10 applications via Edlink, there will be 10 materializations kicked off after the enrichment process is complete.
Data Materialization
Materialization is the process of taking the enriched data and making it available to your application. Edlink creates a unique version of the school's dataset for each application that is connected. This ensures that each application only receives the data it needs, in the exact way that it needs it.
Materializations can be paused by an IT administrator or a company if needed. This is done by setting your integration status to paused
in the Edlink dashboard. Pausing materializations will not stop the rest of the data pipeline, but it will prevent new data from being sent to the application. That is, the school may continue to sync data to Edlink, but the application will not receive any new data until the materialization is unpaused. This can be useful at certain times of year (e.g. over the summer) when you may want to continue to allow teachers to access their classes from the prior school year.
Please note, even when you pause an integration, you may still see materializations running from time to time. However, these materializations will not result in changes to roster data while the integration is paused. Edlink runs other (internal) functions during materialization that need to continue to run even when roster changes are paused.