Exporting from Edlink to SFTP
This cookbook will guide you through creating a system that pulls data from the Edlink API, transforms it, and sends it to an SFTP server. By the end, you'll have a robust integration that can sync educational data from our system to yours in a containerized environment. This project is fully open source and you can find all the code on GitHub, released under the MIT License. Feel free to fork, modify, and adapt it to your needs.
Project Overview
The API-to-SFTP integration project solves a common problem in educational technology: getting data into systems that only support SFTP. Here's what we're going to do:
- Connects to an the Edlink API to fetch data from your integrations.
- Retrieves data about districts, schools, courses, classes, and people.
- Transforms the data into domain-specific CSV formats.
- Uploads these files to an SFTP server for consumption by other systems.
This pattern is useful for integrating with legacy systems that require file-based imports, or for creating periodic data snapshots for analytics, backups, or compliance requirements.
Setting Up the Project
Initialize the Project
First, create a new directory for your project and initialize it:
mkdir api-to-sftp
cd api-to-sftp
npm init -y
Install Dependencies
This project requires several key dependencies:
npm install @edlink/typescript axios papaparse promise-queue ssh2-sftp-client ts-node typescript
npm install --save-dev @types/node @types/papaparse @types/promise-queue @types/ssh2-sftp-client
Package | Description |
---|---|
@edlink/typescript | SDK for interacting with the Edlink API |
axios | HTTP client for making API requests |
papaparse | CSV parser/generator for transforming data |
promise-queue | Job queue for handling parallel operations |
ssh2-sftp-client | SFTP client for uploading files |
ts-node | TypeScript execution environment |
typescript | TypeScript compiler |
Configure TypeScript
Create a tsconfig.json
file with the following configuration:
{
"compilerOptions": {
"module": "Node16",
"target": "es2020",
"moduleResolution": "Node16",
"esModuleInterop": true,
"composite": true,
"incremental": true,
"declaration": true,
"sourceMap": true,
"emitDecoratorMetadata": true,
"experimentalDecorators": true,
"skipLibCheck": true,
"skipDefaultLibCheck": true,
"downlevelIteration": true,
"types": ["node"],
"emitDeclarationOnly": true,
"strict": false,
"outDir": "dist"
},
"exclude": ["**/node_modules", "**/dist", ".github", ".husky", ".vscode", ".yarn", "**/*.spec.ts", "**/*.test.ts", "**/*._test.ts"]
}
Update package.json Scripts
Modify your package.json
to include a start script:
{
"scripts": {
"start": "node -r ts-node/register/transpile-only --max-old-space-size=4096 main.ts"
}
}
This script uses ts-node
to run the TypeScript code directly and allocates extra memory for handling large data volumes.
Connecting to the Edlink API
Environment Variables
The first step is to set up environment variables for configuration. Create a new file called main.ts
and start with the configuration:
import { Edlink, IntegrationState, Integration, TokenSetType } from '@edlink/typescript';
import axios from 'axios';
// Configuration from environment variables
const EDLINK_CLIENT_ID = process.env.EDLINK_CLIENT_ID;
const EDLINK_CLIENT_SECRET = process.env.EDLINK_CLIENT_SECRET;
// Initialize the Edlink API client
const edlink = new Edlink({
version: 2,
client_id: EDLINK_CLIENT_ID,
client_secret: EDLINK_CLIENT_SECRET
});
Listing Integrations
Next, add code to fetch the list of integrations from the Edlink API:
async function sync() {
// List all available integrations
const { data } = await axios.get('https://ed.link/api/v1/integrations', {
headers: {
Authorization: `Bearer ${EDLINK_CLIENT_SECRET}`
}
});
// Filter to only the active integrations
const integrations: Integration[] = data.$data.filter((integration: any) => integration.status === IntegrationState.Active);
console.log(`Found ${integrations.length} active integrations.`);
}
This code uses axios to directly call the Edlink API to list all integrations, then filters to only include those with an "Active" status. This design decision ensures that we only process data for integrations that are currently operational, saving resources.
Fetching and Transforming Data
Create the Ingest Function
Now, add a function to fetch data for a specific integration:
async function ingest(integration: Integration) {
console.log('Starting sync for integration:', integration.id);
// Set the tokens for this integration so we can use the SDK to make requests
const context = edlink.use({
access_token: integration.access_token,
type: TokenSetType.Integration
});
// Storage for our API data
const result: Map<string, any[]> = new Map([
['districts', []],
['schools', []],
['sessions', []],
['subjects', []],
['courses', []],
['classes', []],
['people', []],
['enrollments', []],
['agents', []]
]);
// Fetch the data from the API
for await (const district of context.districts.list()) {
result.get('districts')!.push(district);
}
for await (const school of context.schools.list()) {
result.get('schools')!.push(school);
}
// Fetch the rest of the data...
for await (const session of context.sessions.list()) {
result.get('sessions')!.push(session);
}
for await (const course of context.courses.list({ expand: ['subject', 'session'] })) {
result.get('courses')!.push(course);
}
for await (const cls of context.classes.list()) {
result.get('classes')!.push(cls);
}
for await (const person of context.people.list()) {
result.get('people')!.push(person);
}
for await (const enrollment of context.enrollments.list()) {
result.get('enrollments')!.push(enrollment);
}
for await (const agent of context.agents.list()) {
result.get('agents')!.push(agent);
}
// Apply transformations (we'll define this next)
const formatted = format(result);
console.log('Data transformation complete for integration:', integration.id);
// Return formatted data for further processing
return formatted;
}
Creating Data Transformations
Next, create a function to transform the raw API data into the desired output formats:
import { Person, Role, IdentifierType, Gender, Agent, Class, Course, SessionType, GradeLevel, Enrollment, ClassState } from '@edlink/typescript';
function format(data: Map<string, any[]>): Map<string, any[]> {
// Define output file structure
const formatted: Map<string, any[]> = new Map([
['student', []],
['staff', []],
['parents', []],
['course_grades', []],
['course_catalog', []],
['case_load', []],
['case_load_members', []],
['case_load_import', []],
['gpa_import', []]
]);
// Indices for relationship lookups
const people: Map<string, Person> = new Map();
const classes: Map<string, Class> = new Map();
const courses: Map<string, Course> = new Map();
// Process people data first
for (const person of data.get('people')! as Iterable<Person>) {
// Save the person into the index
people.set(person.id, person);
// Process students
if (person.roles.includes(Role.Student)) {
const iso = person.demographics.birthday?.toString();
const school = person.school_ids.sort().pop();
formatted.get('student')!.push({
SchoolID: school,
StudentID: person.id,
Email: person.email,
FirstName: person.first_name,
MiddleName: person.middle_name,
LastName: person.last_name,
// Format all required student fields...
DateOfBirth: iso ? iso.slice(5, 7) + '/' + iso.slice(8, 10) + '/' + iso.slice(0, 4) : null,
Gender: person.demographics.gender === Gender.Other ? 'X' : person.demographics.gender
// Additional fields...
});
// Add to GPA import
formatted.get('gpa_import')!.push({
SchoolID: school,
StudentID: person.id,
CGPA: person.gpa
});
}
// Process teachers
if (person.roles.includes(Role.Teacher)) {
formatted.get('staff')!.push({
SchoolID: person.school_ids.sort().pop(),
StaffID: person.id,
FirstName: person.first_name,
LastName: person.last_name,
Email: person.email,
Role: 'Teacher'
});
}
// Process administrators
if (person.roles.includes(Role.Administrator)) {
formatted.get('staff')!.push({
SchoolID: person.school_ids.sort().pop(),
StaffID: person.id,
FirstName: person.first_name,
LastName: person.last_name,
Email: person.email,
Role: 'Schooladmin'
});
}
}
// Process parent/guardian relationships
for (const agent of data.get('agents')! as Iterable<Agent>) {
const target = people.get(agent.target_id);
const observer = people.get(agent.observer_id);
if (!target || !observer) {
continue;
}
formatted.get('parents')!.push({
SchoolID: target.school_ids.sort().pop(),
StudentID: target.id,
FirstName: observer.first_name,
LastName: observer.last_name,
Email: observer.email,
Phone: observer.phone
});
}
// Build course catalog
for (const course of data.get('courses')! as Iterable<Course>) {
courses.set(course.id, course);
formatted.get('course_catalog')!.push({
SubjectName: course.subject?.name,
CourseTitle: course.name,
CourseNumber: course.code,
// Include course catalog fields...
TermType: (() => {
switch (course.session?.type) {
case SessionType.SchoolYear:
return 'Year';
case SessionType.Semester:
return 'Semester';
default:
return 'Quarter';
}
})()
// Additional fields...
});
}
// Index classes by ID
for (const cls of data.get('classes')! as Iterable<Class>) {
if (!cls.course_id) {
continue;
}
classes.set(cls.id, cls);
}
// Process enrollments and create course grades
for (const enrollment of data.get('enrollments')! as Iterable<Enrollment>) {
if (enrollment.role !== Role.Student) {
continue;
}
const student = people.get(enrollment.person_id);
if (!student) {
continue;
}
const cls = classes.get(enrollment.class_id);
if (!cls?.course_id) {
continue;
}
const course = courses.get(cls.course_id);
if (!course) {
continue;
}
formatted.get('course_grades')!.push({
SchoolID: cls.school_id,
StudentID: student.id,
SchoolYear: course.session?.end_date?.slice(0, 4),
// Include course grade fields...
WorkInProgress: [ClassState.Active, ClassState.Upcoming].includes(cls.state) ? 'Y' : 'N',
CourseGrade: enrollment.final_numeric_grade
});
}
return formatted;
}
This transformation function is where the "business logic" lives. It takes raw API data and structures it into domain-specific formats required by the target system. Key design decisions in this function include:
- Creating indexed mappings for efficient lookups (people, classes, courses)
- Handling role-based transformations (students, teachers, administrators)
- Formatting data fields to match destination system requirements
- Filtering records to include only relevant data
Sending Data to an SFTP Server
Configure SFTP Connection
Now add the SFTP configuration to your main.ts file:
import SFTPClient from 'ssh2-sftp-client';
import Papa from 'papaparse';
// SFTP Configuration
const SFTP_USERNAME = process.env.SFTP_USERNAME;
const SFTP_PASSWORD = process.env.SFTP_PASSWORD;
const SFTP_PORT = process.env.SFTP_PORT;
const SFTP_HOST = process.env.SFTP_HOST;
const SFTP_CONFIG = {
host: SFTP_HOST,
port: SFTP_PORT,
username: SFTP_USERNAME,
password: SFTP_PASSWORD
};
// Initialize SFTP client
const sftp = new SFTPClient();
Update the Ingest Function to Upload Files
Modify the ingest function to upload the formatted data to the SFTP server:
async function ingest(integration: Integration) {
// ... existing code to fetch and format data
try {
console.log('Opening SFTP connection.');
// Connect to the SFTP server
await sftp.connect(SFTP_CONFIG);
console.log('Connected to SFTP.');
// Upload each formatted file
for (const [filename, data] of formatted.entries()) {
if (data.length === 0) {
continue;
}
// Convert data to CSV and upload
await sftp.put(Buffer.from(Papa.unparse(data)), `/remote/path/${integration.id}/${filename}.csv`);
}
} catch (err) {
console.error('SFTP Upload Error:', err);
} finally {
await sftp.end();
}
console.log('Sync complete for integration:', integration.id);
}
This code:
- Connects to the SFTP server
- Converts each data array to CSV format using PapaParse
- Uploads each file to a dedicated folder on the SFTP server
- Properly closes the connection regardless of success or failure
Running Multiple Integrations in Parallel
Configure the Job Queue
To handle multiple integrations efficiently, add a job queue with parallelization:
import Queue from 'promise-queue';
// Configurable parallelization
const SYNC_INTERVAL = process.env.SYNC_INTERVAL ? parseInt(process.env.SYNC_INTERVAL) : 1000 * 60 * 60 * 24; // 24 hours
const MAX_PARALLEL_JOBS = process.env.MAX_PARALLEL_JOBS ? parseInt(process.env.MAX_PARALLEL_JOBS) : 1;
// Initialize queue with configurable concurrency
const queue = new Queue(MAX_PARALLEL_JOBS, Infinity);
Update the Sync Function
Now modify the sync function to use the queue:
async function sync() {
// ... existing code to fetch integrations
// Process each integration using the queue
for (const integration of integrations) {
queue.add(() => ingest(integration));
}
}
// Run sync on startup and then on interval
setInterval(sync, SYNC_INTERVAL);
sync();
This design allows:
- Configurable concurrency through environment variables
- Automatic throttling to prevent overloading the API or SFTP server
- Periodic syncing on a customizable interval
Containerizing with Docker
Create a Dockerfile
Create a Dockerfile
to containerize the application:
FROM node:18-alpine
WORKDIR /usr/src/edlink/api-to-sftp
ENV NODE_ENV production
COPY . .
RUN apk add --no-cache --virtual .gyp \
python3 \
make \
g++ \
&& yarn \
&& apk del .gyp
ENTRYPOINT [ "yarn", "start" ]
This Dockerfile:
- Uses Node.js 18 on Alpine Linux for a small image footprint
- Installs build dependencies temporarily to compile native modules
- Installs dependencies using Yarn
- Removes build dependencies to keep the image small
- Configures the container to run the application on startup
Building and Running the Container
Build and run the Docker container:
# Build the container
docker build -t api-to-sftp .
# Run the container with environment variables
docker run -d \
--name api-to-sftp \
-e EDLINK_CLIENT_ID=your_client_id \
-e EDLINK_CLIENT_SECRET=your_client_secret \
-e SFTP_HOST=your_sftp_host \
-e SFTP_PORT=your_sftp_port \
-e SFTP_USERNAME=your_sftp_username \
-e SFTP_PASSWORD=your_sftp_password \
-e SYNC_INTERVAL=86400000 \
-e MAX_PARALLEL_JOBS=2 \
api-to-sftp
Monitoring and Maintenance
Logging
The application logs important events to the console. When running in Docker, you can view these logs with:
docker logs -f api-to-sftp
Updating CSV Formats
To modify the CSV format, you can update the format
function. For example, to add a new field to the student records:
formatted.get('student')!.push({
// Existing fields...
SchoolID: school,
StudentID: person.id,
// Add a new field
EnrollmentDate: person.created.slice(0, 10)
});
Adding New Output Files
To add a new output file, update the formatted
map in the format
function:
const formatted: Map<string, any[]> = new Map([
// Existing files...
['student', []],
['staff', []],
// Add a new file
['attendance', []]
]);
// Later in the function, populate the new file
for (const attendance of data.get('attendance')! as Iterable<any>) {
formatted.get('attendance')!.push({
// Format attendance data
});
}
Design Decisions and Lessons Learned
API Access Pattern
The project uses the official Edlink SDK for most operations, but directly calls the API for listing integrations. This hybrid approach:
- Takes advantage of the SDK's convenience for detailed data operations
- Provides flexibility when the SDK doesn't fully cover necessary functionality
Filtering Active Integrations
The code filters for active integrations only, which:
- Reduces API load by avoiding processing inactive integrations
- Prevents uploading potentially outdated data from deprecated integrations
- Makes the process more efficient overall
Using Promise-Queue for Parallelization
The promise-queue library provides:
- Configurable concurrency to optimize resource usage
- Automatic throttling to prevent overwhelming external systems
- Sequential execution within each integration to maintain data consistency
Map-based Data Structures
The code uses JavaScript Map objects extensively to:
- Create indexed lookups for efficient relationship resolution
- Organize formatted data by file type
- Handle large datasets efficiently
Error Handling and Connection Management
The SFTP connection is wrapped in a try/finally block to ensure:
- Connection errors are caught and logged
- The SFTP connection is always properly closed
- Failed syncs don't affect subsequent operations
Container-based Deployment
Containerizing the application provides:
- Consistent execution environment across different hosting platforms
- Simple configuration through environment variables
- Built-in isolation for security and resource management
- Straightforward deployment and scaling options
By following this cookbook, you've created a robust integration that can reliably move data from the Edlink API to an SFTP server, with configurable parallelization, containerization, and maintainable data transformations.