Exporting from Edlink to SFTP

By Dakota Gordon

Last Updated March 7th, 2025

This cookbook will guide you through creating a system that pulls data from the Edlink API, transforms it, and sends it to an SFTP server. By the end, you'll have a robust integration that can sync educational data from our system to yours in a containerized environment. This project is fully open source and you can find all the code on GitHub, released under the MIT License. Feel free to fork, modify, and adapt it to your needs.

Project Overview

The API-to-SFTP integration project solves a common problem in educational technology: getting data into systems that only support SFTP. Here's what we're going to do:

Connects to an the Edlink API to fetch data from your integrations.
Retrieves data about districts, schools, courses, classes, and people.
Transforms the data into domain-specific CSV formats.
Uploads these files to an SFTP server for consumption by other systems.

This pattern is useful for integrating with legacy systems that require file-based imports, or for creating periodic data snapshots for analytics, backups, or compliance requirements.

Setting Up the Project

Initialize the Project

First, create a new directory for your project and initialize it:

mkdir api-to-sftp
cd api-to-sftp
npm init -y

Install Dependencies

This project requires several key dependencies:

npm install @edlink/typescript axios papaparse promise-queue ssh2-sftp-client ts-node typescript
npm install --save-dev @types/node @types/papaparse @types/promise-queue @types/ssh2-sftp-client

Package	Description
`@edlink/typescript`	SDK for interacting with the Edlink API
`axios`	HTTP client for making API requests
`papaparse`	CSV parser/generator for transforming data
`promise-queue`	Job queue for handling parallel operations
`ssh2-sftp-client`	SFTP client for uploading files
`ts-node`	TypeScript execution environment
`typescript`	TypeScript compiler

Configure TypeScript

Create a tsconfig.json file with the following configuration:

{
    "compilerOptions": {
        "module": "Node16",
        "target": "es2020",
        "moduleResolution": "Node16",
        "esModuleInterop": true,
        "composite": true,
        "incremental": true,
        "declaration": true,
        "sourceMap": true,
        "emitDecoratorMetadata": true,
        "experimentalDecorators": true,
        "skipLibCheck": true,
        "skipDefaultLibCheck": true,
        "downlevelIteration": true,
        "types": ["node"],
        "emitDeclarationOnly": true,
        "strict": false,
        "outDir": "dist"
    },
    "exclude": ["**/node_modules", "**/dist", ".github", ".husky", ".vscode", ".yarn", "**/*.spec.ts", "**/*.test.ts", "**/*._test.ts"]
}

Update package.json Scripts

Modify your package.json to include a start script:

{
    "scripts": {
        "start": "node -r ts-node/register/transpile-only --max-old-space-size=4096 main.ts"
    }
}

This script uses ts-node to run the TypeScript code directly and allocates extra memory for handling large data volumes.

Connecting to the Edlink API

Environment Variables

The first step is to set up environment variables for configuration. Create a new file called main.ts and start with the configuration:

import { Edlink, IntegrationState, Integration, TokenSetType } from '@edlink/typescript';
import axios from 'axios';

// Configuration from environment variables
const EDLINK_CLIENT_ID = process.env.EDLINK_CLIENT_ID;
const EDLINK_CLIENT_SECRET = process.env.EDLINK_CLIENT_SECRET;

// Initialize the Edlink API client
const edlink = new Edlink({
    version: 2,
    client_id: EDLINK_CLIENT_ID,
    client_secret: EDLINK_CLIENT_SECRET
});

Listing Integrations

Next, add code to fetch the list of integrations from the Edlink API:

async function sync() {
    // List all available integrations
    const { data } = await axios.get('https://ed.link/api/v1/integrations', {
        headers: {
            Authorization: `Bearer ${EDLINK_CLIENT_SECRET}`
        }
    });

    // Filter to only the active integrations
    const integrations: Integration[] = data.$data.filter((integration: any) => integration.status === IntegrationState.Active);

    console.log(`Found ${integrations.length} active integrations.`);
}

This code uses axios to directly call the Edlink API to list all integrations, then filters to only include those with an "Active" status. This design decision ensures that we only process data for integrations that are currently operational, saving resources.

Fetching and Transforming Data

Create the Ingest Function

Now, add a function to fetch data for a specific integration:

async function ingest(integration: Integration) {
    console.log('Starting sync for integration:', integration.id);

    // Set the tokens for this integration so we can use the SDK to make requests
    const context = edlink.use({
        access_token: integration.access_token,
        type: TokenSetType.Integration
    });

    // Storage for our API data
    const result: Map<string, any[]> = new Map([
        ['districts', []],
        ['schools', []],
        ['sessions', []],
        ['subjects', []],
        ['courses', []],
        ['classes', []],
        ['people', []],
        ['enrollments', []],
        ['agents', []]
    ]);

    // Fetch the data from the API
    for await (const district of context.districts.list()) {
        result.get('districts')!.push(district);
    }

    for await (const school of context.schools.list()) {
        result.get('schools')!.push(school);
    }

    // Fetch the rest of the data...
    for await (const session of context.sessions.list()) {
        result.get('sessions')!.push(session);
    }

    for await (const course of context.courses.list({ expand: ['subject', 'session'] })) {
        result.get('courses')!.push(course);
    }

    for await (const cls of context.classes.list()) {
        result.get('classes')!.push(cls);
    }

    for await (const person of context.people.list()) {
        result.get('people')!.push(person);
    }

    for await (const enrollment of context.enrollments.list()) {
        result.get('enrollments')!.push(enrollment);
    }

    for await (const agent of context.agents.list()) {
        result.get('agents')!.push(agent);
    }

    // Apply transformations (we'll define this next)
    const formatted = format(result);

    console.log('Data transformation complete for integration:', integration.id);

    // Return formatted data for further processing
    return formatted;
}

Creating Data Transformations

Next, create a function to transform the raw API data into the desired output formats:

import { Person, Role, IdentifierType, Gender, Agent, Class, Course, SessionType, GradeLevel, Enrollment, ClassState } from '@edlink/typescript';

function format(data: Map<string, any[]>): Map<string, any[]> {
    // Define output file structure
    const formatted: Map<string, any[]> = new Map([
        ['student', []],
        ['staff', []],
        ['parents', []],
        ['course_grades', []],
        ['course_catalog', []],
        ['case_load', []],
        ['case_load_members', []],
        ['case_load_import', []],
        ['gpa_import', []]
    ]);

    // Indices for relationship lookups
    const people: Map<string, Person> = new Map();
    const classes: Map<string, Class> = new Map();
    const courses: Map<string, Course> = new Map();

    // Process people data first
    for (const person of data.get('people')! as Iterable<Person>) {
        // Save the person into the index
        people.set(person.id, person);

        // Process students
        if (person.roles.includes(Role.Student)) {
            const iso = person.demographics.birthday?.toString();
            const school = person.school_ids.sort().pop();

            formatted.get('student')!.push({
                SchoolID: school,
                StudentID: person.id,
                Email: person.email,
                FirstName: person.first_name,
                MiddleName: person.middle_name,
                LastName: person.last_name,
                // Format all required student fields...
                DateOfBirth: iso ? iso.slice(5, 7) + '/' + iso.slice(8, 10) + '/' + iso.slice(0, 4) : null,
                Gender: person.demographics.gender === Gender.Other ? 'X' : person.demographics.gender
                // Additional fields...
            });

            // Add to GPA import
            formatted.get('gpa_import')!.push({
                SchoolID: school,
                StudentID: person.id,
                CGPA: person.gpa
            });
        }

        // Process teachers
        if (person.roles.includes(Role.Teacher)) {
            formatted.get('staff')!.push({
                SchoolID: person.school_ids.sort().pop(),
                StaffID: person.id,
                FirstName: person.first_name,
                LastName: person.last_name,
                Email: person.email,
                Role: 'Teacher'
            });
        }

        // Process administrators
        if (person.roles.includes(Role.Administrator)) {
            formatted.get('staff')!.push({
                SchoolID: person.school_ids.sort().pop(),
                StaffID: person.id,
                FirstName: person.first_name,
                LastName: person.last_name,
                Email: person.email,
                Role: 'Schooladmin'
            });
        }
    }

    // Process parent/guardian relationships
    for (const agent of data.get('agents')! as Iterable<Agent>) {
        const target = people.get(agent.target_id);
        const observer = people.get(agent.observer_id);

        if (!target || !observer) {
            continue;
        }

        formatted.get('parents')!.push({
            SchoolID: target.school_ids.sort().pop(),
            StudentID: target.id,
            FirstName: observer.first_name,
            LastName: observer.last_name,
            Email: observer.email,
            Phone: observer.phone
        });
    }

    // Build course catalog
    for (const course of data.get('courses')! as Iterable<Course>) {
        courses.set(course.id, course);

        formatted.get('course_catalog')!.push({
            SubjectName: course.subject?.name,
            CourseTitle: course.name,
            CourseNumber: course.code,
            // Include course catalog fields...
            TermType: (() => {
                switch (course.session?.type) {
                    case SessionType.SchoolYear:
                        return 'Year';
                    case SessionType.Semester:
                        return 'Semester';
                    default:
                        return 'Quarter';
                }
            })()
            // Additional fields...
        });
    }

    // Index classes by ID
    for (const cls of data.get('classes')! as Iterable<Class>) {
        if (!cls.course_id) {
            continue;
        }
        classes.set(cls.id, cls);
    }

    // Process enrollments and create course grades
    for (const enrollment of data.get('enrollments')! as Iterable<Enrollment>) {
        if (enrollment.role !== Role.Student) {
            continue;
        }

        const student = people.get(enrollment.person_id);
        if (!student) {
            continue;
        }

        const cls = classes.get(enrollment.class_id);
        if (!cls?.course_id) {
            continue;
        }

        const course = courses.get(cls.course_id);
        if (!course) {
            continue;
        }

        formatted.get('course_grades')!.push({
            SchoolID: cls.school_id,
            StudentID: student.id,
            SchoolYear: course.session?.end_date?.slice(0, 4),
            // Include course grade fields...
            WorkInProgress: [ClassState.Active, ClassState.Upcoming].includes(cls.state) ? 'Y' : 'N',
            CourseGrade: enrollment.final_numeric_grade
        });
    }

    return formatted;
}

This transformation function is where the "business logic" lives. It takes raw API data and structures it into domain-specific formats required by the target system. Key design decisions in this function include:

Creating indexed mappings for efficient lookups (people, classes, courses)
Handling role-based transformations (students, teachers, administrators)
Formatting data fields to match destination system requirements
Filtering records to include only relevant data

Sending Data to an SFTP Server

Configure SFTP Connection

Now add the SFTP configuration to your main.ts file:

import SFTPClient from 'ssh2-sftp-client';
import Papa from 'papaparse';

// SFTP Configuration
const SFTP_USERNAME = process.env.SFTP_USERNAME;
const SFTP_PASSWORD = process.env.SFTP_PASSWORD;
const SFTP_PORT = process.env.SFTP_PORT;
const SFTP_HOST = process.env.SFTP_HOST;

const SFTP_CONFIG = {
    host: SFTP_HOST,
    port: SFTP_PORT,
    username: SFTP_USERNAME,
    password: SFTP_PASSWORD
};

// Initialize SFTP client
const sftp = new SFTPClient();

Update the Ingest Function to Upload Files

Modify the ingest function to upload the formatted data to the SFTP server:

async function ingest(integration: Integration) {
    // ... existing code to fetch and format data

    try {
        console.log('Opening SFTP connection.');

        // Connect to the SFTP server
        await sftp.connect(SFTP_CONFIG);

        console.log('Connected to SFTP.');

        // Upload each formatted file
        for (const [filename, data] of formatted.entries()) {
            if (data.length === 0) {
                continue;
            }

            // Convert data to CSV and upload
            await sftp.put(Buffer.from(Papa.unparse(data)), `/remote/path/${integration.id}/${filename}.csv`);
        }
    } catch (err) {
        console.error('SFTP Upload Error:', err);
    } finally {
        await sftp.end();
    }

    console.log('Sync complete for integration:', integration.id);
}

This code:

Connects to the SFTP server
Converts each data array to CSV format using PapaParse
Uploads each file to a dedicated folder on the SFTP server
Properly closes the connection regardless of success or failure

Running Multiple Integrations in Parallel

Configure the Job Queue

To handle multiple integrations efficiently, add a job queue with parallelization:

import Queue from 'promise-queue';

// Configurable parallelization
const SYNC_INTERVAL = process.env.SYNC_INTERVAL ? parseInt(process.env.SYNC_INTERVAL) : 1000 * 60 * 60 * 24; // 24 hours
const MAX_PARALLEL_JOBS = process.env.MAX_PARALLEL_JOBS ? parseInt(process.env.MAX_PARALLEL_JOBS) : 1;

// Initialize queue with configurable concurrency
const queue = new Queue(MAX_PARALLEL_JOBS, Infinity);

Update the Sync Function

Now modify the sync function to use the queue:

async function sync() {
    // ... existing code to fetch integrations

    // Process each integration using the queue
    for (const integration of integrations) {
        queue.add(() => ingest(integration));
    }
}

// Run sync on startup and then on interval
setInterval(sync, SYNC_INTERVAL);
sync();

This design allows:

Configurable concurrency through environment variables
Automatic throttling to prevent overloading the API or SFTP server
Periodic syncing on a customizable interval

Containerizing with Docker

Create a Dockerfile

Create a Dockerfile to containerize the application:

FROM node:18-alpine
WORKDIR /usr/src/edlink/api-to-sftp
ENV NODE_ENV production

COPY . .
RUN apk add --no-cache --virtual .gyp \
    python3 \
    make \
    g++ \
    && yarn \
    && apk del .gyp

ENTRYPOINT [ "yarn", "start" ]

This Dockerfile:

Uses Node.js 18 on Alpine Linux for a small image footprint
Installs build dependencies temporarily to compile native modules
Installs dependencies using Yarn
Removes build dependencies to keep the image small
Configures the container to run the application on startup

Building and Running the Container

Build and run the Docker container:

# Build the container
docker build -t api-to-sftp .

# Run the container with environment variables
docker run -d \
  --name api-to-sftp \
  -e EDLINK_CLIENT_ID=your_client_id \
  -e EDLINK_CLIENT_SECRET=your_client_secret \
  -e SFTP_HOST=your_sftp_host \
  -e SFTP_PORT=your_sftp_port \
  -e SFTP_USERNAME=your_sftp_username \
  -e SFTP_PASSWORD=your_sftp_password \
  -e SYNC_INTERVAL=86400000 \
  -e MAX_PARALLEL_JOBS=2 \
  api-to-sftp

Monitoring and Maintenance

Logging

The application logs important events to the console. When running in Docker, you can view these logs with:

docker logs -f api-to-sftp

Updating CSV Formats

To modify the CSV format, you can update the format function. For example, to add a new field to the student records:

formatted.get('student')!.push({
    // Existing fields...
    SchoolID: school,
    StudentID: person.id,

    // Add a new field
    EnrollmentDate: person.created.slice(0, 10)
});

Adding New Output Files

To add a new output file, update the formatted map in the format function:

const formatted: Map<string, any[]> = new Map([
    // Existing files...
    ['student', []],
    ['staff', []],

    // Add a new file
    ['attendance', []]
]);

// Later in the function, populate the new file
for (const attendance of data.get('attendance')! as Iterable<any>) {
    formatted.get('attendance')!.push({
        // Format attendance data
    });
}

Design Decisions and Lessons Learned

API Access Pattern

The project uses the official Edlink SDK for most operations, but directly calls the API for listing integrations. This hybrid approach:

Takes advantage of the SDK's convenience for detailed data operations
Provides flexibility when the SDK doesn't fully cover necessary functionality

Filtering Active Integrations

The code filters for active integrations only, which:

Reduces API load by avoiding processing inactive integrations
Prevents uploading potentially outdated data from deprecated integrations
Makes the process more efficient overall

Using Promise-Queue for Parallelization

The promise-queue library provides:

Configurable concurrency to optimize resource usage
Automatic throttling to prevent overwhelming external systems
Sequential execution within each integration to maintain data consistency

Map-based Data Structures

The code uses JavaScript Map objects extensively to:

Create indexed lookups for efficient relationship resolution
Organize formatted data by file type
Handle large datasets efficiently

Error Handling and Connection Management

The SFTP connection is wrapped in a try/finally block to ensure:

Connection errors are caught and logged
The SFTP connection is always properly closed
Failed syncs don't affect subsequent operations

Container-based Deployment

Containerizing the application provides:

Consistent execution environment across different hosting platforms
Simple configuration through environment variables
Built-in isolation for security and resource management
Straightforward deployment and scaling options

By following this cookbook, you've created a robust integration that can reliably move data from the Edlink API to an SFTP server, with configurable parallelization, containerization, and maintainable data transformations.