For Developers

Exporting from Edlink to SFTP

This cookbook will guide you through creating a system that pulls data from the Edlink API, transforms it, and sends it to an SFTP server. By the end, you'll have a robust integration that can sync educational data from our system to yours in a containerized environment. This project is fully open source and you can find all the code on GitHub, released under the MIT License. Feel free to fork, modify, and adapt it to your needs.

Project Overview

The API-to-SFTP integration project solves a common problem in educational technology: getting data into systems that only support SFTP. Here's what we're going to do:

  1. Connects to an the Edlink API to fetch data from your integrations.
  2. Retrieves data about districts, schools, courses, classes, and people.
  3. Transforms the data into domain-specific CSV formats.
  4. Uploads these files to an SFTP server for consumption by other systems.

This pattern is useful for integrating with legacy systems that require file-based imports, or for creating periodic data snapshots for analytics, backups, or compliance requirements.

Setting Up the Project

Initialize the Project

First, create a new directory for your project and initialize it:

mkdir api-to-sftp
cd api-to-sftp
npm init -y

Install Dependencies

This project requires several key dependencies:

npm install @edlink/typescript axios papaparse promise-queue ssh2-sftp-client ts-node typescript
npm install --save-dev @types/node @types/papaparse @types/promise-queue @types/ssh2-sftp-client
PackageDescription
@edlink/typescriptSDK for interacting with the Edlink API
axiosHTTP client for making API requests
papaparseCSV parser/generator for transforming data
promise-queueJob queue for handling parallel operations
ssh2-sftp-clientSFTP client for uploading files
ts-nodeTypeScript execution environment
typescriptTypeScript compiler

Configure TypeScript

Create a tsconfig.json file with the following configuration:

{
    "compilerOptions": {
        "module": "Node16",
        "target": "es2020",
        "moduleResolution": "Node16",
        "esModuleInterop": true,
        "composite": true,
        "incremental": true,
        "declaration": true,
        "sourceMap": true,
        "emitDecoratorMetadata": true,
        "experimentalDecorators": true,
        "skipLibCheck": true,
        "skipDefaultLibCheck": true,
        "downlevelIteration": true,
        "types": ["node"],
        "emitDeclarationOnly": true,
        "strict": false,
        "outDir": "dist"
    },
    "exclude": ["**/node_modules", "**/dist", ".github", ".husky", ".vscode", ".yarn", "**/*.spec.ts", "**/*.test.ts", "**/*._test.ts"]
}

Update package.json Scripts

Modify your package.json to include a start script:

{
    "scripts": {
        "start": "node -r ts-node/register/transpile-only --max-old-space-size=4096 main.ts"
    }
}

This script uses ts-node to run the TypeScript code directly and allocates extra memory for handling large data volumes.

Environment Variables

The first step is to set up environment variables for configuration. Create a new file called main.ts and start with the configuration:

import { Edlink, IntegrationState, Integration, TokenSetType } from '@edlink/typescript';
import axios from 'axios';

// Configuration from environment variables
const EDLINK_CLIENT_ID = process.env.EDLINK_CLIENT_ID;
const EDLINK_CLIENT_SECRET = process.env.EDLINK_CLIENT_SECRET;

// Initialize the Edlink API client
const edlink = new Edlink({
    version: 2,
    client_id: EDLINK_CLIENT_ID,
    client_secret: EDLINK_CLIENT_SECRET
});

Listing Integrations

Next, add code to fetch the list of integrations from the Edlink API:

async function sync() {
    // List all available integrations
    const { data } = await axios.get('https://ed.link/api/v1/integrations', {
        headers: {
            Authorization: `Bearer ${EDLINK_CLIENT_SECRET}`
        }
    });

    // Filter to only the active integrations
    const integrations: Integration[] = data.$data.filter((integration: any) => integration.status === IntegrationState.Active);

    console.log(`Found ${integrations.length} active integrations.`);
}

This code uses axios to directly call the Edlink API to list all integrations, then filters to only include those with an "Active" status. This design decision ensures that we only process data for integrations that are currently operational, saving resources.

Fetching and Transforming Data

Create the Ingest Function

Now, add a function to fetch data for a specific integration:

async function ingest(integration: Integration) {
    console.log('Starting sync for integration:', integration.id);

    // Set the tokens for this integration so we can use the SDK to make requests
    const context = edlink.use({
        access_token: integration.access_token,
        type: TokenSetType.Integration
    });

    // Storage for our API data
    const result: Map<string, any[]> = new Map([
        ['districts', []],
        ['schools', []],
        ['sessions', []],
        ['subjects', []],
        ['courses', []],
        ['classes', []],
        ['people', []],
        ['enrollments', []],
        ['agents', []]
    ]);

    // Fetch the data from the API
    for await (const district of context.districts.list()) {
        result.get('districts')!.push(district);
    }

    for await (const school of context.schools.list()) {
        result.get('schools')!.push(school);
    }

    // Fetch the rest of the data...
    for await (const session of context.sessions.list()) {
        result.get('sessions')!.push(session);
    }

    for await (const course of context.courses.list({ expand: ['subject', 'session'] })) {
        result.get('courses')!.push(course);
    }

    for await (const cls of context.classes.list()) {
        result.get('classes')!.push(cls);
    }

    for await (const person of context.people.list()) {
        result.get('people')!.push(person);
    }

    for await (const enrollment of context.enrollments.list()) {
        result.get('enrollments')!.push(enrollment);
    }

    for await (const agent of context.agents.list()) {
        result.get('agents')!.push(agent);
    }

    // Apply transformations (we'll define this next)
    const formatted = format(result);

    console.log('Data transformation complete for integration:', integration.id);

    // Return formatted data for further processing
    return formatted;
}

Creating Data Transformations

Next, create a function to transform the raw API data into the desired output formats:

import { Person, Role, IdentifierType, Gender, Agent, Class, Course, SessionType, GradeLevel, Enrollment, ClassState } from '@edlink/typescript';

function format(data: Map<string, any[]>): Map<string, any[]> {
    // Define output file structure
    const formatted: Map<string, any[]> = new Map([
        ['student', []],
        ['staff', []],
        ['parents', []],
        ['course_grades', []],
        ['course_catalog', []],
        ['case_load', []],
        ['case_load_members', []],
        ['case_load_import', []],
        ['gpa_import', []]
    ]);

    // Indices for relationship lookups
    const people: Map<string, Person> = new Map();
    const classes: Map<string, Class> = new Map();
    const courses: Map<string, Course> = new Map();

    // Process people data first
    for (const person of data.get('people')! as Iterable<Person>) {
        // Save the person into the index
        people.set(person.id, person);

        // Process students
        if (person.roles.includes(Role.Student)) {
            const iso = person.demographics.birthday?.toString();
            const school = person.school_ids.sort().pop();

            formatted.get('student')!.push({
                SchoolID: school,
                StudentID: person.id,
                Email: person.email,
                FirstName: person.first_name,
                MiddleName: person.middle_name,
                LastName: person.last_name,
                // Format all required student fields...
                DateOfBirth: iso ? iso.slice(5, 7) + '/' + iso.slice(8, 10) + '/' + iso.slice(0, 4) : null,
                Gender: person.demographics.gender === Gender.Other ? 'X' : person.demographics.gender
                // Additional fields...
            });

            // Add to GPA import
            formatted.get('gpa_import')!.push({
                SchoolID: school,
                StudentID: person.id,
                CGPA: person.gpa
            });
        }

        // Process teachers
        if (person.roles.includes(Role.Teacher)) {
            formatted.get('staff')!.push({
                SchoolID: person.school_ids.sort().pop(),
                StaffID: person.id,
                FirstName: person.first_name,
                LastName: person.last_name,
                Email: person.email,
                Role: 'Teacher'
            });
        }

        // Process administrators
        if (person.roles.includes(Role.Administrator)) {
            formatted.get('staff')!.push({
                SchoolID: person.school_ids.sort().pop(),
                StaffID: person.id,
                FirstName: person.first_name,
                LastName: person.last_name,
                Email: person.email,
                Role: 'Schooladmin'
            });
        }
    }

    // Process parent/guardian relationships
    for (const agent of data.get('agents')! as Iterable<Agent>) {
        const target = people.get(agent.target_id);
        const observer = people.get(agent.observer_id);

        if (!target || !observer) {
            continue;
        }

        formatted.get('parents')!.push({
            SchoolID: target.school_ids.sort().pop(),
            StudentID: target.id,
            FirstName: observer.first_name,
            LastName: observer.last_name,
            Email: observer.email,
            Phone: observer.phone
        });
    }

    // Build course catalog
    for (const course of data.get('courses')! as Iterable<Course>) {
        courses.set(course.id, course);

        formatted.get('course_catalog')!.push({
            SubjectName: course.subject?.name,
            CourseTitle: course.name,
            CourseNumber: course.code,
            // Include course catalog fields...
            TermType: (() => {
                switch (course.session?.type) {
                    case SessionType.SchoolYear:
                        return 'Year';
                    case SessionType.Semester:
                        return 'Semester';
                    default:
                        return 'Quarter';
                }
            })()
            // Additional fields...
        });
    }

    // Index classes by ID
    for (const cls of data.get('classes')! as Iterable<Class>) {
        if (!cls.course_id) {
            continue;
        }
        classes.set(cls.id, cls);
    }

    // Process enrollments and create course grades
    for (const enrollment of data.get('enrollments')! as Iterable<Enrollment>) {
        if (enrollment.role !== Role.Student) {
            continue;
        }

        const student = people.get(enrollment.person_id);
        if (!student) {
            continue;
        }

        const cls = classes.get(enrollment.class_id);
        if (!cls?.course_id) {
            continue;
        }

        const course = courses.get(cls.course_id);
        if (!course) {
            continue;
        }

        formatted.get('course_grades')!.push({
            SchoolID: cls.school_id,
            StudentID: student.id,
            SchoolYear: course.session?.end_date?.slice(0, 4),
            // Include course grade fields...
            WorkInProgress: [ClassState.Active, ClassState.Upcoming].includes(cls.state) ? 'Y' : 'N',
            CourseGrade: enrollment.final_numeric_grade
        });
    }

    return formatted;
}

This transformation function is where the "business logic" lives. It takes raw API data and structures it into domain-specific formats required by the target system. Key design decisions in this function include:

  1. Creating indexed mappings for efficient lookups (people, classes, courses)
  2. Handling role-based transformations (students, teachers, administrators)
  3. Formatting data fields to match destination system requirements
  4. Filtering records to include only relevant data

Sending Data to an SFTP Server

Configure SFTP Connection

Now add the SFTP configuration to your main.ts file:

import SFTPClient from 'ssh2-sftp-client';
import Papa from 'papaparse';

// SFTP Configuration
const SFTP_USERNAME = process.env.SFTP_USERNAME;
const SFTP_PASSWORD = process.env.SFTP_PASSWORD;
const SFTP_PORT = process.env.SFTP_PORT;
const SFTP_HOST = process.env.SFTP_HOST;

const SFTP_CONFIG = {
    host: SFTP_HOST,
    port: SFTP_PORT,
    username: SFTP_USERNAME,
    password: SFTP_PASSWORD
};

// Initialize SFTP client
const sftp = new SFTPClient();

Update the Ingest Function to Upload Files

Modify the ingest function to upload the formatted data to the SFTP server:

async function ingest(integration: Integration) {
    // ... existing code to fetch and format data

    try {
        console.log('Opening SFTP connection.');

        // Connect to the SFTP server
        await sftp.connect(SFTP_CONFIG);

        console.log('Connected to SFTP.');

        // Upload each formatted file
        for (const [filename, data] of formatted.entries()) {
            if (data.length === 0) {
                continue;
            }

            // Convert data to CSV and upload
            await sftp.put(Buffer.from(Papa.unparse(data)), `/remote/path/${integration.id}/${filename}.csv`);
        }
    } catch (err) {
        console.error('SFTP Upload Error:', err);
    } finally {
        await sftp.end();
    }

    console.log('Sync complete for integration:', integration.id);
}

This code:

  1. Connects to the SFTP server
  2. Converts each data array to CSV format using PapaParse
  3. Uploads each file to a dedicated folder on the SFTP server
  4. Properly closes the connection regardless of success or failure

Running Multiple Integrations in Parallel

Configure the Job Queue

To handle multiple integrations efficiently, add a job queue with parallelization:

import Queue from 'promise-queue';

// Configurable parallelization
const SYNC_INTERVAL = process.env.SYNC_INTERVAL ? parseInt(process.env.SYNC_INTERVAL) : 1000 * 60 * 60 * 24; // 24 hours
const MAX_PARALLEL_JOBS = process.env.MAX_PARALLEL_JOBS ? parseInt(process.env.MAX_PARALLEL_JOBS) : 1;

// Initialize queue with configurable concurrency
const queue = new Queue(MAX_PARALLEL_JOBS, Infinity);

Update the Sync Function

Now modify the sync function to use the queue:

async function sync() {
    // ... existing code to fetch integrations

    // Process each integration using the queue
    for (const integration of integrations) {
        queue.add(() => ingest(integration));
    }
}

// Run sync on startup and then on interval
setInterval(sync, SYNC_INTERVAL);
sync();

This design allows:

  1. Configurable concurrency through environment variables
  2. Automatic throttling to prevent overloading the API or SFTP server
  3. Periodic syncing on a customizable interval

Containerizing with Docker

Create a Dockerfile

Create a Dockerfile to containerize the application:

FROM node:18-alpine
WORKDIR /usr/src/edlink/api-to-sftp
ENV NODE_ENV production

COPY . .
RUN apk add --no-cache --virtual .gyp \
    python3 \
    make \
    g++ \
    && yarn \
    && apk del .gyp

ENTRYPOINT [ "yarn", "start" ]

This Dockerfile:

  1. Uses Node.js 18 on Alpine Linux for a small image footprint
  2. Installs build dependencies temporarily to compile native modules
  3. Installs dependencies using Yarn
  4. Removes build dependencies to keep the image small
  5. Configures the container to run the application on startup

Building and Running the Container

Build and run the Docker container:

# Build the container
docker build -t api-to-sftp .

# Run the container with environment variables
docker run -d \
  --name api-to-sftp \
  -e EDLINK_CLIENT_ID=your_client_id \
  -e EDLINK_CLIENT_SECRET=your_client_secret \
  -e SFTP_HOST=your_sftp_host \
  -e SFTP_PORT=your_sftp_port \
  -e SFTP_USERNAME=your_sftp_username \
  -e SFTP_PASSWORD=your_sftp_password \
  -e SYNC_INTERVAL=86400000 \
  -e MAX_PARALLEL_JOBS=2 \
  api-to-sftp

Monitoring and Maintenance

Logging

The application logs important events to the console. When running in Docker, you can view these logs with:

docker logs -f api-to-sftp

Updating CSV Formats

To modify the CSV format, you can update the format function. For example, to add a new field to the student records:

formatted.get('student')!.push({
    // Existing fields...
    SchoolID: school,
    StudentID: person.id,

    // Add a new field
    EnrollmentDate: person.created.slice(0, 10)
});

Adding New Output Files

To add a new output file, update the formatted map in the format function:

const formatted: Map<string, any[]> = new Map([
    // Existing files...
    ['student', []],
    ['staff', []],

    // Add a new file
    ['attendance', []]
]);

// Later in the function, populate the new file
for (const attendance of data.get('attendance')! as Iterable<any>) {
    formatted.get('attendance')!.push({
        // Format attendance data
    });
}

Design Decisions and Lessons Learned

API Access Pattern

The project uses the official Edlink SDK for most operations, but directly calls the API for listing integrations. This hybrid approach:

  • Takes advantage of the SDK's convenience for detailed data operations
  • Provides flexibility when the SDK doesn't fully cover necessary functionality

Filtering Active Integrations

The code filters for active integrations only, which:

  • Reduces API load by avoiding processing inactive integrations
  • Prevents uploading potentially outdated data from deprecated integrations
  • Makes the process more efficient overall

Using Promise-Queue for Parallelization

The promise-queue library provides:

  • Configurable concurrency to optimize resource usage
  • Automatic throttling to prevent overwhelming external systems
  • Sequential execution within each integration to maintain data consistency

Map-based Data Structures

The code uses JavaScript Map objects extensively to:

  • Create indexed lookups for efficient relationship resolution
  • Organize formatted data by file type
  • Handle large datasets efficiently

Error Handling and Connection Management

The SFTP connection is wrapped in a try/finally block to ensure:

  • Connection errors are caught and logged
  • The SFTP connection is always properly closed
  • Failed syncs don't affect subsequent operations

Container-based Deployment

Containerizing the application provides:

  • Consistent execution environment across different hosting platforms
  • Simple configuration through environment variables
  • Built-in isolation for security and resource management
  • Straightforward deployment and scaling options

By following this cookbook, you've created a robust integration that can reliably move data from the Edlink API to an SFTP server, with configurable parallelization, containerization, and maintainable data transformations.