Understanding how to properly use bunqueue #12

arthurvanl · 2026-03-01T13:27:26Z

arthurvanl
Mar 1, 2026

So I had this question on discord sent but asked to sent it here:

So I want to run a job that will process data from end to another.

It needs to fetch data from the ftp server. I have a controller setup in the main process for that. But I am not sure if that is a problem or that should have a controller inside the spawned process instead of reusing this from the main process (otherwise mayhbe extra hard on the heap?)

Then after fetching the data I need to filter out the correct data and create/update that data if not already created.

I also check if old data should be removed which I again use controllers for from the main process.

How should create this logic inside a job? maybe with child dependencies???

The data is going to be very large. Like 2-300MB per job. But I also need to fetch the already created data to see if there are any changes made from the ftp. So that would round up to a 400MB or something. What do you think is best to use bunqueue for this problem?

I have tried some stuff already.

I setupd a systemd file for bunqueue like this:

I tried the docs example but that kept crashing for me *

This is my code setup in the app:

But events do not show anything
The spawned worker seems to do nothing

This is my spawned worker file:

ts
import { write } from "bun";

export default async (job: {
    id: string;
    data: any;
    queue: string;
    attempts: number;
    progress: (value: number) => void;
}) => {
    job.progress(50);
    await write('sdsdapsidujgoasdfijyhsdgf', 'SADCASDASDFASDF')
    // const result = await heavyComputation(job.data);
    job.progress(100);
    return {test: 4};
};

It seems like something is being queued but I dont get queue logs from it

When using cli I get this:

Running this all on a ubuntu hetzner server.

Answered by egeominotti

Mar 11, 2026

Great questions!

Embedded vs TCP: what to choose

Since you already have a production API running in the same process and you want to offload heavy work, embedded mode with SandboxedWorker is actually the right choice for your use case. Each SandboxedWorker runs in an isolated Bun Worker thread with its own heap, so it won't block your API's event loop or compete for the same memory. That's exactly the separation you're looking for.

The TCP (server mode) alternative would be: run bunqueue start as a separate systemd service, then connect your app via new Worker(queue, processor, { connection: { port: 6789 } }). This gives you process-level isolation (separate PID) and lets you scale the qu…

View full answer

arthurvanl · 2026-03-01T13:46:44Z

arthurvanl
Mar 1, 2026
Author

I found more info about the jobs:

but yea no idea why it's not running.

0 replies

egeominotti · 2026-03-01T13:56:53Z

egeominotti
Mar 1, 2026
Maintainer

Hey Arthur, thanks for the detailed writeup! I can see a few things going on here, let me walk through them.

The main issue: mode mismatch

You're running bunqueue as a systemd service (TCP server mode), which is correct. But the worker setup in your app seems to follow the SandboxedWorker pattern (the file with export default async (job) => {...}), and SandboxedWorker only works in embedded mode. It uses an in-process queue manager, so it never connects to your TCP server. That means jobs pushed to the server via TCP sit there, and the sandboxed worker looks at its own local empty queue and finds nothing.

The same applies to events: SandboxedWorker does not emit events like completed or failed. Only the regular Worker class has the EventEmitter.

How to fix it

Since you already have the bunqueue server running via systemd, your app should use the regular Worker class with a TCP connection. Both the Queue and Worker need to point to the same server:

import { Queue, Worker } from 'bunqueue/client';

// Both connect to your bunqueue systemd server
const queue = new Queue('ftp-pipeline', {
  connection: { host: 'localhost', port: 6789 }
});

const worker = new Worker('ftp-pipeline', async (job) => {
  console.log(`Processing: ${job.name}`, job.data);

  await job.updateProgress(10, 'Starting...');

  // your logic here
  const result = await doWork(job.data);

  await job.updateProgress(100, 'Done');
  return result;
}, {
  connection: { host: 'localhost', port: 6789 },
  concurrency: 1,
});

// Now events work
worker.on('completed', (job, result) => {
  console.log(`Job ${job.id} completed`, result);
});

worker.on('failed', (job, error) => {
  console.error(`Job ${job.id} failed`, error.message);
});

The key rule is: if Queue uses TCP (no embedded: true), the Worker must also use TCP with the same connection settings. If one is embedded and the other is TCP, they operate on completely separate queue managers and never see each other's jobs.

About your large data (200-400MB)

This is important: do not put 200-400MB of raw data inside the job payload. Job data gets serialized via msgpack and stored in SQLite, so large payloads will cause memory spikes, slow serialization, and database bloat.

Instead, store only references in the job data (FTP paths, file IDs, database query parameters) and let the worker fetch the actual data during processing:

// Good: small job payload with references
await queue.add('fetch-ftp', {
  ftpHost: 'ftp.example.com',
  remotePath: '/data/export-2026-03-01.csv',
  localTempPath: '/tmp/ftp-downloads/'
});

// Inside the worker, download and process directly
const worker = new Worker('ftp-pipeline', async (job) => {
  if (job.name === 'fetch-ftp') {
    const data = await downloadFromFTP(job.data.ftpHost, job.data.remotePath);
    await saveToTempFile(job.data.localTempPath, data);
    return { filePath: job.data.localTempPath, recordCount: data.length };
  }
  // ...
});

For your pipeline: FlowProducer

For the multi-step workflow (fetch from FTP, filter, create/update, cleanup old data), you can use FlowProducer.addChain() which works in TCP mode and creates sequential dependencies automatically:

import { FlowProducer, Worker } from 'bunqueue/client';

const flow = new FlowProducer({
  connection: { host: 'localhost', port: 6789 }
});

// Create the pipeline: each step waits for the previous one to complete
const { jobIds } = await flow.addChain([
  {
    name: 'fetch-ftp',
    queueName: 'ftp-pipeline',
    data: { ftpHost: 'ftp.example.com', remotePath: '/data/export.csv' }
  },
  {
    name: 'filter-data',
    queueName: 'ftp-pipeline',
    data: { criteria: 'active-only' }
  },
  {
    name: 'create-update-records',
    queueName: 'ftp-pipeline',
    data: { table: 'products' }
  },
  {
    name: 'cleanup-old',
    queueName: 'ftp-pipeline',
    data: { olderThanDays: 30 }
  },
]);

console.log('Pipeline started, job IDs:', jobIds);

Then a single worker handles all steps based on the job name:

const worker = new Worker('ftp-pipeline', async (job) => {
  switch (job.name) {
    case 'fetch-ftp': {
      await job.updateProgress(10, 'Connecting to FTP...');
      const records = await fetchFromFTP(job.data);
      await saveToDisk('/tmp/pipeline/raw.json', records);
      await job.updateProgress(100, 'Downloaded');
      return { recordCount: records.length, path: '/tmp/pipeline/raw.json' };
    }
    case 'filter-data': {
      const raw = await readFromDisk('/tmp/pipeline/raw.json');
      const filtered = raw.filter(r => r.status === 'active');
      await saveToDisk('/tmp/pipeline/filtered.json', filtered);
      return { filteredCount: filtered.length };
    }
    case 'create-update-records': {
      const filtered = await readFromDisk('/tmp/pipeline/filtered.json');
      const result = await upsertRecords(filtered);
      return { created: result.created, updated: result.updated };
    }
    case 'cleanup-old': {
      const deleted = await removeOldRecords(job.data.olderThanDays);
      return { deleted };
    }
  }
}, {
  connection: { host: 'localhost', port: 6789 },
  concurrency: 1,
});

Each step automatically waits for the previous one to complete before starting. The execution order is guaranteed: fetch → filter → create/update → cleanup.

Your systemd setup

Your systemd configuration for the bunqueue server looks fine. Just make sure your app code (the Queue and Worker) uses the connection option pointing to the same host/port where systemd is running the server. If both are on the same machine, localhost:6789 is the default and you don't even need to specify it explicitly.

Quick summary

The worker file you created (export default async (job) => {...}) is the SandboxedWorker processor pattern, which is embedded-only and isolated. For your TCP server setup, use the regular Worker class with an inline async function and a connection option. Keep job payloads small (references only, not raw data), and use FlowProducer.addChain() for your sequential pipeline.

Let me know if you run into anything else!

0 replies

arthurvanl · 2026-03-01T16:38:38Z

arthurvanl
Mar 1, 2026
Author

Thanks for detailed explanation. So I should switch to the regular Worker.

The main reason I want to use the SpawnedWorker is because I also have a running API. The API has a lot of calls so I thought it was better to move the long processing tasks outside in another program which would make it work smoother.

But what do you think is the best take? Should I start using embedded worker? or keep the structure as you described it to me?

I do like this FlowProducer I will look into that.
Is it also possible to send data from 1child process to another child process.

Like for example. The fetching data from the ftp and then using that data into the second child process. Or is the best just to write a file and read that out?

0 replies

egeominotti · 2026-03-11T14:18:54Z

egeominotti
Mar 11, 2026
Maintainer

Great questions!

Embedded vs TCP: what to choose

Since you already have a production API running in the same process and you want to offload heavy work, embedded mode with SandboxedWorker is actually the right choice for your use case. Each SandboxedWorker runs in an isolated Bun Worker thread with its own heap, so it won't block your API's event loop or compete for the same memory. That's exactly the separation you're looking for.

The TCP (server mode) alternative would be: run bunqueue start as a separate systemd service, then connect your app via new Worker(queue, processor, { connection: { port: 6789 } }). This gives you process-level isolation (separate PID) and lets you scale the queue server independently. But it adds network overhead and operational complexity. For your scenario (single server, API + workers), embedded + SandboxedWorker is simpler and works well.

TL;DR: Stay with embedded + SandboxedWorker. Switch to TCP/server mode only if you need to run workers on separate machines or want independent scaling.

Passing data between chain steps

Yes! FlowProducer.addChain() supports this. Each step in the chain can access the previous step's return value using getParentResult():

import { FlowProducer } from 'bunqueue/client';

const flow = new FlowProducer({ embedded: true });

await flow.addChain([
  { name: 'fetch-ftp', queueName: 'pipeline', data: { ftpPath: '/export.csv' } },
  { name: 'filter', queueName: 'pipeline', data: {} },
  { name: 'upsert', queueName: 'pipeline', data: {} },
]);

In your SandboxedWorker processor, each step can get the previous result:

// In the processor file
export default async (job) => {
  if (job.name === 'fetch-ftp') {
    const records = await downloadFTP(job.data.ftpPath);
    await Bun.write('/tmp/pipeline/raw.json', JSON.stringify(records));
    return { path: '/tmp/pipeline/raw.json', count: records.length };
  }

  if (job.name === 'filter') {
    // Access previous step's result via __flowParentId in job.data
    const parentResult = await flow.getParentResult(job.data.__flowParentId);
    // parentResult = { path: '/tmp/pipeline/raw.json', count: 1234 }
    const raw = JSON.parse(await Bun.file(parentResult.path).text());
    const filtered = raw.filter(r => r.active);
    await Bun.write('/tmp/pipeline/filtered.json', JSON.stringify(filtered));
    return { path: '/tmp/pipeline/filtered.json', count: filtered.length };
  }
};

Important: large data (200-400MB)

Never pass large data through job results. Results are stored in an LRU cache (max 10,000 entries) and SQLite. Serializing 200MB+ objects will cause memory spikes and slow down the queue.

The pattern you should use:

Write large data to disk (temp files)
Return only the file path as the job result (small string)
Next step reads from disk using that path

This way the job result is just { path: '/tmp/pipeline/raw.json' } (a few bytes) instead of 200MB of serialized data. This is exactly what you suggested — and yes, it's the recommended approach.

Let me know if you need help setting up the chain!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding how to properly use bunqueue #12

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Understanding how to properly use bunqueue #12

Uh oh!

arthurvanl Mar 1, 2026

Embedded vs TCP: what to choose

Replies: 4 comments

Uh oh!

arthurvanl Mar 1, 2026 Author

Uh oh!

egeominotti Mar 1, 2026 Maintainer

The main issue: mode mismatch

How to fix it

About your large data (200-400MB)

For your pipeline: FlowProducer

Your systemd setup

Quick summary

Uh oh!

arthurvanl Mar 1, 2026 Author

Uh oh!

egeominotti Mar 11, 2026 Maintainer

Embedded vs TCP: what to choose

Passing data between chain steps

Important: large data (200-400MB)

arthurvanl
Mar 1, 2026

arthurvanl
Mar 1, 2026
Author

egeominotti
Mar 1, 2026
Maintainer

arthurvanl
Mar 1, 2026
Author

egeominotti
Mar 11, 2026
Maintainer