Distributed Systems With Node.js: Part 1 Why Distributed?

Matthew MacFarquhar
5 min readApr 19, 2024

Introduction

In this series, I will be working through this book on Distributed Systems using Node.js. The book is quite large and in depth, but in these articles I will be distilling the big picture pieces of the book into bite size tutorials and walk throughs.

In this section, we will be exploring the Node JS engine and then setting up an extremely simple “distributed” system with a data producer service and a consumer service running on two separate node processes.

The code for this demo is found in this repo at the commit here https://github.com/mattmacf98/distributed-systems-node-js/tree/0fc0a8a30aa62fae168118a29cc4a54b66929f52

Single Threaded Javascript

Javascript and Node JS were created as single threaded languages. There has been recent work to change that and introduce multi-threaded runtimes to Javascript. However, at this time, these projects have not been fully embraced by the Javascript community. Everything in this book will run Javascript as a single threaded process.

The Event Loop

You might be thinking “Wait I have seen multi-threaded js with promises, async/await and SetTimeout”, Javascript is able to give the illusion of being multithreaded by using a very clever system called the event loop.

A Javascript process starts by executing all synchronous actions in the code, maintaining a stack of function calls as it does so. Async operations will be added to a specific queue to be processed by the JS event loop.

  1. Poll: Javascript checks the poll phase for things like I/O callbacks from file operations or url requests
  2. Check: Javascript checks for any callbacks in the queue spawned by setImmediate()
  3. Close: Javascript checks for callbacks on close events (i.e. a TCP connection closing callback)
  4. Timers: Javascript checks for setTimeout() and setInterval() callbacks which are ready to be executed
  5. Pending: Javascript checks and runs some special event callbacks like connection to a TCP socket being refused

Javascript also has a MicroTask queue which takes priority over the regular queues and will be checked and emptied between each of the above steps.

  1. Next Tick: Javascript checks and runs any callbacks created using process.nextTick()
  2. Promises: Javascript checks for any promises that have resolve or reject callbacks ready to execute (async/await is just syntactic sugar for promises)

The book gave a few cool event loop exercises to work through, below is one.

const fs = require('fs');

setImmediate(() => console.log(1));
Promise.resolve().then(() => console.log(2));
process.nextTick(() => console.log(3));
fs.readFile(__filename, () => {
console.log(4);
setTimeout(() => console.log(5));
setImmediate(() => console.log(6));
process.nextTick(() => console.log(7));
});
console.log(8);

Tracing through this code we can see that

  1. console.log(8) is the only synchronous log so it is logged first
  2. console.log(3) is a nextTick callback, which is at the head of the MicroTask queue
  3. console.log(2) is a Promise resolve callback which is also on the MicroTask queue (but with a lesser priority than nextTicks)
  4. console.log(1) is done next as our JS process moves from poll phase to check phase.
  5. console.log(4) is run since we hangout in our poll phase until our file is read and its callback is executed, 4 is the only synchronous log in that callback
  6. console.log(7) is run since it is a nextTick callback
  7. console.log(6) is run since we enter our check phase and run the setImmediate callback
  8. console.log(5) is run once we transition into the timers phase

Some Event Loop Gotchas

The book then goes into a few gotchas when coding in Node JS

Starving the Event Loop

Since javascript really is not multi-threaded, we can delay and block some important processes by running a lot of code synchronously and not letting our event loop check for other tasks, this is called starving the event loop.

To avoid it, break up that expensive 10,000 record CPU processing task into ten 1,000 record processing tasks and spin them of one after another using setImmediate.

If true async else sync

The problem arises when code that is supposed to be executed asynchronously ends up executing synchronously or vice versa. This inconsistency can lead to hard-to-debug issues.

Imagine a function that accepts a callback to be executed asynchronously. If that function sometimes executes the callback synchronously and sometimes asynchronously, it can lead to unpredictable behavior in the code that uses it.

So make sure you are executing passed in callbacks in the same fashion, either all sync or all async for the branches you may enter.

Summary

  • Javascript is not — currently — multi-threaded
  • Javascript uses an event loop that checks some prioritized queues to give us the illusion of enabling async operations

Distributed

After our deep dive into Javascript’s runtime, the book jumps into creating a very simple “distributed” system that runs on two node js processes on our machine.

Producer

#!/usr/bin/env node

const server = require('fastify')();
const HOST = process.env.HOST || '127.0.0.1';
const PORT = process.env.PORT || 4000;

console.log(`worker pid=${process.pid}`);

server.get('/recipes/:id', async (req, res) => {
console.log(`worker request pid=${process.pid}`);
const id = Number(req.params.id);

if (id !== 42) {
res.statusCode = 404;
return { error: 'not_found' }
}

return {
producer_pid: process.pid,
recipe: {
id: id,
name: "Chicken Tikka Masala",
steps: "Throw it in a pot...",
ingredients: [
{id:1, name: "Chicken", quantity: "1 lb"},
{id:2, name: "Sauce", quantity: "2 cups"},
]
}
}
});

server.listen(PORT, HOST, () => {
console.log(`Producer running at http://${HOST}:${PORT}`);
})

Our recipe service is incredibly simple, when we get a request to recipes/42, we send back some hard coded recipe data. Our producer service lives on localhost:4000.

Consumer

#!/usr/bin/env node

const server = require('fastify')();
const fetch = require('node-fetch');
const HOST = process.env.HOST || '127.0.0.1';
const PORT = process.env.PORT || 3000;
const TARGET = process.env.TARGET || 'localhost:4000';

server.get('/', async () => {
const res = await fetch(`http://${TARGET}/recipes/42`);
const producer_data = await res.json();

return {
consumer_pid: process.pid,
producer_data: producer_data
};
});

server.listen(PORT, HOST, () => {
console.log(`Consumer running at http://${HOST}:${PORT}`)
})

Our consumer is even simpler! When we get a request, we simply call our producer service and return all the recipe data — and meta-data about our consumer & producer — to the caller.

We can test our consumer to producer hookup by calling our consumer when both services are running.

curl http://127.0.0.1:3000

Summary

  • We can spin up separate Node JS servers on separate processes on the same machine
  • We can allow these processes to talk to each other over HTTP

Conclusion

In the first chapter of this book, we got a very cool look into the internals of Node JS. We also set up a bare bones simple consumer-producer system for us to build on down the road. In the next chapter, we will go over protocols to explore how these services will communicate to each other and how we, as outside entities, will communicate with them.

--

--

Matthew MacFarquhar

I am a software engineer working for Amazon living in SF/NYC.