Socket.io and multiple node instances
Socket.io is the most popular framework for web socket communication between node server and the client. Socket.io creates a session between server and client that allows for two way communication in an event-based manner that is familiar to JS developers. However, this solution creates a problem for typical node.js use case; that is running multiple instances. Due to the fact that each instance holds its own and only its own sessions, communication between client A connected to server instance A will not be seen by server instance B.
In a case shown above, messages sent by Alice will never reach Charlie. Moreover, Bob who is logged on two different devices will receive messages from Alice only in the browser and those from Charlie only in his mobile app.
This might not be an issue if you want to implement only backend — frontend communication. However, if you want the information to flow between multiple clients, as is the case in, for example, chat rooms, that might be connected to different instances, the solution must be found.
Adapters to the rescue
Whereas, there might be use-case specific solutions, like saving data in DB first and observing change streams on all instances, commonly we want to use something faster and less error-prone. Socket.io ecosystem includes special tools for such a case. This is where adapters come in. Those usually work using pubsub pattern where each server instance publishes to others the messages it emits. Subscribers — all other instances — upon receiving the published message, reemit it if necessary.
This allows handling our previously described issue with adapter connecting all of the instances.
Default Redis adapter problem
Commonly advised, including by official socket.io tutorials, is Redis based adapter socket.io-redis or — for folks using ioredis client — socket.io-ioredis. Those adapters are great tools that use Redis as a message broker and should not prove problematic with typical usage. However, you might see your server struggling in a certain case.
In Flip, we sometimes want to notify all followers of a certain user. For some of the more popular people that might mean sending socket message to over 4000 users. In our testing with socket.io-redis we have noticed a visible event loop blocking when we tried to do so. We have checked different calls that could be synchronous like message encoding and decoding but to no avail. After diagnosing the whole flow we found a culprit in Redis’s publish command. As we were going to update to ioredis anyway, we have checked how socket.io-ioredis adapter behaves when using a different client. Unfortunately, we did not notice any significant improvement.
We have realized that different broker must be used. Luckily for us, we already had RabbitMQ in our stack used for communication between services so we decided to try that. There is an amqp based adapter built by the community so we benchmarked it against Redis based ones. The results have exceeded our expectations and we decided to use amqp based adapter as our production solution. Due to our specific system, we have opted to bake adapter into our message bus, but for most setups socket.io-amqp will work out of the box. Depending on your stack might also try other adapters or write your own based on the broker of your choice.
For a quick comparison, here are the result of using different adapters when sending 1000000 messages that are 350 characters long strings.
Performing test of 1000000 emits for no transport …
Performing test of 1000000 emits for amqp …
Performing test of 1000000 emits for redis …
Performing test of 1000000 emits for ioredis …
You can perform this benchmark yourself on your setup simply. First, set up socket.io to use the adapter of choice.
Then wait for the adapter to connect to message broker and emit a large number of messages to one room and measure the time it took. While waiting for ready state might depend on a broker client you are using, for test purposes you can simply emulate it with simple timeout. Related useful one-liner:
const sleep = require(“util”).promisify(setTimeout); which can later be used as simply as
Importance of testing
In my humble opinion, the most important lesson from this experience is the importance of load and stress testing. If we did not try to run this test, we might have spent hours trying to debug why some instances randomly take over a minute to respond to a basic request. There is a value in running tests checking both internal and external components that might cause bottlenecks to your system because it is much easier to prevent than debug and fix on a live server.
Sidenote: this is my first post here, so please feel free to provide as much feedback as possible. Finally, you might want to look into how adapters work and try to create your own to understand what is going on underneath the facade. Tutorial on this is something I can write in the next post if someone will find it interesting.