Nodejs is faster than Java? It's all about IO

In this post I will talk about kernel, io and all these stuff but let's start with a simple question. Nodejs is faster than Java? I've heard it a lot from my friends and from forums on the Internet. Let's make it clear, I am a big fan of Java ,specifically Java Runtime.However, I don't like Annotation driven development though.Let me explain what it means for Java to be slower.A little spoiler, it's all about IO.

Small Tom
Tom & Jerry by William Hanna and Joseph Barbera

Why is it slower?

Let's understand why Java is slower than NodeJs. First of all what do we mean by saying that one language is slower than another. Well language itself can't be fast, speed depends on the runtime. In this blog I will talk about Hotspot and OpenJdk runtime which implies that Java's runtime includes interpeter and two JIT compilers. Usually people talk about how fast a language could perform a CPU intensive task. Something like matrix multiplication or finding n-th number in Fibonacci sequence. In this case Java(Hotspot) beats Nodejs and even c++ (sometimes).Why ? Because of JIT. Java is a compiled language, it is compiled into a specific language called bytecode. JVM interprets this bytecode instruction by instruction which is of course slower than executing compiled machine code. However, JVM has two JIT compilers that periodically compile hot methods into machine specific code. For this to work , Hotspot stores statistics(in easiest case just counters) for each method,loop. Using these statistics, Compilers decide what to compile and how to optimize compiled code (Remove unused memory allocations, unloop statements and so on). I don't want to go deep in this topic because I am mainly focusing in non CPU related tasks , but if you are interested in JIT internals then you have to watch this excellent video by Douglas Hawkins.

Okay, now we know that Java is great in terms of CPU intensive tasks then why there are so many articles about big companies moving to Node? I am not the first one to tell you this, but Nodejs is faster because it doesn't require a lot of user level threads(the ones that you create by using new Thread(runnable)) in order to provide a high performance backend server. You probably heard that Node js is a single threaded and it uses the technique called Event Loop in order to handle all incoming requests. If you are a Java developer then you probably think "What the hell, one thread?". Yes, this question bothered me for a long time .As I said before , I am a Java developer.I got used to thinking about concurrency in terms of Thread Pools. And when someone tells me that a backend server could have only one thread and still perform better than thread pools I just laughed at them. But really, one thread ? How one thread can handle all the traffic. Let's go deeper and understand what an Event Loop is.

Event Loop

Let's look at this picture that I sincerely borrowed from Internet

Event loop

We have incoming requests and one thread that handles them, if a task is heavy(takes a lot of time) then we assign it to some sort of thread pool and create a callback ,otherwise , the event loop will be blocked, and we won't be able to handle new connections. When the task is finished , the thread pool puts a result of this task back to the event loop , then the event loop will eventually execute this callback and return the result to client . Please check out this wonderful talk byPhilip Roberts .

Now the tricky question, Ok event loop, but can't we use it in Java ? We can indeed ,moreover Servlet containers such as Tomcat use Event Loop by default. What? Yes, the default connector for Tomcat is based on Java nio which has its own event loop under the hood. Ok , if even Tomcat can do it then what is the purpose of Node? Let's now switch to Tomcat(I chose Tomcat but other servlet containers have the same principles)

Tomcat

How Tomcat NIO Connector works. First of all we can specify the amount of event loop threads that will handle incoming requests (According to my experience, one thread will be enough). After that, when connection with the client has been established, the event loop puts a task into a queue which is monitored by a thread pool of worker threads.Thread pool takes a task from the queue and assigns a worker thread to it. All application logic (Methods inside your Controllers in Spring) is executed using these threads. The problem here is that application logic usually needs to do some IO tasks such as calling cache service, sending requests to database, making Http calls to other services and so on. In most cases, these requests will block worker thread. When a worker thread is waiting for the Database to respond it doesn't do anything , as a result, if all worker threads are waiting on IO, then Tomcat can't process new tasks from the event loop. The only solution for Servlet containers prior to Servlet 3.1 specification was to increase amount of worker threads, but again , most of them won't do anything but wait for IO (BTW I want to write another small blog about Async servlets, because I have seen how developers , including me, blindly move all application logic from worker threads to async servlets without noticing that it doesn't solve problem of blocking).

Ok, finally the question that I asked myself so many times, how in the hell nodejs uses threads without blocking them, and I want to brag myself, I found the answer.

Kernel is the only source of truth

You see, NodeJs and Java are just languages, they don't know how to write data to disk , or how to read it.They don't even know how to create new threads. They don't, but Kernel does. What is Kernel ? First of all, I will explain how it works in Linux because I have no clue what Mac and Windows do(honestly, I am just not interested). Kernel is a heart of Linux. It's a layer between user processes and hardware. User processes are not allowed to work with Hardware. Then how can we read/write from a file? Fairly simple, Kernel provides a public API called system calls. User processes use these "system calls" and Kernel will do what it is supposed to do. When we write a program(unless writing your own OS) using any programming language, we create a user process ,consequently, our program will use system calls as well. How to check it? Let's write a small Java program that reads a file called input.txt.This file contains only two words Hello world

                            
  public class Main {
    public static void main(String[] args){
        Files.readAllBytes(Paths.get("input.txt"));
     }
}
                        
                        

In order to see system calls we can use cli program called strace . Here is an example(I use Java 11 so no need to compile java file) strace java Main.java. The output will be huge but we are interested in the last lines. One of them will look like this

                    
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0p>\0\0\0\0\0\0"..., 832) = 832
                

Now we know that InputStream uses a system method called read ,however read takes a file descriptor(in my case it was number 3) created by openat , let's check the manual for openat man openat. You will notice that openat creates a blocking file descriptor by default.It means that user thread that uses file descriptor created by openat will have to wait for a response from the kernel, in other words, thread will be blocked. Every subclass of Java's InoutStream and OutputStream uses a blocking system calls like this one.

File Descriptors

Remember when I told you about event loop handling incoming requests ? What it really does is it creates a file descriptor for each socket connection . In Unix systems everything is a file, including sockets. OS creates a unique number called file descriptor and uses it as a file identifier(system calls such as read and write take file descriptor in order to understand from what file to read or write to). Technically , there is no limit on amount of file descriptors, you can open as much as 10K file descriptors in parallel , the only obstacle is RAM, because each file descriptor is an internal structure that occupies some memory (Now you can understand how nginx is able to handle 10K parallel requests, it uses file descriptors).

Epoll

The way Tomcat's Nio Connector works is approximately the same as Event loop from Nodejs. Then why Node is faster.The tricky part is , NodeJs doesn't use blocking calls. Of course, you can do it , but it's considered as antipattern. Worker threads of Nodejs don't have to wait for IO and therefore use CPU more efficiently than blocked threads from servlet container.In order to implement an event loop ,nodejs uses a C++ library called libuv, this library is a cross platform, but in Linux it uses a system call known as epoll. Again , you can check the manual page related to epoll but here I will give you a brief explanation on how it works.
Epoll is a system call that allows us to monitor sockets for a new data. When we use epoll, our program doesn't have to wait until new data will appear.As soon as data from sockets is ready, Kernel will return a list of file descriptors associated with these sockets. Here is a brief example on C


  //create epoll
  int epoll_fd = epoll_create1(0);
  //event loop
  while(running)
  {
    printf("\nPolling for input...\n");
    //ask kernel which file descriptors have a new data
    //The last argument is the amount of milliseconds to wait
    event_count = epoll_wait(epoll_fd, events, MAX_EVENTS, 30000);
    printf("%d ready events\n", event_count);
    for(i = 0; i < event_count; i++)
    {
           //work with available data
			...
    }
  }
                

Event loop asks kernel if any sockets have a new data, if so then we can get these data using file descriptor number , if no sockets have new data, then kernel returns 0 and main thread doesn't have to wait .Also we have a for loop that works with data, in this loop we can assign heavy tasks to some sort of thread pool because we don't want to block the main thread. When a task is completed we can send a response to a socket using a file descriptor(remember, it's just an id). Under the hood , all event loop based servers use these system calls in order to provide non blocking behavior. Just for the comparison with Java's program that reads a file, let's write the same program using Node.js and execute it using strace

                     
fs = require('fs')
fs.readFile('read.txt', 'utf8', function (err,data) {
    console.log(data)
});
                     
                 

Again the output from strace is huge but as in C code we are looking for the following system calls

  • epoll_create that creates an unblocking file descriptor. In my case it looks like this epoll_create1(EPOLL_CLOEXEC) = 13 as you can see File descriptor 13 was created
  • epoll_wait to wait for events from kernel.epoll_wait(13, [{EPOLLIN, {u32=16, u64=16}}], 1024, 8100) = 1 .The last argument is the time to wait for a response.

Java NIO and Netty

Starting from Java 5, we have a new package named NIO. Java Nio gives us an API to use non blocking system calls . So, we have a NIO package, could we improve the servlet problem described above where most worker threads are waiting on blocking system calls ? The answer is, it depends. Java is an old language and as I said before, if you use InputStream class , then your threads will be blocked and I assure you ,you do, the majority of Java libraries use this old class from Java 1. Moreover, JDBC API uses blocking sockets, so all interactions with databases from Java will be blocking. There are some projects aimed at rewriting JDBC in order to use non blocking sockets. The one that I contribute to and most interested in is r2dbc. With this library we can send non blocking requests to Database and get a response in the form of Reactive interfaces(Mono or Flux).
However, I want to mention that Java ecosystem is huge , and there are a lot of frameworks that don't implement servlet specification. My favorite one is Vert.x. It resembles a Nodejs cause its based on exactly the same algorithms but it uses Java Runtime(in reality , you can use Vert.x with JavaScript, if you are interested check the docs). It's creator ,Tim Fox wanted to bring a powerful , Reactive framework to Java world and I assure you he did indeed.

Some words about project Reactor

One of the newest trends in Java backend development is a project called Reactive Spring and Project Reactor. Now, as you know the main disadvantage of servlets and blocking streams, it's easy to understand what this project tries to solve. It provides a functional API to write a non blocking backend (of course there are more things such as back pressure, but the core idea is that code must be non blocking).

Conclusion

I hope that my explanation was clear enough. I spend an enormous amount of time in order to understand how it really works and I want to thank Eli Bendersky for his wonderful blog with detailed explanation on concurrent servers and I want to recommend a wonderful book called The Linux Programming Interface . If you are interested in Kernel and system calls it has, then you will find all answers in this book.