Project Loom. How are virtual threads different ?
I want to start this year with a blog post about Project Loom , a new JEP from Oracle that will bring virtual threads(or coroutines) to the JVM. That's huge and people have been waiting for it for ages. Right now, Java ecosystem provides few frameworks to work with sockets in non-blocking way such as Project Reactor. I have nothing against of Reactor project, Confluence made a good developer experience for regular devs to use non-blocking API, however there is a big drawback in Reactor that I see. Once you start writing a backend in a reactive style, you can't stop. All blocking calls have to be rewritten using non-blocking API which is a completely new coding style with its own hidden surprises. Of course, all blocking calls can be delegated to dedicated Thread Poll but in this case we are going back to standard thread per request model . But this blog post is not about Reactor, it's about Loom. I spend quite a lot of time trying to understand what Oracle means by saying "Virtual Threads" and how it can solve the problems we face using Servlet API(Spring MVC). And after endless time of reading mailing lists and trying Loom in my local machine I think I finally got it. Before I start, all opinions here are my own and can(probably are) be wrong, always double check by your own and if you see any mistakes please let me know so this blog can become better. And so Let's start
The current state of blocking IO
Let's talk about a regular Java backend. You want to write some service that solves a particular business problem. You generate a new Spring boot project that uses Tomcat under the hood, then you create a database schema and use Hibernate as an ORM solution but instead of working with EntityManager directly you delegate the heavy work to Spring Data, and lastly there are probably some other networking services involved so your backend is making http requests to generate email, make a payment and so on. That's a typical Java based backend, and we all are familiar with how it works.
Under the hood Tomcat uses non-blocking event loop based
thread that accepts new connections from users, then it
delegates them into a fixed size thread poll that executes a
business logic. Business logic almost always contains an IO
interaction with remote services such as
cache(Redis,Memcached),database(Postgres),
and third party services(Stripe). Most of the time ,the
networking communication
logic is written in the blocking way whether it's RestTemplate
to send an http request to the service
or JdbcTemplate
to fetch some data from the
database. Blocking calls will block a
thread from Tomcat's thread pool and the only way to
increase the throughput of the backend is to increase the
amount of threads in the poll.
Increasing amount of threads means more context
switching(which is actually fast enough in latests Linux
kernels which use
a fair scheduler) and more memory usage because each
thread needs a memory for a stacktrace(this value varies but
for my Ubuntu 20.04 it's 8Mb per thread you can check stack
trace size by running ulimit -s
in your shell).
As you can see,
increasing amount of threads isn't scalable, so to solve a
throughput problem we can use asynchronous style to work
with sockets.
The blocking IO in java is represented by two main classes
namely InputStream
and
OutputStream
and their subclasses. Let's say we
want to read some data from database using Jdbc, the
stacktrace of the program will end up by calling the
read
method of InputStream
, here
is the interface for this method.
Remember, read
doesn't specify how it should be
implemented , it just tells you that it will return data
from the stream. It didn't satisfy my curiosity so I used
strace
(cli application that keeps track of the
system calls your program is making) to see what my Linux
machine is actually
doing when read
is called. So let's say we have
a simple program that reads file's content
readAllBytes
uses
InputStream#read()
under the hood. Next step is
to compile this java file by using javac
Main.java
, lastly run this command in your
shell
strace -o output.txt -f ./bin/java Main
.
All system calls that Main.java uses were saved into the
output.txt
file. If you open the file, output will be huge because
strace also tracked all syscalls that JVM uses upon
starting, but I want you to focus on those few lines
Let's go over the syscalls line by line
- First we call
openat
to open the file , the method returns a file descriptor(number 4 in this case), a number that can be used to find this particular file in the filesystem - Next we read the content of the file by given file descriptor
- Lastly we release file descriptor back to OS, because
amount of file descriptors is limited, you must give
them back once you are done working with a file(or
socket, that's basically why we have
try/finally
block in java, to efficiently release file descriptors or other resources we don't need anymore)
The main syscall here is read
that accepts 3
parameters
- fd - file descriptor
- buf - byte array where the content will be saved
- count - amount of bytes we want to read
Reading the manual page for read
call , I
didn't
find any references to blocking an underlying user Thread,
so
I started googling and found this article from LWN.net. The
quote from this blog states
So a call to read() on a normal file descriptor can always block; most of the time this blocking causes no difficulties, but it can be problematic for programs that need to always be responsive
Now we know the problem, InputStream
is using
blocking system calls to read the data by file descriptor.
How is it relevant to our backend service ? Well, when you
make a networking call to the database or to another
service, Linux will use socket
syscall that
creates a file descriptor for the socket, the
read
method then will read data from the
incoming traffic by using this file descriptor in the same
blocking manner. I think you got an idea, to make
non-blocking reads we need to use another syscall
Meet epoll
I already talked about epoll
in my previous blog
where I compared Java
with Node js but I didn't show how it can be used
within the Java code. Let's see an example by writing a
small program that downloads a video from youtube in
non-blocking way using async-http-client.
Here is the code
In essence, we are making a http call that returns a CompletableFuture
(java's
way to define a Promise), then we block the main thread by
calling join
. Let's see what strace can tell us
about this code(you can find complete code sample here.
Here is the output I got.
A lot of new syscalls , let's walk through each of them
epoll_create1
- creates an instance of epoll, in this case kernel create an epoll with id 20eventfd2
- creates a file descriptor for event notification, in this case the fd is 23epoll_ctl
- tell the kernel file descriptors you’re interested in updates about, we are interested in fd 23epoll_wait
- wait for updates about the list of file descriptors you’re interested in
Just a reminder, Linux kernel is written in C so the C equivalent of this program would look like this
As you can see, the C version has single thread that fetches
available file descriptors in the infinite loop(event loop)
without blocking an underlying
thread. Kernel does all the heavy work for us. async-http-client
did a great job
of giving us
an abstraction over epoll, but still , most developers are
not familiar with this programming style, moreover, writing
your code in sequential manner is a way easier than working
with callbacks and Features
. Here comes the
first problem that Project Loom tries to solve, namely,
allow Java
developers to write asynchronous code in sequential way.
Legacy software can't take advantage of Async API
Ok, let's say you know how to write efficient code using
asynchronous style, and you are willing to start your new
startup with something like Project Reactor , great but keep
in mind, Java is old and most Java based backends were
written in a blocking way. Servlet API that uses thread
per-request model, Hibernate that uses JDBC under the hood
while
JDBC is using blocking InputStream
to read the
data from a wire. Wouldn't it be nice if JVM runtime could
detect all blocking calls and replace them with async epoll
? This is exactly what Golang did by using goroutines. Here
is an example of http request written in Go
It's similar to http request you would do with RestTemplate
,
the only difference is , RestTemplate
uses
read
syscall while go uses epoll.
Go doesn't force you to write your code with async callbacks
in mind, the runtime is smart enough to detect(when
possible) syscalls that
block your thread and replace them with async equivalents.
Meet Project Loom
Project loom is an attempt from Oracle to make JVM runtime smarter by introducing what is called Virtual Threads. I personally found this expression a little vague. In reality , with Project Loom, JVM runtime will replace all blocking calls with non-blocking equivalents the same way as Golang does. You don't have to replace your Servlet API with Project Reactor to use async sockets if jvm runtime can do it for you. Here is a small example(the source code is available here) that shows you how Project Loom is going to work.
Next we need to compile it using Loom build of open-jdk(You
can
download it from here) with
experimental features enabled
flag javac --enable-preview --release 19
LoomTest.java
, finally let's run it in background
mode and save thread dumps
Some things to notice
- JVM didn't use
read
method of InputStream - JVM replaced an actual socket implementation with an async one from Java-Nio package
EPoll.wait
method was used which just usesepoll_wait
syscall
This is what Project Loom is all about. Instead of rewriting
Java programs using a new programming paradigm, developers
could just use virtual threads and JVM runtime will replace
all blocking calls with non-blocking ones. There are also
some changes in JDK standard library code, for example, all
children of Input/Output streams were rewritten to
eliminate synchronized
keyword because it
generates an assembly which doesn't allow kernel to use
epoll(I am not sure why, haven't worked with assembly yet).
What about Servlet Containers?
Most Java servers are written using ServerSocket
class that simply listens for new connections in given port.
Here is the small example how you can implement a http
server using thread per request model
User creates a connection, server submits this task into
dedicated pool. As you can see we are using InputStream
to read the data from the user. The read method is blocking
as I said before so Thread from a pool can't do anything
else while waiting for user to finish the input.But what if
we change the pool into a virtual one(here is the source
code) ? var pool =
Executors.newVirtualThreadPool(10)
In this case, JVM runtime will switch to non-blocking sockets without you changing any lines of code(except for the Executor implementation). If you run this server and load test it with a bunch of telnet clients, there will be only one thread that handles the IO part. Here is the view from JVisualVM after opening 3 telnet sessions to the server
Read and write poller are epoll
related
threads, all logic is running within Main
thread without using any additional threads
Finale
I hope it was an interesting read for you, and now you have a better understanding of what virtual threads are actually meant to be and how it works internally in syscalls levels. Cheers !