D Parsin D Parsin - 2 years ago 134
Scala Question

akka jvm threads vs os threads when performing io

I've searched the site a bit for help understanding this, but haven't found anything super clear, so I thought I'd post my use case and see if anybody could shed some light.

I have a question about the scaling of jvm threads vs os threads when used in akka for io operations. From the akka site:

Akka supports dispatchers for both event-driven lightweight threads, allowing creation of millions threads on a single workstation, and thread-based Actors, where each dispatcher is bound to a dedicated OS thread.

The event-based Actors currently consume ~600 bytes per Actor which means that you can create more than 6.5 million Actors on 4 G RAM.

In this context, can you all help me understand how that matters on a workstation with only 1 processor (for simplicity). So, for my example use case, I want to take a list of say 1000 'Users' and then go query a database (or several) for various information about each user. So if I were to dispatch each of these 'get' tasks to an actor, and that actor is going to do IO, wouldn't that actor block based on the os thread limit for the workstation?

How does the akka actor model give me lift in a scenario like this? I know that I am probably missing something as I am not wildly knowledgeable on the interworkings of vm threads vs os threads, so if one of the smart folks here could spell it out for me, that would be great.

If I use Futures, don't I need to use await() or get() to block and wait for the reply?

In my use case, regardless of actors, would it end up just 'feeling' like I'm making 1000 sequential database requests?

If code snips are useful in helping me understand this, Java would be preferred as I am still coming up to speed on scala syntax - but a nice clear textual explanation of how these millions of threads can interoperate on a single processor machine while doing database IO would be fine too.

Answer Source

It is really hard to figure out what you are actually asking here, but here are some pointers:

  • If you are running on a modern JVM, there is typically a one-to-one relationship between Java threads and OS threads. (IIRC, Solaris allows you to do this differently ... but that's the exception.)

  • The amount of real parallelism you will get using threads, or anything built on top of threads is limited by the number of processors / cores that are available to the application. Beyond that, you will find that not all threads are actually executing at any given instant.

  • If you have 1000 Actors all trying to access the database "at the same time", then most of them will actually be waiting on the database itself, or on the thread scheduler. Whether this amounts to making 1000 sequential requests (i.e. strict serialization) will depend on the database and the queries / updates that the actors are doing.

The bottom line is that a computer system has hard limits on the resources available for doing stuff; e.g. number of processors, speed of processors, memory bandwidth, disc access times, network bandwidth, etc. You can design an application to be smart about the way it uses available resources, but you can't get it to use more resources than there actually are.

On reading the text that you quoted, it seems to me that it is talking about two different kinds of actors:

Thread-based actors have a 1 to 1 relationship with threads. There's no way you could have millions of this kind of actor in 4Gb memory.

Event-based actors work differently. Instead of having a thread at all times, they would mostly be sitting in a queue waiting for an event to happen. When that happened, an event processing thread would grab the actor from the queue and execute the "action" associated with the event. When the action finished, the thread moves onto another actor / event pair.

The quoted text is saying that the memory overhead of an event-based actor is ~600 bytes. They don't include the event thread ... because the event thread is shared by multiple actors.

Now I'm not an expert on Scala / Actors, but it is pretty obvious that there are certain things that you should avoid when using event-based actors. For instance, you should probably avoid talking directly to an external database because that is liable to block the event processing thread.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download