These Are the Helpful Tools I Always Use to Service My Mental Motor in Case It Backfires

My last therapy session was in April 2000, after sixteen years of intermittent bouts of major depressive disorder (MDD) coupled with anxiety attacks. The grand finale was my admission to a…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




My JavaScript is Faster than Your Rust

GC your dealloc and let your threads down, let’s talk performance.

I go over some of the performance tradeoffs that are made with different approaches, but the moral of the story is that performance is complicated and very use-case dependent. One of the primary tradeoff considerations is CPU vs memory, but the memory side of that equation can get very complicated.

One of the most enjoyable (at least from my point of view) parts of being a software architect is mentoring developers and help expose them to new concepts and larger implications of technical decisions. It’s also a fun to foster a learning environment by occasionally letting a brash dev fall on their face, a bit of the the ‘pay it forward’ from when I was a young, brash developer.

A perfect example is when a green dev challenges your recommendations (the reality of being an architect, you always make the wrong choice in other’s eyes) and go on to bet that their approach is the best approach. I know far from everything, but I have been around long enough to see a sucker. How could I resist? I’ll take that bet. And then years later I’ll write a post about it.

I honestly don’t remember the specifics (it’s been a few years), but I do recall that I recommended using Node.js primarily based on the knowledge set of the existing team, available libraries and other technical debt. A pretty junior dev wanted to show of their fancy new bachelors of computer science and show off their ‘mad’ skills. Maybe they knew I only minored in computer science and assumed I was simply unaware of how computers actually worked (to be fair, after ~20 years, I have come to believe they are just magic).

The claim was something along the lines of the standard ‘C++ is faster than JavaScript’, countered by my (stereotypically architect) response of: it depends. Or probably more specifically, ‘optimized C++ will perform better than optimized JavaScript’ as there is inescapable overhead to running JavaScript (well, you could probably compile it down to a static program and get similar performance if you really, really tried). Needless to say, I like a good challenge.

The ‘surprise’ was that the JavaScript solution was a bit faster than the C++ program and (more importantly from an architectural point of view) had the benefit that it was fully maintainable by the existing team. Let the choir hold their tongues and scratch their heads. TBH, I wasn’t 100% sure that it would win, but based on this specific use case’s likely dependence on dynamically sized memory objects and the developer’s inexperience, I took an educated guess.

If you can’t guess why, don’t worry. In my experience, most devs wouldn’t know why either. The result flys in the face of the common rule of thumb that ‘compiled’ languages are faster than ‘interpreted’, and ‘static’ programs are faster than ‘VM’ programs. But this is just a rule of thumb.

‘Optimized’ is the key word in my above retort, as a naive C++ program can quickly go off the rails. On the other hand, Node.js (leveraging the C++/C based V8 & libuv libraries) has made a lot of strides with optimizing dumb JS to run fast, meaning there are cases where naive JS can beat naive C++. But it’s obviously more complicated than that.

Most developers should be familiar with the ideas of stacks and heaps, but many don’t get deeper than surface level characteristics like a stack is linear and a heap is a pile with pointers (or something like that). They also probably missed that these are just concepts (and there are other approaches) with multiple implementations. Low-level hardware typically doesn’t know what the hell a ‘heap’ is as software defines how memory is managed*, and the choices made can have massive impacts on the performance characteristics of the final program.

*There is a whole rabbit hole that you can (and possibly should) climb down. Kernels can get complicated and modern hardware is far from dumb and often can include a number of special purpose optimizations that may leverage high-level memory layouts in their optimizations. This can mean that software can (or be forced to) delegate to memory management features provided by the hardware. And this doesn’t even start to cover virtualization…

Sure, the Node.js solution takes longer to start as it has to load and run the script through its JIT compiler, but once it’s loaded it has a secret edge. It’s garbage collected.

In the C++ program, on the other hand, the app routinely created dynamically sized objects in the heap and then deleted them. This meant that the program’s allocator had to allocate and deallocate memory in the heap over and over again. This is generally not a fast operation, and depends heavily on what algorithm is used in the allocator. In many cases, dealloc is particularly slow and sanitized allocs aren’t the cheapest either.

For the Node.js program, the cheat occurs because the program runs once and exits. Node.js runs the script, allocates all the memory needed but actual removals are postponed for the garbage collector to take care of (preferably during idle time). Now garbage collection isn’t inherently better or worse than other memory management strategies (tradeoffs, tradeoffs, tradeoffs…), but in the case of this particular program it proved beneficial as it never actually ran. We threw a bunch of objects into memory, then just dumped the whole lot all at once when we left.

This does come at a cost, the Node.js process uses significantly more memory than the C++ program. It’s the classic tradeoff of ‘less cpu = more memory vs less memory = more cpu’, but it was a good trade in order to win a bet.

And the bet only worked because the developer chose a naive strategy and implemented it correctly. A quick way to win would be to add a memory leak, purposely keeping all allocations in memory. It would likely still use less memory than Node.js and be significantly faster. Or you could use things like stack allocated buffers to up performance even more, and be ‘production-ready’.

This also brings up an issue with benchmarks, often they use a single-metric like ops/s. This JS vs C++ story is a perfect example of why knowing the total cost of performance is important before making choices. In software architecture you are concerned with the ‘total cost of ownership’ for all the choices made.

Rust is one of my go to languages nowadays. It’s got a lot of great modern features, is fast and has a great memory model that leads to generally safe code. It has drawbacks for sure, compile time is still an issue and it’s got some weird semantics here and there, but in general I highly recommend it. You can have a lot of control over how memory is managed in Rust, but the ‘stack’ memory follows an ownership model which creates its trademark safety.

One of the projects I’m currently working on is a FaaS (Function-as-a-Service) host written in Rust that executes WASM (WebAssembly) functions. It’s designed to securely execute isolated functions very quickly, minimizing the overhead of using FaaS. And it is pretty fast, able to get 90k clean requests per second per core. Better yet, it can do that with a total reference memory footprint of ~20MB.

What does this have to do with the Node.js & C++? Well I use Node.js as my benchmark for ‘reasonable’ performance (Go is used as the ‘dream’ target, it’s hard to compare to a language designed for web services while adding the overhead of FaaS), and early versions of the program weren’t promising (even though they used less than 10% of the memory of Node.js). While it’s common to focus on ‘getting something working’ before optimizing, it’s not a great feeling to put in a ton of work using a ‘fast’ language only to get beat by novice JavaScript.

The bottleneck was pretty clear from early on, however. It was the memory management. Each guest function was allocated an array of memory, but there was a lot of overhead between allocating within the function and also copying data to and from the function’s memory and the host’s. Because of the dynamic data being thrown around, the allocator was being hammered from all directions. The solution: cheat (sort of).

Fundamentally a heap is just a some memory that an allocator manages the mapping for. The program requests N units of memory and the allocator will find it in its available memory pool (or request the host give it more memory), store that the units are in use and then return the location pointer of that memory. When the program is done with that memory, it tells the allocator and the allocator then updates its mapping to know those units are now available. Simple, right?

The issues start to arise while allocating a bunch of different sized units of memory with different lifetimes, you are going to end up with a lot of fragmentation that amplifies the cost of allocating new memory. This is where you start to see that performance penalty kick in as it’s basically its own program just to figure out where to store things. Obviously there isn’t one solution to this problem, there are a lot of different allocation algorithms from buddies to slabs to blocks. Each approach has tradeoffs, meaning you can choose which one fits your use case best (or just chose the default one like most people do).

Now for the cheating, you don’t have to choose just one approach. And for FaaS, you can go lax on the per-run dealloc and just clear the whole heap after each run. And you can use different allocators for different parts of the function lifecycle, e.g. init vs run. This allows either a clean function (reset to the exact same memory state each run) or a stateful one (maintains state between runs) and have each case optimized for by using a different memory strategy.

For my FaaS project, we ended up building a dynamic allocator that chooses the allocation algorithm based on usage and that choice persists between runs. For ‘low-usage’ functions (seemingly the majority of functions thus far), a naive stack allocator is used that just maintains a single pointer to the next free slot. When dealloc is called, if the unit is the last one on the stack it will just roll back the pointer, otherwise it is a noop. When the function has completed, the pointer is set to 0 (like Node.js exiting before GC). If the function hits a certain number of failed deallocs and a certain usage threshold, a different allocation algorithm is then used for the remainder of calls. The result is very fast memory allocation in the majority of cases.

There is also another ‘heap’ used in the runtime, that is the host — function shared memory. It uses the same dynamic allocation strategy and allows for writing directly to the function’s memory, bypassing the copy step in early versions. This means the I/O gets directly copied from the kernel to the guest function, bypassing the host runtime and significantly improving throughput.

After optimizations, the Rust FaaS runtime ended up being >70% faster while using >90% less memory than our reference Node.js implementation. But the key is ‘after optimizations’, the initial implementation was slower. And it did require placing some restrictions on the WASM functions to work, though those are transparently applied during compilation with rare incompatibilities.

The major benefit of the Rust implementation is the low memory footprint, all the extra RAM can be used for things like caching and distributed in-memory stores. That means it can be even faster in production by reducing I/O overhead, which is probably a bigger win than the modest CPU performance gains.

We do have more optimizations slated, but they mostly involve changes to the host layer that have major security implications. They also aren’t directly related to memory management performance, but they do give plenty of fodder for the ‘Rust is faster than Node’ camp.

Not really sure. I guess a couple of points:

At the end of the day you’ve got to choose the best tech for your situation and it’s rarely a simple answer, but understanding the different characteristics of different stacks can surely help.

Cheers!

Add a comment

Related posts:

What we are teaching youth about sexual harassment.

She is in seventh grade. She gets in trouble a lot. She doesn’t get the best grades, causes a lot of drama, and can’t control her temper. This is the girl who came to me this week and told me a boy…

Why do I love the Russian People?

When I was in my early twenties, I worked for a printing company. This company printed labels of all kinds, from biohazard labels, to Christmas gift labels, Hell, we even printed some Pokemon labels…

Muslim dating events in surfers paradise queensland

Shortcuts. Speed Dating New Near Surfers Paradise Qld — If you are looking for someone you can have fun with then our service is the best place for you. Speed Dating New Near Surfers Paradise Qld…