11mo ago

Tool for instantiating a C++ template at runtime?

I'm working on a query engine, essentially a tool to scan/filter/annotate by lookups/group by/aggregate a large dataset, tens-of-terabytes range. The compute part seems to be a bottleneck for me (I'll be doing around 80-300 GB/s of reads, and yes, I will have hardware capable of providing that kind of throughput). My hypothesis is that by encoding query in form of template arguments I can make the compiler generate code optimized for a specific type of query (like, the filtering or aggregation keys). But I do not know what queries will users send, so I need a way to instantiate templates at runtime.

Sounds simple: for a new type of query invoke a compiler at runtime to build a dynamic library with a new instantiation, then dynload it and off we go. Some prior work is here, though I'm pretty sure any JIT compiler also can counts here. But there's enough technical details to worry about, and at the same time this idea isn't novel, so I wonder—are there any packaged solutions for this kind of approach?

7 comments

If the result sets are that large, you have enough time to generate a program and compile it, though as you say a more streamlined approach is to use a JIT library. Postgres uses LLVM for that, or depending on the workload, maybe you could even benefit from using a GPU. None of this is new.
- Fascinating, thanks for pointing that out https://www.postgresql.org/docs/current/jit-reason.html

build a dynamic library with a new instantiation, then dynload it and off we go
I haven't played around with the internals of C++ myself, but isn't that a one way thing? Wouldn't you need to be able to "unload" a query after you're done with it?
Personally I think child processes are the right approach for this. Launch a new process* for each query and it can (if you choose to go that route) dynamically load in compiled code. Exit when you're done, and the dynamically loaded code is gone. A side benefit of that is memory leaks are contained, since all memory you allocate is about to be removed anyway.
(*) On most operating systems launching new process is a bit slow, so you likely wouldn't want to do that when the query is requested. Instead you'd maintain a pool of processes that are running and ready to receive a query. That's how HTTP servers are often configured to run. The number of processes "pool" is generally limited by how much memory they need. Is it 1MB per process? 2GB?
Honestly, I wonder if you could just use an actual HTTP server for this? They can handle hundreds or even thousands of simultaneous requests. They can handle requests that complete in a fraction of a millisecond or ones that run for several hours. And they have good tools to catch/deal with code that segfaults, hits an endless loop, attempts to allocate terabytes of swap, etc. HTTP also has wonderful tools to load balance across multiple servers if you do need to scale to massive numbers of requests.
I would also seriously consider using JavaScript instead of C++. I hate JavaScript... but modern JavaScript JIT compilers are really special... they apply compiler optimisations AT RUNTIME. So a loop will compile to different machine code if it iterates three times vs three million times. The code is literally recompiled on the fly when the JIT compiler detects a tight loop. Same thing with a function that's called over and over again - it will be inlined if inlining is appropriate.
As flexible as your system sounds, I suspect runtime optimisations like that would provide real performance advantages. Well optimised C++ code is faster than JavaScript, but you're probably not always going to generate well optimised code.
JavaScript would also eliminate entire categories of security vulnerabilities. And any time you're generating code on the fly, you really need to be careful about those.
The good news is if you use a HTTP server like I suggested... then you can literally use any language you want, C++, JavaScript, Python, Rust... you can decide on a case by case basis.
- Personally I think child processes are the right approach for this. Launch a new process* for each query and it can (if you choose to go that route) dynamically load in compiled code. Exit when you’re done, and the dynamically loaded code is gone. A side benefit of that is memory leaks are contained, since all memory you allocate is about to be removed anyway.
  I'd probably be fine with hundreds or thousands of these hanging in memory. I suspect the generated code for a single query would be in hundreds of kilobytes, maybe a megabyte. But yeah, this is one of those technical details I'd worry about.
  Honestly, I wonder if you could just use an actual HTTP server for this? They can handle hundreds or even thousands of simultaneous requests. They can handle requests that complete in a fraction of a millisecond or ones that run for several hours. And they have good tools to catch/deal with code that segfaults, hits an endless loop, attempts to allocate terabytes of swap, etc. HTTP also has wonderful tools to load balance across multiple servers if you do need to scale to massive numbers of requests.
  Not sure how a HTTP server would solve the CPU bottleneck of scanning terabytes of data per query?

This doesn't specifically use the template metaprogramming interface for C++, but seems to do what you want regardless. https://github.com/jmmartinez/easy-just-in-time
I've never used the library myself though.
- I somehow didn't think a regular JIT solution might be applicable here, but it is. Thank you! There seems to be a number of projects doing JIT for C++, will look at them.

It sounds like what you are looking for is a form of an object request broker. Provide the name of a class as a string (or, if the set of desired objects is more constrained, an integer or enum or something similar) and then build an instance based on that key. Generally, all these objects typically inherit from some base class like Object so that the broker can return an Object* and the client can dynamic cast it down to the actual thing. I've used a pattern like this in the past that worked pretty well using macro magic to enable classes eligible to be instantiated through the broker (register the key and the class name with the broker). This was pre-C++03, so doubtless there are cleaner and more modern ways to implement such a thing these days.

7 comments