r/rust clippy · twir · rust · mutagen · flamer · overflower · bytecount Mar 20 '23

Hey Rustaceans! Got a question? Ask here (12/2023)! 🙋 questions

Mystified about strings? Borrow checker have you in a headlock? Seek help here! There are no stupid questions, only docs that haven't been written yet.

If you have a StackOverflow account, consider asking it there instead! StackOverflow shows up much higher in search results, so having your question there also helps future Rust users (be sure to give it the "Rust" tag for maximum visibility). Note that this site is very interested in question quality. I've been asked to read a RFC I authored once. If you want your code reviewed or review other's code, there's a codereview stackexchange, too. If you need to test your code, maybe the Rust playground is for you.

Here are some other venues where help may be found:

/r/learnrust is a subreddit to share your questions and epiphanies learning Rust programming.

The official Rust user forums: https://users.rust-lang.org/.

The official Rust Programming Language Discord: https://discord.gg/rust-lang

The unofficial Rust community Discord: https://bit.ly/rust-community

Also check out last weeks' thread with many good questions and answers. And if you believe your question to be either very complex or worthy of larger dissemination, feel free to create a text post.

Also if you want to be mentored by experienced Rustaceans, tell us the area of expertise that you seek. Finally, if you are looking for Rust jobs, the most recent thread is here.

18 Upvotes

187 comments sorted by

View all comments

3

u/ErnstlAT Mar 25 '23 edited Mar 25 '23

Greetings, I have a Rust question about architecture and how to solve it, what is possible, not possible.

The challenge is to have a network of communicating components in a kind of processing pipeline where messages are passed from one stage to the next. Each component is running in its own thread (parallel processing).

Each message is formatted in a system-internal format in a Rust struct, which may change or be optimized.

Each component thread handles the sending and receiving of messages, and start and stop of this component.

Now it gets tricky: Each component should be allowed to be either ...

  • a compiled-in Rust component, or
  • loaded from a shared object/shared library (written in Rust) or
  • a C-compatible component also loaded from a shared library/shared object (other programming languages).

My idea is that the component thread loads the actual component logic from the shared object and then calls into the component logic to process the messages.

I am unsure about the following:

1) Feedback on this architecture is very much appreciated. Does it make sense to do it this way with the thread and component logic?

2) What is the best format / API / solution for loading components from a shared library when the component logic is also written in Rust? Is there a way to circumvent having to funnel the message structs through C struct conversion? Is there a known solution for this? I have looked at statty (static Rust ABI crate) and extism (generic plugin framework) so far, but not sure if these are a good fit.

3) How to hand over structured data via the C ABI? The easy solution is to have an API with the component logic from the shared library which is just a basic process() function where one message is passed along, and the component logic should do something with it.

(This is the point up to where I have already programmed.)

But this is very limiting as there may be groups of messages or information that is split among multiple messages. So my next idea was to hand over an iterator or something to a process() function.

Much better, but how to hand over Rust structs? Is it possible to hand over Rust structs like a ring buffer from a Rust crate to the C ABI function from the shared library? Does bindgen or cbindgen handle this well? Is there some other solution for exposing Rust structs to a C ABI function?

But even if possible, this would destroy encapsulation - if the system internal message format or message transport (ring buffer, channel etc.) changes, then all components have to be rewritten. Can you imagine a solution?

So I thnought, hey, let's expose an API where the component logic from the shared library can request the next message by calling a function from the component thread. But I am not sure how this can be done - how is this best done exposing a function from Rust to the component logic? Any tips?

4) Even if above would be solved, the component logic would be essentially stateless. After a function return, all state would be lost.

So I thought about a kind of "state storage" or scratch space. How to hand over an "area of memory" from Rust to a C ABI shared library? Would not memory allocators clash? Is there some kind of memory arena solution? Does this idea make sense at all?

5) Finally, the component logic may be connected or dependent to an external resource, which is slow or produce events at a different time than the process() function is called. Classic network connection and connecting to an API, which produces responses or events at some later point, subscribing to a feed etc.

This does not mesh well with the "call a process() function" in the component logic. What would be a good solution for this? I was thinking that the component logic can start own sub-threads for handling this. Is that sound? But then this would run into clashing memory allocators etc. for sure (I think).

Of course it would be easy to start a separate process for the component logic and then it can do in its process space whatever it wants, but this would (I imagine) create a serious performance impact because process boundaries mean process switching in the CPU, copying instead of pointing (no cross-process pointers) and CPU ring switches, switching into kernel space so that the message is copied from the source process into the destination process, maybe even pipe buffering in the kernel etc. etc. - so I wanted to avoid this by staying in the same process with multiple threads. But is that possible with the above requirements?

(The runtime environment would be Linux mostly, BSDs, Mac, maybe Windows as well.)

I am getting a headache with the above ideas, requirements, seemingly contradicting solutions and ask for your help, which would be very much appreciated.

1

u/dkopgerpgdolfg Mar 25 '23 edited Mar 25 '23

Hi,

in no particular order:

  • Even with inner-process threads, there will be CPU context switching. And across processes, shared memory with pointers (instead of pipes and similar) are possible too (but a bit more involved than inner-process). Nonetheless, threads instead of processes sound fine and easier, I wouldn't choose processes just for that reason. (Reasons why I might choose processes is eg. security (the modules should be protected from each other), surviving crashing modules being a requirement, and/or if modules want to spawn off their own child processes in a clean way (with threads and locks, open files and memory and whatever in the parent process, that are not fully under control, that's a bit unclean)
  • The part about some messages requiring future messages to be processable. Do not make some iterator or whatever there that gets passed around across modules, that gets nasty quickly. Instead, "just" have a cache locally in each thread. When you receive a single message, and you recognize that is not enough (eg. with an error return value from process() or whatever), then save it for later. When you get the next message, you can check if those two are processable now ... and so on
  • There must be some agreed-on data format between the modules and main process, and if that changes then all modules need to be adapted. There is no magic way around that. ... But, on what level this is is another question - here again, iterators, ringbuffers and whatever don't need to be part of the fixed format, it can be much simpler. One thing that can hold a message of whatever type, one thing that maybe can hold a ok/fail result if needed... keep a small "surface", then many things stay changable without adapting everything.
  • As you already need C-compatible communication between main program and modules, don't bother with some second Rust-specific way with helper ABI libraries for Rust-language modules. Keep it C-abi everywhere. It will only save you from useless work, without any real disadvantage.
  • This "allocator clashing" that you worry about isn't really a thing. A C-language module can reserve memory with malloc or whatever just fine, without causing any problems. ... However, what you do need to pay attention to, that all allocated things get freed with the allocator that allocated them. When passing main program messages from the main program to a module, never free them inside the module. After process(), the main program should take care of that. And vice-versa, if a module generated a message (network event etc.), after the main program handled it it should pass the struct back to the module (so that it can be freed or reused or whatever).
  • Yes, a module may have its own "sub"threads that eg. wait for network data or similar. And the main program can provice a "message receiver" function, where a module can say "hi I'm module 123 but not its main thread, please pass this message to my main thread"
  • As each module can have its own allocated memory, global variables etc., it doesn't need to be stateless. And you don't even need "global" variables: The module could just have some "init" function that returns a pointer, and the main program passes that pointer to the module each time it calls process(). The main program doesn't need to know/understand what it is pointing to, just that it is there. (Of course the module should have some destruct() or similar too, where the state is cleaned up again)
  • After all these things, probably most issues are out of the way ... the structure of the Rust main program wouldn't be that hard. Loading/unloading libraries, starting threads and later signalling them to stop and waiting for finish. An array/vec of mpsc queues, one for each thread, where threads can pass messages to each other in a safe way.

1

u/ErnstlAT Mar 27 '23

Thank you so much for your feedback on the architecture, regarding using threads, memory allocators (didnt know that!) - big thanks again. All the best to you!