c++ - Can two programs deserialize a file and share memory using mmap?

I have a read-only file that is 1GB in size and contains serialized data. I have 2 duplicate processes that want to deserialize the file into C++ objects and use them in my program. With mmap, it seems that I can map the file and prevent duplicate memory usage from loading the serialized file into memory. Is there a way that process 1 can deserialize the file, creating the objects in memory, then process 2 reuse the deserialized objects? Therefore, only allocating memory once, rather than twice.

I tried using mmap then deserialize, but seems that it duplicates memory as the deserialization allocates memory to construct the deserialized objects.

Share Improve this question asked Nov 16, 2024 at 3:12 doper3 1

4 Shared memory exists on every platform. You can use the APIs offered by the specific platform/OS you are working on or use a library like Boost to use a portable API. (boost./doc/libs/1_86_0/doc/html/interprocess/…). Using non-trivial C++ objects or pointers in shared memory is however usually a non-trivial problem for multiple reasons relating to memory synchronization (i.e. need to mutices and/or atomics) and relating to virtual address space mapping differences in the processes (requiring e.g. offset pointers instead of native ones). – user17732522 Commented Nov 16, 2024 at 3:22
2 Instead of two processes, how about two threads? One thread creating memory objects (from [ultimate] calls to new/malloc) would immediately be shared by the other thread. – Craig Estey Commented Nov 16, 2024 at 3:52
Maybe to explain the issue a bit simpler: The same memory mapping is usually positioned on two different addresses in two different processes. Therefore you cannot store pointers in that memory. That can be done, e.g. by only using relative offsets and simple arrays as data structures, but it's not trivial – Homer512 Commented Nov 16, 2024 at 8:35
An alternative that can work, at least on Unix: Let one process deserialize the data, then fork the process to spawn the second. It will inherit all of the process's memory state as a copy-on-write mapping. So as long as the second process does not change the data, no additional memory will be consumed – Homer512 Commented Nov 16, 2024 at 8:37
1 Normally serialized and deserialized data have different layout and representation. Serialized format is permanent storage friendly, while deserialized data is CPU friendly. If the format conversion is identity or serialized data is represented as fixed-sized records, then the first process is redundant; otherwise memory allocation for deserialized data is inevitable. – Red.Wave Commented Nov 16, 2024 at 9:07

| Show 2 more comments

1 Answer 1

Sorted by: Reset to default 0

It duplicates memory because you want to use different programs, that is, different processes. And processes are isolated.

One possible solution would be having only one process handling the file object itself but different processes using this file data. For example, you could create some sort of service application to open this file for exclusive access. It can use some kind of inter-process messaging and enter the message loop to serve other applications and provide the required pieces of information contained in the file, on requests from other processes. Generally, such a design is quite easy to implement, but the complexity also depends on the semantic complexity of data and possible requests.

The particular forms of a service and messaging mechanisms depend on the platform and other factors, but they typically do exist everywhere.

Another approach is using shared memory IPC. It looks like what user17732522 meant in the first comment to your question. The comment is credited.

A different approach is migration from multiple processes to multiple threads within a single process, but you already have this suggestion, please see comments to your question — credit to Craig Estey. With multi-threading, you still can use the same kind of service implemented in one thread, but you can also use the shared data directly, say, read-only.

I want to reiterate that my suggestion is not for the ultimate decision but is just an example. This is up to you what to choose.

科技改变生活-雨落星辰 - 所有的伟大,都源于一个勇敢的开始

c++ - Can two programs deserialize a file and share memory using mmap? - Stack Overflow

1 Answer 1

与本文相关的文章

评论列表(0)