I have a read-only file that is 1GB in size and contains serialized data. I have 2 duplicate processes that want to deserialize the file into C++ objects and use them in my program. With mmap, it seems that I can map the file and prevent duplicate memory usage from loading the serialized file into memory. Is there a way that process 1 can deserialize the file, creating the objects in memory, then process 2 reuse the deserialized objects? Therefore, only allocating memory once, rather than twice.
I tried using mmap then deserialize, but seems that it duplicates memory as the deserialization allocates memory to construct the deserialized objects.
I have a read-only file that is 1GB in size and contains serialized data. I have 2 duplicate processes that want to deserialize the file into C++ objects and use them in my program. With mmap, it seems that I can map the file and prevent duplicate memory usage from loading the serialized file into memory. Is there a way that process 1 can deserialize the file, creating the objects in memory, then process 2 reuse the deserialized objects? Therefore, only allocating memory once, rather than twice.
I tried using mmap then deserialize, but seems that it duplicates memory as the deserialization allocates memory to construct the deserialized objects.
Share Improve this question asked Nov 16, 2024 at 3:12 doper3doper3 1 7 | Show 2 more comments1 Answer
Reset to default 0It duplicates memory because you want to use different programs, that is, different processes. And processes are isolated.
One possible solution would be having only one process handling the file object itself but different processes using this file data. For example, you could create some sort of service application to open this file for exclusive access. It can use some kind of inter-process messaging and enter the message loop to serve other applications and provide the required pieces of information contained in the file, on requests from other processes. Generally, such a design is quite easy to implement, but the complexity also depends on the semantic complexity of data and possible requests.
The particular forms of a service and messaging mechanisms depend on the platform and other factors, but they typically do exist everywhere.
Another approach is using shared memory IPC. It looks like what user17732522 meant in the first comment to your question. The comment is credited.
A different approach is migration from multiple processes to multiple threads within a single process, but you already have this suggestion, please see comments to your question — credit to Craig Estey. With multi-threading, you still can use the same kind of service implemented in one thread, but you can also use the shared data directly, say, read-only.
I want to reiterate that my suggestion is not for the ultimate decision but is just an example. This is up to you what to choose.
new/malloc
) would immediately be shared by the other thread. – Craig Estey Commented Nov 16, 2024 at 3:52fork
the process to spawn the second. It will inherit all of the process's memory state as a copy-on-write mapping. So as long as the second process does not change the data, no additional memory will be consumed – Homer512 Commented Nov 16, 2024 at 8:37