How to Use MDX
MDX = malloc() + free() + Garbage Collection
Memory Pools
MDX manages memory in pools, represented by C++ objects of class MDXPool. The pools function entirely independently of each other -- they do not share any resources.
Pools are created with the static method MDXPool *MDXPool::createPool(MDXConfiguration *config). This method takes a single parameter, an MDXConfiguration structure containing configuration information for the pool. The normal C++ way of construction (new MDXPool()) won't work.
Pools are deleted with the (non-static) deletePool() method, which also automatically frees all memory still allocated from the pool. This should be the last method called on an MDXPool object. Afterwards the pool has completely ceased to exist, including freeing the memory for the MDXPool object itself.
Memory's allocated and freed using an MDXPool object's malloc() and free() methods. There are variants of these methods for allocating and freeing multiple blocks at once.
Flags
Each memory block allocated by MDX has a set of flags associated with it, the 2 most interesting being COLLECTIBLE and INTERCEPT_FREE. Generally the flags are set as a parameter to malloc() when the memory's allocated.
Garbage Collection
MDX supports basic mark and sweep garbage collection for blocks allocated with the COLLECTIBLE flag. Other blocks are not affected by garbage collection, and must still be explicitly released with MDX's free() method. Both collectible and non-collectible memory may be allocated from the same pool at the same time.
A pool tracks the total amount of memory allocated, and when it goes over a threshold it triggers garbage collection. Afterwards, it computes a new threshold by multiplying the amount of allocated memory left over post-collection by a factor set in the configuration structure passed to MDX when the pool was created. In fact, it's a bit more complicated than this (see the source code for details) but ths is the gist of what happens.
The MDX configuration structure includes a callback responsible for marking, which the program using MDX must supply. The marking callback must locate all the memory blocks allocated with the COLLECTIBLE flag that should not be freed in the subsequent sweep phase. It then marks the wanted blocks using MDX's mark() method, a variant of which allows for marking multiple blocks at once.
MDX also provides a makeCollectible() method, which turns a non-collectible block (which would otherwise have to have been explicitly freed with free()) into a collectible block. Currently this is a 1 way journey, there's no provision for turning collectible blocks back into freeable ones.
Garbage Collection & Thread Synchronization
Each MDX pool has a background housekeeping thread associated with it, which is responsible for actually doing the garbage collection. It's this thread that calls the marking callback, and does the subsequent sweeping.
|
The marking callback can be called at any moment, not necessarily only during a memory allocation routine. Of course, the marking callback can use mutexes and etc. to synchronize itself with other threads.
The classic memory allocation interface is to have malloc() return the address of the newly allocated block. But MDX's malloc() has a different approach. It takes the address of a pointer as a parameter, and writes the address of the newly allocated block back into that pointer. Furthermore, MDX writes back the address of the new block (making it visible to other threads, such as the marking thread) before releasing it's locks. This allows MDX to offer the following thread safety guarantee:
Only (unmarked) blocks that were already allocated and with their addresses visible to the marking thread at the start of the marking callback will be freed in the subsequent sweep phase. Collectible blocks allocated during the marking callback, but after it's started, will not be freed in the subsequent sweep phase, even if they weren't marked. This also applies to blocks allocated just before the marking callback starts, if their addresses were not yet visible to the making thread when the callback started. In short, in each mark and sweep cycle a block will not be freed by the sweeper, unless the making callback has first had a chance to prevent that freeing by marking the block.
The upshot of this is that the program must be ready to mark a newly allocated collectible block from the moment that malloc() writes back the new block's address to the caller's pointer. Note that this happens before malloc() returns, and that therefore it may be necessary to mark the new block before the malloc() allocating it returns.
For example, the pointer may be initialized to null and arranged to be scanned by the making callback from before the call to malloc(). The marking callback can then mark the pointer only if it's not null. However, in many situations this kind of approach isn't practical. The marking callback's allowed to call operating system synchronization prmatives. It may wait on a condition variable/event object/semaphore, hold a mutex, and so on. In many situations it will be advantageous to use these primatives to carefully synchronize the actions of marking callback with the actions of the rest of the program. The makeCollectible() method offers another approach.
makeCollectible()
For situations where being ready to mark a new collectible block from before its malloc() returns is not practical makeCollectible() provides another technique. The block(s) can be shielded from garbage collection and the need to be marked until the network of pointers the marking routine must traverse is ready. First allocate the block(s) without the COLLECTIBLE flag. Next setup the pointers for the marking routine. Finally, when all's ready for marking, make the block(s) collectible with the makeCollectible() method.
Note that makeCollectible()'s fast, and in the case of blocks smaller than the minLargeBlockSizeBytes configuration parameter requires no locking at all. Also, it's harmless to mark a non-collectible block. It does nothing, but it's not an error.
Thread Termination
MDX caches memory for each thread using it, so as to increase the chances that a thread can allocate or free without lock contention with other threads. MDX automatically notices new threads using it, and transparently creates thread caches for them as needed.
But the memory in the thread cache must be reclaimed when the thread terminates. Threads should call MDX's uncacheThread() method before termination to tell MDX to reclaim their caches.
|
On Linux this isn't strictly necessary, since MDX uses the TSD destructor to automatically notice that a thread has terminated, and reclaim its cache. However, on Windows the equivalent TLS facility doesn't provide notification of thread termination, so the use of uncacheThread() is required, unless the thread only exits when the process terminates.
It's not an error for a thread to call uncacheThread() more than once, or to call it on Linux. So it's OK to use uncacheThread() and program the same way for both systems.
It's also not an error for a thread to allocate memory after calling uncacheThread(). However, in this situation, MDX creates a new cache for the thread. So it's necessary to call uncacheThread() again to reclaim the new cache.
Finalization
MDX provides optional finalization via the INTERCEPT_FREE flag. The intercept free mechanism works the same for both collectible and non-collectible blocks.
The pool configuration structure includes an intercept free callback. When a block allocated with the INTERCEPT_FREE flag's freed, INSTEAD of freeing the block, MDX calls this callback, passing it the address of the block being freed. The callback may then finalize the block.
For a collectible block the pool's background thread calls the intercept free callback during the sweep phase of garbage collection. For a non-collectible block the intercept free callback is called in-line as part of MDX's free() method, and so is run by the thread that called free() in the first place.
Although the intercept free callback must not allocate memory, freeing memory is supported via the innerFree() method. The normal free() method must not be used inside an intercept free callback, and innerFree() must not be used except inside an intercept free callback.
|
MDX either frees the block, or calls the intercept free callback, but not both. The intercept free callback then has the choice of whether or not to actually free the block (with innerFree()). In the case of intercepting the freeing of a collectible block, where the callback wishes to free the block, it may pass the collectible block to innerFree().
This case, where the block being passed to innerFree() is the same block that's having it's freeing intercepted, is the only case where it's ok to pass a collectible block to innerFree(). It's never ok to pass a collectible block to free().
Observe that when innerFree()'s called on a block allocated with the INTERCEPT_FREE flag, it will result in the intercept free callback being called a 2nd time from inside of itself. For this reason many intercept free callbacks will have to be re-entrant.
Of course, to prevent an infinite recursion, when the block being passed to innerFree()'s the same block that's already having it's freeing intercepted, then innerFree() doesn't call the intercept free callback a 2nd time (even though the block does have the INTERCEPT_FREE flag).
How Freeing Memory in Finalization Can Save CPU Time
The marking routine's often felt to be the big CPU time consumer in garbage collection. Generally, it must traverse the vast network of pointers and arrays and structures of pointers that link together all of the program's currently in use objects. It's clear that when the number of in use objects is large, this traversal can take time.
The idea is to reduce the duration of the marking traversal by deferring some of its work to the time when the memory's freed. Since memory may be marked many times, but is freed only once, moving work from marking to freeing has the potential to save time.
For example, think of a collection which has a master object that manages a flock of attendant objects containing the contents of the collection. Imagine that the collection hides its attendant objects from its users, and is only accessed through methods belonging to the master object.
First, suppose that both the master object and all the attendant objects are collectible. In this case it's necessary for every marking cycle to traverse the collection's entire network of attendant objects in order to mark them.
Now, suppose instead, that only the master object's collectible, and that it manages it's attendant objects as (explicitly freed) non-collectibles. This makes marking the collection quick, because only the master object need be marked. Eventually, when the master object's no longer referenced by the remainder of the program, it will go unmarked, and be deleted by the sweeper. At this point the intercept free callback must traverse the collection's entire network of attendant objects in order to explicitly free them.
Generally, the traversal of the attendant objects can be done either on every marking cycle, or, once only, when the master object's deleted. So, there's a good chance that if the master object's likely to be in use for long enough to be marked more than once, then managing the attendant objects as non-collectibles will probably save CPU time.
Debugging Aides
The checkIntegrity() method may help find bugs which write to addresses outside of any allocated block. It checks for inconsistencies in a pool's internal structure. Setting the CHECK_INTEGRITY_EVERY_CALL flag in MDXConfiguration::flags causes MDX to automatically call checkIntegrity() after every malloc() and free().
The flags FILL_ON_MALLOC and FILL_ON_FREE may help find bugs involving the improper resuse of memory values written to blocks previuosly allocated in the same address range. Setting these flags in MDXConfiguration::flags causes MDX to automatically apply them to every malloc() and free().
Ther're 2 methods to obtain information about a pool. dumpPoolStatus() writes a text report on the pool's status, with various levels of verbosity available. getStatistics() retrieves some of the same information programmatically, by filling the values in a statistics structure.
Finally a helper method, dumpMemory(), prints a standard hex dump of memory.