|
 |
 |
 |
 |
Thursday, 28 August 2008 |
Latest Resources  As shareware publishers we feel we should contribute out knowledge to the wider community of authors.
So we are developing a section of our web site dedicated to linking to useful resources for the development and publishing of software.
This section is largely incomplete at this point in time, but continue to check back as it is developing rapidly.
Suggestions to webmaster@marsjupiter.com are welcome.
Shareware sites for software submission
We hope you will find this shareware site submission list useful when you need to get your shareware known on the web.
Other Resource Categories
Browse these lists of articles for useful information.
|
Introduction
The Standard Template Library bitset class is a useful class suitable for most bitset operations, but if you are dealing with lots of large bitsets, then it may be that the stl bitset class is missing a trick. In many applications using bitsets, you will find that a typical bitset is either sparsely populated with bits, or conversely almost fully set. In these cases a large bit set can use a lot of memory for what is actually being represented.
To overcome this we have developed the CMJBitset class as a plug in replacement for bitset, The CMJBitset class depending on compilation options may take as little as 7 bytes to represent a bitset of any size, assuming all the bits are set or reset. In comparision a 1024 bitset will take 128 bytes. In essence the CMJBitset operates by run length encoding a bitset if the bitset is either almost all set/reset, but otherwise uses the stl bitset class. We will refer to the different encodings as normal, sparse and full.
Using the code
General UsageTo use the CMJBitset class should be simple, replace bitset in your code with CMJBitset and include the CMJBitset.h header file. Since CMJBitset presents the same interface as bitset then this should be enough to get things working.
Loading/Saving the bitsetThe class would not give a fraction of its potential benefit, if you were unable to save and load the bitset to and from file. Hence member functions: bool load(FILE *)
and:
bool save(FILE *)
are provided, but perhaps more to the point, all members of the class are public, enabling you to load and save at will.
Compilation OptionsIf you examine MJBitset.h, then there are a number of compilation options.
For debuggingThis is used by the example project to test the CMJBitset class. I would also recommend that when trying out the CMJBitset class you do some testing with this as well. As of the current date, a "full coverage" test has not been carried out on the class and hence bugs are a definite possibility. //
// When DEBUG_CMMJBITSET is defined, then a shadow bitset
// is kept and validated against operations.
// This is obviously very slow and uses loads of memory and
// defeats any purpose in using this class
// and hence should never be used outside of a debug session.
//
//
//#define DEBUG_CMJBITSET
For maximum bitset size you need to deal withThe default is for a maximum of 0xffff. //
// data representation type for bitsets <= 256
// CMJBITSET_USE_CHAR should be defined
//
//#define CMJBITSET_USE_CHAR
#define CMJBITSET_USE_SHORT
//#defineCMJBITSET_USE_LONG
The example project
The example project is a simple win32 console application that attempts to regress the CMJBitset by creating a succession of bitmaps with random population levels and performing a reasonably comprehensive set of operations on the bitsets.
You may easily alter the regression test to compare performance between CMJBitset and bitset.
Points of Interest
Performance
The performance of the CMJBitset class, is quite hard to quantify. For bitsets that it does not encode as they are neither full or sparse, the CMJBitset class is clearly several times slower than bitset.
If you run the example regression test and make it use sparse bitsets then the times will still be a little faster than bitset. But the comparitive speed will dramatically increase, once the operations cease to be completely in memory either due to paging or loading/saving to files.
We have tested the class in the WhereWasI product and see dramatic reductions in memory usages and very significant speed increases.
It is also notable that some operations with CMJBitset are dramatically faster than bitset, flip() for example simply needs to invert the full and sparse representation type.
At the top of the source file, there are a number of compilation options which effect the way CMJBitset allocates memory, the key option is set by default and reserves space within the class for bitsets with up to 4 bits set/unset. This often avoids the overhead of time and memory of using malloc/free, and in our examples is a definite performance bonus. Your mileage may vary and you may want to tune performance by changing some of these values.
History
- June 14th 2004 - 1.00 Released.
- June 18th 2004 - 1.01 Released. - Contains a number of optimisations
- June 22nd 2004 - 1.02 Released - Fixes compilation bug with VC7.0. Fixes operator precedence bug, that would effect normal bitset performance
|
|
Super bitset class - development log
Coming to this space soon, one of a series of C++ classes that we want to contribute to the wider community.
CMJBitset
is a class that acts as a drop in replacement for the standard template library bitset class.
The difference is that if the bits being represented are sparse e.g. mostly 0's, the the class not only takes up a lot less memory, but will be faster for many operations.
The class is still being developed and tested, we hope benchmarks and a link to the source will be available here by the end of June 2004.
6th June update
The class is now apparently working without issues in a version of a marsjupiter product built using the sparse bitset. The initital results show a 10% drop in performance but a large drop in memory consumption. However the replacement of bitsets in the product was global regardless of whether the bitsets were sparse or not.
Benchmarking performance is we think best done in the context of a product as opposed a contrived test, as the sparse bitset has a totally different performance to a standard bitset, for example a shift left operation on a standard bit set will be massively slower than the same operation of the sparse bitset. conversely setting and resetting bits is much slower in the sparse bitset.
8th June updateWe have improved the class to treat a fully set bitset which would otherwise take a disproportionate amount of memory as a special case. Even when dealing with generally sparse bitsets it is often the case that you start off with a fully set bitset, and this addition raised our benchmarks to be just about exactly on a par with stl bitset. Of course though with the extra benefit of far less memory consumption.
One notable issue that has become apparent is that the expression:
bitset1 &= ~bitset2;
is very inefficient in the sparse class, we have added the syntax:
bitset1.reset(bitset2);
to resolve this, but this breaks away a little from the idea of a drop in replacement for bitset.
10th June Update.
The class has moved on a lot, from being a sparse class it now automatically moves between 3 representations of a bitmap (full, normal, sparse). This should prevent performance degradation and generally make the class faster. No figures are available as yet, since a large amount of testing needs doing. With the new class many operations now have 9 permutations to deal with! this is not likely to be bug free yet a while.
12th June update
The CMJBitset class has now been regressed within our test applications, not surprisingly quite a few bugs were found and resolved.
The speed benchmarks were superficially very impressive with the test application running 35x faster! whilst using a lot less memory.
Looking behind the figures, we see that the massive increase in speed is fundermentally because the CMJBitset makes loading and saving a bitset to file so much faster. Note that the application already used a compressed format for bitset storage, but now that format needs practically no processing for for reading or writing from file.
So what is the underlying speed of the class? at this stage it is hard to tell, all operations (including ~ which was previously a problem) that in the CMJBitset do not need to visit individual bits are in most circumstances much quicker than the bitset class.
thus:
a &= b; // faster in CMJBitset
a.set(position); // faster in bitset.
But CMJBitset also adds significant overheads, it must check the representation type to decide on how to perform each operation and it must sometimes switch the representation, it must also allocate and reallocate memory.
Checking when to change representation relies on the count() function, which is very fast in CMJBitset but very slow in bitset. We avoid making this check all the time if we are using bitset for the representation type as it would slow things significantly. The parameter for how often to check is a pre-processor constant, and tweaking it effects performance considerably.
This brings us on to an aside about C++ classes in general, we are told to think of them as black boxes, and we protect ourselves from the internals of the classes with private etc. But whilst this has its benefits it does not lead to well optimised code. A good example of this is CString, it allocates short strings into 64 byte buffers as a a strategy to avoid memory fragmentation, a laudable tactic in the average program, but most programs will not be "average" and would benefit from larger buffers, smaller buffers, or precisely allocated buffers.
To conclude for the day though with a summary of where the CMJBitset class stands at the moment. It has been regressed though two (similar) applications and also some specific bitset testing code, and has no known bugs currently. Due to the complexity of the class though, this is far from saying I am confident the class is bug free. |
|
|
The following code is for a non standard allocator, which like all non-standard allocators is geared towards some peculiarity of the calling code.
Of course this somewhat breaks the "purity" of the code, it ceases to be a transplantable blackbox capable of slotting into any project!
In this case we present an allocator for a class that expects many instances of itself to be allocated, but for the total number of allocations to periodically get back to zero. The code can be compiled in two version, one of which maintains a "free list" so that if the class gets frequently deallocated as well as allocated, the memory usage will not expand excessively. Alternatively you can compile without a free list, in which case when you delete an instance, that memory will not be resused again.
In our tests of using this in a real life product, we gained a 13.5% increase in processing speed without the free list, and 10% with the free list. Given that the code in question has quite a lot of work to do, aside from allocation, this is a pretty impressive increase.
Fragment of the DictionaryLetterArray.h file:
#ifdef NSA_DICTIONARY_LETTER_ARRAY
//
// Code for Non Standard allocator
//
#ifdef NSA_DICTIONARY_LETTER_ARRAY_FREE_LIST
#define NSA_DICTIONARY_LETTER_ARRAY_POOL_SIZE 50 // optimal pool size
static CDictionaryLetterArray *m_free_list; // keep a free list
#else
#define NSA_DICTIONARY_LETTER_ARRAY_POOL_SIZE 150 // optimal pool size
static int m_current_pool_pos; // simply allocate from current pool
static CDictionaryLetterArray *m_current_pool; // current pool
#endif
static void NewPool();
static void FreePools();
static int m_nsa_alloc_count;> // keep track of instances allocated so we can free pools when zero
static vectorm_pools; // remember pool so we can free them
static CComAutoCriticalSection m_thread_alloc_section; // use a critical section for thread safety
#ifdef DEBUG
void * operator new (size_t size/*,const char *file, int line*/)
#else
void * operator new (size_t size)
#endif
{
m_thread_alloc_section.Lock();
CDictionaryLetterArray *pItem;
#ifdef NSA_DICTIONARY_LETTER_ARRAY_FREE_LIST
if (m_free_list)
{
// we use a linked list of free items
// using the memory allocated to the class. Thus it is
// essential that the class is at least void * in length.
pItem = m_free_list;
m_free_list = *((CDictionaryLetterArray **)m_free_list);
}
else
{
NewPool();
pItem = m_free_list;
m_free_list = *((CDictionaryLetterArray **)m_free_list);
}
#else
// if no free list simply return next in pool, creating new pools as needed
if (m_current_pool_pos == 0)
{
NewPool();
}
pItem = (m_current_pool + m_current_pool_pos);
m_current_pool_pos = (m_current_pool_pos +1) % NSA_DICTIONARY_LETTER_ARRAY_POOL_SIZE;
#endif
m_thread_alloc_section.Unlock();
m_nsa_alloc_count++;
return (void *)pItem;
}
void operator delete (void * mem)
{
CDictionaryLetterArray *pItem = (CDictionaryLetterArray *)mem;
m_thread_alloc_section.Lock();
#ifdef NSA_DICTIONARY_LETTER_ARRAY_FREE_LIST
void **pFree = (void **)pItem;
*pFree = m_free_list;
m_free_list = pItem;
#endif
m_nsa_alloc_count --;
if (!m_nsa_alloc_count)
{
FreePools();
}
m_thread_alloc_section.Unlock();
}
#endif
A fragment of the DictionaryLetterArray.cpp file
#ifdef NSA_DICTIONARY_LETTER_ARRAY
CComAutoCriticalSection CDictionaryLetterArray::m_thread_alloc_section;
int CDictionaryLetterArray::m_nsa_alloc_count=0;
vector CDictionaryLetterArray::m_pools;
#ifndef NSA_DICTIONARY_LETTER_ARRAY_FREE_LIST
int CDictionaryLetterArray::m_current_pool_pos = 0;
CDictionaryLetterArray * CDictionaryLetterArray::m_current_pool = NULL;
#else
CDictionaryLetterArray * CDictionaryLetterArray::m_free_list = NULL;
#endif
void CDictionaryLetterArray::FreePools()
{
for (int i = 0; i < m_pools.size();i++)
{
void *pPool = m_pools[i];
free(pPool);
}
m_pools.clear();
#ifndef NSA_DICTIONARY_LETTER_ARRAY_FREE_LIST
m_current_pool_pos = 0;
#else
m_free_list = NULL;
#endif
}
void CDictionaryLetterArray::NewPool()
{
CDictionaryLetterArray *pPool = (CDictionaryLetterArray *) malloc(NSA_DICTIONARY_LETTER_ARRAY_POOL_SIZE * sizeof(CDictionaryLetterArray));
m_pools.push_back((void *)pPool);
#ifdef NSA_DICTIONARY_LETTER_ARRAY_FREE_LIST
for (int i = 0; i < NSA_DICTIONARY_LETTER_ARRAY_POOL_SIZE; i++)
{
CDictionaryLetterArray *pItem = (CDictionaryLetterArray *)(pPool + i);
void **pFree = (void** )pItem;
*pFree = (void *)m_free_list;
m_free_list = pItem;
}
_ASSERT(m_free_list);
#else
m_current_pool = pPool;
#endif
}
#endif
|
|
|
This article in under construction
Introduction
Using replacements for the standard memory allocators, can seem a bit scary. But memory allocation is at the heart of programming and can easily be the most fundermental issue effecting program operation.
There are several issues involved in memory allocation:
- Speed
- Fragmentation
- Overhead
- Thread Safety
- Class specific allocators
Memory Allocation SpeedPerhaps the main point for most people, memory allocators must retain some form of map of what memory they have allocated and what has been freed so that they can return unallocated space on a request to malloc. There are any number of strategies for doing this, and we provide some references to custom allocators at the end of this article. Most memory allocators will use a system of "bins" each bin being used to store blocks in a certain size range.
Memory Allocation Fragmentation
When memory is allocated and freed as an application is running, spaces inevitably develop in the memory map, where memory has been freed and not reallocated. It is tempting to assume that the space wasted is marginal, but in many real life examples, the wastage can be massive in some cases exceeding 100%. In programs consuming large amounts of memory this will have a horrendus impact upon performance.
Memory Allocation Overhead
Each time you allocate memory, you incur an overhead over and above the amount you actually allocate. At a minimum this is likely to be 8 bytes, 4 bytes being a pointer to the next allocated block, and 4 bytes for the amount of memory allocated. Though having said that many allocators will have special routines for small block allocation which may reduce this overhead.
If you consider the case where you have a great number of small allocations to make, then this overhead can be very large. This is one of the scenarios when use of a custom allocator should be considered.
Memory Allocation Thread Safety
If you are writing single threaded code, then you do not need to worry about choosing an allocator which is thread safe. If you are using threads then you can still easily port custom allocators, by using critical sections to ensure thread safety.
Class Specific Allocators
This in many ways is the crux of the matter. In common with most issues of optimisation, where it does tend to be the case that one small section of the code takes up most of the time. It is likely that most memory issues can be sorted with just a few custom allocators. A custom allocator for a class has several benefits.
- Speed, as it only has to allocate one size it can be very fast.
- Low overhead, as it does not need to keep track of allocation size.
- Potential for garbage collection. You can program so as to start allocation pools, that can be released in a single call, assuming that no class members allocate memory themselves.
For examples we present some non standard allocations being used in MarsJupiter projects.
- Our simple non standard allocator
For a non standard allocator to offer real benefits, you need to consider just what special features your usage of a class gives you to work with? The simple lack of the need to store the size of each allocation whilst also tightly packing the allocations has real benefits when allocating lots of small blocks, but this benefit is less proportionally with bigger blocks. For example for one of the marsjupiter classes, we know that the pattern of allocation is to start with no allocation, followed by loads of allocations with only the odd free, and then to clear all the allocated space. This removes the need for a "free list" and hence gives real benefit.
Replacement Malloc
Given the notable inefficiencies of the standard malloc it may be desirable to replace malloc with you own code, if your compiling under unix, then a simple internet search will provide you with plenty of possibilities. Under windows with Visual C++ and MFC you will struggle to find drop in replacements, if you do know of any, please let us know.
We have ported quick fit malloc (see references) and made if thread safe and are currently experimenting with it, initial results on some simple benchmarks showed a 10% increase in speed. A 10% speed up is quite remarkable, and we look forward to reporting more real life results, since benchmark tests unless cleverly designed to make life difficult for malloc can all to easily simply "fit a niche" with a particular implementation making it look better than it really is.
References
A memory allocator - Doug Lee
Linux Scalability malloc performance report.
Are Mallocs Free of fragmentation?
Quick Fit Malloc
Malloc/Free implementations list - Ben Zorn
Mathtools.net - Memory Management Links |
|
|
C++ is a wonderful language. I could hardly be a bigger fan of inheritance, polymorphism, template and many of the other features of C++.
But all things come at a price and one of those prices tends to be program bloat.
Consider:
class Three{ long m_test; public: Three(void); virtual ~Three(void);
};
class One{ public: Three m_three; One(void); virtual ~One(void);
};
class Two :public One{ public: Two(void); virtual ~Two(void); };
class Three { long m_test; public: Three(void); virtual ~Three(void); };
Any instance of class Two will be 12 bytes long, e.g. 4 bytes for the "long" in class Three and 4 bytes each for the virtual function tables for class Three and One and Two.
This is not intended to be a course on C++ but let us remind ourselves of the purpose of the virtual destructor.
If we use polymorphism and have: Two *pTwo = new Two(); One *pOne = pTwo; delete pOne; then without the virtual destructor, the destructor code for class Two will never be called. With resulting memory leaks if class Two allocates any memory.
But here we have two "ifs", if we are using polymorphism and if class Two does allocate any memory. Meanwhile our class uses 8 bytes more memory than it needs to for every instance we allocate.
It is notable, that Microsoft Visual C++ 6.0 makes virtual destructors the default, even for generic classes, In Microsoft Visual C++ 7.00 they have to be selected as an option when you use the class wizard to create a new class. |
|
| | << Start < Prev 1 2 3 4 5 6 Next > End >>
| | Results 1 - 9 of 47 |
|
|
 |
 |
 |
 |
|
|
|