Smaointe Smaointe - 10 days ago 4
C Question

C Bitset Implementation Using Structs Unexpected Output

The below code, which creates and uses a bitset, is from the following tutorial "Intro to Bit Vectors". I am rewriting this code to try to learn and understand more about C structs and pointers.

#include <stdio.h>
#include <stdlib.h>

#define WORDSIZE 32
#define BITS_WORDSIZE 5
#define MASK 0x1f

// Create a bitset
int initbv(int **bv, int val) {
*bv = calloc(val/WORDSIZE + 1, sizeof(int));
return *bv != NULL;
}

// Place int 'i' in the biset
void set(int bv[], int i) {
bv[i>>BITS_WORDSIZE] |= (1 << (i & MASK));
}

// Return true if integer 'i' is a member of the bitset
int member(int bv[], int i) {
int boolean = bv[i>>BITS_WORDSIZE] & (1 << (i & MASK));
return boolean;
}

int main() {
int *bv, i;

int s1[] = {32, 5, 0};
int s2[] = {32, 4, 5, 0};

initbv(&bv, 32);

// Fill bitset with s1
for(i = 0; s1[i]; i++) {
set(bv, s1[i]);
}

// Print intersection of bitset (s1) and s2
for(i = 0; s2[i]; i++) {
if(member(bv, s2[i])) {
printf("%d\n", s2[i]);
}
}

free(bv);
return 0;
}


Below, is what I have rewritten to make use of structs.

#include <stdio.h>
#include <stdlib.h>

#define WORDSIZE 32
#define BITS_WS 5
#define MASK 0x1f

struct bitset {
int *bv;
};

/* Create bitset that can hold 'size' items */
struct bitset * bitset_new(int size) {
struct bitset * set = malloc(sizeof(struct bitset));

set->bv = calloc(size/WORDSIZE + 1, sizeof(int));

return set;
}

/* Add an item to a bitset */
int bitset_add(struct bitset * this, int item) {
return this->bv[item>>BITS_WS] |= (1 << (item & MASK));
}

/* Check if an item is in the bitset */
int bitset_lookup(struct bitset * this, int item) {
int boolean = this->bv[item>>BITS_WS] & (1 << (item & MASK));
printf("%d\n", boolean);
return boolean;
}

int main() {
struct bitset * test = bitset_new(32);

int num = 5;
bitset_add(test, num);

printf("%d\n", bitset_lookup(test, num));

return 0;
}


What I have rewritten is not working as expected. To test the implementation, in main() I try a bitset_lookup and expect a return value of 0 or 1 but I get a value of 32. I believe it must be something to do with my use of pointers, although I cannot see what I am doing wrong.

Any help appreciated!

Answer

That is not a tutorial, it is misleading examples at best.

First of all, use an unsigned type. I recommend unsigned long (for various reasons, none of them critical). The <limits.h> header file defines constant CHAR_BIT, and the number of bits you can use in any unsigned integer type is always CHAR_BIT * sizeof (unsigned_type).

Second, you can make the bit map (or ordered bit set) dynamically resizable, by adding the size information to the structure.

The above boils down to

#include <stdlib.h>
#include <limits.h>

#define ULONG_BITS (CHAR_BIT * sizeof (unsigned long))

typedef struct {
    size_t         ulongs;
    unsigned long *ulong;
} bitset;

#define BITSET_INIT { 0, NULL }

void bitset_init(bitset *bset)
{
    if (bset) {
        bset->ulongs = 0;
        bset->ulong  = NULL;
    }
}

void bitset_free(bitset *bset)
{
    if (bset) {
        free(bset->ulong);
        bset->ulongs = 0;
        bset->ulong  = NULL;
    }
}

/* Returns: 0 if successfully set
           -1 if bs is NULL
           -2 if out of memory. */
int bitset_set(bitset *bset, const size_t bit)
{
    if (bset) {
        const size_t  i = bit / ULONG_BITS;

        /* Need to grow the bitset? */
        if (i >= bset->ulongs) {
            const size_t   ulongs = i + 1; /* Use better strategy! */
            unsigned long *ulong;
            size_t         n = bset->ulongs;

            ulong = realloc(bset->ulong, ulongs * sizeof bset->ulong[0]);
            if (!ulong)
                return -2;

            /* Update the structure to reflect the changes */
            bset->ulongs = ulongs;
            bset->ulong  = ulong;

            /* Clear the newly acquired part of the ulong array */
            while (n < ulongs)
                ulong[n++] = 0UL;
        }

        bset->ulong[i] |= 1UL << (bit % ULONG_BITS);

        return 0;
    } else
        return -1;
}

/* Returns: 0 if SET
            1 if UNSET
           -1 if outside the bitset */
int bitset_get(bitset *bset, const size_t bit)
{
    if (bset) {
        const size_t  i = bit / ULONG_BITS;

        if (i >= bset->ulongs)
            return -1;

        return !(bset->ulong[i] & (1UL << (bit % ULONG_BITS)));
    } else
        return -1;
}

In a bitset structure, the ulong is a dynamically allocated array of ulongs unsigned longs. Thus, it stores ulongs * ULONG_BITS bits.

BITSET_INIT is a preprocessor macro you can use to initialize an empty bitset. If you cannot or do not want to use it, you can use bitset_init() to initialize a bitset. The two are equivalent.

bitset_free() releases the dynamic memory allocated for the bitset. After the call, the bit set is gone, and the variable used is re-initialized. (Note that it is perfectly okay to call bitset_free() on an un-used but initialized bit set, because calling free(NULL) is perfectly safe and does nothing.)

Because the OS/kernel will automatically release all memory used by a program (except for certain types of shared memory segments), it is not necessary to call bitset_free() just before a program exits. But, if you use bit sets as part of some algorithm, it is obviously a good practice to release the memory no longer needed, so that the application can potentially run indefinitely without "leaking" (wasting) memory.

bitset_set() automatically grows the bit set when necessary, but only to as large as is needed. This is not necessarily a good reallocation policy: malloc()/realloc() etc. calls are relatively slow, and if you happen to call bitset_set() in increasing order (by increasing bit number), you end up calling realloc() for every ULONG_BITS. Instead, it is often a good idea to adjust the new size (ulongs) upwards -- the exact formula you use for this is your reallocation policy --, but suggesting a good policy would require practical testing with practical programs. The shown one works, and is quite robust, but may be a bit slow in some situations. (You'd need to use at least tens of thousands of bits, though.)

The bitset_get() function return value is funky, because I wanted the function to return a similar value for both "unset" and "outside the bit set", because the two are logically similar. (That is, I consider the bit set, a set of set bits; in which case it is logical to think of all bits outside the set as unset.)

A much more traditional definition is

int bitset_get(bitset *bset, const size_t bit)
{
    if (bset) {
        const size_t  i = bit / ULONG_BITS;

        if (i >= bset->ulongs)
            return 0;

        return !!(bset->ulong[i] & (1UL << (bit % ULONG_BITS)));
    } else
        return 0;
}

which returns 1 only for bits set, and 0 for bits outside the set.

Note the !!. It is just two not operators, nothing too strange; making it a not-not operator. !!x is 0 if x is zero, and 1 if x is nonzero.

(A single not operator, !x, yields 1 if x is zero, and 0 if x is nonzero. Applying not twice yields the not-not I explained above.)

To use the above, try e.g.

int main(void)
{
    bitset train = BITSET_INIT;

    printf("bitset_get(&train, 5) = %d\n", bitset_get(&train, 5));

    if (bitset_set(&train, 5)) {
        printf("Oops; we ran out of memory.\n");
        return EXIT_FAILURE;
    } else
        printf("Called bitset_set(&train, 5) successfully\n");

    printf("bitset_get(&train, 5) = %d\n");

    bitset_free(&train);

    return EXIT_SUCCESS;
}

Because we do not make any assumptions about the hardware or system we are running (unless I goofed somewhere; if you notice I did, let me know in the comments so I can fix my goof!), and only stuff that the C standard says we can rely on, this should work on anything you can compile the code with a standards-compliant compiler. Windows, Linux, BSDs, old Unix, macOS, and others.

With some changes, it can be made to work on microcontrollers, even. I'm not sure if all development libraries have realloc(); even malloc() might not be available. Aside from that, on things like 32-bit ARMs this should work as-is just fine; on 8-bit AVRs and such it would be a good idea to use unsigned char and CHAR_BIT, instead, since they tend to emulate the larger types rather than supporting them in hardware. (The code above would work, but be slower than necessary.)

Comments