mattstone mattstone - 1 month ago 8
C Question

Memory Leaks and Invalid Reads/Writes

Before I get torn apart for not looking at the plethora of Valgrind questions on here: I did. I spent a good while looking, and all I've found were people who didn't

malloc()
the correct number of bytes. If I am incorrect and this is a duplicate, I will gladly accept your reference.

In this program, I am using a tree to create a spell-checker. There are many things that need to be fixed in my code, but the memory leak is the only one I really need help figuring out. (A lot of this code is temporary just so it will compile so that I can fix the memory leak.)

The issue is that I am fairly sure I am allocating the correct amount of space for my nodes, and I think Valgrind confirms this, because I only have 2 not-freed blocks (out of 365,371 allocs).

Anyway, I will post the entirety of the code (in case anyone needs the full context), but the relevant parts I presume are the load function and the clear function, where I allocate and free the memory, respectively.

/**
* Implements a dictionary's functionality.
*/
#include <ctype.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include "dictionary.h"

// number of characters we are using (a-z and ')
#define LETTERS 27

// max guaranteed number of nonnegative char values that exist
#define CHARVALUES 128

// create node structure for trie
typedef struct node
{
struct node *children[LETTERS];
bool is_word;
}
node;

// create root node for trie
node *root;

// stores the size of our dictionary
unsigned int dict_size = 0;

/**
* Returns true if word is in dictionary else false.
*/
bool check(const char *word)
{
// keeps track of where we are; starts with root for each new word
node *current_node = root;

while (*word != '\0')
{

// indices: 'a' -> 0, ..., 'z' -> 25, '\' -> 26
int index = (tolower(*word) - 'a') % CHARVALUES;
if (index >= LETTERS - 1)
{
// by assumption, the char must be '\'' if not '\n' or a letter
index = LETTERS - 1;
}

// if the node we need to go to is NULL, the word is not here
if (current_node->children[index] == NULL)
{
return false;
}

// go to the next logical node, and look at the next letter of the word
current_node = current_node->children[index];
word++;
}
}
return current_node->is_word;
}

/**
* Loads dictionary into memory. Returns true if successful else false.
*/
bool load(const char *dictionary)
{

FILE *inptr = fopen(dictionary, "r");
if (inptr == NULL)
{
return false;
}

// allocate memory for the root node
root = malloc(sizeof(node));

// store first letter (by assumption, it must be a lowercase letter)
char letter = fgetc(inptr);

// stores indices corresponding to letters
int index = 0;

/**
* we can assume that there is at least one word; we will execute the loop
* and assign letter a new value at the end. at the end of each loop, due
* to the inside loop, letter will be a newline; we know the EOF in the
* dictionary follows a newline, so the loop will terminate appropriately
*/
do
{
// keeps track of where we are; starts with root for each new word
node *current_node = root;

// this loop will only execute if our character is a letter or '\''
while (letter != '\n')
{
// indices: 'a' -> 0, ..., 'z' -> 25, '\' -> 26
index = (letter - 'a') % CHARVALUES;
if (index >= LETTERS - 1)
{
// by assumption, the char must be '\'' if not '\n' or a letter
index = LETTERS - 1;
}

// allocate memory for a node if we have not done so already
if (current_node->children[index] == NULL)
{
current_node->children[index] = malloc(sizeof(node));

// if we cannot allocate the memory, unload and return false
if (current_node->children[index] == NULL)
{
unload();
return false;
}

}

// go to the appropriate node for the next letter in our word
current_node = current_node->children[index];

// get the next letter
letter = fgetc(inptr);
}

// after each linefeed, our current node represents a dictionary word
current_node->is_word = true;
dict_size++;

// get the next letter
letter = fgetc(inptr);
}
while (letter != EOF);

fclose(inptr);

// if we haven't returned false yet, then loading the trie must have worked
return true;
}

/**
* Returns number of words in dictionary if loaded else 0 if not yet loaded.
*/
unsigned int size(void)
{
return dict_size;
}

void clear(node *head)
{
for (int i = 0; i < LETTERS; i++)
{
if (head->children[i] != NULL)
{
clear(head->children[i]);
}
}
free(head);
}

/**
* Unloads dictionary from memory. Returns true if successful else false.
*/
bool unload(void)
{
clear(root);
return true;
}


The relevant valgrind output is the following:

==18981== HEAP SUMMARY:
==18981== in use at exit: 448 bytes in 2 blocks
==18981== total heap usage: 365,371 allocs, 365,369 frees, 81,843,792 bytes allocated
==18981==
==18981== 448 (224 direct, 224 indirect) bytes in 1 blocks are definitely lost in loss record 2 of 2
==18981== at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==18981== by 0x4011B0: load (dictionary.c:111)
==18981== by 0x4008CD: main (speller.c:40)
==18981==
==18981== LEAK SUMMARY:
==18981== definitely lost: 224 bytes in 1 blocks
==18981== indirectly lost: 224 bytes in 1 blocks
==18981== possibly lost: 0 bytes in 0 blocks
==18981== still reachable: 0 bytes in 0 blocks
==18981== suppressed: 0 bytes in 0 blocks


So, my interpretation of this output is that, in the following block of code:

if (current_node->children[index] == NULL)
{
current_node->children[index] = malloc(sizeof(node));

// if we cannot allocate the memory, unload and return false
if (current_node->children[index] == NULL)
{
unload();
return false;
}

}


the
malloc
statement (which is indeed line dictionary.c:111) is executed twice such that the allocated memory is never freed. (Is this correct?) Now, that leads me to think that the real problem lies with my clear function, i.e. that it is written poorly and does not clear every node of my trie.

However, I've stared at the code for hours and I literally cannot see anything wrong with it. (I'm sure a lot is; I'm just not too good at this.)

Any help with this would be greatly appreciated.

alk alk
Answer

For starters:

The code misses to initialise the member array children to all NULLs for each node malloc()ed freshly. malloc() does not perform any initialisation on the memory allocated. It just contains "garbage".

So this here

       if (current_node->children[index] == NULL)
       {

becomes a "random" decision, if not provoking undefined behaviour by itself already.

(BTW, the "Invalid Read/Writes" your question's title mentions, and which you do not show us, most likely also referred to exactly this very line of code above ...)

To fix this you could use something like this:

#include <stdlib.h> (/* for malloc() */
#include <errno.h> (/* for errno */


/* Allocate and initialises a new node. */
/* Returns 0 on success and -1 on failure. */

int node_create(node ** pn)
{
  int result = 0; /* Be optimistic. */

  if (NULL == pn)
  {
    errno = EINVAL;
    result = -1;
  }
  else
  {
    node * n = malloc(sizeof *n);
    if (NULL == n)
    {
      result = -1;
    }
    else
    {
      for (size_t i = 0; i < LETTERS; ++i)
      {
        n -> children[i] = NULL;
      }

      n -> is_word = 0;
      (*pn) = n;
    }
  }

  return result;
}

And use it like this:

  ...

  if (-1 == node_create(&root))
  {
    perror("node_create() failed");
    return false;
  }