SomethingSomething - 1 month ago 12x
C Question

# C: What is the best and fastest way to concatenate strings

I currently concatenate strings in c using the

`strcat()`
function from
`string.h`
library.

I thought about it, and I got to a conclusion that it should be very expensive function, as before it starts to concatenate, it has to iterate over the char array until it finds the
`'\0'`
char.

For example, if I concatenate the string
`"horses"`
1000 times using
`strcat()`
, I'll have to pay
`(1 + 2 + 3 + ... + 1000) * strlen("horses") = (1000*1001)/2 * 6 = 3003000`

I thought about the non-standard way, of maintaining an integer with the string length, and then sending to
`strcat()`
the pointer to the end of the string:

``````strcat(dest + dest_len, "string");
``````

In this case, I'll pay only
`1000 * strlen("horses") = 1000 * 6 = 6000`
.

`6000`
is much lower than
`3003000`
, so it can be very critical for performance if you make a lot of such concatenations.

Is there some more standard way to do it, looks better than my solution?

Joel Spolsky, in his Back to Basics article, describes the problem of inefficient string concatenation with `strcat` as the Shlemiel the painter's algorithm (read the article, it's quite good). As an example of inefficient code, he gives this example, which runs in O(n2) time:

``````char bigString[1000];     /* I never know how much to allocate... */
bigString[0] = '\0';
strcat(bigString,"John, ");
strcat(bigString,"Paul, ");
strcat(bigString,"George, ");
strcat(bigString,"Joel ");
``````

It's not really a problem to walk over the first string the first time; since we've already got to walk over the second string, the runtime of one `strcat` is linear in the length of the result. Multiple `strcat`s is problematic though, because we walk over the previously concatenated results again and again. He provides this alternative:

How do we fix this? A few smart C programmers implemented their own `mystrcat` as follows:

``````char* mystrcat( char* dest, char* src )
{
while (*dest) dest++;
while (*dest++ = *src++);
return --dest;
}
``````

What have we done here? At very little extra cost we're returning a pointer to the end of the new, longer string. That way the code that calls this function can decide to append further without rescanning the string:

``````char bigString[1000];     /* I never know how much to allocate... */
char *p = bigString;
bigString[0] = '\0';
p = mystrcat(p,"John, ");
p = mystrcat(p,"Paul, ");
p = mystrcat(p,"George, ");
p = mystrcat(p,"Joel ");
``````

This is, of course, linear in performance, not n-squared, so it doesn't suffer from degradation when you have a lot of stuff to concatenate.

Of course, this is what you can do if you want to use standard C strings. The alternative that you're describing of caching the length of the string and using a special concatenation function (e.g., calling `strcat` with slightly different arguments) is sort of a variation on Pascal strings, which Joel also mentioned:

The designers of Pascal were aware of this problem and "fixed" it by storing a byte count in the first byte of the string. These are called Pascal Strings. They can contain zeros and are not null terminated. Because a byte can only store numbers between 0 and 255, Pascal strings are limited to 255 bytes in length, but because they are not null terminated they occupy the same amount of memory as ASCIZ strings. The great thing about Pascal strings is that you never have to have a loop just to figure out the length of your string. Finding the length of a string in Pascal is one assembly instruction instead of a whole loop. It is monumentally faster.

For a long time, if you wanted to put a Pascal string literal in your C code, you had to write:

``````char* str = "\006Hello!";
``````

Yep, you had to count the bytes by hand, yourself, and hardcode it into the first byte of your string. Lazy programmers would do this, and have slow programs:

``````char* str = "*Hello!";
str[0] = strlen(str) - 1;
``````