Ravi Gupta Ravi Gupta - 1 year ago 51
C Question

Defining a string with no null terminating char(\0) at the end

What are various ways in C/C++ to define a string with no null terminating char(\0) at the end?

EDIT: I am interested in character arrays only and not in STL string.

Answer Source

Typically as another poster wrote:

char s[6] = {'s', 't', 'r', 'i', 'n', 'g'};

or if your current C charset is ASCII, which is usually true (not much EBCDIC around today)

char s[6] = {115, 116, 114, 105, 110, 107};

There is also a largely ignored way that works only in C (not C++)

char s[6] = "string";

If the array size is too small to hold the final 0 (but large enough to hold all the other characters of the constant string), the final zero won't be copied, but it's still valid C (but invalid C++).

Obviously you can also do it at run time:

char s[6];
s[0] = 's';
s[1] = 't';
s[2] = 'r';
s[3] = 'i';
s[4] = 'n';
s[5] = 'g';

or (same remark on ASCII charset as above)

char s[6];
s[0] = 115;
s[1] = 116;
s[2] = 114;
s[3] = 105;
s[4] = 110;
s[5] = 103;

Or using memcopy (or memmove, or bcopy but in this case there is no benefit to do that).

memcpy(c, "string", 6);

or strncpy

strncpy(c, "string", 6);

What should be understood is that there is no such thing as a string in C (in C++ there is strings objects, but that's completely another story). So called strings are just char arrays. And even the name char is misleading, it is no char but just a kind of numerical type. We could probably have called it byte instead, but in the old times there was strange hardware around using 9 bits registers or such and byte implies 8 bits.

As char will very often be used to store a character code, C designers thought of a simpler way than store a number in a char. You could put a letter between simple quotes and the compiler would understand it must store this character code in the char.

What I mean is (for example) that you don't have to do

char c = '\0';

To store a code 0 in a char, just do:

char c = 0;

As we very often have to work with a bunch of chars of variable length, C designers also choosed a convention for "strings". Just put a code 0 where the text should end. By the way there is a name for this kind of string representation "zero terminated string" and if you see the two letters sz at the beginning of a variable name it usually means that it's content is a zero terminated string.

"C sz strings" is not a type at all, just an array of chars as normal as, say, an array of int, but string manipulation functions (strcmp, strcpy, strcat, printf, and many many others) understand and use the 0 ending convention. That also means that if you have a char array that is not zero terminated, you shouldn't call any of these functions as it will likely do something wrong (or you must be extra carefull and use functions with a n letter in their name like strncpy).

The biggest problem with this convention is that there is many cases where it's inefficient. One typical exemple: you want to put something at the end of a 0 terminated string. If you had kept the size you could just jump at the end of string, with sz convention, you have to check it char by char. Other kind of problems occur when dealing with encoded unicode or such. But at the time C was created this convention was very simple and did perfectly the job.

Nowadays, the letters between double quotes like "string" are not plain char arrays as in the past, but const char *. That means that what the pointer points to is a constant that should not be modified (if you want to modify it you must first copy it), and that is a good thing because it helps to detect many programming errors at compile time.