ValenceElectron ValenceElectron - 2 months ago 7
Linux Question

Questions about putenv() and setenv()

I have been thinking a little about environment variables and have a few questions/observations.


  • putenv(char *string);


    This call seems fatally flawed. Because it doesn't copy the passed string you can't call it with a local and there is no guarantee a heap allocated string won't be overwritten or accidentally deleted. Furthermore (though I haven't tested it), since one use of environment variables is to pass values to child's environment this seems useless if the child calls one of the
    exec*()
    functions. Am I wrong in that?

  • The Linux man page indicates that glibc 2.0-2.1.1 abandoned the above behavior and began copying the string but this led to a memory leak that was fixed in glibc 2.1.2. It's not clear to me what this memory leak was or how it was fixed.

  • setenv()
    copies the string but I don't know exactly how that works. Space for the environment is allocated when the process loads but it is fixed. Is there some (arbitrary?) convention at work here? For example, allocating more slots in the env string pointer array than currently used and moving the null terminating pointer down as needed? Is the memory for the new (copied) string allocated in the address space of the environment itself and if it is too big to fit you just get ENOMEM?

  • Considering the above issues, is there any reason to prefer
    putenv()
    over
    setenv()
    ?


Answer
  • [The] putenv(char *string); [...] call seems fatally flawed.

Yes, it is fatally flawed. It was preserved in POSIX (1988) because that was the prior art. The setenv() mechanism arrived later. Correction: The POSIX 1990 standard says in §B.4.6.1 "Additional functions putenv() and clearenv() were considered but rejected". The Single Unix Specification (SUS) version 2 from 1997 lists putenv() but not setenv() or unsetenv(). The next revision (2004) did define both setenv() and unsetenv() as well.

Because it doesn't copy the passed string you can't call it with a local and there is no guarantee a heap allocated string won't be overwritten or accidentally deleted.

You're correct that a local variable is almost invariably a bad choice to pass to putenv() — the exceptions are obscure to the point of almost not existing. If the string is allocated on the heap (with malloc() et al), you must ensure that your code does not modify it. If it does, it is modifying the environment at the same time.

Furthermore (though I haven't tested it), since one use of environment variables is to pass values to child's environment this seems useless if the child calls one of the exec*() functions. Am I wrong in that?

The exec*() functions make a copy of the environment and pass that to the executed process. There's no problem there.

The Linux man page indicates that glibc 2.0-2.1.1 abandoned the above behavior and began copying the string but this led to a memory leak that was fixed in glibc 2.1.2. It's not clear to me what this memory leak was or how it was fixed.

The memory leak arises because once you have called putenv() with a string, you cannot use that string again for any purpose because you can't tell whether it is still in use, though you could modify the value by overwriting it (with indeterminate results if you change the name to that of an environment variable found at another position in the environment). So, if you have allocated space, the classic putenv() leaks it if you change the variable again. When putenv() began to copy data, allocated variables became unreferenced because putenv() no longer kept a reference to the argument, but the user expected that the environment would be referencing it, so the memory was leaked. I'm not sure what the fix was — I would 3/4 expect it was to revert to the old behaviour.

setenv() copies the string but I don't know exactly how that works. Space for the environment is allocated when the process loads but it is fixed.

The original environment space is fixed; when you start modifying it, the rules change. Even with putenv(), the original environment is modified and could grow as a result of adding new variables, or as a result of changing existing variables to have longer values.

Is there some (arbitrary?) convention at work here? For example, allocating more slots in the env string pointer array than currently used and moving the null terminating pointer down as needed?

That is what the setenv() mechanism is likely to do. The (global) variable environ points to the start of the array of pointers to environment variables. If it points to one block of memory at one time and a different block at a different time, then the environment is switched, just like that.

Is the memory for the new (copied) string allocated in the address space of the environment itself and if it is too big to fit you just get ENOMEM?

Well, yes, you could get ENOMEM, but you'd have to be trying pretty hard. And if you grow the environment too large, you may be unable to exec other programs properly - either the environment will be truncated or the exec operation will fail.

Considering the above issues, is there any reason to prefer putenv() over setenv()?

  • Use setenv() in new code.
  • Update old code to use setenv(), but don't make it a top priority.
  • Do not use putenv() in new code.
Comments