Matthew Herbst Matthew Herbst - 7 months ago 9
Bash Question

Why can't environment variables with dashes be accessed in bash 4.1.2?

On a CentOS 5 host (with bash 3.2.32), we use Ruby (1.8.7) to

ENV['AWS_foo-bar_ACCESS_KEY'] = xxxxx


Then, using bash, we run a shell script that does:

BUCKET_NAME=$1
AWS_ACCESS_KEY_ID_VAR="AWS_${BUCKET_NAME}_ACCESS_KEY_ID"
AWS_ACCESS_KEY_ID="${!AWS_ACCESS_KEY_ID_VAR}"
export AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}


This works fine on CentOS 5.

However, on CentOS 6 (with bash 4.1.2), we get the error

-bash: export: `AWS_foo-bar_ACCESS_KEY_ID=xxxxx': not a valid identifier


It is our understanding that this fails because
-
is not allowed in the variable name. But why does this work on bash 3.2 and not bash 4.1?

Answer

The "why" is almost irrelevant: The POSIX standard makes it very clear that export is only required to support arguments which are valid names, and anything with a dash is not a valid name. Thus, no POSIX shell is required to support exporting or expanding variable names with dashes, via indirect expansion or otherwise.

It's worth noting that ShellShock -- a major security bug caused by sloppy handling of environment contents -- is fixed in the bash 4.1 present in the current CentOS 6 updates repo; increased rigor in an area which spawned security bugs should be no surprise.

The remainder of this answer will focus on demonstrating that the new behavior of bash 4.1 is explicitly allowed, or even required, by POSIX -- and thus that the prior behavior was an undefined implementation artifact.


To quote POSIX on environment variables:

These strings have the form name=value; names shall not contain the character '='. For values to be portable across systems conforming to IEEE Std 1003.1-2001, the value shall be composed of characters from the portable character set (except NUL and as indicated below). There is no meaning associated with the order of strings in the environment. If more than one string in a process' environment has the same name, the consequences are undefined.

Environment variable names used by the utilities in the Shell and Utilities volume of IEEE Std 1003.1-2001 consist solely of uppercase letters, digits, and the '_' (underscore) from the characters defined in Portable Character Set and do not begin with a digit. Other characters may be permitted by an implementation; applications shall tolerate the presence of such names. Uppercase and lowercase letters shall retain their unique identities and shall not be folded together. The name space of environment variable names containing lowercase letters is reserved for applications. Applications can define any environment variables with names from this name space without modifying the behavior of the standard utilities.

Note: Other applications may have difficulty dealing with environment variable names that start with a digit. For this reason, use of such names is not recommended anywhere.

Thus:

  • Tools (including the shell) are required to fully support environment variable names with uppercase and lowercase letters, digits (except in the first position), and the underscore.
  • Tools (including the shell) should tolerate other names -- meaning they shouldn't crash or misbehave in their presence -- but are not required to support them.

Finally, shells are explicitly allowed to discard environment variable names which are not also shell variable names. From the relevant standard:

It is unspecified whether environment variables that were passed to the shell when it was invoked, but were not used to initialize shell variables (see Shell Variables) because they had invalid names, are included in the environment passed to execl() and (if execl() fails as described above) to the new shell.


Moreover, what defines a valid shell name is well-defined:

Name - In the shell command language, a word consisting solely of underscores, digits, and alphabetics from the portable character set. The first character of a name is not a digit.

Notably, only underscores (not dashes) are considered part of a valid name in a POSIX-compliant shell.


...and the POSIX specification for export explicitly uses the word "name" (which it defined in the text quoted above), and describes it as applying to "variables" (shell variables, the restrictions on names for which are also subject to restrictions quoted elsewhere in this document):

The shell shall give the export attribute to the variables corresponding to the specified names, which shall cause them to be in the environment of subsequently executed commands. If the name of a variable is followed by = word, then the value of that variable shall be set to word.


All the above being said -- if your operating system provides a /proc/self/environ which represents the state of your enviroment variables at process startup (before a shell has, as it's allowed to do, potentially discarded any variables which don't have valid names in shell), you can extract content with invalid names like so:

# using a lower-case name where possible is in line with POSIX guidelines, see above
aws_access_key_id_var="AWS_${BUCKET_NAME}_ACCESS_KEY_ID"
while IFS= read -r -d '' var; do
  [[ $var = "$aws_access_key_id_var"=* ]] || continue
  val=${var#"${aws_access_key_id_var}="}
  break
done </proc/self/environ
echo "Extracted value: $val"