Gentoo Websites Logo
Go to: Gentoo Home Documentation Forums Lists Bugs Planet Store Wiki Get Gentoo!
Bug 881391 - app-shells/bash: changes to handling of array subscripts in math context raise backward compatibility concerns
Summary: app-shells/bash: changes to handling of array subscripts in math context rais...
Status: CONFIRMED
Alias: None
Product: Gentoo Linux
Classification: Unclassified
Component: Current packages (show other bugs)
Hardware: All Linux
: Normal normal (vote)
Assignee: Gentoo's Team for Core System packages
URL: http://mywiki.wooledge.org/BashPitfal...
Whiteboard:
Keywords:
Depends on:
Blocks: bash-5.2
  Show dependency tree
 
Reported: 2022-11-15 12:34 UTC by kfm
Modified: 2023-02-15 03:51 UTC (History)
0 users

See Also:
Package list:
Runtime testing required: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description kfm 2022-11-15 12:34:48 UTC
SUMMARY

From the 5.2 release notes:-

"The shell attempts to do a much better job of parsing and expanding array subscripts only once; this has visible effects in the `unset' builtin, word expansions, conditional commands, and other builtins that can assign variable values as a side effect."

This has several implications but the intent of this bug is to focus on how the arithmetic context is affected. Essentially, this is yet another episode in the ongoing saga of applying half-baked changes that attempt to curtail the issue of repeated expansions, even arbitrary code execution, occurring. However, this time, the changes made are able to break existing code that was provably correct when executed by prior versions of bash.

PROBLEM (PRE 5.2)

To understand the problem, let's consider a simple example. I'll use the (( keyword to demonstrate, as it is a convenient way of imposing an arithmetic context, though it should be borne in mind that it's not the only way. The following code can be considered unsafe, unless the value of key is validated beforehand.

    (( map[$key] )) && echo "the value is numerically true"

But why? Let's see.

$ declare -A map; map[foo]=1
$ key='foo$(date >&2)'; (( map[$key] )); echo $?
Tue 15 Nov 11:10:13 GMT 2022
0

The problem here is that the subscript is subject to repeated expansion. Consequently, the portion of the value of key that is a valid command substitution is interpreted as such, amounting to a code injection. As the comsub proceeds to capture nothing, the value of key is ultimately treated as 'foo' and, therefore, a value of 1 is obtained, which is arithmetically true. None of which was the intent.

While never properly documented, bash has long offered the means to defend against this problem. Namely, to quote the sigil in the case of referencing a variable whose name is expected to be a valid subscript for an associative array, and to forgo the sigil in the case of referencing a variable whose name is expected to be a valid subscript for an indexed array.

((   assoc_array[\$key] )) # good
((   assoc_array[$key]  )) # bad
(( indexed_array[i]     )) # good
(( indexed_array[$i]    )) # bad

To apply this principle to the example above:-

$ key='foo$(date >&2)'; (( map[\$key] )); echo $?
$ 1

That's better. No code injection occurs and the value of key is, in fact, taken literally. As an aside, bash 5.0 introduced a shell option named "assoc_expand_once", which is one of the half-baked measures that I alluded to earlier. It defaults to being off. I have never used this in my own code and have no recollection of ever seeing it be used in the wild.

PROBLEM (POST 5.2)

The problem with 5.2 is best explained by an excerpt from Wooledge's BashPitfalls page:-

"It appears bash has made the parser more complex by selectively removing the pre-expansion step for certain parts of expressions. That's bad because the rules for when that happens are undocumented, and you would need to know how bash decides that in order to know where to escape things. let looks unaffected, as expected, because its arguments are passed directly to arith.c without the parser mangling the expression unpredictably."

The long and the short of it is that the map[\$key] trick is no longer a dependable workaround. Here is an example, based on the one shown by the BashPitfalls page.

$ echo "$BASH_VERSION"
5.1.16(1)-release
$ declare -A map; key="']"
$ (( map[\$key]++ ))
$ declare -p map
declare -A map=(["']"]="1" )

That is the expected outcome. Except for NUL, any sequence of bytes is valid as a associative array key. Now let's see what happens in 5.2.

$ echo "$BASH_VERSION"
5.2.9(1)-release
$ declare -A map; key="']"
$ (( map[\$key]++ ))
-bash: ((: map\[']]++ : syntax error: invalid arithmetic operator (error token is "\[']]++ ")

Well that's no good. Refraining from escaping the sigil doesn't help either.

$ (( map[$key]++ ))
-bash: ((: map[']]++ : bad array subscript (error token is "map[']]++ ")

WORKAROUNDS

One workaround is never to install bash >=5.2. Needless to say, this is not viable in the long term. As far as I can gather, Chet has no intention of reverting this change and is unlikely to do so unless he receives considerable resistance. So far, that doesn't appear to be happening, though questions have been raised on the mailing lists. Even if he were to do so, I fear that he would simply 'fix' one thing and break two others in the course of doing so.

A second workaround is to avoid the arithmetic context where feasible. Depending on the purpose of the array, this can be a perfectly reasonable course of action. Consider the following program as an example, which prints unique lines.

	declare -A seen
	while read -r; do
		# This is dangerous as of 5.2
		if (( ! seen[\$REPLY]++ )); then
			printf '%s\n' "$REPLY"
		fi
	done

In order to serve its intended purpose, it really only needs to check whether a key exists, so it could instead be written as follows.

	declare -A seen
	while read -r; do
		# Works in 4.2 onwards
		if [[ ! -v 'seen[$REPLY]' ]]; then
			printf '%s\n' "$REPLY"
			seen[$REPLY]=
		fi
	done

Similarly, it could just as well set the the value for the key to any non-empty value and test with [[ ${seen[$REPLY]} ]] instead.

A third workaround - and the only one known to allow for the continued use of the arithmetic context in a definitely safe fashion - given an arbitrary key - is to look up a value and operate on it as necessary before re-assigning it. After applying this particular workaround to the above program, it might look as follows.

	declare -A seen
	while read -r; do
                i=${seen[$REPLY]}
		if (( ! i++ )); then
			printf '%s\n' "$REPLY"
		fi
		seen[$REPLY]=$i
	done

CONCLUSION

Firstly, any code that makes use of the previously dependable array[\$key] technique - and its variants - will need to be re-written.

Secondly, any code that continues to specify associative array subscripts within an arithmetic context, in such a way that their values are the consequences of a variable/parameter expansion and are not necessarily predictable in nature, will need to be reviewed and potentially re-written.

Thirdly, vendors that find this change to be burdensome would do well to voice their concerns.
Comment 1 kfm 2022-11-15 17:01:34 UTC
There is another workaround. As observed by the BashPitfalls page, let is unaffected because "its arguments are passed directly to arith.c without the parser mangling the expression unpredictably." Again, taking the unique lines program as an example, it could be written as follows.

	declare -A seen
	while read -r; do
		if ! let 'seen[$REPLY]++'; then
			printf '%s\n' "$REPLY"
		fi
	done

I think that it's improbable that the behaviour of let will change any time soon. That being said, who can ever tell what Chet is going to do next.
Comment 2 kfm 2022-12-04 11:47:39 UTC
So far, I have reviewed the following projects.

- portage
- the ::gentoo repo
- the ::haskell repo
- the ::guru repo

No changes are required in any of these at the current time.