Vi. Vi. - 3 months ago 18
Linux Question

How to execve a process, retaining capabilities in spite of missing filesystem-based capabilities?

I want to make system usable without

setuid
, file "+p" capabilities, and in general without things which are disabled when I set PR_SET_NO_NEW_PRIVS.

With this approach (
init
sets
PR_SET_NO_NEW_PRIVS
and filesystem-based capability elevation no longer possible) you cannot "refill" your capabilities and only need to be careful not to "splatter" them.

How to
execve
some other process without "splattering" any granted capabilities (such as if the new program's file is
setcap =ei
)? Just "I trust this new process as I trust myself". For example, a capability is given to a user (and the user wants to exercise it in any program he starts)...

Can I make the entire filesystem permanently
=ei
? I want to keep the filesystem just not interfering with the scheme, not capable of granting or revoking capabilities; controlling everything through parent->child things.

Answer

I am not saying that I recommend this for what you are doing, but here it is.

Extracted from the manual, There have been some changes. According to it: fork does not change capabilities. And now there is an ambient set added in Linux kernel 4.3, it seems that this is for what you are trying to do.

   Ambient (since Linux 4.3):
          This is a set of capabilities that are preserved across an execve(2) of a program that is not privileged.  The ambient capability set obeys the invariant that no capability can ever
          be ambient if it is not both permitted and inheritable.

          The ambient capability set can be directly modified using
          prctl(2).  Ambient capabilities are automatically lowered if
          either of the corresponding permitted or inheritable
          capabilities is lowered.

          Executing a program that changes UID or GID due to the set-
          user-ID or set-group-ID bits or executing a program that has
          any file capabilities set will clear the ambient set.  Ambient
          capabilities are added to the permitted set and assigned to
          the effective set when execve(2) is called.

   A child created via fork(2) inherits copies of its parent's
   capability sets.  See below for a discussion of the treatment of
   capabilities during execve(2).

Transformation of capabilities during execve()
   During an execve(2), the kernel calculates the new capabilities of
   the process using the following algorithm:

       P'(ambient) = (file is privileged) ? 0 : P(ambient)

       P'(permitted) = (P(inheritable) & F(inheritable)) |
                       (F(permitted) & cap_bset) | P'(ambient)

       P'(effective) = F(effective) ? P'(permitted) : P'(ambient)

       P'(inheritable) = P(inheritable)    [i.e., unchanged]

   where:

       P         denotes the value of a thread capability set before the
                 execve(2)

       P'        denotes the value of a thread capability set after the
                 execve(2)

       F         denotes a file capability set

       cap_bset  is the value of the capability bounding set (described
                 below).

   A privileged file is one that has capabilities or has the set-user-ID
   or set-group-ID bit set.