From 67d1131fd900200d1888313d62fe4ae29834c8ee Mon Sep 17 00:00:00 2001 From: Michael Kerrisk Date: Wed, 27 Feb 2013 07:08:06 +0100 Subject: [PATCH] namespaces.7: Remove userns material shifted to user_namespaces(7) The user namespaces section was getting long and unwieldy. Split it into its own page, so that it can be better structured with subtitles, etc. Signed-off-by: Michael Kerrisk --- man7/namespaces.7 | 313 +--------------------------------------------- 1 file changed, 3 insertions(+), 310 deletions(-) diff --git a/man7/namespaces.7 b/man7/namespaces.7 index 12b85739e..e75951333 100644 --- a/man7/namespaces.7 +++ b/man7/namespaces.7 @@ -486,316 +486,8 @@ option. .\" ==================== User namespaces ==================== .\" .SS User namespaces (CLONE_NEWUSER) -User namespaces isolate security-related identifiers, in particular, -user IDs, group IDs, keys (see -.BR keyctl (2)), -and capabilities. -A process's user and group IDs can be different -inside and outside a user namespace. -In particular, -a process can have a normal unprivileged user ID outside a user namespace -while at the same time having a user ID of 0 inside the namespace; -in other words, -the process has full privileges for operations inside the user namespace, -but is unprivileged for operations outside the namespace. - -User namespaces can be nested; -that is, each user namespace has a parent user namespace, -and can have zero or more child user namespaces. -The parent of a user namespace is the user namespace -of the process that creates the user namespace via a call to -.BR unshare (2) -or -.BR clone (2) -with the -.BR CLONE_NEWUSER -flag. - -When a user namespace is created, -it starts out without a mapping of user IDs (group IDs) -to the parent user namespace. -The desired mapping of user IDs (group IDs) to the parent user namespace -may be set by writing into -.IR /proc/[pid]/uid_map -.RI ( /proc/[pid]/gid_map ); -see below. - -The first process in a user namespace starts out with a complete set -of capabilities with respect to the new user namespace. - -System calls that return user IDs (group IDs) will return -either the user ID (group ID) mapped into the current -user namespace if there is a mapping, or the overflow user ID (group ID); -the default value for the overflow user ID (group ID) is 65534. -See the descriptions of -.IR /proc/sys/kernel/overflowuid -and -.IR /proc/sys/kernel/overflowgid -in -.BR proc (5). - -Starting in Linux 3.8, unprivileged processes can create user namespaces, -and mount, PID, IPC, network, and UTS namespaces can be created with just the -.B CAP_SYS_ADMIN -capability in the caller's user namespace. - -If -.BR CLONE_NEWUSER -is specified along with other -.B CLONE_NEW* -flags in a single -.BR clone (2) -or -.BR unshare (2) -call, the user namespace is guaranteed to be created first, -giving the caller privileges over the remaining -namespaces created by the call. -Thus, it is possible for an unprivileged caller to specify this combination -of flags. - -When a new IPC, mount, network, PID, or UTS namespace is created via -.BR clone (2) -or -.BR unshare (2), -the kernel records the user namespace of the creating process against -the new namespace. -When a process in the new namespace subsequently performs -privileged operations that operate on global -resources isolated by the namespace, -the permission checks are performed according to the process's capabilities -in the user namespace that the kernel associated with the new namespace. - - -The following rules apply with respect to the capabilities granted -to a process: -.\" In the 3.8 sources, see security/commoncap.c::cap_capable(): -.IP 1. 3 -If a process has a capability in a parent user namespace, -then it has that capability in all child (and further removed descendant) -namespaces as well. -.IP 2. -.\" * The owner of the user namespace in the parent of the -.\" * user namespace has all caps. -When a user namespace is created, the kernel records the effective -user ID of the creating process as being the "owner" of the namespace, -and likewise associates the effective group ID of the creating process -with the namespace. -A process whose effective user ID matches that of the -owner of a user namespace and which is a member of the parent namespace -(or a further removed namespace that is a direct ancestor) -has all capabilities in the user namespace. -.\" As a rough approximation, this means that -.\" the user who creates a user namespace -.\" has all capabilities inside that namespace and its descendants. -.PP -Use of user namespaces requires a kernel that is configured with the -.B CONFIG_USER_NS -option. - -Over the years, there have been a lot of features that have been added -to the Linux kernel that are only available to privileged users -because of their potential to confuse set-user-ID-root applications. -In general, it becomes safe to allow the root user in a user namespace to -use those features because it is impossible, while in a user namespace, -to gain more privilege than the root user of a user namespace has. - -The -.IR /proc/[pid]/uid_map -and -.IR /proc/[pid]/gid_map -files (available since Linux 3.5) -.\" commit 22d917d80e842829d0ca0a561967d728eb1d6303 -expose the mappings for user and group IDs -inside the user namespace for the process -.IR pid . -The description here explains the details for -.IR uid_map ; -.IR gid_map -is exactly the same, -but each instance of "user ID" is replaced by "group ID". - -The -.I uid_map -file exposes the mapping of user IDs from the user namespace -of the process -.IR pid -to the user namespace of the process that opened -.IR uid_map -(but see a qualification to this point below). -In other words, processes that are in different user namespaces -will potentially see different values when reading from a particular -.I uid_map -file, depending on the user ID mappings for the user namespaces -of the reading processes. - -Each line in the -.I uid_map -file specifies a 1-to-1 mapping of a range of contiguous -user IDs between two user namespaces. -(When a user namespace is first created, this file is empty.) -The specification in each line takes the form of -three numbers delimited by white space. -The first two numbers specify the starting user ID in -each user namespace. -The third number specifies the length of the mapped range. -In detail, the fields are interpreted as follows: -.IP (1) 4 -The start of the range of user IDs in -the user namespace of the process -.IR pid . -.IP (2) -The start of the range of user -IDs to which the user IDs specified by field one map. -How field two is interpreted depends on whether the process that opened -.I uid_map -and the process -.IR pid -are in the same user namespace, as follows: -.RS -.IP a) 3 -If the two processes are in different user namespaces: -field two is the start of a range of -user IDs in the user namespace of the process that opened -.IR uid_map . -.IP b) -If the two processes are in the same user namespace: -field two is the start of the range of -user IDs in the parent user namespace of the process -.IR pid . -This case enables the opener of -.I uid_map -(the common case here is opening -.IR /proc/self/uid_map ) -to see the mapping of user IDs into the user namespace of the process -that created this user namespace. -.RE -.IP (3) -The length of the range of user IDs that is mapped between the two -user namespaces. -.PP -After the creation of a new user namespace, the -.I uid_map -file of -.I one -of the process in the namespace may be written to -.I once -to define the mapping of user IDs in the new user namespace. -(An attempt to write more than once to a -.I uid_map -file in a user namespace fails with the error -.BR EPERM .) - -The lines written to -.IR uid_map -must conform to the following rules: -.IP * 3 -The three fields must be valid numbers, -and the last field must be greater than 0. -.IP * -Lines are terminated by newline characters. -.IP * -There is an (arbitrary) limit on the number of lines in the file. -As at Linux 3.8, the limit is five lines. -In addition, the number of bytes written to -the file must be less than the system page size, -.\" FIXME(Eric): the restriction "less than" rather than "less than or equal" -.\" seems strangely arbitrary. Furthermore, the comment does not agree -.\" with the code in kernel/user_namespace.c. Which is correct. -and the write must be performed at the start of the file (i.e., -.BR lseek (2) -and -.BR pwrite (2) -can't be used to write to nonzero offsets in the file). -.IP * -The range of user IDs specified in each line cannot overlap with the ranges -in any other lines. -In the current implementation (Linux 3.8), this requirement is -satisfied by a simplistic implementation that imposes the further -requirement that -the values in both field 1 and field 2 of successive lines must be -in ascending numerical order. -.IP * -At least one line must be written to the file. -.PP -Writes that violate the above rules fail with the error -.BR EINVAL . - -In order for a process to write to the -.I /proc/[pid]/uid_map -.RI ( /proc/[pid]/gid_map ) -file, all of the following requirements must be met: -.IP 1. 3 -The writing process must have the -.BR CAP_SETUID -.RB ( CAP_SETGID ) -capability in the user namespace of the process -.IR pid . -.\" FIXME(Eric): -.\" Something isn't quite right in the description here. -.\" Suppose UID 1000 creates a user namespace. At this point, UID 0 in -.\" the parent namespace can write a map of (say) '0 1000 10' to uid_map. -.\" That succeeds. But how is that case covered in the three rules here? -.\" In other words, how does UID 0 in the parent namespace have any -.\" capabilities in the new child namespace? Somewhere on the page, -.\" I think there needs to be a statement about the privileges of -.\" UID 0 when no mapping has yet been defined, right? -.\" Or is it simply the case that UID 0 in the parent namespace -.\" always has all capabilities in the child namespace? -.\" -.IP 2. -The writing process must be in either the user namespace of the process -.I pid -or inside the parent user namespace of the process -.IR pid . -.IP 3. -One of the following is true: -.RS -.IP * 3 -The data written to -.I uid_map -.RI ( gid_map ) -consists of a single line that maps the writing process's file system user ID -(group ID) in the parent user namespace to a user ID (group ID) -in the user namespace. -The usual case here is that this single line provides a mapping for user ID -of the process that created the namespace. -.IP * 3 -The process has the -.BR CAP_SETUID -.RB ( CAP_SETGID ) -capability in the parent user namespace. -Thus, a privileged process can make mappings to arbitrary user IDs (group IDs) -in the parent user namespace. -.RE -.PP -Writes that violate the above rules fail with the error -.BR EPERM . -.PP -In order to create a new user namespace, -there must exist a mapping of the caller's effective -user and group IDs into the parent namespace. -If such a mapping does not exist, then -.BR clone (2) -and -.BR unshare (2) -fail with the error -.BR EPERM . -.PP -When a process inside a user namespace executes -a set-user-ID (set-group-ID) program, -the process's effective user (group) ID inside the namespace is changed -to whatever value is mapped for the user (group) ID of the file. -However, if either the user -.I or -the group ID of the file has no mapping inside the namespace, -the set-user-ID (set-group-ID) bit is silently ignored: -the new program is executed, -but the process's effective user (group) ID is left unchanged. -(This mirrors the semantics of executing a set-user-ID or set-group-ID -program that resides on a file system that was mounted with the -.BR MS_NOSUID -flag (see -.BR mount (2).) +See +.BR user_namespaces (7). .\" .\" ==================== UTS namespaces ==================== .\" @@ -827,4 +519,5 @@ Namespaces are a Linux-specific feature. .BR proc (5), .BR credentials (7), .BR capabilities (7), +.BR user_namespaces (7), .BR switch_root (8)