namespaces.7: Remove userns material shifted to user_namespaces(7)

The user namespaces section was getting long and unwieldy. Split it into its own page, so that it can be better structured with subtitles, etc. Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
2013-02-27 07:08:06 +01:00 · 2013-02-27 07:08:06 +01:00 · 67d1131fd9
parent 046de6a7d7
commit 67d1131fd9
1 changed files with 3 additions and 310 deletions
--- a/man7/namespaces.7
+++ b/man7/namespaces.7
@ -486,316 +486,8 @@ option.
 .\" ==================== User namespaces ====================
 .\"
 .SS User namespaces (CLONE_NEWUSER)
-User namespaces isolate security-related identifiers, in particular,
+See
-user IDs, group IDs, keys (see
+.BR user_namespaces (7).
 .BR keyctl (2)),
 and capabilities.
 A process's user and group IDs can be different
 inside and outside a user namespace.
 In particular,
 a process can have a normal unprivileged user ID outside a user namespace
 while at the same time having a user ID of 0 inside the namespace;
 in other words,
 the process has full privileges for operations inside the user namespace,
 but is unprivileged for operations outside the namespace.
 User namespaces can be nested;
 that is, each user namespace has a parent user namespace,
 and can have zero or more child user namespaces.
 The parent of a user namespace is the user namespace
 of the process that creates the user namespace via a call to
 .BR unshare (2)
 or
 .BR clone (2)
 with the 
 .BR CLONE_NEWUSER
 flag.
 When a user namespace is created,
 it starts out without a mapping of user IDs (group IDs)
 to the parent user namespace.
 The desired mapping of user IDs (group IDs) to the parent user namespace
 may be set by writing into  
 .IR /proc/[pid]/uid_map
 .RI ( /proc/[pid]/gid_map );
 see below.
 The first process in a user namespace starts out with a complete set
 of capabilities with respect to the new user namespace.  
 System calls that return user IDs (group IDs) will return
 either the user ID (group ID) mapped into the current
 user namespace if there is a mapping, or the overflow user ID (group ID);
 the default value for the overflow user ID (group ID) is 65534.
 See the descriptions of
 .IR /proc/sys/kernel/overflowuid
 and
 .IR /proc/sys/kernel/overflowgid
 in
 .BR proc (5).
 Starting in Linux 3.8, unprivileged processes can create user namespaces,
 and mount, PID, IPC, network, and UTS namespaces can be created with just the
 .B CAP_SYS_ADMIN
 capability in the caller's user namespace.
 If
 .BR CLONE_NEWUSER
 is specified along with other
 .B CLONE_NEW*
 flags in a single
 .BR clone (2)
 or
 .BR unshare (2)
 call, the user namespace is guaranteed to be created first,
 giving the caller privileges over the remaining
 namespaces created by the call.
 Thus, it is possible for an unprivileged caller to specify this combination
 of flags.
 When a new IPC,  mount, network, PID, or UTS namespace is created via
 .BR clone (2)
 or
 .BR unshare (2),
 the kernel records the user namespace of the creating process against
 the new namespace.
 When a process in the new namespace subsequently performs
 privileged operations that operate on global
 resources isolated by the namespace,
 the permission checks are performed according to the process's capabilities
 in the user namespace that the kernel associated with the new namespace.
 The following rules apply with respect to the capabilities granted
 to a process:
 .\" In the 3.8 sources, see security/commoncap.c::cap_capable():
 .IP 1. 3
 If a process has a capability in a parent user namespace,
 then it has that capability in all child (and further removed descendant)
 namespaces as well.
 .IP 2.
 .\" * The owner of the user namespace in the parent of the
 .\" * user namespace has all caps.
 When a user namespace is created, the kernel records the effective
 user ID of the creating process as being the "owner" of the namespace,
 and likewise associates the effective group ID of the creating process
 with the namespace.
 A process whose effective user ID matches that of the 
 owner of a user namespace and which is a member of the parent namespace
 (or a further removed namespace that is a direct ancestor)
 has all capabilities in the user namespace.
 .\" As a rough approximation, this means that
 .\" the user who creates a user namespace 
 .\" has all capabilities inside that namespace and its descendants.
 .PP
 Use of user namespaces requires a kernel that is configured with the
 .B CONFIG_USER_NS
 option.
 Over the years, there have been a lot of features that have been added
 to the Linux kernel that are only available to privileged users
 because of their potential to confuse set-user-ID-root applications.
 In general, it becomes safe to allow the root user in a user namespace to
 use those features because it is impossible, while in a user namespace,
 to gain more privilege than the root user of a user namespace has.
 The
 .IR /proc/[pid]/uid_map
 and
 .IR /proc/[pid]/gid_map
 files (available since Linux 3.5)
 .\" commit 22d917d80e842829d0ca0a561967d728eb1d6303
 expose the mappings for user and group IDs
 inside the user namespace for the process
 .IR pid .
 The description here explains the details for
 .IR uid_map ;
 .IR gid_map
 is exactly the same,
 but each instance of "user ID" is replaced by "group ID".
 The
 .I uid_map
 file exposes the mapping of user IDs from the user namespace
 of the process
 .IR pid
 to the user namespace of the process that opened
 .IR uid_map
 (but see a qualification to this point below).
 In other words, processes that are in different user namespaces
 will potentially see different values when reading from a particular
 .I uid_map
 file, depending on the user ID mappings for the user namespaces
 of the reading processes.
 Each line in the
 .I uid_map
 file specifies a 1-to-1 mapping of a range of contiguous
 user IDs between two user namespaces.
 (When a user namespace is first created, this file is empty.)
 The specification in each line takes the form of
 three numbers delimited by white space.
 The first two numbers specify the starting user ID in 
 each user namespace.
 The third number specifies the length of the mapped range.
 In detail, the fields are interpreted as follows:
 .IP (1) 4
 The start of the range of user IDs in
 the user namespace of the process
 .IR pid .
 .IP (2)
 The start of the range of user
 IDs to which the user IDs specified by field one map.
 How field two is interpreted depends on whether the process that opened
 .I uid_map
 and the process
 .IR pid
 are in the same user namespace, as follows:
 .RS
 .IP a) 3
 If the two processes are in different user namespaces:
 field two is the start of a range of
 user IDs in the user namespace of the process that opened
 .IR uid_map .
 .IP b)
 If the two processes are in the same user namespace:
 field two is the start of the range of
 user IDs in the parent user namespace of the process
 .IR pid .
 This case enables the opener of
 .I uid_map
 (the common case here is opening
 .IR /proc/self/uid_map )
 to see the mapping of user IDs into the user namespace of the process
 that created this user namespace.
 .RE
 .IP (3)
 The length of the range of user IDs that is mapped between the two
 user namespaces.
 .PP
 After the creation of a new user namespace, the
 .I uid_map
 file of
 .I one
 of the process in the namespace may be written to 
 .I once
 to define the mapping of user IDs in the new user namespace.
 (An attempt to write more than once to a
 .I uid_map
 file in a user namespace fails with the error
 .BR EPERM .)
 The lines written to
 .IR uid_map
 must conform to the following rules:
 .IP * 3
 The three fields must be valid numbers,
 and the last field must be greater than 0.
 .IP *
 Lines are terminated by newline characters.
 .IP *
 There is an (arbitrary) limit on the number of lines in the file.
 As at Linux 3.8, the limit is five lines.
 In addition, the number of bytes written to
 the file must be less than the system page size,
 .\" FIXME(Eric): the restriction "less than" rather than "less than or equal"
 .\" seems strangely arbitrary. Furthermore, the comment does not agree
 .\" with the code in kernel/user_namespace.c. Which is correct.
 and the write must be performed at the start of the file (i.e.,
 .BR lseek (2)
 and
 .BR pwrite (2)
 can't be used to write to nonzero offsets in the file).
 .IP *
 The range of user IDs specified in each line cannot overlap with the ranges
 in any other lines.
 In the current implementation (Linux 3.8), this requirement is 
 satisfied by a simplistic implementation that imposes the further
 requirement that
 the values in both field 1 and field 2 of successive lines must be
 in ascending numerical order.
 .IP *
 At least one line must be written to the file.
 .PP
 Writes that violate the above rules fail with the error
 .BR EINVAL .
 In order for a process to write to the
 .I /proc/[pid]/uid_map
 .RI ( /proc/[pid]/gid_map )
 file, all of the following requirements must be met:
 .IP 1. 3
 The writing process must have the
 .BR CAP_SETUID
 .RB ( CAP_SETGID )
 capability in the user namespace of the process
 .IR pid .
 .\" FIXME(Eric): 
 .\" Something isn't quite right in the description here.
 .\" Suppose UID 1000 creates a user namespace. At this point, UID 0 in
 .\" the parent namespace can write a map of (say) '0 1000 10' to uid_map.
 .\" That succeeds. But how is that case covered in the three rules here?
 .\" In other words, how does UID 0 in the parent namespace have any
 .\" capabilities in the new child namespace? Somewhere on the page,
 .\" I think there needs to be a statement about the privileges of
 .\" UID 0 when no mapping has yet been defined, right?
 .\" Or is it simply the case that UID 0 in the parent namespace
 .\" always has all capabilities in the child namespace?
 .\"
 .IP 2.
 The writing process must be in either the user namespace of the process
 .I pid
 or inside the parent user namespace of the process
 .IR pid .
 .IP 3.
 One of the following is true:
 .RS
 .IP * 3
 The data written to
 .I uid_map
 .RI ( gid_map )
 consists of a single line that maps the writing process's file system user ID
 (group ID) in the parent user namespace to a user ID (group ID)
 in the user namespace.
 The usual case here is that this single line provides a mapping for user ID
 of the process that created the namespace.
 .IP * 3
 The process has the
 .BR CAP_SETUID
 .RB ( CAP_SETGID )
 capability in the parent user namespace.
 Thus, a privileged process can make mappings to arbitrary user IDs (group IDs)
 in the parent user namespace.
 .RE
 .PP
 Writes that violate the above rules fail with the error
 .BR EPERM .
 .PP
 In order to create a new user namespace,
 there must exist a mapping of the caller's effective
 user and group IDs into the parent namespace.
 If such a mapping does not exist, then
 .BR clone (2)
 and
 .BR unshare (2)
 fail with the error
 .BR EPERM .
 .PP
 When a process inside a user namespace executes
 a set-user-ID (set-group-ID) program,
 the process's effective user (group) ID inside the namespace is changed
 to whatever value is mapped for the user (group) ID of the file.
 However, if either the user
 .I or
 the group ID of the file has no mapping inside the namespace,
 the set-user-ID (set-group-ID) bit is silently ignored:
 the new program is executed,
 but the process's effective user (group) ID is left unchanged.
 (This mirrors the semantics of executing a set-user-ID or set-group-ID
 program that resides on a file system that was mounted with the
 .BR MS_NOSUID
 flag (see 
 .BR mount (2).)
 .\"
 .\" ==================== UTS namespaces ====================
 .\"
@ -827,4 +519,5 @@ Namespaces are a Linux-specific feature.
 .BR proc (5),
 .BR credentials (7),
 .BR capabilities (7),
 .BR user_namespaces (7),
 .BR switch_root (8)