Reid noted a confusion between 'old_root' (my attempt at a
shorthand for the old root point) and 'put_old. Eliminate the
confusion by replacing the shorthand with "old root mount point".
Reported-by: Reid Priedhorsky <reidpr@lanl.gov>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Eric Biederman notes that the change in commit f646ac88ef was
not strictly necessary for this example, since one of the already
documented requirements is that various mount points must not have
shared propagation, or else pivot_root() will fail. So, simplify
the example.
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Eric Biederman noted that my list of directories that could not
have shared propagation was incorrect. I had written that
new_root could not be shared; rather it should be: the parent of
the current root mount point.
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Quoting Eric Biederman:
The concern from our conversation at the container
mini-summit was that there is a pathology if in your initial
mount namespace all of the mounts are marked MS_SHARED like
systemd does (and is almost necessary if you are going to
use mount propagation), that if new_root itself is MS_SHARED
then unmounting the old_root could propagate.
So I believe the desired sequence is:
>>> chdir(new_root);
+++ mount("", ".", MS_SLAVE | MS_REC, NULL);
>>> pivot_root(".", ".");
>>> umount2(".", MNT_DETACH);
The change to new new_root could be either MS_SLAVE or
MS_PRIVATE. So long as it is not MS_SHARED the mount won't
propagate back to the parent mount namespace.
Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
LXC uses this [1]. I tested, to double-check, and it works.
The fchdir() dance done by LXC is not needed though:
fchdir(old_root); umount(".", MNT_DETACH); fchdir(new_root);
As far as I can see, just the umount() is sufficient, since,
after pivot_root(), oldi_root is at the top of the stack
of mounts at "/" and thus (so long as CWD is at "/")
the umount will remove the mount at the top of the stack.
Eric Biederman confirmed my understanding by mail, and
Philipp Wendler verified my results by experiment.
[1] See the following commit in LXC:
commit 2d489f9e87fa0cccd8a1762680a43eeff2fe1b6e
Author: Serge Hallyn <serge.hallyn@ubuntu.com>
Date: Sat Sep 20 03:15:44 2014 +0000
pivot_root: switch to a new mechanism (v2)
Helped-by: Eric W. Biederman <ebiederm@xmission.com>
Helped-by: Philipp Wendler <ml@philippwendler.de>
Helped-by: Aleksa Sarai <asarai@suse.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
After around 19 years, the behavior of pivot_root() has not been
changed, and will almost certainly not change in the future.
So, reword to remove the suggestion that the behavior may change.
Also, more clearly document the effect of pivot_root() on
the calling process's current working directory.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The reference of "Note that this also applies" was vague. So
combine this paragraph with an earlier one to make the linkage
clearer.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The idea that there might one day be a mechanism for kernel
threads to explicitly relinquish access to the filesystem never
came to pass (after 20 years), and the presence of text
describing this idea is, IMO, a distraction. So, remove it.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
One kernel printk() later, my suspicions seem confirmed: the text
describing the situation where the current root is not a mount
point (because of a chroot()) seems to be bogus. (Perhaps it was
true once upon a time.) In my testing, if the current root is not
a mount point, an EINVAL error results.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
In this text:
If the current root is not a mount point (e.g., after an
earlier chroot(2) or pivot_root())...
mention of pivot_root() makes no sense, since (as noted in an
earlier commit message for this page) 'new_root' in a previous
pivot_root() must (since Linux 2.4.5) have been a mount point.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
One of these "bugs" is a philosophical point already covered
elsewhere in the page, while the other is a somewhat obscure joke.
Both pieces are a bit of a distraction, really.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The note that EBUSY is given if a filesystem is already mounted
on 'Iput_old' was never really true. That restriction was in
Linux 2.3.14, but removed in Linux 2.3.99-pre6 so it never made
it to mainline.
The relevant diff in pivot_root() was:
error = -EBUSY;
- if (d_new_root->d_sb == root->d_sb || d_put_old->d_sb == root->d_sb)
+ if (new_nd.mnt == root_mnt || old_nd.mnt == root_mnt)
goto out2; /* loop */
- if (d_put_old != d_put_old->d_covers)
- goto out2; /* mount point is busy */
error = -EINVAL;
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Some of the text was written long ago, and hinted that things
might change in the future. However, 20 years have passed
and these details have not changed, so rework the text to
hint at that fact.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As far as I can see from the source code, the statement that
"No other filesystem may be mounted on 'put_old'" is incorrect.
Even looking at the 2.4.0 source code, there I can't see such
a restriction. In addition, some testing on a 5.0 kernel
(mounting 'put_old' in the new mount namespace just before
pivot_root()) did not result in an error for this case when
calling pivot_root().
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
pivot_root() only affects the current working directory and root
directory of other processes in the same mount namespace as the
caller.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
It appears that 'new_root' may not have needed to be a mount
point on ancient kernels, but already in Linux 2.4.5, there
was the diff shown below. Verified also by testing.
@@ -1631,8 +1605,9 @@
* - we don't move root/cwd if they are not at the root (reason: if something
* cared enough to change them, it's probably wrong to force them elsewhere)
* - it's okay to pick a root that isn't the root of a file system, e.g.
- * /nfs/my_root where /nfs is the mount point. Better avoid creating
- * unreachable mount points this way, though.
+ * /nfs/my_root where /nfs is the mount point. It must be a mountpoint,
+ * though, so you may need to say mount --bind /nfs/my_root /nfs/my_root
+ * first.
*/
asmlinkage long sys_pivot_root(const char *new_root, const char *put_old)
@@ -1640,7 +1615,7 @@
struct dentry *root;
struct vfsmount *root_mnt;
struct vfsmount *tmp;
- struct nameidata new_nd, old_nd;
+ struct nameidata new_nd, old_nd, parent_nd, root_parent;
char *name;
int error;
@@ -1688,6 +1663,10 @@
if (new_nd.mnt == root_mnt || old_nd.mnt == root_mnt)
goto out2; /* loop */
error = -EINVAL;
+ if (root_mnt->mnt_root != root)
+ goto out2;
+ if (new_nd.mnt->mnt_root != new_nd.dentry)
+ goto out2; /* not a mountpoint */
tmp = old_nd.mnt; /* make sure we can reach put_old from new_root */
spin_lock(&dcache_lock);
if (tmp != new_nd.mnt) {
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Based on text from Documentation/filesystems/ramfs-rootfs-initramfs.txt.
Signed-off-by: Elvira Khabirova <lineprinter@altlinux.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Per the comment on the pivot_root syscall in fs/namespace.c:
Also, the current root cannot be on the 'rootfs'
(initial ramfs) filesystem. See
Documentation/filesystems/ramfs-rootfs-initramfs.txt
for alternatives in this situation.
Signed-off-by: Joseph C. Sible <josephcsible@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Blank lines shouldn't generally appear in *roff source (other
than in code examples), since they create large vertical
spaces between text blocks.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>