This is in effect a revert of
commit 1391278030
Reported-by: Alexander E. Patrakov <patrakov@gmail.com>
Reported-by: Jakub Wilk <jwilk@jwilk.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Extended information for timerfd file descriptors in
/proc/[pid]/fdinfo was added in commit af9c4957cf21 ("timerfd:
Implement show_fdinfo method", 2014-07-16), to support
checkpoint/restore for such file descriptors (see also the
TFD_IOC_SET_TICKS ioctl which is documented in timerfd_create.2).
Signed-off-by: Lucas Werkmeister <mail@lucaswerkmeister.de>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Pathname escaping is not done properly in /proc/<pid>/maps;
because of this, different pathnames may appear the same
(verified by experiment and reading the source code).
Further details from Elvira about the relevant location in
the kernel code:
show_map_vma() from fs/proc/task_mmu.c uses seq_file_path()
from fs/seq_file.c to print the dentry name, which in turn
calls seq_path() from the same file. seq_path() uses
d_path() from fs/d_path.c to get the path name; this is
where the " (deleted)" part comes from. This is followed by
mangling the string with mangle_path() (fs/seq_file.c); this
function only replaces those characters that were supplied
in the "esc" argument and does not bother with escaping
anything else ('\\', for example). The value of this
argument comes without modifications from the initial call
of seq_file_path() by show_map_vma(), and that is "\n".
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The left-most pid namespace in a given procfs' `NStgid` does not
change based on the pid namespace of the reading process. Rather,
each procfs has an associated outer-most namespace, which gets
set when the procfs is mounted:
```
static struct dentry *proc_mount(struct file_system_type *fs_type,
int flags, const char *dev_name, void *data)
{
struct pid_namespace *ns;
if (flags & MS_KERNMOUNT) {
ns = data;
data = NULL;
} else {
ns = task_active_pid_ns(current);
}
return mount_ns(fs_type, flags, data, ns, ns->user_ns, proc_fill_super);
}
```
i.e. either the root namespace for kernel mounts or the namespace
of the mounting process. This ns then gets saved in the fs' super
block and is the basis for most operations. It is this ns that the
left-most value of `NStgid` is relative to, not the reading process.
Reported-by: Robert O'Callahan <robert@ocallahan.org>
Signed-off-by: Keno Fischer <keno@juliacomputing.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Explain how to determine the top-most mount at a particular
location by inspecting /proc/PID/mountinfo.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Starting in Linux 4.11, if the process dumpable attribute is
not 1 and the process resides in a noninitial namespaces that
has valid mappings for UID 0 and GID 0, then the ownership of
/proc/PID/* is made the same as the root IDs of the namespace.
Determined by inspection of fs/proc/base.c
See also the following kernel commit:
commit 68eb94f16227336a5773b83ecfa8290f1d6b78ce
Author: Eric W. Biederman <ebiederm@xmission.com>
Date: Tue Jan 3 10:23:11 2017 +1300
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The statement that resetting the dumpable attribute of a process
to 1 causes the ownership of files to revert the process's real
IDs looked suspect. And indeed it is at odds with the code in
fs/proc/base.c::task_dump_owner() (Linux 4.16 sources).
Further verified with a quick test that resetting dumpable to 1
causes the ownership of /proc/PID/* files to revert to the
process's effective IDs. Mea culpa for the original mistake.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
After comments from Miklos, and further digging in the kernel
source that showed that chroot() can also result in "hidden"
parent-IDs in mountinfo, I've revised the description of
mountinfo.
In fs/proc_namespace.cs::how_mountinfo() there is:
/* mountpoints outside of chroot jail will give SEQ_SKIP on this */
err = seq_path_root(m, &mnt_path, &p->root, " \t\n\\");
if (err)
goto out;
I instrumented the 'if (err)' code path with printk()
to show that there is indeed a record corresponding to the
parent-ID for the process root that is being skipped.
Reported-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
I do not have an exact handle on the details, but I can see
roughly what is going on. Internally, there seems to be one
("hidden") mount ID reserved to each mount namespace, and that ID
is the parent of the root mount point.
Looking through the (4.14) kernel source, mount IDs are allocated
by a kernel function called mnt_alloc_id() (in fs/namespace.c),
which is in turn called by alloc_vfsmnt() which is in turn called
by clone_mnt().
A new mount namespace is created by the kernel function
copy_mnt_ns() (in fs/namespace.c, called by
create_new_namespaces() in kernel/nsproxy.c). The copy_mnt_ns()
function calls copy_tree() (in fs/namespace.c), and copy_tree()
calls clone_mnt() in *two* places. The first of these is the call
that creates the "hidden" mount ID that becomes the parent of the
root mount point. (I verified this by instrumenting the kernel
with a few printk() calls to display the IDs.) The second place
where copy_tree() calls clone_mnt() is in a loop that replicates
each of the mount points (including the root mount point) in the
source mount namespace.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
After Linux 2.6.36, the heuristic calculation of oom_score
has changed to only consider used memory and CAP_SYS_ADMIN.
See kernel commit a63d83f427fbce97a6cea0db2e64b0eb8435cd10.
Signed-off-by: Marcus Folkesson <marcus.folkesson@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Document the seccomp /proc interfaces in Linux 4.14:
/proc/sys/kernel/seccomp/actions_avail and
/proc/sys/kernel/seccomp/actions_logged.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Since the symbolic links for pipes and sockets do not refer to real
files in the file system tree, it can be hard to discover that they
still have mode and ownership information (revealed e.g. by `stat -L`),
so let's point this out in the manpage.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
When referring to the architecture, consistently use "x86-64",
not "x86_64". Hitherto, there was a mixture of usages, with
"x86-64" predominant.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
See Linux commit 56873f43abdcd574b25105867a990f067747b2f4
and Linux commit f074a8f49eb87cde95ac9d040ad5e7ea4f029738
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The location has been changed in Linux commit
v4.10-rc1~40^2~86^2~4.
* man5/proc.5 (.SS Files and directories)
<.TP .I /proc/sys/kernel/sysrq, .TP .IR /proc/sysrq-trigger>:
Amend pointer to Documentation/sysrq.txt with change introduced
in Linux 4.10 (move to Documentation/admin-guide/sysrq.rst).
Signed-off-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The location of the file has been changed in Linux commit
v2.6.28-rc1~734^2^8~3.
* man5/proc.5 (.SS Files and directories) <.TP .I /proc/mtrr>:
Amend pointer to in-kernel MTRR documentation with the
location change happened in Linux 2.6.28.
Signed-off-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
DocBook documentation has been removed in commit
v4.13-rc1~34^2~21^2~11. Crypto API has been converted to
ReStructured format during the Linux 4.10 development cycle
(see commits v4.10-rc1~40^2~8 and v4.10-rc1~40^2~7).
* man5/proc.5 (.SS Files and directories) <.TP .I /proc/crypto>:
Amend the reference to the kernel's crypto API documentation
with the new location, effective since Linux 4.10.
Signed-off-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The file has been moved twice since its mention on the man page.
* man5/proc.5 (.SS Files and directories)
<.TP .IR /proc/[pid]/attr/keycreate>: Amend security keys
documentation reference with the locations in different
versions of Linux kernel tree.
Signed-off-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
timer_stats was removed in Linux commit commit v4.11-rc1~177^2~5
citing security concerns.
* man5/proc.5 (.SS Files and directories)
<.TP .I /proc/timer_stats>: Mention the last Linux version where
the file was available along with the reasons of removal.
Signed-off-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This PPC-specific sysctl option has been removed in Linux 2.4.9.2,
according to historic Linux repository commit log.
* man5/proc.5 (.SS Files and directories)
<.TP .I /proc/sys/kernel/htab-reclaim>: Mention the last Linux
version where the option was available.
Signed-off-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Linux commit v4.10-rc1~40^2~86^2~4 moves initrd documentation from
Documentation/initrd.txt to Documentation/admin-quide/initrd.rst.
* man4/initrd.4 (.SS Changing the normal root filesystem,
.SH SEE ALSO): Amend pointer to in-kernel initrd documentation
with change introduced in Linux 4.10 (move to
Documentation/admin-guide/initrd.rst).
* man5/proc.5 (.SS Files and directories)
<.TP .I /proc/sys/kernel/real-root-dev>: Likewise.
* man7/bootparam.7 (.SS Boot arguments for ramdisk use)
<.TP .B 'noinitrd'>: Likewise.
Signed-off-by: Eugene Syromyatnikov <evgsyr@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
A recent patch fixed the use of dashes in ranges in
various places, but mistakenly used em-dashes, rather than
en-dashes. Fix that.
Reported-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Based on a patch by Bjarni Ingi Gislason.
Reported-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Based on a patch by Bjarni Ingi Gislason.
According to SI, ""The numerical value always precedes the unit,
and a space is always used to separate the unit from the number
[...] The only exceptions to this rule are for the unit symbols
for degree, minute, and second for plane angle."
Cowritten-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The documentation moved in linux
commit 9d85025b0418163fae079c9ba8f8445212de8568
("docs-rst: create an user's manual book").
Signed-off-by: Benjamin Peterson <bp@benjamin.pe>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Mention a few other system calls that create file descriptors
that display an 'anon_inode' symlink in /proc/PID/fd
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
/proc/pid/environ reflects process environment at
*start* of program execution.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Refer to recently added descriptions of
/proc/sys/vm/admin_reserve_kbytes and
/proc/sys/vm/user_reserve_kbytes.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Rework the text, make it clearer that MMUPageSize is a separate
line, add kernel version numbers, and example output.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Add a shell example showing that /proc/[pid]/root is more
than a symlink. Based on an example provided by Mike Frysinger
in an earlier commit message.
Cowritten-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
If the target process is in a different mount namespace, the root
symlink actually shows that view of the filesystem. As an example:
/* Terminal 1 */
$ unshare -Urnm
# mount -t tmpfs tmpfs /etc
# mount --bind /bin /dev
# echo $$
17168
/* Terminal 2 */
# ls /etc # Normal view of /etc files.
# ls /proc/17168/root/etc # Empty view of the tmpfs.
# ls /dev # Normal view of /dev files.
# ls /proc/17168/root/dev # Contents of /bin files.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>