proc.5: Document inaccurate RSS due to SPLIT_RSS_COUNTING

[mtk: Manually applied patch, because of conflicts with other
merged changes; also added an edit suggested by Jann; see the
thread at
https://lore.kernel.org/linux-man/20201012114940.1317510-1-jannh@google.com/]

Since 34e55232e59f7b19050267a05ff1226e5cd122a5 (introduced back in
v2.6.34), Linux uses per-thread RSS counters to reduce cache
contention on the per-mm counters. With a 4K page size, that means
that you can end up with the counters off by up to 252KiB per
thread.

Example:

$ cat rsstest.c
#include <stdlib.h>
#include <err.h>
#include <stdio.h>
#include <signal.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/eventfd.h>
#include <sys/prctl.h>
void dump(int pid) {
  char cmd[1000];
  sprintf(cmd,
    "grep '^VmRSS' /proc/%d/status;"
    "grep '^Rss:' /proc/%d/smaps_rollup;"
    "echo",
    pid, pid
  );
  system(cmd);
}
int main(void) {
  eventfd_t dummy;
  int child_wait = eventfd(0, EFD_SEMAPHORE|EFD_CLOEXEC);
  int child_resume = eventfd(0, EFD_SEMAPHORE|EFD_CLOEXEC);
  if (child_wait == -1 || child_resume == -1) err(1, "eventfd");
  pid_t child = fork();
  if (child == -1) err(1, "fork");
  if (child == 0) {
    if (prctl(PR_SET_PDEATHSIG, SIGKILL)) err(1, "PDEATHSIG");
    if (getppid() == 1) exit(0);
    char *mapping = mmap(NULL, 80 * 0x1000, PROT_READ|PROT_WRITE,
                         MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
    eventfd_write(child_wait, 1);
    eventfd_read(child_resume, &dummy);
    for (int i=0; i<40; i++) mapping[0x1000 * i] = 1;
    eventfd_write(child_wait, 1);
    eventfd_read(child_resume, &dummy);
    for (int i=40; i<80; i++) mapping[0x1000 * i] = 1;
    eventfd_write(child_wait, 1);
    eventfd_read(child_resume, &dummy);
    exit(0);
  }

  eventfd_read(child_wait, &dummy);
  dump(child);
  eventfd_write(child_resume, 1);

  eventfd_read(child_wait, &dummy);
  dump(child);
  eventfd_write(child_resume, 1);

  eventfd_read(child_wait, &dummy);
  dump(child);
  eventfd_write(child_resume, 1);

  exit(0);
}
$ gcc -o rsstest rsstest.c && ./rsstest
VmRSS:	      68 kB
Rss:                 616 kB

VmRSS:	      68 kB
Rss:                 776 kB

VmRSS:	     812 kB
Rss:                 936 kB

$

Let's document that those counters aren't entirely accurate.

Reported-by: Mark Mossberg <mark.mossberg@gmail.com>
Signed-off-by: Jann Horn <jannh@google.com>

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This commit is contained in:
Jann Horn 2020-10-27 14:35:35 +01:00 committed by Michael Kerrisk
parent 14948ad6ec
commit 20e43cd694
1 changed files with 34 additions and 2 deletions

View File

@ -2265,6 +2265,9 @@ This is just the pages which
count toward text, data, or stack space. count toward text, data, or stack space.
This does not include pages This does not include pages
which have not been demand-loaded in, or which are swapped out. which have not been demand-loaded in, or which are swapped out.
This value is inaccurate; see
.I /proc/[pid]/statm
below.
.TP .TP
(25) \fIrsslim\fP \ %lu (25) \fIrsslim\fP \ %lu
Current soft limit in bytes on the rss of the process; Current soft limit in bytes on the rss of the process;
@ -2409,10 +2412,11 @@ The columns are:
size (1) total program size size (1) total program size
(same as VmSize in \fI/proc/[pid]/status\fP) (same as VmSize in \fI/proc/[pid]/status\fP)
resident (2) resident set size resident (2) resident set size
(same as VmRSS in \fI/proc/[pid]/status\fP) (inaccurate; same as VmRSS in \fI/proc/[pid]/status\fP)
shared (3) number of resident shared pages shared (3) number of resident shared pages
(i.e., backed by a file) (i.e., backed by a file)
(same as RssFile+RssShmem in \fI/proc/[pid]/status\fP) (inaccurate; same as RssFile+RssShmem in
\fI/proc/[pid]/status\fP)
text (4) text (code) text (4) text (code)
.\" (not including libs; broken, includes data segment) .\" (not including libs; broken, includes data segment)
lib (5) library (unused since Linux 2.6; always 0) lib (5) library (unused since Linux 2.6; always 0)
@ -2421,6 +2425,16 @@ data (6) data + stack
dt (7) dirty pages (unused since Linux 2.6; always 0) dt (7) dirty pages (unused since Linux 2.6; always 0)
.EE .EE
.in .in
.IP
.\" See SPLIT_RSS_COUNTING in the kernel.
.\" Inaccuracy is bounded by TASK_RSS_EVENTS_THRESH.
Some of these values are inaccurate because
of a kernel-internal scalability optimization.
If accurate values are required, use
.I /proc/[pid]/smaps
or
.I /proc/[pid]/smaps_rollup
instead, which are much slower but provide accurate, detailed information.
.TP .TP
.I /proc/[pid]/status .I /proc/[pid]/status
Provides much of the information in Provides much of the information in
@ -2597,6 +2611,9 @@ directly access physical memory.
.TP .TP
.IR VmHWM .IR VmHWM
Peak resident set size ("high water mark"). Peak resident set size ("high water mark").
This value is inaccurate; see
.I /proc/[pid]/statm
above.
.TP .TP
.IR VmRSS .IR VmRSS
Resident set size. Resident set size.
@ -2605,16 +2622,25 @@ Note that the value here is the sum of
.IR RssFile , .IR RssFile ,
and and
.IR RssShmem . .IR RssShmem .
This value is inaccurate; see
.I /proc/[pid]/statm
above.
.TP .TP
.IR RssAnon .IR RssAnon
Size of resident anonymous memory. Size of resident anonymous memory.
.\" commit bf9683d6990589390b5178dafe8fd06808869293 .\" commit bf9683d6990589390b5178dafe8fd06808869293
(since Linux 4.5). (since Linux 4.5).
This value is inaccurate; see
.I /proc/[pid]/statm
above.
.TP .TP
.IR RssFile .IR RssFile
Size of resident file mappings. Size of resident file mappings.
.\" commit bf9683d6990589390b5178dafe8fd06808869293 .\" commit bf9683d6990589390b5178dafe8fd06808869293
(since Linux 4.5). (since Linux 4.5).
This value is inaccurate; see
.I /proc/[pid]/statm
above.
.TP .TP
.IR RssShmem .IR RssShmem
Size of resident shared memory (includes System V shared memory, Size of resident shared memory (includes System V shared memory,
@ -2626,6 +2652,9 @@ and shared anonymous mappings).
.TP .TP
.IR VmData ", " VmStk ", " VmExe .IR VmData ", " VmStk ", " VmExe
Size of data, stack, and text segments. Size of data, stack, and text segments.
This value is inaccurate; see
.I /proc/[pid]/statm
above.
.TP .TP
.IR VmLib .IR VmLib
Shared library code size. Shared library code size.
@ -2641,6 +2670,9 @@ Size of second-level page tables (added in Linux 4.0; removed in Linux 4.15).
.\" commit b084d4353ff99d824d3bc5a5c2c22c70b1fba722 .\" commit b084d4353ff99d824d3bc5a5c2c22c70b1fba722
Swapped-out virtual memory size by anonymous private pages; Swapped-out virtual memory size by anonymous private pages;
shmem swap usage is not included (since Linux 2.6.34). shmem swap usage is not included (since Linux 2.6.34).
This value is inaccurate; see
.I /proc/[pid]/statm
above.
.TP .TP
.IR HugetlbPages .IR HugetlbPages
Size of hugetlb memory portions Size of hugetlb memory portions