mirror of https://github.com/mkerrisk/man-pages
500 lines
14 KiB
Groff
500 lines
14 KiB
Groff
.\" Copyright (C) Michael Kerrisk, 2004
|
|
.\" using some material drawn from earlier man pages
|
|
.\" written by Thomas Kuhn, Copyright 1996
|
|
.\"
|
|
.\" %%%LICENSE_START(GPLv2+_DOC_FULL)
|
|
.\" This is free documentation; you can redistribute it and/or
|
|
.\" modify it under the terms of the GNU General Public License as
|
|
.\" published by the Free Software Foundation; either version 2 of
|
|
.\" the License, or (at your option) any later version.
|
|
.\"
|
|
.\" The GNU General Public License's references to "object code"
|
|
.\" and "executables" are to be interpreted as the output of any
|
|
.\" document formatting or typesetting system, including
|
|
.\" intermediate and printed output.
|
|
.\"
|
|
.\" This manual is distributed in the hope that it will be useful,
|
|
.\" but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
.\" GNU General Public License for more details.
|
|
.\"
|
|
.\" You should have received a copy of the GNU General Public
|
|
.\" License along with this manual; if not, see
|
|
.\" <http://www.gnu.org/licenses/>.
|
|
.\" %%%LICENSE_END
|
|
.\"
|
|
.TH MLOCK 2 2020-04-11 "Linux" "Linux Programmer's Manual"
|
|
.SH NAME
|
|
mlock, mlock2, munlock, mlockall, munlockall \- lock and unlock memory
|
|
.SH SYNOPSIS
|
|
.nf
|
|
.B #include <sys/mman.h>
|
|
.PP
|
|
.BI "int mlock(const void *" addr ", size_t " len );
|
|
.BI "int mlock2(const void *" addr ", size_t " len ", int " flags );
|
|
.BI "int munlock(const void *" addr ", size_t " len );
|
|
.PP
|
|
.BI "int mlockall(int " flags );
|
|
.B int munlockall(void);
|
|
.fi
|
|
.SH DESCRIPTION
|
|
.BR mlock (),
|
|
.BR mlock2 (),
|
|
and
|
|
.BR mlockall ()
|
|
lock part or all of the calling process's virtual address
|
|
space into RAM, preventing that memory from being paged to the
|
|
swap area.
|
|
.PP
|
|
.BR munlock ()
|
|
and
|
|
.BR munlockall ()
|
|
perform the converse operation,
|
|
unlocking part or all of the calling process's virtual
|
|
address space, so that pages in the specified virtual address range may
|
|
once more to be swapped out if required by the kernel memory manager.
|
|
.PP
|
|
Memory locking and unlocking are performed in units of whole pages.
|
|
.SS mlock(), mlock2(), and munlock()
|
|
.BR mlock ()
|
|
locks pages in the address range starting at
|
|
.I addr
|
|
and continuing for
|
|
.I len
|
|
bytes.
|
|
All pages that contain a part of the specified address range are
|
|
guaranteed to be resident in RAM when the call returns successfully;
|
|
the pages are guaranteed to stay in RAM until later unlocked.
|
|
.PP
|
|
.BR mlock2 ()
|
|
.\" commit a8ca5d0ecbdde5cc3d7accacbd69968b0c98764e
|
|
.\" commit de60f5f10c58d4f34b68622442c0e04180367f3f
|
|
.\" commit b0f205c2a3082dd9081f9a94e50658c5fa906ff1
|
|
also locks pages in the specified range starting at
|
|
.I addr
|
|
and continuing for
|
|
.I len
|
|
bytes.
|
|
However, the state of the pages contained in that range after the call
|
|
returns successfully will depend on the value in the
|
|
.I flags
|
|
argument.
|
|
.PP
|
|
The
|
|
.I flags
|
|
argument can be either 0 or the following constant:
|
|
.TP
|
|
.B MLOCK_ONFAULT
|
|
Lock pages that are currently resident and mark the entire range so
|
|
that the remaining nonresident pages are locked when they are populated
|
|
by a page fault.
|
|
.PP
|
|
If
|
|
.I flags
|
|
is 0,
|
|
.BR mlock2 ()
|
|
behaves exactly the same as
|
|
.BR mlock ().
|
|
.PP
|
|
.BR munlock ()
|
|
unlocks pages in the address range starting at
|
|
.I addr
|
|
and continuing for
|
|
.I len
|
|
bytes.
|
|
After this call, all pages that contain a part of the specified
|
|
memory range can be moved to external swap space again by the kernel.
|
|
.SS mlockall() and munlockall()
|
|
.BR mlockall ()
|
|
locks all pages mapped into the address space of the
|
|
calling process.
|
|
This includes the pages of the code, data and stack
|
|
segment, as well as shared libraries, user space kernel data, shared
|
|
memory, and memory-mapped files.
|
|
All mapped pages are guaranteed
|
|
to be resident in RAM when the call returns successfully;
|
|
the pages are guaranteed to stay in RAM until later unlocked.
|
|
.PP
|
|
The
|
|
.I flags
|
|
argument is constructed as the bitwise OR of one or more of the
|
|
following constants:
|
|
.TP
|
|
.B MCL_CURRENT
|
|
Lock all pages which are currently mapped into the address space of
|
|
the process.
|
|
.TP
|
|
.B MCL_FUTURE
|
|
Lock all pages which will become mapped into the address space of the
|
|
process in the future.
|
|
These could be, for instance, new pages required
|
|
by a growing heap and stack as well as new memory-mapped files or
|
|
shared memory regions.
|
|
.TP
|
|
.BR MCL_ONFAULT " (since Linux 4.4)"
|
|
Used together with
|
|
.BR MCL_CURRENT ,
|
|
.BR MCL_FUTURE ,
|
|
or both.
|
|
Mark all current (with
|
|
.BR MCL_CURRENT )
|
|
or future (with
|
|
.BR MCL_FUTURE )
|
|
mappings to lock pages when they are faulted in.
|
|
When used with
|
|
.BR MCL_CURRENT ,
|
|
all present pages are locked, but
|
|
.BR mlockall ()
|
|
will not fault in non-present pages.
|
|
When used with
|
|
.BR MCL_FUTURE ,
|
|
all future mappings will be marked to lock pages when they are faulted
|
|
in, but they will not be populated by the lock when the mapping is
|
|
created.
|
|
.B MCL_ONFAULT
|
|
must be used with either
|
|
.B MCL_CURRENT
|
|
or
|
|
.B MCL_FUTURE
|
|
or both.
|
|
.PP
|
|
If
|
|
.B MCL_FUTURE
|
|
has been specified, then a later system call (e.g.,
|
|
.BR mmap (2),
|
|
.BR sbrk (2),
|
|
.BR malloc (3)),
|
|
may fail if it would cause the number of locked bytes to exceed
|
|
the permitted maximum (see below).
|
|
In the same circumstances, stack growth may likewise fail:
|
|
the kernel will deny stack expansion and deliver a
|
|
.B SIGSEGV
|
|
signal to the process.
|
|
.PP
|
|
.BR munlockall ()
|
|
unlocks all pages mapped into the address space of the
|
|
calling process.
|
|
.SH RETURN VALUE
|
|
On success, these system calls return 0.
|
|
On error, \-1 is returned,
|
|
.I errno
|
|
is set to indicate the error,
|
|
and no changes are made to any locks in the
|
|
address space of the process.
|
|
.SH ERRORS
|
|
.TP
|
|
.B ENOMEM
|
|
(Linux 2.6.9 and later) the caller had a nonzero
|
|
.B RLIMIT_MEMLOCK
|
|
soft resource limit, but tried to lock more memory than the limit
|
|
permitted.
|
|
This limit is not enforced if the process is privileged
|
|
.RB ( CAP_IPC_LOCK ).
|
|
.TP
|
|
.B ENOMEM
|
|
(Linux 2.4 and earlier) the calling process tried to lock more than
|
|
half of RAM.
|
|
.\" In the case of mlock(), this check is somewhat buggy: it doesn't
|
|
.\" take into account whether the to-be-locked range overlaps with
|
|
.\" already locked pages. Thus, suppose we allocate
|
|
.\" (num_physpages / 4 + 1) of memory, and lock those pages once using
|
|
.\" mlock(), and then lock the *same* page range a second time.
|
|
.\" In the case, the second mlock() call will fail, since the check
|
|
.\" calculates that the process is trying to lock (num_physpages / 2 + 2)
|
|
.\" pages, which of course is not true. (MTK, Nov 04, kernel 2.4.28)
|
|
.TP
|
|
.B EPERM
|
|
The caller is not privileged, but needs privilege
|
|
.RB ( CAP_IPC_LOCK )
|
|
to perform the requested operation.
|
|
.\"SVr4 documents an additional EAGAIN error code.
|
|
.PP
|
|
For
|
|
.BR mlock (),
|
|
.BR mlock2 (),
|
|
and
|
|
.BR munlock ():
|
|
.TP
|
|
.B EAGAIN
|
|
Some or all of the specified address range could not be locked.
|
|
.TP
|
|
.B EINVAL
|
|
The result of the addition
|
|
.IR addr + len
|
|
was less than
|
|
.IR addr
|
|
(e.g., the addition may have resulted in an overflow).
|
|
.TP
|
|
.B EINVAL
|
|
(Not on Linux)
|
|
.I addr
|
|
was not a multiple of the page size.
|
|
.TP
|
|
.B ENOMEM
|
|
Some of the specified address range does not correspond to mapped
|
|
pages in the address space of the process.
|
|
.TP
|
|
.B ENOMEM
|
|
Locking or unlocking a region would result in the total number of
|
|
mappings with distinct attributes (e.g., locked versus unlocked)
|
|
exceeding the allowed maximum.
|
|
.\" I.e., the number of VMAs would exceed the 64kB maximum
|
|
(For example, unlocking a range in the middle of a currently locked
|
|
mapping would result in three mappings:
|
|
two locked mappings at each end and an unlocked mapping in the middle.)
|
|
.PP
|
|
For
|
|
.BR mlock2 ():
|
|
.TP
|
|
.B EINVAL
|
|
Unknown \fIflags\fP were specified.
|
|
.PP
|
|
For
|
|
.BR mlockall ():
|
|
.TP
|
|
.B EINVAL
|
|
Unknown \fIflags\fP were specified or
|
|
.B MCL_ONFAULT
|
|
was specified without either
|
|
.B MCL_FUTURE
|
|
or
|
|
.BR MCL_CURRENT .
|
|
.PP
|
|
For
|
|
.BR munlockall ():
|
|
.TP
|
|
.B EPERM
|
|
(Linux 2.6.8 and earlier) The caller was not privileged
|
|
.RB ( CAP_IPC_LOCK ).
|
|
.SH VERSIONS
|
|
.BR mlock2 ()
|
|
is available since Linux 4.4;
|
|
glibc support was added in version 2.27.
|
|
.SH CONFORMING TO
|
|
POSIX.1-2001, POSIX.1-2008, SVr4.
|
|
.PP
|
|
.BR mlock2 ()
|
|
is Linux specific.
|
|
.PP
|
|
On POSIX systems on which
|
|
.BR mlock ()
|
|
and
|
|
.BR munlock ()
|
|
are available,
|
|
.B _POSIX_MEMLOCK_RANGE
|
|
is defined in \fI<unistd.h>\fP and the number of bytes in a page
|
|
can be determined from the constant
|
|
.B PAGESIZE
|
|
(if defined) in \fI<limits.h>\fP or by calling
|
|
.IR sysconf(_SC_PAGESIZE) .
|
|
.PP
|
|
On POSIX systems on which
|
|
.BR mlockall ()
|
|
and
|
|
.BR munlockall ()
|
|
are available,
|
|
.B _POSIX_MEMLOCK
|
|
is defined in \fI<unistd.h>\fP to a value greater than 0.
|
|
(See also
|
|
.BR sysconf (3).)
|
|
.\" POSIX.1-2001: It shall be defined to -1 or 0 or 200112L.
|
|
.\" -1: unavailable, 0: ask using sysconf().
|
|
.\" glibc defines it to 1.
|
|
.SH NOTES
|
|
Memory locking has two main applications: real-time algorithms and
|
|
high-security data processing.
|
|
Real-time applications require
|
|
deterministic timing, and, like scheduling, paging is one major cause
|
|
of unexpected program execution delays.
|
|
Real-time applications will
|
|
usually also switch to a real-time scheduler with
|
|
.BR sched_setscheduler (2).
|
|
Cryptographic security software often handles critical bytes like
|
|
passwords or secret keys as data structures.
|
|
As a result of paging,
|
|
these secrets could be transferred onto a persistent swap store medium,
|
|
where they might be accessible to the enemy long after the security
|
|
software has erased the secrets in RAM and terminated.
|
|
(But be aware that the suspend mode on laptops and some desktop
|
|
computers will save a copy of the system's RAM to disk, regardless
|
|
of memory locks.)
|
|
.PP
|
|
Real-time processes that are using
|
|
.BR mlockall ()
|
|
to prevent delays on page faults should reserve enough
|
|
locked stack pages before entering the time-critical section,
|
|
so that no page fault can be caused by function calls.
|
|
This can be achieved by calling a function that allocates a
|
|
sufficiently large automatic variable (an array) and writes to the
|
|
memory occupied by this array in order to touch these stack pages.
|
|
This way, enough pages will be mapped for the stack and can be
|
|
locked into RAM.
|
|
The dummy writes ensure that not even copy-on-write
|
|
page faults can occur in the critical section.
|
|
.PP
|
|
Memory locks are not inherited by a child created via
|
|
.BR fork (2)
|
|
and are automatically removed (unlocked) during an
|
|
.BR execve (2)
|
|
or when the process terminates.
|
|
The
|
|
.BR mlockall ()
|
|
.B MCL_FUTURE
|
|
and
|
|
.B MCL_FUTURE | MCL_ONFAULT
|
|
settings are not inherited by a child created via
|
|
.BR fork (2)
|
|
and are cleared during an
|
|
.BR execve (2).
|
|
.PP
|
|
Note that
|
|
.BR fork (2)
|
|
will prepare the address space for a copy-on-write operation.
|
|
The consequence is that any write access that follows will cause
|
|
a page fault that in turn may cause high latencies for a real-time process.
|
|
Therefore, it is crucial not to invoke
|
|
.BR fork (2)
|
|
after an
|
|
.BR mlockall ()
|
|
or
|
|
.BR mlock ()
|
|
operation\(emnot even from a thread which runs at a low priority within
|
|
a process which also has a thread running at elevated priority.
|
|
.PP
|
|
The memory lock on an address range is automatically removed
|
|
if the address range is unmapped via
|
|
.BR munmap (2).
|
|
.PP
|
|
Memory locks do not stack, that is, pages which have been locked several times
|
|
by calls to
|
|
.BR mlock (),
|
|
.BR mlock2 (),
|
|
or
|
|
.BR mlockall ()
|
|
will be unlocked by a single call to
|
|
.BR munlock ()
|
|
for the corresponding range or by
|
|
.BR munlockall ().
|
|
Pages which are mapped to several locations or by several processes stay
|
|
locked into RAM as long as they are locked at least at one location or by
|
|
at least one process.
|
|
.PP
|
|
If a call to
|
|
.BR mlockall ()
|
|
which uses the
|
|
.B MCL_FUTURE
|
|
flag is followed by another call that does not specify this flag, the
|
|
changes made by the
|
|
.B MCL_FUTURE
|
|
call will be lost.
|
|
.PP
|
|
The
|
|
.BR mlock2 ()
|
|
.B MLOCK_ONFAULT
|
|
flag and the
|
|
.BR mlockall ()
|
|
.B MCL_ONFAULT
|
|
flag allow efficient memory locking for applications that deal with
|
|
large mappings where only a (small) portion of pages in the mapping are touched.
|
|
In such cases, locking all of the pages in a mapping would incur
|
|
a significant penalty for memory locking.
|
|
.SS Linux notes
|
|
Under Linux,
|
|
.BR mlock (),
|
|
.BR mlock2 (),
|
|
and
|
|
.BR munlock ()
|
|
automatically round
|
|
.I addr
|
|
down to the nearest page boundary.
|
|
However, the POSIX.1 specification of
|
|
.BR mlock ()
|
|
and
|
|
.BR munlock ()
|
|
allows an implementation to require that
|
|
.I addr
|
|
is page aligned, so portable applications should ensure this.
|
|
.PP
|
|
The
|
|
.I VmLck
|
|
field of the Linux-specific
|
|
.I /proc/[pid]/status
|
|
file shows how many kilobytes of memory the process with ID
|
|
.I PID
|
|
has locked using
|
|
.BR mlock (),
|
|
.BR mlock2 (),
|
|
.BR mlockall (),
|
|
and
|
|
.BR mmap (2)
|
|
.BR MAP_LOCKED .
|
|
.SS Limits and permissions
|
|
In Linux 2.6.8 and earlier,
|
|
a process must be privileged
|
|
.RB ( CAP_IPC_LOCK )
|
|
in order to lock memory and the
|
|
.B RLIMIT_MEMLOCK
|
|
soft resource limit defines a limit on how much memory the process may lock.
|
|
.PP
|
|
Since Linux 2.6.9, no limits are placed on the amount of memory
|
|
that a privileged process can lock and the
|
|
.B RLIMIT_MEMLOCK
|
|
soft resource limit instead defines a limit on how much memory an
|
|
unprivileged process may lock.
|
|
.SH BUGS
|
|
In Linux 4.8 and earlier,
|
|
a bug in the kernel's accounting of locked memory for unprivileged processes
|
|
(i.e., without
|
|
.BR CAP_IPC_LOCK )
|
|
meant that if the region specified by
|
|
.I addr
|
|
and
|
|
.I len
|
|
overlapped an existing lock,
|
|
then the already locked bytes in the overlapping region were counted twice
|
|
when checking against the limit.
|
|
Such double accounting could incorrectly calculate a "total locked memory"
|
|
value for the process that exceeded the
|
|
.BR RLIMIT_MEMLOCK
|
|
limit, with the result that
|
|
.BR mlock ()
|
|
and
|
|
.BR mlock2 ()
|
|
would fail on requests that should have succeeded.
|
|
This bug was fixed
|
|
.\" commit 0cf2f6f6dc605e587d2c1120f295934c77e810e8
|
|
in Linux 4.9.
|
|
.PP
|
|
In the 2.4 series Linux kernels up to and including 2.4.17,
|
|
a bug caused the
|
|
.BR mlockall ()
|
|
.B MCL_FUTURE
|
|
flag to be inherited across a
|
|
.BR fork (2).
|
|
This was rectified in kernel 2.4.18.
|
|
.PP
|
|
Since kernel 2.6.9, if a privileged process calls
|
|
.I mlockall(MCL_FUTURE)
|
|
and later drops privileges (loses the
|
|
.B CAP_IPC_LOCK
|
|
capability by, for example,
|
|
setting its effective UID to a nonzero value),
|
|
then subsequent memory allocations (e.g.,
|
|
.BR mmap (2),
|
|
.BR brk (2))
|
|
will fail if the
|
|
.B RLIMIT_MEMLOCK
|
|
resource limit is encountered.
|
|
.\" See the following LKML thread:
|
|
.\" http://marc.theaimsgroup.com/?l=linux-kernel&m=113801392825023&w=2
|
|
.\" "Rationale for RLIMIT_MEMLOCK"
|
|
.\" 23 Jan 2006
|
|
.SH SEE ALSO
|
|
.BR mincore (2),
|
|
.BR mmap (2),
|
|
.BR setrlimit (2),
|
|
.BR shmctl (2),
|
|
.BR sysconf (3),
|
|
.BR proc (5),
|
|
.BR capabilities (7)
|