mirror of https://github.com/mkerrisk/man-pages
498 lines
13 KiB
Groff
498 lines
13 KiB
Groff
.\" Copyright 2003,2004 Andi Kleen, SuSE Labs.
|
|
.\" and Copyright 2007 Lee Schermerhorn, Hewlett Packard
|
|
.\"
|
|
.\" %%%LICENSE_START(VERBATIM_PROF)
|
|
.\" Permission is granted to make and distribute verbatim copies of this
|
|
.\" manual provided the copyright notice and this permission notice are
|
|
.\" preserved on all copies.
|
|
.\"
|
|
.\" Permission is granted to copy and distribute modified versions of this
|
|
.\" manual under the conditions for verbatim copying, provided that the
|
|
.\" entire resulting derived work is distributed under the terms of a
|
|
.\" permission notice identical to this one.
|
|
.\"
|
|
.\" Since the Linux kernel and libraries are constantly changing, this
|
|
.\" manual page may be incorrect or out-of-date. The author(s) assume no
|
|
.\" responsibility for errors or omissions, or for damages resulting from
|
|
.\" the use of the information contained herein.
|
|
.\"
|
|
.\" Formatted or processed versions of this manual, if unaccompanied by
|
|
.\" the source, must acknowledge the copyright and authors of this work.
|
|
.\" %%%LICENSE_END
|
|
.\"
|
|
.\" 2006-02-03, mtk, substantial wording changes and other improvements
|
|
.\" 2007-08-27, Lee Schermerhorn <Lee.Schermerhorn@hp.com>
|
|
.\" more precise specification of behavior.
|
|
.\"
|
|
.\" FIXME
|
|
.\" Linux 3.8 added MPOL_MF_LAZY, which needs to be documented.
|
|
.\" Does it also apply for move_pages()?
|
|
.\"
|
|
.\" commit b24f53a0bea38b266d219ee651b22dba727c44ae
|
|
.\" Author: Lee Schermerhorn <lee.schermerhorn@hp.com>
|
|
.\" Date: Thu Oct 25 14:16:32 2012 +0200
|
|
.\"
|
|
.TH MBIND 2 2021-03-22 Linux "Linux Programmer's Manual"
|
|
.SH NAME
|
|
mbind \- set memory policy for a memory range
|
|
.SH SYNOPSIS
|
|
.nf
|
|
.B "#include <numaif.h>"
|
|
.PP
|
|
.BI "long mbind(void *" addr ", unsigned long " len ", int " mode ,
|
|
.BI " const unsigned long *" nodemask ", unsigned long " maxnode ,
|
|
.BI " unsigned int " flags );
|
|
.PP
|
|
Link with \fI\-lnuma\fP.
|
|
.fi
|
|
.PP
|
|
.IR Note :
|
|
There is no glibc wrapper for this system call; see NOTES.
|
|
.SH DESCRIPTION
|
|
.BR mbind ()
|
|
sets the NUMA memory policy,
|
|
which consists of a policy mode and zero or more nodes,
|
|
for the memory range starting with
|
|
.I addr
|
|
and continuing for
|
|
.I len
|
|
bytes.
|
|
The memory policy defines from which node memory is allocated.
|
|
.PP
|
|
If the memory range specified by the
|
|
.IR addr " and " len
|
|
arguments includes an "anonymous" region of memory\(emthat is
|
|
a region of memory created using the
|
|
.BR mmap (2)
|
|
system call with the
|
|
.BR MAP_ANONYMOUS \(emor
|
|
a memory-mapped file, mapped using the
|
|
.BR mmap (2)
|
|
system call with the
|
|
.B MAP_PRIVATE
|
|
flag, pages will be allocated only according to the specified
|
|
policy when the application writes (stores) to the page.
|
|
For anonymous regions, an initial read access will use a shared
|
|
page in the kernel containing all zeros.
|
|
For a file mapped with
|
|
.BR MAP_PRIVATE ,
|
|
an initial read access will allocate pages according to the
|
|
memory policy of the thread that causes the page to be allocated.
|
|
This may not be the thread that called
|
|
.BR mbind ().
|
|
.PP
|
|
The specified policy will be ignored for any
|
|
.B MAP_SHARED
|
|
mappings in the specified memory range.
|
|
Rather the pages will be allocated according to the memory policy
|
|
of the thread that caused the page to be allocated.
|
|
Again, this may not be the thread that called
|
|
.BR mbind ().
|
|
.PP
|
|
If the specified memory range includes a shared memory region
|
|
created using the
|
|
.BR shmget (2)
|
|
system call and attached using the
|
|
.BR shmat (2)
|
|
system call,
|
|
pages allocated for the anonymous or shared memory region will
|
|
be allocated according to the policy specified, regardless of which
|
|
process attached to the shared memory segment causes the allocation.
|
|
If, however, the shared memory region was created with the
|
|
.B SHM_HUGETLB
|
|
flag,
|
|
the huge pages will be allocated according to the policy specified
|
|
only if the page allocation is caused by the process that calls
|
|
.BR mbind ()
|
|
for that region.
|
|
.PP
|
|
By default,
|
|
.BR mbind ()
|
|
has an effect only for new allocations; if the pages inside
|
|
the range have been already touched before setting the policy,
|
|
then the policy has no effect.
|
|
This default behavior may be overridden by the
|
|
.B MPOL_MF_MOVE
|
|
and
|
|
.B MPOL_MF_MOVE_ALL
|
|
flags described below.
|
|
.PP
|
|
The
|
|
.I mode
|
|
argument must specify one of
|
|
.BR MPOL_DEFAULT ,
|
|
.BR MPOL_BIND ,
|
|
.BR MPOL_INTERLEAVE ,
|
|
.BR MPOL_PREFERRED ,
|
|
or
|
|
.BR MPOL_LOCAL
|
|
(which are described in detail below).
|
|
All policy modes except
|
|
.B MPOL_DEFAULT
|
|
require the caller to specify the node or nodes to which the mode applies,
|
|
via the
|
|
.I nodemask
|
|
argument.
|
|
.PP
|
|
The
|
|
.I mode
|
|
argument may also include an optional
|
|
.IR "mode flag" .
|
|
The supported
|
|
.I "mode flags"
|
|
are:
|
|
.TP
|
|
.BR MPOL_F_STATIC_NODES " (since Linux-2.6.26)"
|
|
A nonempty
|
|
.I nodemask
|
|
specifies physical node IDs.
|
|
Linux does not remap the
|
|
.I nodemask
|
|
when the thread moves to a different cpuset context,
|
|
nor when the set of nodes allowed by the thread's
|
|
current cpuset context changes.
|
|
.TP
|
|
.BR MPOL_F_RELATIVE_NODES " (since Linux-2.6.26)"
|
|
A nonempty
|
|
.I nodemask
|
|
specifies node IDs that are relative to the set of
|
|
node IDs allowed by the thread's current cpuset.
|
|
.PP
|
|
.I nodemask
|
|
points to a bit mask of nodes containing up to
|
|
.I maxnode
|
|
bits.
|
|
The bit mask size is rounded to the next multiple of
|
|
.IR "sizeof(unsigned long)" ,
|
|
but the kernel will use bits only up to
|
|
.IR maxnode .
|
|
A NULL value of
|
|
.I nodemask
|
|
or a
|
|
.I maxnode
|
|
value of zero specifies the empty set of nodes.
|
|
If the value of
|
|
.I maxnode
|
|
is zero,
|
|
the
|
|
.I nodemask
|
|
argument is ignored.
|
|
Where a
|
|
.I nodemask
|
|
is required, it must contain at least one node that is on-line,
|
|
allowed by the thread's current cpuset context
|
|
(unless the
|
|
.B MPOL_F_STATIC_NODES
|
|
mode flag is specified),
|
|
and contains memory.
|
|
.PP
|
|
The
|
|
.I mode
|
|
argument must include one of the following values:
|
|
.TP
|
|
.B MPOL_DEFAULT
|
|
This mode requests that any nondefault policy be removed,
|
|
restoring default behavior.
|
|
When applied to a range of memory via
|
|
.BR mbind (),
|
|
this means to use the thread memory policy,
|
|
which may have been set with
|
|
.BR set_mempolicy (2).
|
|
If the mode of the thread memory policy is also
|
|
.BR MPOL_DEFAULT ,
|
|
the system-wide default policy will be used.
|
|
The system-wide default policy allocates
|
|
pages on the node of the CPU that triggers the allocation.
|
|
For
|
|
.BR MPOL_DEFAULT ,
|
|
the
|
|
.I nodemask
|
|
and
|
|
.I maxnode
|
|
arguments must be specify the empty set of nodes.
|
|
.TP
|
|
.B MPOL_BIND
|
|
This mode specifies a strict policy that restricts memory allocation to
|
|
the nodes specified in
|
|
.IR nodemask .
|
|
If
|
|
.I nodemask
|
|
specifies more than one node, page allocations will come from
|
|
the node with sufficient free memory that is closest to
|
|
the node where the allocation takes place.
|
|
Pages will not be allocated from any node not specified in the
|
|
IR nodemask .
|
|
(Before Linux 2.6.26,
|
|
.\" commit 19770b32609b6bf97a3dece2529089494cbfc549
|
|
page allocations came from
|
|
the node with the lowest numeric node ID first, until that node
|
|
contained no free memory.
|
|
Allocations then came from the node with the next highest
|
|
node ID specified in
|
|
.I nodemask
|
|
and so forth, until none of the specified nodes contained free memory.)
|
|
.TP
|
|
.B MPOL_INTERLEAVE
|
|
This mode specifies that page allocations be interleaved across the
|
|
set of nodes specified in
|
|
.IR nodemask .
|
|
This optimizes for bandwidth instead of latency
|
|
by spreading out pages and memory accesses to those pages across
|
|
multiple nodes.
|
|
To be effective the memory area should be fairly large,
|
|
at least 1\ MB or bigger with a fairly uniform access pattern.
|
|
Accesses to a single page of the area will still be limited to
|
|
the memory bandwidth of a single node.
|
|
.TP
|
|
.B MPOL_PREFERRED
|
|
This mode sets the preferred node for allocation.
|
|
The kernel will try to allocate pages from this
|
|
node first and fall back to other nodes if the
|
|
preferred nodes is low on free memory.
|
|
If
|
|
.I nodemask
|
|
specifies more than one node ID, the first node in the
|
|
mask will be selected as the preferred node.
|
|
If the
|
|
.I nodemask
|
|
and
|
|
.I maxnode
|
|
arguments specify the empty set, then the memory is allocated on
|
|
the node of the CPU that triggered the allocation.
|
|
.TP
|
|
.BR MPOL_LOCAL " (since Linux 3.8)"
|
|
.\" commit 479e2802d09f1e18a97262c4c6f8f17ae5884bd8
|
|
.\" commit f2a07f40dbc603c15f8b06e6ec7f768af67b424f
|
|
This mode specifies "local allocation"; the memory is allocated on
|
|
the node of the CPU that triggered the allocation (the "local node").
|
|
The
|
|
.I nodemask
|
|
and
|
|
.I maxnode
|
|
arguments must specify the empty set.
|
|
If the "local node" is low on free memory,
|
|
the kernel will try to allocate memory from other nodes.
|
|
The kernel will allocate memory from the "local node"
|
|
whenever memory for this node is available.
|
|
If the "local node" is not allowed by the thread's current cpuset context,
|
|
the kernel will try to allocate memory from other nodes.
|
|
The kernel will allocate memory from the "local node" whenever
|
|
it becomes allowed by the thread's current cpuset context.
|
|
By contrast,
|
|
.B MPOL_DEFAULT
|
|
reverts to the memory policy of the thread (which may be set via
|
|
.BR set_mempolicy (2));
|
|
that policy may be something other than "local allocation".
|
|
.PP
|
|
If
|
|
.B MPOL_MF_STRICT
|
|
is passed in
|
|
.I flags
|
|
and
|
|
.I mode
|
|
is not
|
|
.BR MPOL_DEFAULT ,
|
|
then the call fails with the error
|
|
.B EIO
|
|
if the existing pages in the memory range don't follow the policy.
|
|
.\" According to the kernel code, the following is not true
|
|
.\" --Lee Schermerhorn
|
|
.\" In 2.6.16 or later the kernel will also try to move pages
|
|
.\" to the requested node with this flag.
|
|
.PP
|
|
If
|
|
.B MPOL_MF_MOVE
|
|
is specified in
|
|
.IR flags ,
|
|
then the kernel will attempt to move all the existing pages
|
|
in the memory range so that they follow the policy.
|
|
Pages that are shared with other processes will not be moved.
|
|
If
|
|
.B MPOL_MF_STRICT
|
|
is also specified, then the call fails with the error
|
|
.B EIO
|
|
if some pages could not be moved.
|
|
.PP
|
|
If
|
|
.B MPOL_MF_MOVE_ALL
|
|
is passed in
|
|
.IR flags ,
|
|
then the kernel will attempt to move all existing pages in the memory range
|
|
regardless of whether other processes use the pages.
|
|
The calling thread must be privileged
|
|
.RB ( CAP_SYS_NICE )
|
|
to use this flag.
|
|
If
|
|
.B MPOL_MF_STRICT
|
|
is also specified, then the call fails with the error
|
|
.B EIO
|
|
if some pages could not be moved.
|
|
.\" ---------------------------------------------------------------
|
|
.SH RETURN VALUE
|
|
On success,
|
|
.BR mbind ()
|
|
returns 0;
|
|
on error, \-1 is returned and
|
|
.I errno
|
|
is set to indicate the error.
|
|
.\" ---------------------------------------------------------------
|
|
.SH ERRORS
|
|
.\" I think I got all of the error returns. --Lee Schermerhorn
|
|
.TP
|
|
.B EFAULT
|
|
Part or all of the memory range specified by
|
|
.I nodemask
|
|
and
|
|
.I maxnode
|
|
points outside your accessible address space.
|
|
Or, there was an unmapped hole in the specified memory range specified by
|
|
.IR addr
|
|
and
|
|
.IR len .
|
|
.TP
|
|
.B EINVAL
|
|
An invalid value was specified for
|
|
.I flags
|
|
or
|
|
.IR mode ;
|
|
or
|
|
.I addr + len
|
|
was less than
|
|
.IR addr ;
|
|
or
|
|
.I addr
|
|
is not a multiple of the system page size.
|
|
Or,
|
|
.I mode
|
|
is
|
|
.B MPOL_DEFAULT
|
|
and
|
|
.I nodemask
|
|
specified a nonempty set;
|
|
or
|
|
.I mode
|
|
is
|
|
.B MPOL_BIND
|
|
or
|
|
.B MPOL_INTERLEAVE
|
|
and
|
|
.I nodemask
|
|
is empty.
|
|
Or,
|
|
.I maxnode
|
|
exceeds a kernel-imposed limit.
|
|
.\" As at 2.6.23, this limit is "a page worth of bits", e.g.,
|
|
.\" 8 * 4096 bits, assuming a 4kB page size.
|
|
Or,
|
|
.I nodemask
|
|
specifies one or more node IDs that are
|
|
greater than the maximum supported node ID.
|
|
Or, none of the node IDs specified by
|
|
.I nodemask
|
|
are on-line and allowed by the thread's current cpuset context,
|
|
or none of the specified nodes contain memory.
|
|
Or, the
|
|
.I mode
|
|
argument specified both
|
|
.B MPOL_F_STATIC_NODES
|
|
and
|
|
.BR MPOL_F_RELATIVE_NODES .
|
|
.TP
|
|
.B EIO
|
|
.B MPOL_MF_STRICT
|
|
was specified and an existing page was already on a node
|
|
that does not follow the policy;
|
|
or
|
|
.B MPOL_MF_MOVE
|
|
or
|
|
.B MPOL_MF_MOVE_ALL
|
|
was specified and the kernel was unable to move all existing
|
|
pages in the range.
|
|
.TP
|
|
.B ENOMEM
|
|
Insufficient kernel memory was available.
|
|
.TP
|
|
.B EPERM
|
|
The
|
|
.I flags
|
|
argument included the
|
|
.B MPOL_MF_MOVE_ALL
|
|
flag and the caller does not have the
|
|
.B CAP_SYS_NICE
|
|
privilege.
|
|
.\" ---------------------------------------------------------------
|
|
.SH VERSIONS
|
|
The
|
|
.BR mbind ()
|
|
system call was added to the Linux kernel in version 2.6.7.
|
|
.SH CONFORMING TO
|
|
This system call is Linux-specific.
|
|
.SH NOTES
|
|
Glibc does not provide a wrapper for this system call.
|
|
For information on library support, see
|
|
.BR numa (7).
|
|
.PP
|
|
NUMA policy is not supported on a memory-mapped file range
|
|
that was mapped with the
|
|
.B MAP_SHARED
|
|
flag.
|
|
.PP
|
|
The
|
|
.B MPOL_DEFAULT
|
|
mode can have different effects for
|
|
.BR mbind ()
|
|
and
|
|
.BR set_mempolicy (2).
|
|
When
|
|
.B MPOL_DEFAULT
|
|
is specified for
|
|
.BR set_mempolicy (2),
|
|
the thread's memory policy reverts to the system default policy
|
|
or local allocation.
|
|
When
|
|
.B MPOL_DEFAULT
|
|
is specified for a range of memory using
|
|
.BR mbind (),
|
|
any pages subsequently allocated for that range will use
|
|
the thread's memory policy, as set by
|
|
.BR set_mempolicy (2).
|
|
This effectively removes the explicit policy from the
|
|
specified range, "falling back" to a possibly nondefault
|
|
policy.
|
|
To select explicit "local allocation" for a memory range,
|
|
specify a
|
|
.I mode
|
|
of
|
|
.B MPOL_LOCAL
|
|
or
|
|
.B MPOL_PREFERRED
|
|
with an empty set of nodes.
|
|
This method will work for
|
|
.BR set_mempolicy (2),
|
|
as well.
|
|
.PP
|
|
Support for huge page policy was added with 2.6.16.
|
|
For interleave policy to be effective on huge page mappings the
|
|
policied memory needs to be tens of megabytes or larger.
|
|
.PP
|
|
Before Linux 5.7.
|
|
.\" commit dcf1763546d76c372f3136c8d6b2b6e77f140cf0
|
|
.B MPOL_MF_STRICT
|
|
was ignored on huge page mappings.
|
|
.PP
|
|
.B MPOL_MF_MOVE
|
|
and
|
|
.B MPOL_MF_MOVE_ALL
|
|
are available only on Linux 2.6.16 and later.
|
|
.SH SEE ALSO
|
|
.BR get_mempolicy (2),
|
|
.BR getcpu (2),
|
|
.BR mmap (2),
|
|
.BR set_mempolicy (2),
|
|
.BR shmat (2),
|
|
.BR shmget (2),
|
|
.BR numa (3),
|
|
.BR cpuset (7),
|
|
.BR numa (7),
|
|
.BR numactl (8)
|