man-pages/man2/mbind.2

267 lines
6.4 KiB
Groff

.\" Copyright 2003,2004 Andi Kleen, SuSE Labs.
.\"
.\" Permission is granted to make and distribute verbatim copies of this
.\" manual provided the copyright notice and this permission notice are
.\" preserved on all copies.
.\"
.\" Permission is granted to copy and distribute modified versions of this
.\" manual under the conditions for verbatim copying, provided that the
.\" entire resulting derived work is distributed under the terms of a
.\" permission notice identical to this one.
.\"
.\" Since the Linux kernel and libraries are constantly changing, this
.\" manual page may be incorrect or out-of-date. The author(s) assume no
.\" responsibility for errors or omissions, or for damages resulting from
.\" the use of the information contained herein.
.\"
.\" Formatted or processed versions of this manual, if unaccompanied by
.\" the source, must acknowledge the copyright and authors of this work.
.\"
.\" 2006-02-03, mtk, substantial wording changes and other improvements
.\"
.TH MBIND 2 "2006-02-07" "SuSE Labs" "Linux Programmer's Manual"
.SH NAME
mbind \- Set memory policy for a memory range
.SH SYNOPSIS
.nf
.B "#include <numaif.h>"
.sp
.BI "int mbind(void *" start ", unsigned long " len ", int " policy ,
.BI " unsigned long *" nodemask ", unsigned long " maxnode ,
.BI " unsigned " flags );
.sp
.BI "cc ... -lnuma"
.fi
.SH DESCRIPTION
.BR mbind ()
sets the NUMA memory
.I policy
for the memory range starting with
.I start
and continuing for
.IR len
bytes.
The memory of a NUMA machine is divided into multiple nodes.
The memory policy defines in which node memory is allocated.
.BR mbind ()
only has an effect for new allocations; if the pages inside
the range have been already touched before setting the policy,
then the policy has no effect.
Available policies are
.BR MPOL_DEFAULT ,
.BR MPOL_BIND ,
.BR MPOL_INTERLEAVE ,
and
.BR MPOL_PREFERRED .
All policies except
.B MPOL_DEFAULT
require the caller to specify the nodes to which the policy applies in the
.I nodemask
parameter.
.I nodemask
is a bitmask of nodes containing up to
.I maxnode
bits.
The actual number of bytes transferred via this argument
is rounded up to the next multiple of
.IR "sizeof(unsigned long)" ,
but the kernel will only use bits up to
.IR maxnode .
A NULL argument means an empty set of nodes.
The
.B MPOL_DEFAULT
policy is the default and means to use the underlying process policy
(which can be modified with
.BR set_mempolicy (2)).
Unless the process policy has been changed this means to allocate
memory on the node of the CPU that triggered the allocation.
.I nodemask
should be specified as NULL.
The
.B MPOL_BIND
policy is a strict policy that restricts memory allocation to the
nodes specified in
.IR nodemask .
There won't be allocations on other nodes.
.B MPOL_INTERLEAVE
interleaves allocations to the nodes specified in
.IR nodemask .
This optimizes for bandwidth instead of latency.
To be effective the memory area should be fairly large,
at least 1MB or bigger.
.B MPOL_PREFERRED
sets the preferred node for allocation.
The kernel will try to allocate in this
node first and fall back to other nodes if the
preferred nodes is low on free memory.
Only the first node in the
.I nodemask
is used.
If no node is set in the mask, then the memory is allocated on
the node of the CPU that triggered the allocation allocation).
If
.B MPOL_MF_STRICT
is passed in
.IR flags
and
.I policy
is not
.BR MPOL_DEFAULT ,
then the call will fail with the error
.B EIO
if the existing pages in the mapping don't follow the policy.
In 2.6.16 or later the kernel will also try to move pages
to the requested node with this flag.
If
.B MPOL_MF_MOVE
is passed in
.IR flags ,
then an attempt will be made to
move all the pages in the mapping so that they follow the policy.
Pages that are shared with other processes are not moved.
If
.B MPOL_MF_STRICT
is also specified, then the call will fail with the error
.B EIO
if some pages could not be moved.
If
.B MPOL_MF_MOVE_ALL
is passed in
.IR flags ,
then all pages in the mapping will be moved regardless of whether
other processes use the pages.
The calling process must be privileged
.RB ( CAP_SYS_NICE )
to use this flag.
If
.B MPOL_MF_STRICT
is also specified, then the call will fail with the error
.B EIO
if some pages could not be moved.
.SH RETURN VALUE
On success,
.BR mbind ()
returns 0;
on error, \-1 is returned and
.I errno
is set to indicate the error.
.SH ERRORS
.TP
.B EFAULT
There was a unmapped hole in the specified memory range
or a passed pointer was not valid.
.TP
.B EINVAL
An invalid value was specified for
.I flags
or
.IR mode ;
or
.I start + len
was less than
.IR start ;
or
.I policy
was
.B MPOL_DEFAULT
and
.I nodemask
pointed to a non-empty set;
or
.I policy
was
.B MPOL_BIND
or
.B MPOL_INTERLEAVE
and
.I nodemask
pointed to an empty set,
.TP
.B ENOMEM
System out of memory.
.TP
.B EIO
.B MPOL_MF_STRICT
was specified and an existing page was already on a node
that does not follow the policy.
.SH NOTES
NUMA policy is not supported on file mappings.
.B MPOL_MF_STRICT
is ignored on huge page mappings right now.
It is unfortunate that the same flag,
.BR MPOL_DEFAULT ,
has different effects for
.BR mbind (2)
and
.BR set_mempolicy (2).
To select "allocation on the node of the CPU that
triggered the allocation" (like
.BR set_mempolicy ()
.BR MPOL_DEFAULT )
when calling
.BR mbind (),
specify a
.I policy
of
.B MPOL_PREFERRED
with an empty
.IR nodemask .
.SH "VERSIONS AND LIBRARY SUPPORT"
The
.BR mbind (),
.BR get_mempolicy (),
and
.BR set_mempolicy ()
system calls were added to the Linux kernel with version 2.6.7.
They are only available on kernels compiled with
.BR CONFIG_NUMA .
Support for huge page policy was added with 2.6.16.
For interleave policy to be effective on huge page mappings the
policied memory needs to be tens of megabytes or larger.
.B MPOL_MF_MOVE
and
.B MPOL_MF_MOVE_ALL
are only available on Linux 2.6.16 and later.
These system calls should not be used directly.
Instead, the higher level interface provided by the
.BR numa (3)
functions in the
.I numactl
package is recommended.
The
.I numactl
package is available at
.IR ftp://ftp.suse.com/pub/people/ak/numa/ .
You can link with
.I -lnuma
to get system call definitions.
.I libnuma
is available in the
.I numactl
package.
This package also has the
.I numaif.h
header.
.SH CONFORMING TO
This system call is Linux specific.
.SH SEE ALSO
.BR numa (3),
.BR numactl (8),
.BR set_mempolicy (2),
.BR get_mempolicy (2),
.BR mmap (2)