2006-02-08 03:20:44 +00:00
|
|
|
.\" Copyright 2003,2004 Andi Kleen, SuSE Labs.
|
|
|
|
.\"
|
|
|
|
.\" Permission is granted to make and distribute verbatim copies of this
|
|
|
|
.\" manual provided the copyright notice and this permission notice are
|
|
|
|
.\" preserved on all copies.
|
|
|
|
.\"
|
|
|
|
.\" Permission is granted to copy and distribute modified versions of this
|
|
|
|
.\" manual under the conditions for verbatim copying, provided that the
|
|
|
|
.\" entire resulting derived work is distributed under the terms of a
|
|
|
|
.\" permission notice identical to this one.
|
|
|
|
.\"
|
|
|
|
.\" Since the Linux kernel and libraries are constantly changing, this
|
|
|
|
.\" manual page may be incorrect or out-of-date. The author(s) assume no
|
|
|
|
.\" responsibility for errors or omissions, or for damages resulting from
|
|
|
|
.\" the use of the information contained herein.
|
|
|
|
.\"
|
|
|
|
.\" Formatted or processed versions of this manual, if unaccompanied by
|
|
|
|
.\" the source, must acknowledge the copyright and authors of this work.
|
|
|
|
.\"
|
|
|
|
.\" 2006-02-03, mtk, substantial wording changes and other improvements
|
|
|
|
.\"
|
|
|
|
.TH MBIND 2 "2006-02-07" "SuSE Labs" "Linux Programmer's Manual"
|
|
|
|
.SH NAME
|
|
|
|
mbind \- Set memory policy for a memory range
|
|
|
|
.SH SYNOPSIS
|
|
|
|
.B "#include <numaif.h>"
|
|
|
|
.sp
|
|
|
|
.BI "int mbind(void *" start ", unsigned long " len ,
|
|
|
|
.BI "int " policy ", unsigned long *" nodemask ,
|
|
|
|
.BI "unsigned long " maxnode ", unsigned " flags );
|
|
|
|
.sp
|
|
|
|
.BI cc ... -lnuma
|
|
|
|
.SH DESCRIPTION
|
|
|
|
.BR mbind ()
|
|
|
|
sets the NUMA memory
|
|
|
|
.I policy
|
|
|
|
for the memory range starting with
|
|
|
|
.I start
|
|
|
|
and continuing for
|
|
|
|
.IR len
|
|
|
|
bytes.
|
|
|
|
The memory of a NUMA machine is divided into multiple nodes.
|
|
|
|
The memory policy defines in which node memory is allocated.
|
|
|
|
.BR mbind ()
|
|
|
|
only has an effect for new allocations; if the pages inside
|
|
|
|
the range have been already touched before setting the policy,
|
|
|
|
then the policy has no effect.
|
|
|
|
|
|
|
|
Available policies are
|
|
|
|
.BR MPOL_DEFAULT ,
|
|
|
|
.BR MPOL_BIND ,
|
|
|
|
.BR MPOL_INTERLEAVE ,
|
|
|
|
and
|
|
|
|
.BR MPOL_PREFERRED .
|
|
|
|
All policies except
|
|
|
|
.B MPOL_DEFAULT
|
|
|
|
require the caller to specify the nodes to which the policy applies in the
|
|
|
|
.I nodemask
|
|
|
|
parameter.
|
|
|
|
.I nodemask
|
|
|
|
is a bitmask of nodes containing up to
|
|
|
|
.I maxnode
|
|
|
|
bits.
|
|
|
|
The actual number of bytes transferred via this argument
|
|
|
|
is rounded up to the next multiple of
|
|
|
|
.IR "sizeof(unsigned long)" ,
|
|
|
|
but the kernel will only use bits up to
|
|
|
|
.IR maxnode .
|
|
|
|
A NULL argument means an empty set of nodes.
|
|
|
|
|
|
|
|
The
|
|
|
|
.B MPOL_DEFAULT
|
|
|
|
policy is the default and means to use the underlying process policy
|
|
|
|
(which can be modified with
|
|
|
|
.BR set_mempolicy (2)).
|
|
|
|
Unless the process policy has been changed this means to allocate
|
|
|
|
memory on the node of the CPU that triggered the allocation.
|
|
|
|
.I nodemask
|
|
|
|
should be specified as NULL.
|
|
|
|
|
|
|
|
The
|
|
|
|
.B MPOL_BIND
|
|
|
|
policy is a strict policy that restricts memory allocation to the
|
|
|
|
nodes specified in
|
|
|
|
.IR nodemask .
|
|
|
|
There won't be allocations on other nodes.
|
|
|
|
|
|
|
|
.B MPOL_INTERLEAVE
|
|
|
|
interleaves allocations to the nodes specified in
|
|
|
|
.IR nodemask .
|
|
|
|
This optimizes for bandwidth instead of latency.
|
|
|
|
To be effective the memory area should be fairly large,
|
|
|
|
at least 1MB or bigger.
|
|
|
|
|
|
|
|
.B MPOL_PREFERRED
|
|
|
|
sets the preferred node for allocation.
|
|
|
|
The kernel will try to allocate in this
|
|
|
|
node first and fall back to other nodes if the
|
|
|
|
preferred nodes is low on free memory.
|
|
|
|
Only the first node in the
|
|
|
|
.I nodemask
|
|
|
|
is used.
|
|
|
|
If no node is set in the mask, then the memory is allocated on
|
|
|
|
the node of the CPU that triggered the allocation allocation).
|
|
|
|
|
|
|
|
If
|
|
|
|
.B MPOL_MF_STRICT
|
|
|
|
is passed in
|
|
|
|
.IR flags
|
|
|
|
and
|
|
|
|
.I policy
|
|
|
|
is not
|
|
|
|
.BR MPOL_DEFAULT ,
|
|
|
|
then the call will fail with the error
|
|
|
|
.B EIO
|
|
|
|
if the existing pages in the mapping don't follow the policy.
|
|
|
|
In 2.6.16 or later the kernel will also try to move pages
|
|
|
|
to the requested node with this flag.
|
|
|
|
|
2006-03-15 10:26:29 +00:00
|
|
|
If
|
|
|
|
.B MPOL_MF_MOVE
|
|
|
|
is passed in
|
|
|
|
.IR flags ,
|
|
|
|
then an attempt will be made to
|
|
|
|
move all the pages in the mapping so that they follow the policy.
|
|
|
|
Pages that are shared with other processes are not moved.
|
|
|
|
If
|
|
|
|
.B MPOL_MF_STRICT
|
|
|
|
is also specified, then the call will fail with the error
|
|
|
|
.B EIO
|
|
|
|
if some pages could not be moved.
|
|
|
|
|
|
|
|
If
|
|
|
|
.B MPOL_MF_MOVE_ALL
|
|
|
|
is passed in
|
|
|
|
.IR flags ,
|
|
|
|
then all pages in the mapping will be moved regardless of whether
|
|
|
|
other processes use the pages.
|
|
|
|
The calling process must be privileged
|
2006-03-21 20:56:33 +00:00
|
|
|
.RB ( CAP_SYS_NICE )
|
2006-03-15 10:26:29 +00:00
|
|
|
to use this flag.
|
|
|
|
If
|
|
|
|
.B MPOL_MF_STRICT
|
|
|
|
is also specified, then the call will fail with the error
|
|
|
|
.B EIO
|
|
|
|
if some pages could not be moved.
|
2006-02-08 03:20:44 +00:00
|
|
|
.SH RETURN VALUE
|
|
|
|
On success,
|
|
|
|
.BR mbind ()
|
|
|
|
returns 0;
|
|
|
|
on error, \-1 is returned and
|
|
|
|
.I errno
|
|
|
|
is set to indicate the error.
|
|
|
|
.SH ERRORS
|
|
|
|
.TP
|
|
|
|
.B EFAULT
|
|
|
|
There was a unmapped hole in the specified memory range
|
|
|
|
or a passed pointer was not valid.
|
|
|
|
.TP
|
|
|
|
.B EINVAL
|
|
|
|
An invalid value was specified for
|
|
|
|
.I flags
|
|
|
|
or
|
|
|
|
.IR mode ;
|
|
|
|
or
|
|
|
|
.I end
|
|
|
|
was less than
|
|
|
|
.IR start ;
|
|
|
|
or
|
|
|
|
.I policy
|
|
|
|
was
|
|
|
|
.B MPOL_DEFAULT
|
|
|
|
and
|
|
|
|
.I nodemask
|
|
|
|
pointed to a non-empty set;
|
|
|
|
or
|
|
|
|
.I policy
|
|
|
|
was
|
|
|
|
.B MPOL_BIND
|
|
|
|
or
|
|
|
|
.B MPOL_INTERLEAVE
|
|
|
|
and
|
|
|
|
.I nodemask
|
|
|
|
pointed to an empty set,
|
|
|
|
.TP
|
|
|
|
.B ENOMEM
|
|
|
|
System out of memory.
|
|
|
|
.TP
|
|
|
|
.B EIO
|
|
|
|
.B MPOL_MF_STRICT
|
|
|
|
was specified and an existing page was already on a node
|
|
|
|
that does not follow the policy.
|
|
|
|
.SH NOTES
|
|
|
|
NUMA policy is not supported on file mappings.
|
|
|
|
|
2006-03-15 10:26:29 +00:00
|
|
|
.B MPOL_MF_STRICT
|
|
|
|
is ignored on huge page mappings right now.
|
|
|
|
|
2006-02-08 19:36:09 +00:00
|
|
|
It is unfortunate that the same flag,
|
|
|
|
.BR MPOL_DEFAULT ,
|
|
|
|
has different effects for
|
2006-02-08 03:20:44 +00:00
|
|
|
.BR mbind (2)
|
|
|
|
and
|
|
|
|
.BR set_mempolicy (2).
|
|
|
|
To select "allocation on the node of the CPU that
|
|
|
|
triggered the allocation" (like
|
|
|
|
.BR set_mempolicy ()
|
|
|
|
.BR MPOL_DEFAULT )
|
|
|
|
when calling
|
|
|
|
.BR mbind (),
|
|
|
|
specify a
|
|
|
|
.I policy
|
|
|
|
of
|
|
|
|
.B MPOL_PREFERRED
|
|
|
|
with an empty
|
|
|
|
.IR nodemask .
|
|
|
|
.SH "VERSIONS AND LIBRARY SUPPORT"
|
|
|
|
The
|
|
|
|
.BR mbind (),
|
|
|
|
.BR get_mempolicy (),
|
|
|
|
and
|
|
|
|
.BR set_mempolicy ()
|
|
|
|
system calls were added to the Linux kernel with version 2.6.7.
|
|
|
|
They are only available on kernels compiled with
|
|
|
|
.BR CONFIG_NUMA .
|
|
|
|
|
|
|
|
Support for huge page policy was added with 2.6.16.
|
|
|
|
For interleave policy to be effective on huge page mappings the
|
|
|
|
policied memory needs to be tens of megabytes or larger.
|
|
|
|
|
2006-03-15 10:26:29 +00:00
|
|
|
.B MPOL_MF_MOVE
|
|
|
|
and
|
|
|
|
.B MPOL_MF_MOVE_ALL
|
|
|
|
are only available on Linux 2.6.16 and later.
|
|
|
|
|
2006-06-03 01:26:30 +00:00
|
|
|
These system calls should not be used directly.
|
2006-02-08 03:20:44 +00:00
|
|
|
Instead, the higher level interface provided by the
|
2006-03-21 10:59:30 +00:00
|
|
|
.BR numa (3)
|
2006-02-08 03:20:44 +00:00
|
|
|
functions in the
|
|
|
|
.I numactl
|
|
|
|
package is recommended.
|
|
|
|
The
|
|
|
|
.I numactl
|
|
|
|
package is available at
|
|
|
|
.IR ftp://ftp.suse.com/pub/people/ak/numa/ .
|
|
|
|
|
|
|
|
You can link with
|
|
|
|
.I -lnuma
|
|
|
|
to get system call definitions.
|
|
|
|
.I libnuma
|
|
|
|
is available in the
|
|
|
|
.I numactl
|
|
|
|
package.
|
|
|
|
This package also has the
|
|
|
|
.I numaif.h
|
|
|
|
header.
|
2006-08-04 09:41:28 +00:00
|
|
|
.SH CONFORMING TO
|
|
|
|
This system call is Linux specific.
|
2006-02-08 03:20:44 +00:00
|
|
|
.SH SEE ALSO
|
|
|
|
.BR numa (3),
|
|
|
|
.BR numactl (8),
|
|
|
|
.BR set_mempolicy (2),
|
|
|
|
.BR get_mempolicy (2),
|
|
|
|
.BR mmap (2)
|