New pages on NUMA memory allocation policy API.

This commit is contained in:
Michael Kerrisk 2006-02-08 03:20:44 +00:00
parent 9c2360f893
commit 314093c9cd
3 changed files with 525 additions and 0 deletions

164
man2/get_mempolicy.2 Normal file
View File

@ -0,0 +1,164 @@
.\" Copyright 2003,2004 Andi Kleen, SuSE Labs.
.\"
.\" Permission is granted to make and distribute verbatim copies of this
.\" manual provided the copyright notice and this permission notice are
.\" preserved on all copies.
.\"
.\" Permission is granted to copy and distribute modified versions of this
.\" manual under the conditions for verbatim copying, provided that the
.\" entire resulting derived work is distributed under the terms of a
.\" permission notice identical to this one.
.\"
.\" Since the Linux kernel and libraries are constantly changing, this
.\" manual page may be incorrect or out-of-date. The author(s) assume no
.\" responsibility for errors or omissions, or for damages resulting from
.\" the use of the information contained herein.
.\"
.\" Formatted or processed versions of this manual, if unaccompanied by
.\" the source, must acknowledge the copyright and authors of this work.
.\"
.\" 2006-02-03, mtk, substantial wording changes and other improvements
.\"
.TH GET_MEMPOLICY 2 "2006-02-07" "SuSE Labs" "Linux Programmer's Manual"
.SH SYNOPSIS
get_mempolicy \- Retrieve NUMA memory policy for a process
.SH NAME
.B "#include <numaif.h>"
.sp
.BI "int get_mempolicy(int *" policy ", unsigned long *" nodemask ,
.BI "unsigned long " maxnode ", unsigned long " addr ", unsigned long " flags )
.\" TBD rewrite this. it is confusing.
.SH DESCRIPTION
.BR get_mempolicy ()
retrieves the NUMA policy of the calling process or of a memory address,
depending on the setting of
.IR flags .
A NUMA machine has different
memory controllers with different distances to specific CPUs.
The memory policy defines in which node memory is allocated for
the process.
If
.IR flags
is specified as 0,
then information about the calling process's default policy
(as set by
.BR set_mempolicy (2))
is returned.
If
.I flags
specifies
.BR MPOL_F_ADDR ,
then information is returned about the policy governing the memory
address given in
.IR addr .
This policy may be different from the process's default policy if
.BR set_mempolicy (2)
has been used to establish a policy for the page containing
.IR addr .
If
.I policy
is not NULL, then it is used to return the policy.
If
.IR nodemask
is not NULL, then it is used to return the nodemask associated
with the policy.
.I maxnode
is the maximum bit number plus one that can be stored into
.IR nodemask .
The bit number is always rounded to a multiple of
.IR "unsigned long" .
.\"
.\" If
.\" .I flags
.\" specifies both
.\" .B MPOL_F_NODE
.\" and
.\" .BR MPOL_F_ADDR ,
.\" then
.\" .I policy
.\" instead returns the number of the node on which the address
.\" .I addr
.\" is allocated.
.\"
.\" If
.\" .I flags
.\" specifies
.\" .B MPOL_F_NODE
.\" but not
.\" .BR MPOL_F_ADDR ,
.\" and the process's current policy is
.\" .BR MPOL_INTERLEAVE ,
.\" then
.\" FIXME checkme: Andi's text below says that the info is returned in
.\" 'nodemask', not 'policy':
.\" .I policy
.\" instead returns the number of the next node that will be used for
.\" interleaving allocation.
.\" FIXME
.\" The other valid flag is
.\" .I MPOL_F_NODE.
.\" It is only valid when the policy is
.\" .I MPOL_INTERLEAVE.
.\" In this case not the interleave mask, but an unsigned long with the next
.\" node that would be used for interleaving is returned in
.\" .I nodemask.
.\" Other flag values are reserved.
For an overview of the possible policies see
.BR set_mempolicy (2).
.SH RETURN VALUE
On success,
.BR get_mempolicy ()
returns 0;
on error, \-1 is returned and
.I errno
is set to indicate the error.
.\" .SH ERRORS
.\" FIXME writeme
.\" .TP
.\" .B EINVAL
.\" .I nodemask
.\" is non-NULL, and
.\" .I maxnode
.\" is too small;
.\" or
.\" .I flags
.\" specified values other than
.\" .B MPOL_F_NODE
.\" or
.\" .BR MPOL_F_ADDR ;
.\" or
.\" .I flags
.\" specified
.\" .B MPOL_F_ADDR
.\" and
.\" .I addr
.\" is NULL.
.\" (And there are other EINVAL cases.)
.SH NOTES
This manual page is incomplete:
it does not document the details the
.BR MPOL_F_NODE
flag,
which modifies the operation of
.BR get_mempolicy ().
This is deliberate: this flag is not intended for application use,
and its operation may change or it may be removed altogether in
future kernel versions.
.B Do not use it.
.SH "VERSIONS AND LIBRARY SUPPORT"
See
.BR mbind (2).
.SH SEE ALSO
.BR mbind (2),
.BR set_mempolicy (2),
.BR numactl (8),
.BR numa (3)

233
man2/mbind.2 Normal file
View File

@ -0,0 +1,233 @@
.\" Copyright 2003,2004 Andi Kleen, SuSE Labs.
.\"
.\" Permission is granted to make and distribute verbatim copies of this
.\" manual provided the copyright notice and this permission notice are
.\" preserved on all copies.
.\"
.\" Permission is granted to copy and distribute modified versions of this
.\" manual under the conditions for verbatim copying, provided that the
.\" entire resulting derived work is distributed under the terms of a
.\" permission notice identical to this one.
.\"
.\" Since the Linux kernel and libraries are constantly changing, this
.\" manual page may be incorrect or out-of-date. The author(s) assume no
.\" responsibility for errors or omissions, or for damages resulting from
.\" the use of the information contained herein.
.\"
.\" Formatted or processed versions of this manual, if unaccompanied by
.\" the source, must acknowledge the copyright and authors of this work.
.\"
.\" 2006-02-03, mtk, substantial wording changes and other improvements
.\"
.TH MBIND 2 "2006-02-07" "SuSE Labs" "Linux Programmer's Manual"
.SH NAME
mbind \- Set memory policy for a memory range
.SH SYNOPSIS
.B "#include <numaif.h>"
.sp
.BI "int mbind(void *" start ", unsigned long " len ,
.BI "int " policy ", unsigned long *" nodemask ,
.BI "unsigned long " maxnode ", unsigned " flags );
.sp
.BI cc ... -lnuma
.SH DESCRIPTION
.BR mbind ()
sets the NUMA memory
.I policy
for the memory range starting with
.I start
and continuing for
.IR len
bytes.
The memory of a NUMA machine is divided into multiple nodes.
The memory policy defines in which node memory is allocated.
.BR mbind ()
only has an effect for new allocations; if the pages inside
the range have been already touched before setting the policy,
then the policy has no effect.
Available policies are
.BR MPOL_DEFAULT ,
.BR MPOL_BIND ,
.BR MPOL_INTERLEAVE ,
and
.BR MPOL_PREFERRED .
All policies except
.B MPOL_DEFAULT
require the caller to specify the nodes to which the policy applies in the
.I nodemask
parameter.
.I nodemask
is a bitmask of nodes containing up to
.I maxnode
bits.
The actual number of bytes transferred via this argument
is rounded up to the next multiple of
.IR "sizeof(unsigned long)" ,
but the kernel will only use bits up to
.IR maxnode .
A NULL argument means an empty set of nodes.
The
.B MPOL_DEFAULT
policy is the default and means to use the underlying process policy
(which can be modified with
.BR set_mempolicy (2)).
Unless the process policy has been changed this means to allocate
memory on the node of the CPU that triggered the allocation.
.I nodemask
should be specified as NULL.
The
.B MPOL_BIND
policy is a strict policy that restricts memory allocation to the
nodes specified in
.IR nodemask .
There won't be allocations on other nodes.
.B MPOL_INTERLEAVE
interleaves allocations to the nodes specified in
.IR nodemask .
This optimizes for bandwidth instead of latency.
To be effective the memory area should be fairly large,
at least 1MB or bigger.
.B MPOL_PREFERRED
sets the preferred node for allocation.
The kernel will try to allocate in this
node first and fall back to other nodes if the
preferred nodes is low on free memory.
Only the first node in the
.I nodemask
is used.
If no node is set in the mask, then the memory is allocated on
the node of the CPU that triggered the allocation allocation).
If
.B MPOL_MF_STRICT
is passed in
.IR flags
and
.I policy
is not
.BR MPOL_DEFAULT ,
then the call will fail with the error
.B EIO
if the existing pages in the mapping don't follow the policy.
In 2.6.16 or later the kernel will also try to move pages
to the requested node with this flag.
.\" FIXME 2.6.16-rc1 adds MPOL_MF_MOVE and MPOL_MF_MOVE_ALL.
.\" These will need to be documented
.SH RETURN VALUE
On success,
.BR mbind ()
returns 0;
on error, \-1 is returned and
.I errno
is set to indicate the error.
.SH ERRORS
.TP
.B EFAULT
There was a unmapped hole in the specified memory range
or a passed pointer was not valid.
.TP
.B EINVAL
An invalid value was specified for
.I flags
or
.IR mode ;
or
.I end
was less than
.IR start ;
or
.I policy
was
.B MPOL_DEFAULT
and
.I nodemask
pointed to a non-empty set;
or
.I policy
was
.B MPOL_BIND
or
.B MPOL_INTERLEAVE
and
.I nodemask
pointed to an empty set,
.TP
.B ENOMEM
System out of memory.
.TP
.B EIO
.B MPOL_MF_STRICT
was specified and an existing page was already on a node
that does not follow the policy.
.SH NOTES
NUMA policy is not supported on file mappings.
.SH BUGS
.B MPOL_DEFAULT
has different meanings for
.BR mbind (2)
and
.BR set_mempolicy (2).
To select "allocation on the node of the CPU that
triggered the allocation" (like
.BR set_mempolicy ()
.BR MPOL_DEFAULT )
when calling
.BR mbind (),
specify a
.I policy
of
.B MPOL_PREFERRED
with an empty
.IR nodemask .
.SH "VERSIONS AND LIBRARY SUPPORT"
The
.BR mbind (),
.BR get_mempolicy (),
and
.BR set_mempolicy ()
system calls were added to the Linux kernel with version 2.6.7.
They are only available on kernels compiled with
.BR CONFIG_NUMA .
Support for huge page policy was added with 2.6.16.
For interleave policy to be effective on huge page mappings the
policied memory needs to be tens of megabytes or larger.
These system calls should not be used directy.
Instead, the higher level interface provided by the
.BR numa (3)
functions in the
.I numactl
package is recommended.
The
.I numactl
package is available at
.IR ftp://ftp.suse.com/pub/people/ak/numa/ .
You can link with
.I -lnuma
to get system call definitions.
.I libnuma
is available in the
.I numactl
package.
This package also has the
.I numaif.h
header.
.SH SEE ALSO
.BR numa (3),
.BR numactl (8),
.BR set_mempolicy (2),
.BR get_mempolicy (2),
.BR mmap (2)

128
man2/set_mempolicy.2 Normal file
View File

@ -0,0 +1,128 @@
.\" Copyright 2003,2004 Andi Kleen, SuSE Labs.
.\"
.\" Permission is granted to make and distribute verbatim copies of this
.\" manual provided the copyright notice and this permission notice are
.\" preserved on all copies.
.\"
.\" Permission is granted to copy and distribute modified versions of this
.\" manual under the conditions for verbatim copying, provided that the
.\" entire resulting derived work is distributed under the terms of a
.\" permission notice identical to this one.
.\"
.\" Since the Linux kernel and libraries are constantly changing, this
.\" manual page may be incorrect or out-of-date. The author(s) assume no
.\" responsibility for errors or omissions, or for damages resulting from
.\" the use of the information contained herein.
.\"
.\" Formatted or processed versions of this manual, if unaccompanied by
.\" the source, must acknowledge the copyright and authors of this work.
.\"
.\" 2006-02-03, mtk, substantial wording changes and other improvements
.\"
.TH SET_MEMPOLICY 2 "2006-02-07" "SuSE Labs" "Linux Programmer's Manual"
.SH NAME
set_mempolicy \- Set default NUMA memory policy for a process and its children.
.SH SYNOPSIS
.B "#include <numaif.h>"
.sp
.BI "int set_mempolicy(int " policy ", unsigned long *" nodemask ,
.BI "unsigned long " maxnode )
.sp
.SH DESCRIPTION
.BR set_mempolicy ()
sets the NUMA memory policy of the calling process to
.IR policy .
A NUMA machine has different
memory controllers with different distances to specific CPUs.
The memory policy defines in which node memory is allocated for
the process.
This system call defines the default policy for the process;
in addition a policy can be set for specific memory ranges using
.BR mbind (2).
The policy is only applied when a new page is allocated
for the process. For anonymous memory this is when the page is first
touched by the application.
Available policies are
.BR MPOL_DEFAULT ,
.BR MPOL_BIND ,
.BR MPOL_INTERLEAVE ,
.BR MPOL_PREFERRED .
All policies except
.B MPOL_DEFAULT
require the caller to specify the nodes to which the policy applies in the
.I nodemask
parameter.
.I nodemask
is pointer to a bit field of nodes that contains up to
.I maxnode
bits. The bit field size is rounded to the next multiple of
.IR "sizeof(unsigned long)" ,
but the kernel will only use bits up to
.IR maxnode .
The
.B MPOL_DEFAULT
policy is the default and means to allocate memory locally,
i.e., on the node of the CPU that triggered the allocation.
.I nodemask
should be specified as NULL.
The
.B MPOL_BIND
policy is a strict policy that restricts memory allocation to the
nodes specified in
.IR nodemask .
There won't be allocations on other nodes.
.B MPOL_INTERLEAVE
interleaves allocations to the nodes specified in
.IR nodemask .
This optimizes for bandwidth instead of latency.
To be effective the memory area should be fairly large,
at least 1MB or bigger.
.B MPOL_PREFERRED
sets the preferred node for allocation.
The kernel will try to allocate in this
node first and fall back to other nodes if the preferred node is low on free
memory. Only the first node in the
.I nodemask
is used.
If no node is set in the mask, then the memory is allocated on
the node of the CPU that triggered the allocation allocation (like
.BR MPOL_DEFAULT ).
The memory policy is inherited by child processes created using
.BR fork (2)
or
.BR clone (2).
.SH NOTES
Process policy is not remembered if the page is swapped out.
.SH RETURN VALUE
On success,
.BR set_mempolicy ()
returns 0;
on error, \-1 is returned and
.I errno
is set to indicate the error.
.\" .SH ERRORS
.\" FIXME writeme
.\" .TP
.\" .B EINVAL
.\" .I mode is invalid.
.SH "VERSIONS AND LIBRARY SUPPORT"
See
.BR mbind (2).
.SH SEE ALSO
.BR mbind (2),
.BR get_mempolicy (2),
.BR numactl (8),
.BR numa (3)