mirror of https://github.com/mkerrisk/man-pages
New pages on NUMA memory allocation policy API.
This commit is contained in:
parent
9c2360f893
commit
314093c9cd
|
@ -0,0 +1,164 @@
|
|||
.\" Copyright 2003,2004 Andi Kleen, SuSE Labs.
|
||||
.\"
|
||||
.\" Permission is granted to make and distribute verbatim copies of this
|
||||
.\" manual provided the copyright notice and this permission notice are
|
||||
.\" preserved on all copies.
|
||||
.\"
|
||||
.\" Permission is granted to copy and distribute modified versions of this
|
||||
.\" manual under the conditions for verbatim copying, provided that the
|
||||
.\" entire resulting derived work is distributed under the terms of a
|
||||
.\" permission notice identical to this one.
|
||||
.\"
|
||||
.\" Since the Linux kernel and libraries are constantly changing, this
|
||||
.\" manual page may be incorrect or out-of-date. The author(s) assume no
|
||||
.\" responsibility for errors or omissions, or for damages resulting from
|
||||
.\" the use of the information contained herein.
|
||||
.\"
|
||||
.\" Formatted or processed versions of this manual, if unaccompanied by
|
||||
.\" the source, must acknowledge the copyright and authors of this work.
|
||||
.\"
|
||||
.\" 2006-02-03, mtk, substantial wording changes and other improvements
|
||||
.\"
|
||||
.TH GET_MEMPOLICY 2 "2006-02-07" "SuSE Labs" "Linux Programmer's Manual"
|
||||
.SH SYNOPSIS
|
||||
get_mempolicy \- Retrieve NUMA memory policy for a process
|
||||
.SH NAME
|
||||
.B "#include <numaif.h>"
|
||||
.sp
|
||||
.BI "int get_mempolicy(int *" policy ", unsigned long *" nodemask ,
|
||||
.BI "unsigned long " maxnode ", unsigned long " addr ", unsigned long " flags )
|
||||
.\" TBD rewrite this. it is confusing.
|
||||
.SH DESCRIPTION
|
||||
.BR get_mempolicy ()
|
||||
retrieves the NUMA policy of the calling process or of a memory address,
|
||||
depending on the setting of
|
||||
.IR flags .
|
||||
|
||||
A NUMA machine has different
|
||||
memory controllers with different distances to specific CPUs.
|
||||
The memory policy defines in which node memory is allocated for
|
||||
the process.
|
||||
|
||||
If
|
||||
.IR flags
|
||||
is specified as 0,
|
||||
then information about the calling process's default policy
|
||||
(as set by
|
||||
.BR set_mempolicy (2))
|
||||
is returned.
|
||||
|
||||
If
|
||||
.I flags
|
||||
specifies
|
||||
.BR MPOL_F_ADDR ,
|
||||
then information is returned about the policy governing the memory
|
||||
address given in
|
||||
.IR addr .
|
||||
This policy may be different from the process's default policy if
|
||||
.BR set_mempolicy (2)
|
||||
has been used to establish a policy for the page containing
|
||||
.IR addr .
|
||||
|
||||
If
|
||||
.I policy
|
||||
is not NULL, then it is used to return the policy.
|
||||
If
|
||||
.IR nodemask
|
||||
is not NULL, then it is used to return the nodemask associated
|
||||
with the policy.
|
||||
.I maxnode
|
||||
is the maximum bit number plus one that can be stored into
|
||||
.IR nodemask .
|
||||
The bit number is always rounded to a multiple of
|
||||
.IR "unsigned long" .
|
||||
.\"
|
||||
.\" If
|
||||
.\" .I flags
|
||||
.\" specifies both
|
||||
.\" .B MPOL_F_NODE
|
||||
.\" and
|
||||
.\" .BR MPOL_F_ADDR ,
|
||||
.\" then
|
||||
.\" .I policy
|
||||
.\" instead returns the number of the node on which the address
|
||||
.\" .I addr
|
||||
.\" is allocated.
|
||||
.\"
|
||||
.\" If
|
||||
.\" .I flags
|
||||
.\" specifies
|
||||
.\" .B MPOL_F_NODE
|
||||
.\" but not
|
||||
.\" .BR MPOL_F_ADDR ,
|
||||
.\" and the process's current policy is
|
||||
.\" .BR MPOL_INTERLEAVE ,
|
||||
.\" then
|
||||
.\" FIXME checkme: Andi's text below says that the info is returned in
|
||||
.\" 'nodemask', not 'policy':
|
||||
.\" .I policy
|
||||
.\" instead returns the number of the next node that will be used for
|
||||
.\" interleaving allocation.
|
||||
.\" FIXME
|
||||
.\" The other valid flag is
|
||||
.\" .I MPOL_F_NODE.
|
||||
.\" It is only valid when the policy is
|
||||
.\" .I MPOL_INTERLEAVE.
|
||||
.\" In this case not the interleave mask, but an unsigned long with the next
|
||||
.\" node that would be used for interleaving is returned in
|
||||
.\" .I nodemask.
|
||||
.\" Other flag values are reserved.
|
||||
|
||||
For an overview of the possible policies see
|
||||
.BR set_mempolicy (2).
|
||||
|
||||
.SH RETURN VALUE
|
||||
On success,
|
||||
.BR get_mempolicy ()
|
||||
returns 0;
|
||||
on error, \-1 is returned and
|
||||
.I errno
|
||||
is set to indicate the error.
|
||||
|
||||
.\" .SH ERRORS
|
||||
.\" FIXME writeme
|
||||
.\" .TP
|
||||
.\" .B EINVAL
|
||||
.\" .I nodemask
|
||||
.\" is non-NULL, and
|
||||
.\" .I maxnode
|
||||
.\" is too small;
|
||||
.\" or
|
||||
.\" .I flags
|
||||
.\" specified values other than
|
||||
.\" .B MPOL_F_NODE
|
||||
.\" or
|
||||
.\" .BR MPOL_F_ADDR ;
|
||||
.\" or
|
||||
.\" .I flags
|
||||
.\" specified
|
||||
.\" .B MPOL_F_ADDR
|
||||
.\" and
|
||||
.\" .I addr
|
||||
.\" is NULL.
|
||||
.\" (And there are other EINVAL cases.)
|
||||
.SH NOTES
|
||||
This manual page is incomplete:
|
||||
it does not document the details the
|
||||
.BR MPOL_F_NODE
|
||||
flag,
|
||||
which modifies the operation of
|
||||
.BR get_mempolicy ().
|
||||
This is deliberate: this flag is not intended for application use,
|
||||
and its operation may change or it may be removed altogether in
|
||||
future kernel versions.
|
||||
.B Do not use it.
|
||||
|
||||
.SH "VERSIONS AND LIBRARY SUPPORT"
|
||||
See
|
||||
.BR mbind (2).
|
||||
|
||||
.SH SEE ALSO
|
||||
.BR mbind (2),
|
||||
.BR set_mempolicy (2),
|
||||
.BR numactl (8),
|
||||
.BR numa (3)
|
|
@ -0,0 +1,233 @@
|
|||
.\" Copyright 2003,2004 Andi Kleen, SuSE Labs.
|
||||
.\"
|
||||
.\" Permission is granted to make and distribute verbatim copies of this
|
||||
.\" manual provided the copyright notice and this permission notice are
|
||||
.\" preserved on all copies.
|
||||
.\"
|
||||
.\" Permission is granted to copy and distribute modified versions of this
|
||||
.\" manual under the conditions for verbatim copying, provided that the
|
||||
.\" entire resulting derived work is distributed under the terms of a
|
||||
.\" permission notice identical to this one.
|
||||
.\"
|
||||
.\" Since the Linux kernel and libraries are constantly changing, this
|
||||
.\" manual page may be incorrect or out-of-date. The author(s) assume no
|
||||
.\" responsibility for errors or omissions, or for damages resulting from
|
||||
.\" the use of the information contained herein.
|
||||
.\"
|
||||
.\" Formatted or processed versions of this manual, if unaccompanied by
|
||||
.\" the source, must acknowledge the copyright and authors of this work.
|
||||
.\"
|
||||
.\" 2006-02-03, mtk, substantial wording changes and other improvements
|
||||
.\"
|
||||
.TH MBIND 2 "2006-02-07" "SuSE Labs" "Linux Programmer's Manual"
|
||||
.SH NAME
|
||||
mbind \- Set memory policy for a memory range
|
||||
.SH SYNOPSIS
|
||||
.B "#include <numaif.h>"
|
||||
.sp
|
||||
.BI "int mbind(void *" start ", unsigned long " len ,
|
||||
.BI "int " policy ", unsigned long *" nodemask ,
|
||||
.BI "unsigned long " maxnode ", unsigned " flags );
|
||||
.sp
|
||||
.BI cc ... -lnuma
|
||||
.SH DESCRIPTION
|
||||
.BR mbind ()
|
||||
sets the NUMA memory
|
||||
.I policy
|
||||
for the memory range starting with
|
||||
.I start
|
||||
and continuing for
|
||||
.IR len
|
||||
bytes.
|
||||
The memory of a NUMA machine is divided into multiple nodes.
|
||||
The memory policy defines in which node memory is allocated.
|
||||
.BR mbind ()
|
||||
only has an effect for new allocations; if the pages inside
|
||||
the range have been already touched before setting the policy,
|
||||
then the policy has no effect.
|
||||
|
||||
Available policies are
|
||||
.BR MPOL_DEFAULT ,
|
||||
.BR MPOL_BIND ,
|
||||
.BR MPOL_INTERLEAVE ,
|
||||
and
|
||||
.BR MPOL_PREFERRED .
|
||||
All policies except
|
||||
.B MPOL_DEFAULT
|
||||
require the caller to specify the nodes to which the policy applies in the
|
||||
.I nodemask
|
||||
parameter.
|
||||
.I nodemask
|
||||
is a bitmask of nodes containing up to
|
||||
.I maxnode
|
||||
bits.
|
||||
The actual number of bytes transferred via this argument
|
||||
is rounded up to the next multiple of
|
||||
.IR "sizeof(unsigned long)" ,
|
||||
but the kernel will only use bits up to
|
||||
.IR maxnode .
|
||||
A NULL argument means an empty set of nodes.
|
||||
|
||||
The
|
||||
.B MPOL_DEFAULT
|
||||
policy is the default and means to use the underlying process policy
|
||||
(which can be modified with
|
||||
.BR set_mempolicy (2)).
|
||||
Unless the process policy has been changed this means to allocate
|
||||
memory on the node of the CPU that triggered the allocation.
|
||||
.I nodemask
|
||||
should be specified as NULL.
|
||||
|
||||
The
|
||||
.B MPOL_BIND
|
||||
policy is a strict policy that restricts memory allocation to the
|
||||
nodes specified in
|
||||
.IR nodemask .
|
||||
There won't be allocations on other nodes.
|
||||
|
||||
.B MPOL_INTERLEAVE
|
||||
interleaves allocations to the nodes specified in
|
||||
.IR nodemask .
|
||||
This optimizes for bandwidth instead of latency.
|
||||
To be effective the memory area should be fairly large,
|
||||
at least 1MB or bigger.
|
||||
|
||||
.B MPOL_PREFERRED
|
||||
sets the preferred node for allocation.
|
||||
The kernel will try to allocate in this
|
||||
node first and fall back to other nodes if the
|
||||
preferred nodes is low on free memory.
|
||||
Only the first node in the
|
||||
.I nodemask
|
||||
is used.
|
||||
If no node is set in the mask, then the memory is allocated on
|
||||
the node of the CPU that triggered the allocation allocation).
|
||||
|
||||
If
|
||||
.B MPOL_MF_STRICT
|
||||
is passed in
|
||||
.IR flags
|
||||
and
|
||||
.I policy
|
||||
is not
|
||||
.BR MPOL_DEFAULT ,
|
||||
then the call will fail with the error
|
||||
.B EIO
|
||||
if the existing pages in the mapping don't follow the policy.
|
||||
In 2.6.16 or later the kernel will also try to move pages
|
||||
to the requested node with this flag.
|
||||
|
||||
.\" FIXME 2.6.16-rc1 adds MPOL_MF_MOVE and MPOL_MF_MOVE_ALL.
|
||||
.\" These will need to be documented
|
||||
.SH RETURN VALUE
|
||||
On success,
|
||||
.BR mbind ()
|
||||
returns 0;
|
||||
on error, \-1 is returned and
|
||||
.I errno
|
||||
is set to indicate the error.
|
||||
|
||||
.SH ERRORS
|
||||
.TP
|
||||
.B EFAULT
|
||||
There was a unmapped hole in the specified memory range
|
||||
or a passed pointer was not valid.
|
||||
.TP
|
||||
.B EINVAL
|
||||
An invalid value was specified for
|
||||
.I flags
|
||||
or
|
||||
.IR mode ;
|
||||
or
|
||||
.I end
|
||||
was less than
|
||||
.IR start ;
|
||||
or
|
||||
.I policy
|
||||
was
|
||||
.B MPOL_DEFAULT
|
||||
and
|
||||
.I nodemask
|
||||
pointed to a non-empty set;
|
||||
or
|
||||
.I policy
|
||||
was
|
||||
.B MPOL_BIND
|
||||
or
|
||||
.B MPOL_INTERLEAVE
|
||||
and
|
||||
.I nodemask
|
||||
pointed to an empty set,
|
||||
.TP
|
||||
.B ENOMEM
|
||||
System out of memory.
|
||||
.TP
|
||||
.B EIO
|
||||
.B MPOL_MF_STRICT
|
||||
was specified and an existing page was already on a node
|
||||
that does not follow the policy.
|
||||
|
||||
.SH NOTES
|
||||
NUMA policy is not supported on file mappings.
|
||||
|
||||
.SH BUGS
|
||||
.B MPOL_DEFAULT
|
||||
has different meanings for
|
||||
.BR mbind (2)
|
||||
and
|
||||
.BR set_mempolicy (2).
|
||||
To select "allocation on the node of the CPU that
|
||||
triggered the allocation" (like
|
||||
.BR set_mempolicy ()
|
||||
.BR MPOL_DEFAULT )
|
||||
when calling
|
||||
.BR mbind (),
|
||||
specify a
|
||||
.I policy
|
||||
of
|
||||
.B MPOL_PREFERRED
|
||||
with an empty
|
||||
.IR nodemask .
|
||||
|
||||
.SH "VERSIONS AND LIBRARY SUPPORT"
|
||||
The
|
||||
.BR mbind (),
|
||||
.BR get_mempolicy (),
|
||||
and
|
||||
.BR set_mempolicy ()
|
||||
system calls were added to the Linux kernel with version 2.6.7.
|
||||
They are only available on kernels compiled with
|
||||
.BR CONFIG_NUMA .
|
||||
|
||||
Support for huge page policy was added with 2.6.16.
|
||||
For interleave policy to be effective on huge page mappings the
|
||||
policied memory needs to be tens of megabytes or larger.
|
||||
|
||||
These system calls should not be used directy.
|
||||
Instead, the higher level interface provided by the
|
||||
.BR numa (3)
|
||||
functions in the
|
||||
.I numactl
|
||||
package is recommended.
|
||||
The
|
||||
.I numactl
|
||||
package is available at
|
||||
.IR ftp://ftp.suse.com/pub/people/ak/numa/ .
|
||||
|
||||
You can link with
|
||||
.I -lnuma
|
||||
to get system call definitions.
|
||||
.I libnuma
|
||||
is available in the
|
||||
.I numactl
|
||||
package.
|
||||
This package also has the
|
||||
.I numaif.h
|
||||
header.
|
||||
|
||||
.SH SEE ALSO
|
||||
.BR numa (3),
|
||||
.BR numactl (8),
|
||||
.BR set_mempolicy (2),
|
||||
.BR get_mempolicy (2),
|
||||
.BR mmap (2)
|
|
@ -0,0 +1,128 @@
|
|||
.\" Copyright 2003,2004 Andi Kleen, SuSE Labs.
|
||||
.\"
|
||||
.\" Permission is granted to make and distribute verbatim copies of this
|
||||
.\" manual provided the copyright notice and this permission notice are
|
||||
.\" preserved on all copies.
|
||||
.\"
|
||||
.\" Permission is granted to copy and distribute modified versions of this
|
||||
.\" manual under the conditions for verbatim copying, provided that the
|
||||
.\" entire resulting derived work is distributed under the terms of a
|
||||
.\" permission notice identical to this one.
|
||||
.\"
|
||||
.\" Since the Linux kernel and libraries are constantly changing, this
|
||||
.\" manual page may be incorrect or out-of-date. The author(s) assume no
|
||||
.\" responsibility for errors or omissions, or for damages resulting from
|
||||
.\" the use of the information contained herein.
|
||||
.\"
|
||||
.\" Formatted or processed versions of this manual, if unaccompanied by
|
||||
.\" the source, must acknowledge the copyright and authors of this work.
|
||||
.\"
|
||||
.\" 2006-02-03, mtk, substantial wording changes and other improvements
|
||||
.\"
|
||||
.TH SET_MEMPOLICY 2 "2006-02-07" "SuSE Labs" "Linux Programmer's Manual"
|
||||
.SH NAME
|
||||
set_mempolicy \- Set default NUMA memory policy for a process and its children.
|
||||
|
||||
.SH SYNOPSIS
|
||||
.B "#include <numaif.h>"
|
||||
.sp
|
||||
.BI "int set_mempolicy(int " policy ", unsigned long *" nodemask ,
|
||||
.BI "unsigned long " maxnode )
|
||||
.sp
|
||||
.SH DESCRIPTION
|
||||
.BR set_mempolicy ()
|
||||
sets the NUMA memory policy of the calling process to
|
||||
.IR policy .
|
||||
|
||||
A NUMA machine has different
|
||||
memory controllers with different distances to specific CPUs.
|
||||
The memory policy defines in which node memory is allocated for
|
||||
the process.
|
||||
|
||||
This system call defines the default policy for the process;
|
||||
in addition a policy can be set for specific memory ranges using
|
||||
.BR mbind (2).
|
||||
The policy is only applied when a new page is allocated
|
||||
for the process. For anonymous memory this is when the page is first
|
||||
touched by the application.
|
||||
|
||||
Available policies are
|
||||
.BR MPOL_DEFAULT ,
|
||||
.BR MPOL_BIND ,
|
||||
.BR MPOL_INTERLEAVE ,
|
||||
.BR MPOL_PREFERRED .
|
||||
All policies except
|
||||
.B MPOL_DEFAULT
|
||||
require the caller to specify the nodes to which the policy applies in the
|
||||
.I nodemask
|
||||
parameter.
|
||||
.I nodemask
|
||||
is pointer to a bit field of nodes that contains up to
|
||||
.I maxnode
|
||||
bits. The bit field size is rounded to the next multiple of
|
||||
.IR "sizeof(unsigned long)" ,
|
||||
but the kernel will only use bits up to
|
||||
.IR maxnode .
|
||||
|
||||
The
|
||||
.B MPOL_DEFAULT
|
||||
policy is the default and means to allocate memory locally,
|
||||
i.e., on the node of the CPU that triggered the allocation.
|
||||
.I nodemask
|
||||
should be specified as NULL.
|
||||
|
||||
The
|
||||
.B MPOL_BIND
|
||||
policy is a strict policy that restricts memory allocation to the
|
||||
nodes specified in
|
||||
.IR nodemask .
|
||||
There won't be allocations on other nodes.
|
||||
|
||||
.B MPOL_INTERLEAVE
|
||||
interleaves allocations to the nodes specified in
|
||||
.IR nodemask .
|
||||
This optimizes for bandwidth instead of latency.
|
||||
To be effective the memory area should be fairly large,
|
||||
at least 1MB or bigger.
|
||||
|
||||
.B MPOL_PREFERRED
|
||||
sets the preferred node for allocation.
|
||||
The kernel will try to allocate in this
|
||||
node first and fall back to other nodes if the preferred node is low on free
|
||||
memory. Only the first node in the
|
||||
.I nodemask
|
||||
is used.
|
||||
If no node is set in the mask, then the memory is allocated on
|
||||
the node of the CPU that triggered the allocation allocation (like
|
||||
.BR MPOL_DEFAULT ).
|
||||
|
||||
The memory policy is inherited by child processes created using
|
||||
.BR fork (2)
|
||||
or
|
||||
.BR clone (2).
|
||||
|
||||
.SH NOTES
|
||||
Process policy is not remembered if the page is swapped out.
|
||||
|
||||
.SH RETURN VALUE
|
||||
On success,
|
||||
.BR set_mempolicy ()
|
||||
returns 0;
|
||||
on error, \-1 is returned and
|
||||
.I errno
|
||||
is set to indicate the error.
|
||||
|
||||
.\" .SH ERRORS
|
||||
.\" FIXME writeme
|
||||
.\" .TP
|
||||
.\" .B EINVAL
|
||||
.\" .I mode is invalid.
|
||||
.SH "VERSIONS AND LIBRARY SUPPORT"
|
||||
See
|
||||
.BR mbind (2).
|
||||
|
||||
.SH SEE ALSO
|
||||
.BR mbind (2),
|
||||
.BR get_mempolicy (2),
|
||||
.BR numactl (8),
|
||||
.BR numa (3)
|
Loading…
Reference in New Issue