mirror of https://github.com/mkerrisk/man-pages
1525 lines
38 KiB
Groff
1525 lines
38 KiB
Groff
.\" This manpage is Copyright (C) 1992 Drew Eckhardt;
|
|
.\" and Copyright (C) 1993 Michael Haardt, Ian Jackson.
|
|
.\" and Copyright (C) 2008 Greg Banks
|
|
.\" and Copyright (C) 2006, 2008, 2013, 2014 Michael Kerrisk <mtk.manpages@gmail.com>
|
|
.\"
|
|
.\" %%%LICENSE_START(VERBATIM)
|
|
.\" Permission is granted to make and distribute verbatim copies of this
|
|
.\" manual provided the copyright notice and this permission notice are
|
|
.\" preserved on all copies.
|
|
.\"
|
|
.\" Permission is granted to copy and distribute modified versions of this
|
|
.\" manual under the conditions for verbatim copying, provided that the
|
|
.\" entire resulting derived work is distributed under the terms of a
|
|
.\" permission notice identical to this one.
|
|
.\"
|
|
.\" Since the Linux kernel and libraries are constantly changing, this
|
|
.\" manual page may be incorrect or out-of-date. The author(s) assume no
|
|
.\" responsibility for errors or omissions, or for damages resulting from
|
|
.\" the use of the information contained herein. The author(s) may not
|
|
.\" have taken the same level of care in the production of this manual,
|
|
.\" which is licensed free of charge, as they might when working
|
|
.\" professionally.
|
|
.\"
|
|
.\" Formatted or processed versions of this manual, if unaccompanied by
|
|
.\" the source, must acknowledge the copyright and authors of this work.
|
|
.\" %%%LICENSE_END
|
|
.\"
|
|
.\" Modified 1993-07-21 by Rik Faith <faith@cs.unc.edu>
|
|
.\" Modified 1994-08-21 by Michael Haardt
|
|
.\" Modified 1996-04-13 by Andries Brouwer <aeb@cwi.nl>
|
|
.\" Modified 1996-05-13 by Thomas Koenig
|
|
.\" Modified 1996-12-20 by Michael Haardt
|
|
.\" Modified 1999-02-19 by Andries Brouwer <aeb@cwi.nl>
|
|
.\" Modified 1998-11-28 by Joseph S. Myers <jsm28@hermes.cam.ac.uk>
|
|
.\" Modified 1999-06-03 by Michael Haardt
|
|
.\" Modified 2002-05-07 by Michael Kerrisk <mtk.manpages@gmail.com>
|
|
.\" Modified 2004-06-23 by Michael Kerrisk <mtk.manpages@gmail.com>
|
|
.\" 2004-12-08, mtk, reordered flags list alphabetically
|
|
.\" 2004-12-08, Martin Pool <mbp@sourcefrog.net> (& mtk), added O_NOATIME
|
|
.\" 2007-09-18, mtk, Added description of O_CLOEXEC + other minor edits
|
|
.\" 2008-01-03, mtk, with input from Trond Myklebust
|
|
.\" <trond.myklebust@fys.uio.no> and Timo Sirainen <tss@iki.fi>
|
|
.\" Rewrite description of O_EXCL.
|
|
.\" 2008-01-11, Greg Banks <gnb@melbourne.sgi.com>: add more detail
|
|
.\" on O_DIRECT.
|
|
.\" 2008-02-26, Michael Haardt: Reorganized text for O_CREAT and mode
|
|
.\"
|
|
.\" FIXME . Apr 08: The next POSIX revision has O_EXEC, O_SEARCH, and
|
|
.\" O_TTYINIT. Eventually these may need to be documented. --mtk
|
|
.\"
|
|
.TH OPEN 2 2014-12-31 "Linux" "Linux Programmer's Manual"
|
|
.SH NAME
|
|
open, openat, creat \- open and possibly create a file
|
|
.SH SYNOPSIS
|
|
.nf
|
|
.B #include <sys/types.h>
|
|
.B #include <sys/stat.h>
|
|
.B #include <fcntl.h>
|
|
.sp
|
|
.BI "int open(const char *" pathname ", int " flags );
|
|
.BI "int open(const char *" pathname ", int " flags ", mode_t " mode );
|
|
|
|
.BI "int creat(const char *" pathname ", mode_t " mode );
|
|
.sp
|
|
.BI "int openat(int " dirfd ", const char *" pathname ", int " flags );
|
|
.BI "int openat(int " dirfd ", const char *" pathname ", int " flags \
|
|
", mode_t " mode );
|
|
.fi
|
|
.sp
|
|
.in -4n
|
|
Feature Test Macro Requirements for glibc (see
|
|
.BR feature_test_macros (7)):
|
|
.in
|
|
.sp
|
|
.BR openat ():
|
|
.PD 0
|
|
.ad l
|
|
.RS 4
|
|
.TP 4
|
|
Since glibc 2.10:
|
|
_XOPEN_SOURCE\ >=\ 700 || _POSIX_C_SOURCE\ >=\ 200809L
|
|
.TP
|
|
Before glibc 2.10:
|
|
_ATFILE_SOURCE
|
|
.RE
|
|
.ad
|
|
.PD
|
|
.SH DESCRIPTION
|
|
Given a
|
|
.I pathname
|
|
for a file,
|
|
.BR open ()
|
|
returns a file descriptor, a small, nonnegative integer
|
|
for use in subsequent system calls
|
|
.RB ( read "(2), " write "(2), " lseek "(2), " fcntl "(2), etc.)."
|
|
The file descriptor returned by a successful call will be
|
|
the lowest-numbered file descriptor not currently open for the process.
|
|
.PP
|
|
By default, the new file descriptor is set to remain open across an
|
|
.BR execve (2)
|
|
(i.e., the
|
|
.B FD_CLOEXEC
|
|
file descriptor flag described in
|
|
.BR fcntl (2)
|
|
is initially disabled); the
|
|
.B O_CLOEXEC
|
|
flag, described below, can be used to change this default.
|
|
The file offset is set to the beginning of the file (see
|
|
.BR lseek (2)).
|
|
.PP
|
|
A call to
|
|
.BR open ()
|
|
creates a new
|
|
.IR "open file description" ,
|
|
an entry in the system-wide table of open files.
|
|
The open file description records the file offset and the file status flags
|
|
(see below).
|
|
A file descriptor is a reference to an open file description;
|
|
this reference is unaffected if
|
|
.I pathname
|
|
is subsequently removed or modified to refer to a different file.
|
|
For further details on open file descriptions, see NOTES.
|
|
.PP
|
|
The argument
|
|
.I flags
|
|
must include one of the following
|
|
.IR "access modes" :
|
|
.BR O_RDONLY ", " O_WRONLY ", or " O_RDWR .
|
|
These request opening the file read-only, write-only, or read/write,
|
|
respectively.
|
|
|
|
In addition, zero or more file creation flags and file status flags
|
|
can be
|
|
.RI bitwise- or 'd
|
|
in
|
|
.IR flags .
|
|
The
|
|
.I file creation flags
|
|
are
|
|
.BR O_CLOEXEC ,
|
|
.BR O_CREAT ,
|
|
.BR O_DIRECTORY ,
|
|
.BR O_EXCL ,
|
|
.BR O_NOCTTY ,
|
|
.BR O_NOFOLLOW ,
|
|
.BR O_TMPFILE ,
|
|
.BR O_TRUNC ,
|
|
and
|
|
.BR O_TTY_INIT .
|
|
The
|
|
.I file status flags
|
|
are all of the remaining flags listed below.
|
|
.\" SUSv4 divides the flags into:
|
|
.\" * Access mode
|
|
.\" * File creation
|
|
.\" * File status
|
|
.\" * Other (O_CLOEXEC, O_DIRECTORY, O_NOFOLLOW)
|
|
.\" though it's not clear what the difference between "other" and
|
|
.\" "File creation" flags is. I raised an Aardvark to see if this
|
|
.\" can be clarified in SUSv4; 10 Oct 2008.
|
|
.\" http://thread.gmane.org/gmane.comp.standards.posix.austin.general/64/focus=67
|
|
.\" TC1 (balloted in 2013), resolved this, so that those three constants
|
|
.\" are also categorized" as file status flags.
|
|
.\"
|
|
The distinction between these two groups of flags is that
|
|
the file status flags can be retrieved and (in some cases)
|
|
modified; see
|
|
.BR fcntl (2)
|
|
for details.
|
|
|
|
The full list of file creation flags and file status flags is as follows:
|
|
.TP
|
|
.B O_APPEND
|
|
The file is opened in append mode.
|
|
Before each
|
|
.BR write (2),
|
|
the file offset is positioned at the end of the file,
|
|
as if with
|
|
.BR lseek (2).
|
|
.B O_APPEND
|
|
may lead to corrupted files on NFS filesystems if more than one process
|
|
appends data to a file at once.
|
|
.\" For more background, see
|
|
.\" http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=453946
|
|
.\" http://nfs.sourceforge.net/
|
|
This is because NFS does not support
|
|
appending to a file, so the client kernel has to simulate it, which
|
|
can't be done without a race condition.
|
|
.TP
|
|
.B O_ASYNC
|
|
Enable signal-driven I/O:
|
|
generate a signal
|
|
.RB ( SIGIO
|
|
by default, but this can be changed via
|
|
.BR fcntl (2))
|
|
when input or output becomes possible on this file descriptor.
|
|
This feature is available only for terminals, pseudoterminals,
|
|
sockets, and (since Linux 2.6) pipes and FIFOs.
|
|
See
|
|
.BR fcntl (2)
|
|
for further details.
|
|
See also BUGS, below.
|
|
.TP
|
|
.BR O_CLOEXEC " (since Linux 2.6.23)"
|
|
.\" NOTE! several other man pages refer to this text
|
|
Enable the close-on-exec flag for the new file descriptor.
|
|
Specifying this flag permits a program to avoid additional
|
|
.BR fcntl (2)
|
|
.B F_SETFD
|
|
operations to set the
|
|
.B FD_CLOEXEC
|
|
flag.
|
|
|
|
Note that the use of this flag is essential in some multithreaded programs,
|
|
because using a separate
|
|
.BR fcntl (2)
|
|
.B F_SETFD
|
|
operation to set the
|
|
.B FD_CLOEXEC
|
|
flag does not suffice to avoid race conditions
|
|
where one thread opens a file descriptor and
|
|
attempts to set its close-on-exec flag using
|
|
.BR fcntl (2)
|
|
at the same time as another thread does a
|
|
.BR fork (2)
|
|
plus
|
|
.BR execve (2).
|
|
Depending on the order of execution,
|
|
the race may lead to the file descriptor returned by
|
|
.BR open ()
|
|
being unintentionally leaked to the program executed by the child process
|
|
created by
|
|
.BR fork (2).
|
|
(This kind of race is in principle possible for any system call
|
|
that creates a file descriptor whose close-on-exec flag should be set,
|
|
and various other Linux system calls provide an equivalent of the
|
|
.BR O_CLOEXEC
|
|
flag to deal with this problem.)
|
|
.\" This flag fixes only one form of the race condition;
|
|
.\" The race can also occur with, for example, descriptors
|
|
.\" returned by accept(), pipe(), etc.
|
|
.TP
|
|
.B O_CREAT
|
|
If the file does not exist, it will be created.
|
|
The owner (user ID) of the file is set to the effective user ID
|
|
of the process.
|
|
The group ownership (group ID) is set either to
|
|
the effective group ID of the process or to the group ID of the
|
|
parent directory (depending on filesystem type and mount options,
|
|
and the mode of the parent directory; see the mount options
|
|
.I bsdgroups
|
|
and
|
|
.I sysvgroups
|
|
described in
|
|
.BR mount (8)).
|
|
.\" As at 2.6.25, bsdgroups is supported by ext2, ext3, ext4, and
|
|
.\" XFS (since 2.6.14).
|
|
.RS
|
|
.PP
|
|
.I mode
|
|
specifies the permissions to use in case a new file is created.
|
|
This argument must be supplied when
|
|
.B O_CREAT
|
|
or
|
|
.B O_TMPFILE
|
|
is specified in
|
|
.IR flags ;
|
|
if neither
|
|
.B O_CREAT
|
|
nor
|
|
.B O_TMPFILE
|
|
is specified, then
|
|
.I mode
|
|
is ignored.
|
|
The effective permissions are modified by
|
|
the process's
|
|
.I umask
|
|
in the usual way: The permissions of the created file are
|
|
.IR "(mode\ &\ ~umask)" .
|
|
Note that this mode applies only to future accesses of the
|
|
newly created file; the
|
|
.BR open ()
|
|
call that creates a read-only file may well return a read/write
|
|
file descriptor.
|
|
.PP
|
|
The following symbolic constants are provided for
|
|
.IR mode :
|
|
.TP 9
|
|
.B S_IRWXU
|
|
00700 user (file owner) has read, write and execute permission
|
|
.TP
|
|
.B S_IRUSR
|
|
00400 user has read permission
|
|
.TP
|
|
.B S_IWUSR
|
|
00200 user has write permission
|
|
.TP
|
|
.B S_IXUSR
|
|
00100 user has execute permission
|
|
.TP
|
|
.B S_IRWXG
|
|
00070 group has read, write and execute permission
|
|
.TP
|
|
.B S_IRGRP
|
|
00040 group has read permission
|
|
.TP
|
|
.B S_IWGRP
|
|
00020 group has write permission
|
|
.TP
|
|
.B S_IXGRP
|
|
00010 group has execute permission
|
|
.TP
|
|
.B S_IRWXO
|
|
00007 others have read, write and execute permission
|
|
.TP
|
|
.B S_IROTH
|
|
00004 others have read permission
|
|
.TP
|
|
.B S_IWOTH
|
|
00002 others have write permission
|
|
.TP
|
|
.B S_IXOTH
|
|
00001 others have execute permission
|
|
.RE
|
|
.TP
|
|
.BR O_DIRECT " (since Linux 2.4.10)"
|
|
Try to minimize cache effects of the I/O to and from this file.
|
|
In general this will degrade performance, but it is useful in
|
|
special situations, such as when applications do their own caching.
|
|
File I/O is done directly to/from user-space buffers.
|
|
The
|
|
.B O_DIRECT
|
|
flag on its own makes an effort to transfer data synchronously,
|
|
but does not give the guarantees of the
|
|
.B O_SYNC
|
|
flag that data and necessary metadata are transferred.
|
|
To guarantee synchronous I/O,
|
|
.B O_SYNC
|
|
must be used in addition to
|
|
.BR O_DIRECT .
|
|
See NOTES below for further discussion.
|
|
.sp
|
|
A semantically similar (but deprecated) interface for block devices
|
|
is described in
|
|
.BR raw (8).
|
|
.TP
|
|
.B O_DIRECTORY
|
|
If \fIpathname\fP is not a directory, cause the open to fail.
|
|
.\" But see the following and its replies:
|
|
.\" http://marc.theaimsgroup.com/?t=112748702800001&r=1&w=2
|
|
.\" [PATCH] open: O_DIRECTORY and O_CREAT together should fail
|
|
.\" O_DIRECTORY | O_CREAT causes O_DIRECTORY to be ignored.
|
|
This flag was added in kernel version 2.1.126, to
|
|
avoid denial-of-service problems if
|
|
.BR opendir (3)
|
|
is called on a
|
|
FIFO or tape device.
|
|
.TP
|
|
.B O_DSYNC
|
|
Write operations on the file will complete according to the requirements of
|
|
synchronized I/O
|
|
.I data
|
|
integrity completion.
|
|
|
|
By the time
|
|
.BR write (2)
|
|
(and similar)
|
|
return, the output data
|
|
has been transferred to the underlying hardware,
|
|
along with any file metadata that would be required to retrieve that data
|
|
(i.e., as though each
|
|
.BR write (2)
|
|
was followed by a call to
|
|
.BR fdatasync (2)).
|
|
.IR "See NOTES below" .
|
|
.TP
|
|
.B O_EXCL
|
|
Ensure that this call creates the file:
|
|
if this flag is specified in conjunction with
|
|
.BR O_CREAT ,
|
|
and
|
|
.I pathname
|
|
already exists, then
|
|
.BR open ()
|
|
will fail.
|
|
|
|
When these two flags are specified, symbolic links are not followed:
|
|
.\" POSIX.1-2001 explicitly requires this behavior.
|
|
if
|
|
.I pathname
|
|
is a symbolic link, then
|
|
.BR open ()
|
|
fails regardless of where the symbolic link points to.
|
|
|
|
In general, the behavior of
|
|
.B O_EXCL
|
|
is undefined if it is used without
|
|
.BR O_CREAT .
|
|
There is one exception: on Linux 2.6 and later,
|
|
.B O_EXCL
|
|
can be used without
|
|
.B O_CREAT
|
|
if
|
|
.I pathname
|
|
refers to a block device.
|
|
If the block device is in use by the system (e.g., mounted),
|
|
.BR open ()
|
|
fails with the error
|
|
.BR EBUSY .
|
|
|
|
On NFS,
|
|
.B O_EXCL
|
|
is supported only when using NFSv3 or later on kernel 2.6 or later.
|
|
In NFS environments where
|
|
.B O_EXCL
|
|
support is not provided, programs that rely on it
|
|
for performing locking tasks will contain a race condition.
|
|
Portable programs that want to perform atomic file locking using a lockfile,
|
|
and need to avoid reliance on NFS support for
|
|
.BR O_EXCL ,
|
|
can create a unique file on
|
|
the same filesystem (e.g., incorporating hostname and PID), and use
|
|
.BR link (2)
|
|
to make a link to the lockfile.
|
|
If
|
|
.BR link (2)
|
|
returns 0, the lock is successful.
|
|
Otherwise, use
|
|
.BR stat (2)
|
|
on the unique file to check if its link count has increased to 2,
|
|
in which case the lock is also successful.
|
|
.TP
|
|
.B O_LARGEFILE
|
|
(LFS)
|
|
Allow files whose sizes cannot be represented in an
|
|
.I off_t
|
|
(but can be represented in an
|
|
.IR off64_t )
|
|
to be opened.
|
|
The
|
|
.B _LARGEFILE64_SOURCE
|
|
macro must be defined
|
|
(before including
|
|
.I any
|
|
header files)
|
|
in order to obtain this definition.
|
|
Setting the
|
|
.B _FILE_OFFSET_BITS
|
|
feature test macro to 64 (rather than using
|
|
.BR O_LARGEFILE )
|
|
is the preferred
|
|
method of accessing large files on 32-bit systems (see
|
|
.BR feature_test_macros (7)).
|
|
.TP
|
|
.BR O_NOATIME " (since Linux 2.6.8)"
|
|
Do not update the file last access time
|
|
.RI ( st_atime
|
|
in the inode)
|
|
when the file is
|
|
.BR read (2).
|
|
This flag is intended for use by indexing or backup programs,
|
|
where its use can significantly reduce the amount of disk activity.
|
|
This flag may not be effective on all filesystems.
|
|
One example is NFS, where the server maintains the access time.
|
|
.\" The O_NOATIME flag also affects the treatment of st_atime
|
|
.\" by mmap() and readdir(2), MTK, Dec 04.
|
|
.TP
|
|
.B O_NOCTTY
|
|
If
|
|
.I pathname
|
|
refers to a terminal device\(emsee
|
|
.BR tty (4)\(emit
|
|
will not become the process's controlling terminal even if the
|
|
process does not have one.
|
|
.TP
|
|
.B O_NOFOLLOW
|
|
If \fIpathname\fP is a symbolic link, then the open fails.
|
|
This is a FreeBSD extension, which was added to Linux in version 2.1.126.
|
|
Symbolic links in earlier components of the pathname will still be
|
|
followed.
|
|
See also
|
|
.BR O_PATH
|
|
below.
|
|
.\" The headers from glibc 2.0.100 and later include a
|
|
.\" definition of this flag; \fIkernels before 2.1.126 will ignore it if
|
|
.\" used\fP.
|
|
.TP
|
|
.BR O_NONBLOCK " or " O_NDELAY
|
|
When possible, the file is opened in nonblocking mode.
|
|
Neither the
|
|
.BR open ()
|
|
nor any subsequent operations on the file descriptor which is
|
|
returned will cause the calling process to wait.
|
|
For the handling of FIFOs (named pipes), see also
|
|
.BR fifo (7).
|
|
For a discussion of the effect of
|
|
.B O_NONBLOCK
|
|
in conjunction with mandatory file locks and with file leases, see
|
|
.BR fcntl (2).
|
|
.TP
|
|
.BR O_PATH " (since Linux 2.6.39)"
|
|
.\" commit 1abf0c718f15a56a0a435588d1b104c7a37dc9bd
|
|
.\" commit 326be7b484843988afe57566b627fb7a70beac56
|
|
.\" commit 65cfc6722361570bfe255698d9cd4dccaf47570d
|
|
.\"
|
|
.\" http://thread.gmane.org/gmane.linux.man/2790/focus=3496
|
|
.\" Subject: Re: [PATCH] open(2): document O_PATH
|
|
.\" Newsgroups: gmane.linux.man, gmane.linux.kernel
|
|
.\"
|
|
Obtain a file descriptor that can be used for two purposes:
|
|
to indicate a location in the filesystem tree and
|
|
to perform operations that act purely at the file descriptor level.
|
|
The file itself is not opened, and other file operations (e.g.,
|
|
.BR read (2),
|
|
.BR write (2),
|
|
.BR fchmod (2),
|
|
.BR fchown (2),
|
|
.BR fgetxattr (2),
|
|
.BR mmap (2))
|
|
fail with the error
|
|
.BR EBADF .
|
|
|
|
The following operations
|
|
.I can
|
|
be performed on the resulting file descriptor:
|
|
.RS
|
|
.IP * 3
|
|
.BR close (2);
|
|
.BR fchdir (2)
|
|
(since Linux 3.5);
|
|
.\" commit 332a2e1244bd08b9e3ecd378028513396a004a24
|
|
.BR fstat (2)
|
|
(since Linux 3.6).
|
|
.\" fstat(): commit 55815f70147dcfa3ead5738fd56d3574e2e3c1c2
|
|
.IP *
|
|
Duplicating the file descriptor
|
|
.RB ( dup (2),
|
|
.BR fcntl (2)
|
|
.BR F_DUPFD ,
|
|
etc.).
|
|
.IP *
|
|
Getting and setting file descriptor flags
|
|
.RB ( fcntl (2)
|
|
.BR F_GETFD
|
|
and
|
|
.BR F_SETFD ).
|
|
.IP *
|
|
Retrieving open file status flags using the
|
|
.BR fcntl (2)
|
|
.BR F_GETFL
|
|
operation: the returned flags will include the bit
|
|
.BR O_PATH .
|
|
.IP *
|
|
Passing the file descriptor as the
|
|
.IR dirfd
|
|
argument of
|
|
.BR openat (2)
|
|
and the other "*at()" system calls.
|
|
This includes
|
|
.BR linkat (2)
|
|
with
|
|
.BR AT_EMPTY_PATH
|
|
(or via procfs using
|
|
.BR AT_SYMLINK_FOLLOW )
|
|
even if the file is not a directory.
|
|
.IP *
|
|
Passing the file descriptor to another process via a UNIX domain socket
|
|
(see
|
|
.BR SCM_RIGHTS
|
|
in
|
|
.BR unix (7)).
|
|
.RE
|
|
.IP
|
|
When
|
|
.B O_PATH
|
|
is specified in
|
|
.IR flags ,
|
|
flag bits other than
|
|
.BR O_CLOEXEC ,
|
|
.BR O_DIRECTORY ,
|
|
and
|
|
.BR O_NOFOLLOW
|
|
are ignored.
|
|
|
|
If
|
|
.I pathname
|
|
is a symbolic link and the
|
|
.BR O_NOFOLLOW
|
|
flag is also specified,
|
|
then the call returns a file descriptor referring to the symbolic link.
|
|
This file descriptor can be used as the
|
|
.I dirfd
|
|
argument in calls to
|
|
.BR fchownat (2),
|
|
.BR fstatat (2),
|
|
.BR linkat (2),
|
|
and
|
|
.BR readlinkat (2)
|
|
with an empty pathname to have the calls operate on the symbolic link.
|
|
.TP
|
|
.B O_SYNC
|
|
Write operations on the file will complete according to the requirements of
|
|
synchronized I/O
|
|
.I file
|
|
integrity completion
|
|
(by contrast with the
|
|
synchronized I/O
|
|
.I data
|
|
integrity completion
|
|
provided by
|
|
.BR O_DSYNC .)
|
|
|
|
By the time
|
|
.BR write (2)
|
|
(and similar)
|
|
return, the output data and associated file metadata
|
|
have been transferred to the underlying hardware
|
|
(i.e., as though each
|
|
.BR write (2)
|
|
was followed by a call to
|
|
.BR fsync (2)).
|
|
.IR "See NOTES below" .
|
|
.TP
|
|
.BR O_TMPFILE " (since Linux 3.11)"
|
|
.\" commit 60545d0d4610b02e55f65d141c95b18ccf855b6e
|
|
.\" commit f4e0c30c191f87851c4a53454abb55ee276f4a7e
|
|
.\" commit bb458c644a59dbba3a1fe59b27106c5e68e1c4bd
|
|
Create an unnamed temporary file.
|
|
The
|
|
.I pathname
|
|
argument specifies a directory;
|
|
an unnamed inode will be created in that directory's filesystem.
|
|
Anything written to the resulting file will be lost when
|
|
the last file descriptor is closed, unless the file is given a name.
|
|
|
|
.B O_TMPFILE
|
|
must be specified with one of
|
|
.B O_RDWR
|
|
or
|
|
.B O_WRONLY
|
|
and, optionally,
|
|
.BR O_EXCL .
|
|
If
|
|
.B O_EXCL
|
|
is not specified, then
|
|
.BR linkat (2)
|
|
can be used to link the temporary file into the filesystem, making it
|
|
permanent, using code like the following:
|
|
|
|
.in +4n
|
|
.nf
|
|
char path[PATH_MAX];
|
|
fd = open("/path/to/dir", O_TMPFILE | O_RDWR,
|
|
S_IRUSR | S_IWUSR);
|
|
|
|
/* File I/O on 'fd'... */
|
|
|
|
snprintf(path, PATH_MAX, "/proc/self/fd/%d", fd);
|
|
linkat(AT_FDCWD, path, AT_FDCWD, "/path/for/file",
|
|
AT_SYMLINK_FOLLOW);
|
|
.fi
|
|
.in
|
|
|
|
In this case,
|
|
the
|
|
.BR open ()
|
|
.I mode
|
|
argument determines the file permission mode, as with
|
|
.BR O_CREAT .
|
|
|
|
Specifying
|
|
.B O_EXCL
|
|
in conjunction with
|
|
.B O_TMPFILE
|
|
prevents a temporary file from being linked into the filesystem
|
|
in the above manner.
|
|
(Note that the meaning of
|
|
.B O_EXCL
|
|
in this case is different from the meaning of
|
|
.B O_EXCL
|
|
otherwise.)
|
|
|
|
|
|
There are two main use cases for
|
|
.\" Inspired by http://lwn.net/Articles/559147/
|
|
.BR O_TMPFILE :
|
|
.RS
|
|
.IP * 3
|
|
Improved
|
|
.BR tmpfile (3)
|
|
functionality: race-free creation of temporary files that
|
|
(1) are automatically deleted when closed;
|
|
(2) can never be reached via any pathname;
|
|
(3) are not subject to symlink attacks; and
|
|
(4) do not require the caller to devise unique names.
|
|
.IP *
|
|
Creating a file that is initially invisible, which is then populated
|
|
with data and adjusted to have appropriate filesystem attributes
|
|
.RB ( chown (2),
|
|
.BR chmod (2),
|
|
.BR fsetxattr (2),
|
|
etc.)
|
|
before being atomically linked into the filesystem
|
|
in a fully formed state (using
|
|
.BR linkat (2)
|
|
as described above).
|
|
.RE
|
|
.IP
|
|
.B O_TMPFILE
|
|
requires support by the underlying filesystem;
|
|
only a subset of Linux filesystems provide that support.
|
|
In the initial implementation, support was provided in
|
|
the ext2, ext3, ext4, UDF, Minix, and shmem filesystems.
|
|
XFS support was added
|
|
.\" commit 99b6436bc29e4f10e4388c27a3e4810191cc4788
|
|
.\" commit ab29743117f9f4c22ac44c13c1647fb24fb2bafe
|
|
in Linux 3.15.
|
|
.TP
|
|
.B O_TRUNC
|
|
If the file already exists and is a regular file and the access mode allows
|
|
writing (i.e., is
|
|
.B O_RDWR
|
|
or
|
|
.BR O_WRONLY )
|
|
it will be truncated to length 0.
|
|
If the file is a FIFO or terminal device file, the
|
|
.B O_TRUNC
|
|
flag is ignored.
|
|
Otherwise, the effect of
|
|
.B O_TRUNC
|
|
is unspecified.
|
|
.SS creat()
|
|
.BR creat ()
|
|
is equivalent to
|
|
.BR open ()
|
|
with
|
|
.I flags
|
|
equal to
|
|
.BR O_CREAT|O_WRONLY|O_TRUNC .
|
|
.SS openat()
|
|
The
|
|
.BR openat ()
|
|
system call operates in exactly the same way as
|
|
.BR open (),
|
|
except for the differences described here.
|
|
|
|
If the pathname given in
|
|
.I pathname
|
|
is relative, then it is interpreted relative to the directory
|
|
referred to by the file descriptor
|
|
.I dirfd
|
|
(rather than relative to the current working directory of
|
|
the calling process, as is done by
|
|
.BR open ()
|
|
for a relative pathname).
|
|
|
|
If
|
|
.I pathname
|
|
is relative and
|
|
.I dirfd
|
|
is the special value
|
|
.BR AT_FDCWD ,
|
|
then
|
|
.I pathname
|
|
is interpreted relative to the current working
|
|
directory of the calling process (like
|
|
.BR open ()).
|
|
|
|
If
|
|
.I pathname
|
|
is absolute, then
|
|
.I dirfd
|
|
is ignored.
|
|
.SH RETURN VALUE
|
|
.BR open (),
|
|
.BR openat (),
|
|
and
|
|
.BR creat ()
|
|
return the new file descriptor, or \-1 if an error occurred
|
|
(in which case,
|
|
.I errno
|
|
is set appropriately).
|
|
.SH ERRORS
|
|
.BR open (),
|
|
.BR openat (),
|
|
and
|
|
.BR creat ()
|
|
can fail with the following errors:
|
|
.TP
|
|
.B EACCES
|
|
The requested access to the file is not allowed, or search permission
|
|
is denied for one of the directories in the path prefix of
|
|
.IR pathname ,
|
|
or the file did not exist yet and write access to the parent directory
|
|
is not allowed.
|
|
(See also
|
|
.BR path_resolution (7).)
|
|
.TP
|
|
.B EDQUOT
|
|
Where
|
|
.B O_CREAT
|
|
is specified, the file does not exist, and the user's quota of disk
|
|
blocks or inodes on the filesystem has been exhausted.
|
|
.TP
|
|
.B EEXIST
|
|
.I pathname
|
|
already exists and
|
|
.BR O_CREAT " and " O_EXCL
|
|
were used.
|
|
.TP
|
|
.B EFAULT
|
|
.I pathname
|
|
points outside your accessible address space.
|
|
.TP
|
|
.B EFBIG
|
|
See
|
|
.BR EOVERFLOW .
|
|
.TP
|
|
.B EINTR
|
|
While blocked waiting to complete an open of a slow device
|
|
(e.g., a FIFO; see
|
|
.BR fifo (7)),
|
|
the call was interrupted by a signal handler; see
|
|
.BR signal (7).
|
|
.TP
|
|
.B EINVAL
|
|
The filesystem does not support the
|
|
.BR O_DIRECT
|
|
flag.
|
|
See
|
|
.BR NOTES
|
|
for more information.
|
|
.TP
|
|
.B EINVAL
|
|
Invalid value in
|
|
.\" In particular, __O_TMPFILE instead of O_TMPFILE
|
|
.IR flags .
|
|
.TP
|
|
.B EINVAL
|
|
.B O_TMPFILE
|
|
was specified in
|
|
.IR flags ,
|
|
but neither
|
|
.B O_WRONLY
|
|
nor
|
|
.B O_RDWR
|
|
was specified.
|
|
.TP
|
|
.B EISDIR
|
|
.I pathname
|
|
refers to a directory and the access requested involved writing
|
|
(that is,
|
|
.B O_WRONLY
|
|
or
|
|
.B O_RDWR
|
|
is set).
|
|
.TP
|
|
.B EISDIR
|
|
.I pathname
|
|
refers to an existing directory,
|
|
.B O_TMPFILE
|
|
and one of
|
|
.B O_WRONLY
|
|
or
|
|
.B O_RDWR
|
|
were specified in
|
|
.IR flags ,
|
|
but this kernel version does not provide the
|
|
.B O_TMPFILE
|
|
functionality.
|
|
.TP
|
|
.B ELOOP
|
|
Too many symbolic links were encountered in resolving
|
|
.IR pathname .
|
|
.TP
|
|
.B ELOOP
|
|
.I pathname
|
|
was a symbolic link, and
|
|
.I flags
|
|
specified
|
|
.BR O_NOFOLLOW
|
|
but not
|
|
.BR O_PATH .
|
|
.TP
|
|
.B EMFILE
|
|
The process already has the maximum number of files open
|
|
(see the description of
|
|
.BR RLIMIT_NOFILE
|
|
in
|
|
.BR getrlimit (2)).
|
|
.TP
|
|
.B ENAMETOOLONG
|
|
.I pathname
|
|
was too long.
|
|
.TP
|
|
.B ENFILE
|
|
The system limit on the total number of open files has been reached.
|
|
.TP
|
|
.B ENODEV
|
|
.I pathname
|
|
refers to a device special file and no corresponding device exists.
|
|
(This is a Linux kernel bug; in this situation
|
|
.B ENXIO
|
|
must be returned.)
|
|
.TP
|
|
.B ENOENT
|
|
.B O_CREAT
|
|
is not set and the named file does not exist.
|
|
Or, a directory component in
|
|
.I pathname
|
|
does not exist or is a dangling symbolic link.
|
|
.TP
|
|
.B ENOENT
|
|
.I pathname
|
|
refers to a nonexistent directory,
|
|
.B O_TMPFILE
|
|
and one of
|
|
.B O_WRONLY
|
|
or
|
|
.B O_RDWR
|
|
were specified in
|
|
.IR flags ,
|
|
but this kernel version does not provide the
|
|
.B O_TMPFILE
|
|
functionality.
|
|
.TP
|
|
.B ENOMEM
|
|
Insufficient kernel memory was available.
|
|
.TP
|
|
.B ENOSPC
|
|
.I pathname
|
|
was to be created but the device containing
|
|
.I pathname
|
|
has no room for the new file.
|
|
.TP
|
|
.B ENOTDIR
|
|
A component used as a directory in
|
|
.I pathname
|
|
is not, in fact, a directory, or \fBO_DIRECTORY\fP was specified and
|
|
.I pathname
|
|
was not a directory.
|
|
.TP
|
|
.B ENXIO
|
|
.BR O_NONBLOCK " | " O_WRONLY
|
|
is set, the named file is a FIFO, and
|
|
no process has the FIFO open for reading.
|
|
Or, the file is a device special file and no corresponding device exists.
|
|
.TP
|
|
.BR EOPNOTSUPP
|
|
The filesystem containing
|
|
.I pathname
|
|
does not support
|
|
.BR O_TMPFILE .
|
|
.TP
|
|
.B EOVERFLOW
|
|
.I pathname
|
|
refers to a regular file that is too large to be opened.
|
|
The usual scenario here is that an application compiled
|
|
on a 32-bit platform without
|
|
.I -D_FILE_OFFSET_BITS=64
|
|
tried to open a file whose size exceeds
|
|
.I (1<<31)-1
|
|
bytes;
|
|
see also
|
|
.B O_LARGEFILE
|
|
above.
|
|
This is the error specified by POSIX.1-2001;
|
|
in kernels before 2.6.24, Linux gave the error
|
|
.B EFBIG
|
|
for this case.
|
|
.\" See http://bugzilla.kernel.org/show_bug.cgi?id=7253
|
|
.\" "Open of a large file on 32-bit fails with EFBIG, should be EOVERFLOW"
|
|
.\" Reported 2006-10-03
|
|
.TP
|
|
.B EPERM
|
|
The
|
|
.B O_NOATIME
|
|
flag was specified, but the effective user ID of the caller
|
|
.\" Strictly speaking, it's the filesystem UID... (MTK)
|
|
did not match the owner of the file and the caller was not privileged
|
|
.RB ( CAP_FOWNER ).
|
|
.TP
|
|
.B EPERM
|
|
The operation was prevented by a file seal; see
|
|
.BR fcntl (2).
|
|
.TP
|
|
.B EROFS
|
|
.I pathname
|
|
refers to a file on a read-only filesystem and write access was
|
|
requested.
|
|
.TP
|
|
.B ETXTBSY
|
|
.I pathname
|
|
refers to an executable image which is currently being executed and
|
|
write access was requested.
|
|
.TP
|
|
.B EWOULDBLOCK
|
|
The
|
|
.B O_NONBLOCK
|
|
flag was specified, and an incompatible lease was held on the file
|
|
(see
|
|
.BR fcntl (2)).
|
|
.PP
|
|
The following additional errors can occur for
|
|
.BR openat ():
|
|
.TP
|
|
.B EBADF
|
|
.I dirfd
|
|
is not a valid file descriptor.
|
|
.TP
|
|
.B ENOTDIR
|
|
.I pathname
|
|
is a relative pathname and
|
|
.I dirfd
|
|
is a file descriptor referring to a file other than a directory.
|
|
.SH VERSIONS
|
|
.BR openat ()
|
|
was added to Linux in kernel 2.6.16;
|
|
library support was added to glibc in version 2.4.
|
|
.SH CONFORMING TO
|
|
.BR open (),
|
|
.BR creat ()
|
|
SVr4, 4.3BSD, POSIX.1-2001, POSIX.1-2008.
|
|
|
|
.BR openat ():
|
|
POSIX.1-2008.
|
|
|
|
The
|
|
.BR O_DIRECT ,
|
|
.BR O_NOATIME ,
|
|
.BR O_PATH ,
|
|
and
|
|
.BR O_TMPFILE
|
|
flags are Linux-specific.
|
|
One must define
|
|
.B _GNU_SOURCE
|
|
to obtain their definitions.
|
|
|
|
The
|
|
.BR O_CLOEXEC ,
|
|
.BR O_DIRECTORY ,
|
|
and
|
|
.BR O_NOFOLLOW
|
|
flags are not specified in POSIX.1-2001,
|
|
but are specified in POSIX.1-2008.
|
|
Since glibc 2.12, one can obtain their definitions by defining either
|
|
.B _POSIX_C_SOURCE
|
|
with a value greater than or equal to 200809L or
|
|
.BR _XOPEN_SOURCE
|
|
with a value greater than or equal to 700.
|
|
In glibc 2.11 and earlier, one obtains the definitions by defining
|
|
.BR _GNU_SOURCE .
|
|
|
|
As noted in
|
|
.BR feature_test_macros (7),
|
|
feature test macros such as
|
|
.BR _POSIX_C_SOURCE ,
|
|
.BR _XOPEN_SOURCE ,
|
|
and
|
|
.B _GNU_SOURCE
|
|
must be defined before including
|
|
.I any
|
|
header files.
|
|
.SH NOTES
|
|
Under Linux, the
|
|
.B O_NONBLOCK
|
|
flag indicates that one wants to open
|
|
but does not necessarily have the intention to read or write.
|
|
This is typically used to open devices in order to get a file descriptor
|
|
for use with
|
|
.BR ioctl (2).
|
|
|
|
.LP
|
|
The (undefined) effect of
|
|
.B O_RDONLY | O_TRUNC
|
|
varies among implementations.
|
|
On many systems the file is actually truncated.
|
|
.\" Linux 2.0, 2.5: truncate
|
|
.\" Solaris 5.7, 5.8: truncate
|
|
.\" Irix 6.5: truncate
|
|
.\" Tru64 5.1B: truncate
|
|
.\" HP-UX 11.22: truncate
|
|
.\" FreeBSD 4.7: truncate
|
|
|
|
Note that
|
|
.BR open ()
|
|
can open device special files, but
|
|
.BR creat ()
|
|
cannot create them; use
|
|
.BR mknod (2)
|
|
instead.
|
|
|
|
If the file is newly created, its
|
|
.IR st_atime ,
|
|
.IR st_ctime ,
|
|
.I st_mtime
|
|
fields
|
|
(respectively, time of last access, time of last status change, and
|
|
time of last modification; see
|
|
.BR stat (2))
|
|
are set
|
|
to the current time, and so are the
|
|
.I st_ctime
|
|
and
|
|
.I st_mtime
|
|
fields of the
|
|
parent directory.
|
|
Otherwise, if the file is modified because of the
|
|
.B O_TRUNC
|
|
flag, its st_ctime and st_mtime fields are set to the current time.
|
|
.\"
|
|
.\"
|
|
.SS Open file descriptions
|
|
The term open file description is the one used by POSIX to refer to the
|
|
entries in the system-wide table of open files.
|
|
In other contexts, this object is
|
|
variously also called an "open file object",
|
|
a "file handle", an "open file table entry",
|
|
or\(emin kernel-developer parlance\(ema
|
|
.IR "struct file" .
|
|
|
|
When a file descriptor is duplicated (using
|
|
.BR dup (2)
|
|
or similar),
|
|
the duplicate refers to the same open file description
|
|
as the original file descriptor,
|
|
and the two file descriptors consequently share
|
|
the file offset and file status flags.
|
|
Such sharing can also occur between processes:
|
|
a child process created via
|
|
.BR fork (2)
|
|
inherits duplicates of its parent's file descriptors,
|
|
and those duplicates refer to the same open file descriptions.
|
|
|
|
Each
|
|
.BR open (2)
|
|
of a file creates a new open file description;
|
|
thus, there may be multiple open file descriptions
|
|
corresponding to a file inode.
|
|
.\"
|
|
.\"
|
|
.SS Synchronized I/O
|
|
The POSIX.1-2008 "synchronized I/O" option
|
|
specifies different variants of synchronized I/O,
|
|
and specifies the
|
|
.BR open ()
|
|
flags
|
|
.BR O_SYNC ,
|
|
.BR O_DSYNC ,
|
|
and
|
|
.BR O_RSYNC
|
|
for controlling the behavior.
|
|
Regardless of whether an implementation supports this option,
|
|
it must at least support the use of
|
|
.BR O_SYNC
|
|
for regular files.
|
|
|
|
Linux implements
|
|
.BR O_SYNC
|
|
and
|
|
.BR O_DSYNC ,
|
|
but not
|
|
.BR O_RSYNC .
|
|
(Somewhat incorrectly, glibc defines
|
|
.BR O_RSYNC
|
|
to have the same value as
|
|
.BR O_SYNC .)
|
|
|
|
.BR O_SYNC
|
|
provides synchronized I/O
|
|
.I file
|
|
integrity completion,
|
|
meaning write operations will flush data and all associated metadata
|
|
to the underlying hardware.
|
|
.BR O_DSYNC
|
|
provides synchronized I/O
|
|
.I data
|
|
integrity completion,
|
|
meaning write operations will flush data
|
|
to the underlying hardware,
|
|
but will only flush metadata updates that are required
|
|
to allow a subsequent read operation to complete successfully.
|
|
Data integrity completion can reduce the number of disk operations
|
|
that are required for applications that don't need the guarantees
|
|
of file integrity completion.
|
|
|
|
To understand the difference between the two types of completion,
|
|
consider two pieces of file metadata:
|
|
the file last modification timestamp
|
|
.RI ( st_mtime )
|
|
and the file length.
|
|
All write operations will update the last file modification timestamp,
|
|
but only writes that add data to the end of the
|
|
file will change the file length.
|
|
The last modification timestamp is not needed to ensure that
|
|
a read completes successfully, but the file length is.
|
|
Thus,
|
|
.BR O_DSYNC
|
|
would only guarantee to flush updates to the file length metadata
|
|
(whereas
|
|
.BR O_SYNC
|
|
would also always flush the last modification timestamp metadata).
|
|
|
|
Before Linux 2.6.33, Linux implemented only the
|
|
.BR O_SYNC
|
|
flag for
|
|
.BR open ().
|
|
However, when that flag was specified,
|
|
most filesystems actually provided the equivalent of synchronized I/O
|
|
.I data
|
|
integrity completion (i.e.,
|
|
.BR O_SYNC
|
|
was actually implemented as the equivalent of
|
|
.BR O_DSYNC ).
|
|
|
|
Since Linux 2.6.33, proper
|
|
.BR O_SYNC
|
|
support is provided.
|
|
However, to ensure backward binary compatibility,
|
|
.BR O_DSYNC
|
|
was defined with the same value as the historical
|
|
.BR O_SYNC ,
|
|
and
|
|
.BR O_SYNC
|
|
was defined as a new (two-bit) flag value that includes the
|
|
.BR O_DSYNC
|
|
flag value.
|
|
This ensures that applications compiled against
|
|
new headers get at least
|
|
.BR O_DSYNC
|
|
semantics on pre-2.6.33 kernels.
|
|
.\"
|
|
.\"
|
|
.SS NFS
|
|
There are many infelicities in the protocol underlying NFS, affecting
|
|
amongst others
|
|
.BR O_SYNC " and " O_NDELAY .
|
|
|
|
On NFS filesystems with UID mapping enabled,
|
|
.BR open ()
|
|
may
|
|
return a file descriptor but, for example,
|
|
.BR read (2)
|
|
requests are denied
|
|
with \fBEACCES\fP.
|
|
This is because the client performs
|
|
.BR open ()
|
|
by checking the
|
|
permissions, but UID mapping is performed by the server upon
|
|
read and write requests.
|
|
.\"
|
|
.\"
|
|
.SS File access mode
|
|
Unlike the other values that can be specified in
|
|
.IR flags ,
|
|
the
|
|
.I "access mode"
|
|
values
|
|
.BR O_RDONLY ", " O_WRONLY ", and " O_RDWR
|
|
do not specify individual bits.
|
|
Rather, they define the low order two bits of
|
|
.IR flags ,
|
|
and are defined respectively as 0, 1, and 2.
|
|
In other words, the combination
|
|
.B "O_RDONLY | O_WRONLY"
|
|
is a logical error, and certainly does not have the same meaning as
|
|
.BR O_RDWR .
|
|
|
|
Linux reserves the special, nonstandard access mode 3 (binary 11) in
|
|
.I flags
|
|
to mean:
|
|
check for read and write permission on the file and return a descriptor
|
|
that can't be used for reading or writing.
|
|
This nonstandard access mode is used by some Linux drivers to return a
|
|
descriptor that is to be used only for device-specific
|
|
.BR ioctl (2)
|
|
operations.
|
|
.\" See for example util-linux's disk-utils/setfdprm.c
|
|
.\" For some background on access mode 3, see
|
|
.\" http://thread.gmane.org/gmane.linux.kernel/653123
|
|
.\" "[RFC] correct flags to f_mode conversion in __dentry_open"
|
|
.\" LKML, 12 Mar 2008
|
|
.\"
|
|
.\"
|
|
.SS Rationale for openat() and other "directory file descriptor" APIs
|
|
.BR openat ()
|
|
and the other system calls and library functions that take
|
|
a directory file descriptor argument
|
|
(i.e.,
|
|
.BR execveat (2),
|
|
.BR faccessat (2),
|
|
.BR fanotify_mark (2),
|
|
.BR fchmodat (2),
|
|
.BR fchownat (2),
|
|
.BR fstatat (2),
|
|
.BR futimesat (2),
|
|
.BR linkat (2),
|
|
.BR mkdirat (2),
|
|
.BR mknodat (2),
|
|
.BR name_to_handle_at (2),
|
|
.BR readlinkat (2),
|
|
.BR renameat (2),
|
|
.BR symlinkat (2),
|
|
.BR unlinkat (2),
|
|
.BR utimensat (2)
|
|
.BR mkfifoat (3),
|
|
and
|
|
.BR scandirat (3))
|
|
are supported
|
|
for two reasons.
|
|
Here, the explanation is in terms of the
|
|
.BR openat ()
|
|
call, but the rationale is analogous for the other interfaces.
|
|
|
|
First,
|
|
.BR openat ()
|
|
allows an application to avoid race conditions that could
|
|
occur when using
|
|
.BR open ()
|
|
to open files in directories other than the current working directory.
|
|
These race conditions result from the fact that some component
|
|
of the directory prefix given to
|
|
.BR open ()
|
|
could be changed in parallel with the call to
|
|
.BR open ().
|
|
Suppose, for example, that we wish to create the file
|
|
.I path/to/xxx.dep
|
|
if the file
|
|
.I path/to/xxx
|
|
exists.
|
|
The problem is that between the existence check and the file creation step,
|
|
.I path
|
|
or
|
|
.I to
|
|
(which might be symbolic links)
|
|
could be modified to point to a different location.
|
|
Such races can be avoided by
|
|
opening a file descriptor for the target directory,
|
|
and then specifying that file descriptor as the
|
|
.I dirfd
|
|
argument of (say)
|
|
.BR fstatat (2)
|
|
and
|
|
.BR openat ().
|
|
|
|
Second,
|
|
.BR openat ()
|
|
allows the implementation of a per-thread "current working
|
|
directory", via file descriptor(s) maintained by the application.
|
|
(This functionality can also be obtained by tricks based
|
|
on the use of
|
|
.IR /proc/self/fd/ dirfd,
|
|
but less efficiently.)
|
|
.\"
|
|
.\"
|
|
.SS O_DIRECT
|
|
.LP
|
|
The
|
|
.B O_DIRECT
|
|
flag may impose alignment restrictions on the length and address
|
|
of user-space buffers and the file offset of I/Os.
|
|
In Linux alignment
|
|
restrictions vary by filesystem and kernel version and might be
|
|
absent entirely.
|
|
However there is currently no filesystem\-independent
|
|
interface for an application to discover these restrictions for a given
|
|
file or filesystem.
|
|
Some filesystems provide their own interfaces
|
|
for doing so, for example the
|
|
.B XFS_IOC_DIOINFO
|
|
operation in
|
|
.BR xfsctl (3).
|
|
.LP
|
|
Under Linux 2.4, transfer sizes, and the alignment of the user buffer
|
|
and the file offset must all be multiples of the logical block size
|
|
of the filesystem.
|
|
Since Linux 2.6.0, alignment to the logical block size of the
|
|
underlying storage (typically 512 bytes) suffices.
|
|
The logical block size can be determined using the
|
|
.BR ioctl (2)
|
|
.B BLKSSZGET
|
|
operation or from the shell using the command:
|
|
|
|
blockdev \-\-getss
|
|
.LP
|
|
.B O_DIRECT
|
|
I/Os should never be run concurrently with the
|
|
.BR fork (2)
|
|
system call,
|
|
if the memory buffer is a private mapping
|
|
(i.e., any mapping created with the
|
|
.BR mmap (2)
|
|
.BR MAP_PRIVATE
|
|
flag;
|
|
this includes memory allocated on the heap and statically allocated buffers).
|
|
Any such I/Os, whether submitted via an asynchronous I/O interface or from
|
|
another thread in the process,
|
|
should be completed before
|
|
.BR fork (2)
|
|
is called.
|
|
Failure to do so can result in data corruption and undefined behavior in
|
|
parent and child processes.
|
|
This restriction does not apply when the memory buffer for the
|
|
.B O_DIRECT
|
|
I/Os was created using
|
|
.BR shmat (2)
|
|
or
|
|
.BR mmap (2)
|
|
with the
|
|
.B MAP_SHARED
|
|
flag.
|
|
Nor does this restriction apply when the memory buffer has been advised as
|
|
.B MADV_DONTFORK
|
|
with
|
|
.BR madvise (2),
|
|
ensuring that it will not be available
|
|
to the child after
|
|
.BR fork (2).
|
|
.LP
|
|
The
|
|
.B O_DIRECT
|
|
flag was introduced in SGI IRIX, where it has alignment
|
|
restrictions similar to those of Linux 2.4.
|
|
IRIX has also a
|
|
.BR fcntl (2)
|
|
call to query appropriate alignments, and sizes.
|
|
FreeBSD 4.x introduced
|
|
a flag of the same name, but without alignment restrictions.
|
|
.LP
|
|
.B O_DIRECT
|
|
support was added under Linux in kernel version 2.4.10.
|
|
Older Linux kernels simply ignore this flag.
|
|
Some filesystems may not implement the flag and
|
|
.BR open ()
|
|
will fail with
|
|
.B EINVAL
|
|
if it is used.
|
|
.LP
|
|
Applications should avoid mixing
|
|
.B O_DIRECT
|
|
and normal I/O to the same file,
|
|
and especially to overlapping byte regions in the same file.
|
|
Even when the filesystem correctly handles the coherency issues in
|
|
this situation, overall I/O throughput is likely to be slower than
|
|
using either mode alone.
|
|
Likewise, applications should avoid mixing
|
|
.BR mmap (2)
|
|
of files with direct I/O to the same files.
|
|
.LP
|
|
The behavior of
|
|
.B O_DIRECT
|
|
with NFS will differ from local filesystems.
|
|
Older kernels, or
|
|
kernels configured in certain ways, may not support this combination.
|
|
The NFS protocol does not support passing the flag to the server, so
|
|
.B O_DIRECT
|
|
I/O will bypass the page cache only on the client; the server may
|
|
still cache the I/O.
|
|
The client asks the server to make the I/O
|
|
synchronous to preserve the synchronous semantics of
|
|
.BR O_DIRECT .
|
|
Some servers will perform poorly under these circumstances, especially
|
|
if the I/O size is small.
|
|
Some servers may also be configured to
|
|
lie to clients about the I/O having reached stable storage; this
|
|
will avoid the performance penalty at some risk to data integrity
|
|
in the event of server power failure.
|
|
The Linux NFS client places no alignment restrictions on
|
|
.B O_DIRECT
|
|
I/O.
|
|
.PP
|
|
In summary,
|
|
.B O_DIRECT
|
|
is a potentially powerful tool that should be used with caution.
|
|
It is recommended that applications treat use of
|
|
.B O_DIRECT
|
|
as a performance option which is disabled by default.
|
|
.PP
|
|
.RS
|
|
"The thing that has always disturbed me about O_DIRECT is that the whole
|
|
interface is just stupid, and was probably designed by a deranged monkey
|
|
on some serious mind-controlling substances."\(emLinus
|
|
.RE
|
|
.SH BUGS
|
|
Currently, it is not possible to enable signal-driven
|
|
I/O by specifying
|
|
.B O_ASYNC
|
|
when calling
|
|
.BR open ();
|
|
use
|
|
.BR fcntl (2)
|
|
to enable this flag.
|
|
.\" FIXME . Check bugzilla report on open(O_ASYNC)
|
|
.\" See http://bugzilla.kernel.org/show_bug.cgi?id=5993
|
|
|
|
One must check for two different error codes,
|
|
.B EISDIR
|
|
and
|
|
.BR ENOENT ,
|
|
when trying to determine whether the kernel supports
|
|
.B O_TMPFILE
|
|
functionality.
|
|
.SH SEE ALSO
|
|
.BR chmod (2),
|
|
.BR chown (2),
|
|
.BR close (2),
|
|
.BR dup (2),
|
|
.BR fcntl (2),
|
|
.BR link (2),
|
|
.BR lseek (2),
|
|
.BR mknod (2),
|
|
.BR mmap (2),
|
|
.BR mount (2),
|
|
.BR open_by_handle_at (2),
|
|
.BR read (2),
|
|
.BR socket (2),
|
|
.BR stat (2),
|
|
.BR umask (2),
|
|
.BR unlink (2),
|
|
.BR write (2),
|
|
.BR fopen (3),
|
|
.BR fifo (7),
|
|
.BR path_resolution (7),
|
|
.BR symlink (7)
|