Looking at the code in mm/fadvise.c, we have
case POSIX_FADV_DONTNEED:
if (!inode_write_congested(mapping->host))
__filemap_fdatawrite_range(mapping, offset, endbyte,
WB_SYNC_NONE);
This suggests that *if* the backing device is not congested, then
__filemap_fdatawrite_range() is called. The comments for that
function say:
__filemap_fdatawrite_range - start writeback on mapping dirty pages in range
So, my reading of this is that *maybe* some dirty pages will be
written to the backing device by the time that POSIX_FADV_DONTNEED
gets to calling invalidate_mapping_pages() whose description says:
/**
* invalidate_mapping_pages - Invalidate all the unlocked pages of one inode
* @mapping: the address_space which holds the pages to invalidate
* @start: the offset 'from' which to invalidate
* @end: the offset 'to' which to invalidate (inclusive)
*
* This function only removes the unlocked pages, if you want to
* remove all the pages of one inode, you must call truncate_inode_pages.
*
* invalidate_mapping_pages() will not block on IO activity. It will not
* invalidate pages which are dirty, locked, under writeback or mapped into
* pagetables.
*/
So, my reading of this is that the handling of dirty pages is an
optimization. If some pages can be written in time, they will be
freed by POSIX_MADV_DONTFREE. But there are no guarantees.
All of that said, some experimentation suggests that, in a lot
of cases, POSIX_MADV_DONTFREE does often free dirty pages.
See https://bugzilla.kernel.org/show_bug.cgi?id=95421.
Reported-by: Maik Zumstrull <maik@zumstrull.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Details for various flags were hidden under NOTES.
Move them to DESCRIPTION, to make the details more
obvious.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Provide some notes to kernel developers considering how to choose
which capability should govern a new kernel feature.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The existing description was hard to understand. Break
it into a bullet list that separates out the details
in a manner that is easier to parse.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
The same information is described in two consecutive paragraphs.
Remove the shorter paragraph, leaving the longer one that
contains more information.
Reported-by: John Wiersba <jrw32982@yahoo.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
See https://bugzilla.kernel.org/show_bug.cgi?id=192801.
See also the glibc source file string/strfry.c, which shows
an example of this initialization:
if (!init)
{
static char state[32];
rdata.state = NULL;
__initstate_r (time ((time_t *) NULL) ^ getpid (),
state, sizeof (state), &rdata);
init = 1;
}
Reported-by: Jan Ziak <0xe2.0x9a.0x9b@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
This has not been true since Linux 2.6.22. The description
of EINTR maintains a reference to signal(7), which explains
the historical details.
See https://bugzilla.kernel.org/show_bug.cgi?id=192071
Reported-by: Fabjan Sukalia <fsukalia@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Remove bogus text saying that POSIX permits partial read
to return -1/EINTR on interrupt by a signal handler.
That statement already ceased to be true in SUSv1 (1995)!
See https://bugzilla.kernel.org/show_bug.cgi?id=193111
Reported-by: Steven Luo <steven@steven676.net>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
As reported by Kent:
The example input/output handler in this EXAMPLE is subject
to introducing subtle bugs if the input stream contains
literal null bytes.
Subsequently, there should be some warning that this occurs,
or an alternative using fwrite(3) might be better.
Change the example program to use fwrite(3).
See https://bugzilla.kernel.org/show_bug.cgi?id=192701
Reported-by: Kent Fredic <kentfredric@gmail.com>
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>