LDP/LDP/guide/docbook/lkmpg-2.6/05-TheProcFileSystem.sgml

<sect1><title>The /proc File System</title>

	<indexterm><primary><filename role=directory>/proc</filename> filesystem</primary></indexterm>
	<indexterm><primary>filesystem</primary><secondary><filename role=directory>/proc</filename></secondary></indexterm>

	<para>
	In Linux, there is an additional mechanism for the kernel and kernel modules to send information to processes --- the
	<filename role="directory">/proc</filename> file system.  Originally designed to allow easy access to information about
	processes (hence the name), it is now used by every bit of the kernel which has something interesting to report, such as
	<filename>/proc/modules</filename> which provides the list of modules and <filename>/proc/meminfo</filename>
	which stats memory usage statistics.
	</para>

	<indexterm><primary><filename>/proc/modules</filename></primary></indexterm>
	<indexterm><primary><filename>/proc/meminfo</filename></primary></indexterm>

	<para>The method to use the proc file system is very similar to the one used with device drivers --- a structure is created
	with all the information needed for the <filename role="directory">/proc</filename> file, including pointers to any handler
	functions (in our case there is only one, the one called when somebody attempts to read from the <filename
	role="directory">/proc</filename> file). Then, <function>init_module</function> registers the structure with the kernel and
	<function>cleanup_module</function> unregisters it.</para>

	<para>The reason we use <function>proc_register_dynamic</function><footnote><para>In version 2.0, in version 2.2 this is done
	automatically if we set the inode to zero.</para></footnote> is because we don't want to determine the inode number
	used for our file in advance, but to allow the kernel to determine it to prevent clashes. Normal file systems are located on a
	disk, rather than just in memory (which is where <filename role="directory">/proc</filename> is), and in that case the inode
	number is a pointer to a disk location where the file's index-node (inode for short) is located. The inode contains
	information about the file, for example the file's permissions, together with a pointer to the disk location or locations
	where the file's data can be found.</para>

	<indexterm><primary><function>proc_register_dynamic</function></primary></indexterm>
	<indexterm><primary><function>proc_register</function></primary></indexterm>
	<indexterm><primary>inode</primary></indexterm>

	<para>
	Because we don't get called when the file is opened or closed, there's nowhere for us to put
	<varname>try_module_get</varname> and <varname>try_module_put</varname> in this module, and if
	the file is opened and then the module is removed, there's no way to avoid the consequences.
	</para>

        <para>
        Here a simple example showing how to use a /proc file. This is the HelloWorld for the /proc filesystem.
	There are three parts: create the file <filename>/proc/helloworld</filename> in the function <function>init_module</function>,
	return a value (and a buffer) when the file <filename>/proc/helloworld</filename> is read in the callback
	function <function>procfs_read</function>, and delete the file <filename>/proc/helloworld</filename>
	in the function <function>cleanup_module</function>.
	</para>

	<para>
        The <filename>/proc/helloworld</filename> is created when the module is loaded with the function
	<function>create_proc_entry</function>. The return value is a 'struct proc_dir_entry *', and it
	will be used to configure the file <filename>/proc/helloworld</filename> (for example, the owner of this file).
	A null return value means that the creation has failed.
	</para>

	<para>
        Each time, everytime the file <filename>/proc/helloworld</filename> is read, the function
	<function>procfs_read</function> is called.
	Two parameters of this function are very important: the buffer (the first parameter) and the offset (the third one).
	The content of the buffer will be returned to the application which read it (for example the cat command). The offset
	is the current position in the file. If the return value of the function isn't null, then this function is
	called again. So be careful with this function, if it never returns zero, the read function is called endlessly.
	</para>

	<screen>
% cat /proc/helloworld
HelloWorld!
        </screen>

        <example>
<title>procfs1.c</title>
<programlisting>
<inlinegraphic fileref="lkmpg-examples/05-TheProcFileSystem/procfs1.c" format="linespecific"/></inlinegraphic>
</programlisting>
        </example>

</sect1>


<sect1><title>Read and Write a /proc File</title>

        <para>
        We have seen a very simple example for a /proc file where we only read the file <filename>/proc/helloworld</filename>.
	It's also possible to write in a /proc file. It works the same way as read, a function is called when the /proc
	file is written. But there is a little difference with read, data comes from user, so you have to import data from
	user space to kernel space (with <function>copy_from_user</function> or <function>get_user</function>)
        </para>

	<indexterm><primary><function>get_user</function></primary></indexterm>
	<indexterm><primary><function>put_user</function></primary></indexterm>
	<indexterm><primary><function>copy_from_user</function></primary></indexterm>
	<indexterm><primary><function>copy_to_user</function></primary></indexterm>
	<indexterm><primary>memory segments</primary></indexterm>
	<indexterm><primary>segment</primary><secondary>memory</secondary></indexterm>

	<para>
	The reason for <function>copy_from_user</function> or <function>get_user</function> is that Linux memory (on Intel
	architecture, it may be different under some other processors) is segmented.  This means that a pointer, by itself,
	does not reference a unique location in memory, only a location in a memory segment, and you need to know which
	memory segment it is to be able to use it. There is one memory segment for the kernel, and one for each of the processes.
	</para>

	<para>
	The only memory segment accessible to a process is its own, so when writing regular programs to run as processes,
	there's no need to worry about segments. When you write a kernel module, normally you want to access the kernel memory
	segment, which is handled automatically by the system. However, when the content of a memory buffer needs to be passed between
	the currently running process and the kernel, the kernel function receives a pointer to the memory buffer which is in the
	process segment. The <function>put_user</function> and <function>get_user</function> macros allow you to access that
	memory. These functions handle only one caracter, you can handle several caracters with <function>copy_to_user</function> and
	<function>copy_from_user</function>. As the buffer (in read or write function) is in kernel space, for write function
	you need to import data because	it comes from user space, but not for the read function because data is already
	in kernel space.
	</para>

	<indexterm><primary>read</primary><secondary>in the kernel</secondary></indexterm>
	<indexterm><primary>write</primary><secondary>in the kernel</secondary></indexterm>

        <example>
<title>procfs2.c</title>
<programlisting>
<inlinegraphic fileref="lkmpg-examples/05-TheProcFileSystem/procfs2.c" format="linespecific"/></inlinegraphic>
</programlisting>
        </example>

</sect1>


<sect1><title>Manage /proc file with standard filesystem</title>

	<para>
	We have seen how to read and write a /proc file with the /proc interface. But it's also possible
	to manage /proc file with inodes. The main interest is to use advanced function, like permissions.
	</para>

	<para>
	In Linux, there is a standard mechanism for file system registration. Since every file system has to have its own
	functions to handle inode and file operations<footnote><para>The difference between the two is that file operations deal with
	the file itself, and inode operations deal with ways of referencing the file, such as creating links to it.</para></footnote>,
	there is a special structure to hold pointers to all those functions, <varname>struct inode_operations</varname>, which
	includes a pointer to <varname>struct file_operations</varname>. In /proc, whenever we register a new file, we're allowed to
	specify which <varname>struct inode_operations</varname> will be used to access to it. This is the mechanism we use, a
	<varname>struct inode_operations</varname> which includes a pointer to a <varname>struct file_operations</varname> which
	includes pointers to our <function>procfs_read</function> and <function>procfs_write</function> functions.
	</para>

	<indexterm><primary>filesystem</primary><secondary>registration</secondary></indexterm>
	<indexterm><primary>filesystem registration</primary></indexterm>
	<indexterm><primary><varname>struct inode_operations</varname></primary></indexterm>
	<indexterm><primary><varname>inode_operations</varname> structure</primary></indexterm>
	<indexterm><primary><varname>struct file_operations</varname></primary></indexterm>
	<indexterm><primary><varname>file_operations</varname> structure</primary></indexterm>

	<para>
	Another interesting point here is the <function>module_permission</function> function. This function is called whenever
	a process tries to do something with the <filename role="directory">/proc</filename> file, and it can decide whether to allow
	access or not. Right now it is only based on the operation and the uid of the current user (as available in
	<varname>current</varname>, a pointer to a structure which includes information on the currently running process), but it
	could be based on anything we like, such as what other processes are doing with the same file, the time of day, or the last
	input we received.</para>

	<indexterm><primary>pointer</primary><secondary>current</secondary></indexterm>
	<indexterm><primary>permission</primary></indexterm>
	<indexterm><primary><varname>module_permissions</varname></primary></indexterm>

	<para>
	It's important to note that the standard roles of read and write are reversed in the kernel. Read functions are used for
	output, whereas write functions are used for input. The reason for that is that read and write refer to the user's point of
	view --- if a process reads something from the kernel, then the kernel needs to output it, and if a process writes something
	to the kernel, then the kernel receives it as input.
	</para>

        <example>
<title>procfs3.c</title>
<programlisting>
<inlinegraphic fileref="lkmpg-examples/05-TheProcFileSystem/procfs3.c" format="linespecific"/></inlinegraphic>
</programlisting>
        </example>

	<para>
	Still hungry for procfs examples? Well, first of all keep in mind, there are rumors around, claiming
	that procfs is on it's way out, consider using sysfs instead. Second, if you really can't get enough,
	there's a highly recommendable bonus level for procfs below <filename> linux/Documentation/DocBook/ </filename>.
	Use <command> make help </command> in your toplevel kernel directory for instructions about how to convert it into
	your favourite format. Example: <command> make htmldocs </command>. Consider using this mechanism,
	in case you want to document something kernel related yourself.
	</para>

</sect1>

<sect1><title>Manage /proc file with seq_file</title>

        <indexterm><primary>seq_file</primary></indexterm>

	<para>
	As we have seen, writing a /proc file may be quite "complex". So to help people writting
	/proc file, there is an API named seq_file that helps formating a /proc file for output.
        It's based on sequence, which is composed of 3 functions: start(), next(), and stop().
        The seq_file API starts a sequence when a user read the /proc file.
	</para>

	<para>
	A sequence begins with the call of the function start(). If the return is a non NULL value,
	the function next() is called. This function is an iterator, the goal is to go thought
	all the data. Each time next() is called, the function show() is also called. It writes
	data values in the buffer read by the user. The function next() is called until it returns
	NULL. The sequence ends when next() returns NULL, then the function stop() is called.
	</para>

	<para>
	BE CARREFUL: when a sequence is finished, another one starts. That means that at the end
	of function stop(), the function start() is called again. This loop finishes when the
        function start() returns NULL. You can see a scheme of this in the figure "How seq_file works".
	</para>

	<figure>
		<title>How seq_file works</title>
		<graphic fileref="figures/seq_file.png"></graphic>
	</figure>

        <para>
        Seq_file provides basic functions for file_operations, as seq_read, seq_lseek, and some others.
        But nothing to write in the /proc file. Of course, you can still use the same way as in the
        previous example.
	</para>

	<example>
<title>procfs4.c</title>
<programlisting>
<inlinegraphic fileref="lkmpg-examples/05-TheProcFileSystem/procfs4.c" format="linespecific"/></inlinegraphic>
</programlisting>
	</example>

        <para>
	If you want more information, you can read this web page:
	</para>

	<itemizedlist>
		<listitem><para><ulink url="http://lwn.net/Articles/22355/"></ulink></para></listitem>
		<listitem><para><ulink url="http://www.kernelnewbies.org/documents/seq_file_howto.txt"></ulink></para></listitem>
	</itemizedlist>

	<para>
		You can also read the code of fs/seq_file.c in the linux kernel.
	</para>

</sect1>

<!--
vim:textwidth=128
-->