LDP/LDP/guide/docbook/lkmpg/06-UsingProcForInput.sgml

<sect1><title>Using /proc For Input</title>

<!-- \label{proc-input
\index{Input\\using /proc for}
\index{/proc\\using for input}
\index{proc\\using for input} -->

<para>So far we have two ways to generate output from kernel modules: we can
register a device driver and <command>mknod</command> a device file, or we
can create a <filename role="directory">/proc</filename> file. This allows
the kernel module to tell us anything it likes. The only problem is that
there is no way for us to talk back. The first way we'll send input to kernel
modules will be by writing back to the <filename
role="directory">/proc</filename> file.</para>

<para>Because the proc filesystem was written mainly to allow the kernel to
report its situation to processes, there are no special provisions for input.
The <varname>struct proc_dir_entry</varname> doesn't include a pointer to an
input function, the way it includes a pointer to an output function. Instead,
to write into a <filename role="directory">/proc</filename> file, we need to
use the standard filesystem mechanism.</para>

<!-- \index{proc\_dir\_entry structure}
\index{struct proc\_dir\_entry} -->

<para><para>In Linux there is a standard mechanism for file system
registration. Since every file system has to have its own functions to handle
inode and file operations<footnote><para>The difference between the two is
that file operations deal with the file itself, and inode operations deal
with ways of referencing the file, such as creating links to
it.</para></footnote>, there is a special structure to hold pointers to all
those functions, <varname>struct inode_operations</varname>, which includes a
pointer to <varname>struct file_operations</varname>. In /proc, whenever we
register a new file, we're allowed to specify which <varname>struct
inode_operations</varname> will be used for access to it. This is the
mechanism we use, a <varname>struct inode_operations</varname> which includes
a pointer to a <varname>struct file_operations</varname> which includes
pointers to our <function>module_input</function> and
<function>module_output</function> functions.</para>

<!-- \index{file system registration}
\index{registration\\file system}
\index{struct inode\_operations}
\index{inode\_operations structure}
\index{struct file\_operations}
\index{file\_operations structure} -->

<para>It's important to note that the standard roles of read and write are
reversed in the kernel. Read functions are used for output, whereas write
functions are used for input. The reason for that is that read and write
refer to the user's point of view --- if a process reads something from the
kernel, then the kernel needs to output it, and if a process writes something
to the kernel, then the kernel receives it as input.</para>

<!-- \index{read\\in the kernel}
\index{write\\in the kernel} -->

<para>Another interesting point here is the
<function>module_permission</function> function. This function is called
whenever a process tries to do something with the <filename
role="directory">/proc</filename> file, and it can decide whether to allow
access or not. Right now it is only based on the operation and the uid of the
current user (as available in <varname>current</varname>, a pointer to a
structure which includes information on the currently running process), but
it could be based on anything we like, such as what other processes are doing
with the same file, the time of day, or the last input we received.</para>

<!-- \index{module\_permissions} \index{permissions} \index{current pointer}
\index{pointer\\current} -->

<para>The reason for <function>put_user</function> and
<function>get_user</function> is that Linux memory (under Intel architecture,
it may be different under some other processors) is segmented.  This means
that a pointer, by itself, does not reference a unique location in memory,
only a location in a memory segment, and you need to know which memory
segment it is to be able to use it. There is one memory segment for the
kernel, and one of each of the processes.</para>

<!-- \index{put\_user} \index{get\_user} \index{memory segments}
\index{segment\\memory} -->


<para>The only memory segment accessible to a process is its own, so when
writing regular programs to run as processes, there's no need to worry about
segments. When you write a kernel module, normally you want to access the
kernel memory segment, which is handled automatically by the system. However,
when the content of a memory buffer needs to be passed between the currently
running process and the kernel, the kernel function receives a pointer to the
memory buffer which is in the process segment. The
<function>put_user</function> and <function>get_user</function> macros allow
you to access that memory.</para>


<example><title>procfs.c</title>
<programlisting><![CDATA[
/* procfs.c -  create a "file" in /proc, which allows
 * both input and output. */

/* Copyright (C) 2001 by Peter Jay Salzman */


/* The necessary header files */

/* Standard in kernel modules */
#include <linux/kernel.h>   /* We're doing kernel work */
#include <linux/module.h>   /* Specifically, a module */

/* Necessary because we use proc fs */
#include <linux/proc_fs.h>


/* In 2.2.3 /usr/include/linux/version.h includes a
 * macro for this, but 2.0.35 doesn't - so I add it
 * here if necessary. */
#ifndef KERNEL_VERSION
#define KERNEL_VERSION(a,b,c) ((a)*65536+(b)*256+(c))
#endif


#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,2,0)
#include <asm/uaccess.h>  /* for get_user and put_user */
#endif

/* The module's file functions ********************** */


/* Here we keep the last message received, to prove
 * that we can process our input */
#define MESSAGE_LENGTH 80
static char Message[MESSAGE_LENGTH];


/* Since we use the file operations struct, we can't
 * use the special proc output provisions - we have to
 * use a standard read function, which is this function */
#if LINUX_VERSION_CODE &gt;= KERNEL_VERSION(2,2,0)
static ssize_t module_output(
    struct file *file,   /* The file read */
    char *buf, /* The buffer to put data to (in the
                * user segment) */
    size_t len,  /* The length of the buffer */
    loff_t *offset) /* Offset in the file - ignore */
#else
static int module_output(
    struct inode *inode, /* The inode read */
    struct file *file,   /* The file read */
    char *buf, /* The buffer to put data to (in the
                * user segment) */
    int len)  /* The length of the buffer */
#endif
{
  static int finished = 0;
  int i;
  char message[MESSAGE_LENGTH+30];

  /* We return 0 to indicate end of file, that we have
   * no more information. Otherwise, processes will
   * continue to read from us in an endless loop. */
  if (finished) {
    finished = 0;
    return 0;
  }

  /* We use put_user to copy the string from the kernel's
   * memory segment to the memory segment of the process
   * that called us. get_user, BTW, is
   * used for the reverse. */
  sprintf(message, "Last input:%s", Message);
  for(i=0; i&lt;len && message[i]; i++)
    put_user(message[i], buf+i);


  /* Notice, we assume here that the size of the message
   * is below len, or it will be received cut. In a real
   * life situation, if the size of the message is less
   * than len then we'd return len and on the second call
   * start filling the buffer with the len+1'th byte of
   * the message. */
  finished = 1;

  return i;  /* Return the number of bytes "read" */
}


/* This function receives input from the user when the
 * user writes to the /proc file. */
#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,2,0)
static ssize_t module_input(
    struct file *file,   /* The file itself */
    const char *buf,     /* The buffer with input */
    size_t length,       /* The buffer's length */
    loff_t *offset)      /* offset to file - ignore */
#else
static int module_input(
    struct inode *inode, /* The file's inode */
    struct file *file,   /* The file itself */
    const char *buf,     /* The buffer with the input */
    int length)          /* The buffer's length */
#endif
{
  int i;

  /* Put the input into Message, where module_output
   * will later be able to use it */
  for(i=0; i<MESSAGE_LENGTH-1 && i<length; i++)
#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,2,0)
    get_user(Message[i], buf+i);
  /* In version 2.2 the semantics of get_user changed,
   * it not longer returns a character, but expects a
   * variable to fill up as its first argument and a
   * user segment pointer to fill it from as the its
   * second.
   *
   * The reason for this change is that the version 2.2
   * get_user can also read an short or an int. The way
   * it knows the type of the variable it should read
   * is by using sizeof, and for that it needs the
   * variable itself.
   */
#else
    Message[i] = get_user(buf+i);
#endif
  Message[i] = '\0';  /* we want a standard, zero
                       * terminated string */

  /* We need to return the number of input characters
   * used */
  return i;
}


/* This function decides whether to allow an operation
 * (return zero) or not allow it (return a non-zero
 * which indicates why it is not allowed).
 *
 * The operation can be one of the following values:
 * 0 - Execute (run the "file" - meaningless in our case)
 * 2 - Write (input to the kernel module)
 * 4 - Read (output from the kernel module)
 *
 * This is the real function that checks file
 * permissions. The permissions returned by ls -l are
 * for referece only, and can be overridden here.
 */
static int module_permission(struct inode *inode, int op)
{
  /* We allow everybody to read from our module, but
   * only root (uid 0) may write to it */
  if (op == 4 || (op == 2 && current->euid == 0))
    return 0;

  /* If it's anything else, access is denied */
  return -EACCES;
}


/* The file is opened - we don't really care about
 * that, but it does mean we need to increment the
 * module's reference count. */
int module_open(struct inode *inode, struct file *file)
{
  MOD_INC_USE_COUNT;

  return 0;
}


/* The file is closed - again, interesting only because
 * of the reference count. */
#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,2,0)
int module_close(struct inode *inode, struct file *file)
#else
void module_close(struct inode *inode, struct file *file)
#endif
{
  MOD_DEC_USE_COUNT;

#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,2,0)
  return 0;  /* success */
#endif
}


/* Structures to register as the /proc file, with
 * pointers to all the relevant functions. ********** */


/* File operations for our proc file. This is where we
 * place pointers to all the functions called when
 * somebody tries to do something to our file. NULL
 * means we don't want to deal with something. */
static struct file_operations File_Ops_4_Our_Proc_File =
  {
    NULL,  /* lseek */
    module_output,  /* "read" from the file */
    module_input,   /* "write" to the file */
    NULL,  /* readdir */
    NULL,  /* select */
    NULL,  /* ioctl */
    NULL,  /* mmap */
    module_open,    /* Somebody opened the file */
#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,2,0)
    NULL,   /* flush, added here in version 2.2 */
#endif
    module_close,    /* Somebody closed the file */
    /* etc. etc. etc. (they are all given in
     * /usr/include/linux/fs.h). Since we don't put
     * anything here, the system will keep the default
     * data, which in Unix is zeros (NULLs when taken as
     * pointers). */
  };


/* Inode operations for our proc file. We need it so
 * we'll have some place to specify the file operations
 * structure we want to use, and the function we use for
 * permissions. It's also possible to specify functions
 * to be called for anything else which could be done to
 * an inode (although we don't bother, we just put
 * NULL). */
static struct inode_operations Inode_Ops_4_Our_Proc_File =
  {
    &File_Ops_4_Our_Proc_File,
    NULL, /* create */
    NULL, /* lookup */
    NULL, /* link */
    NULL, /* unlink */
    NULL, /* symlink */
    NULL, /* mkdir */
    NULL, /* rmdir */
    NULL, /* mknod */
    NULL, /* rename */
    NULL, /* readlink */
    NULL, /* follow_link */
    NULL, /* readpage */
    NULL, /* writepage */
    NULL, /* bmap */
    NULL, /* truncate */
    module_permission /* check for permissions */
  };


/* Directory entry */
static struct proc_dir_entry Our_Proc_File =
  {
    0, /* Inode number - ignore, it will be filled by
        * proc_register[_dynamic] */
    7, /* Length of the file name */
    "rw_test", /* The file name */
    S_IFREG | S_IRUGO | S_IWUSR,
    /* File mode - this is a regular file which
     * can be read by its owner, its group, and everybody
     * else. Also, its owner can write to it.
     *
     * Actually, this field is just for reference, it's
     * module_permission that does the actual check. It
     * could use this field, but in our implementation it
     * doesn't, for simplicity. */
    1,  /* Number of links (directories where the
         * file is referenced) */
    0, 0,  /* The uid and gid for the file -
            * we give it to root */
    80, /* The size of the file reported by ls. */
    &Inode_Ops_4_Our_Proc_File,
    /* A pointer to the inode structure for
     * the file, if we need it. In our case we
     * do, because we need a write function. */
    NULL
    /* The read function for the file. Irrelevant,
     * because we put it in the inode structure above */
  };


/* Module initialization and cleanup ******************* */

/* Initialize the module - register the proc file */
int init_module()
{
  /* Success if proc_register[_dynamic] is a success,
   * failure otherwise */
#if LINUX_VERSION_CODE >= KERNEL_VERSION(2,2,0)
  /* In version 2.2, proc_register assign a dynamic
   * inode number automatically if it is zero in the
   * structure , so there's no more need for
   * proc_register_dynamic
   */
  return proc_register(&proc_root, &Our_Proc_File);
#else
  return proc_register_dynamic(&proc_root, &Our_Proc_File);
#endif
}


/* Cleanup - unregister our file from /proc */
void cleanup_module()
{
  proc_unregister(&proc_root, Our_Proc_File.low_ino);
}

]]></programlisting>
</example>

</sect1>