Software Auto'ing mini-HOWTO Mark Hoebeke INRA Unité Statistique & Génome
Mark.Hoebeke(at)jouy.inra.fr
2003 Mark Hoebeke Copyright Permission is granted to copy, distribute and/or modify this document under the terms of the Open Publication License, version 2.0. 0.1 2003-07-22 mh First draft This mini-HOWTO aims to bootstrap developers in using the Gnu autoconf, automake and libtool utilities. Using a sample project, it describes a recipe to leverage the power of these tools to create distributable and portable programs. Pointers to more in-depth information are also provided.
Why bother with the Gnu Auto tools ? Our ego of software developers makes us all dream of zillions of users happily executing what our brains painstakingly gave birth to. Our legendary laziness makes us balk at maintaining a bunch of configuration files each tailored to make our programs compile on a specific hardware/software combination. Enter the Gnu Auto tools (which is the term used here to designate autoconf, automake, libtool). With a little effort, these utilities will allow us to roll our own tarballs which any user can the install by issuing the universal command triplet : ./configure make make install This mini-HOWTO is a guided tour of the steps it takes to adapt an existing software project to the Gnu Auto tools. It starts with the installation of these utilities and proceeds to describe how to organize a project's directory hierarchy to ease the job of writing the configuration files. Then each tool's main functionalities and usage are examined in turn.
Installing the Gnu Auto tools As the Gnu Auto tools are installed on a vast majority of systems, this section is provided for the unlucky few for which this is not the case. To determine if installation is needed, check the output of the following commands : autoconf --version automake --version libtooolize --version If each command prints out its version information, you can skip the remaining of this section. If one or more commands return with a command not found, they'll have to be installed.
Installing from binary packages Most of the Linux distributions include pre-packaged bundles of the Gnu Auto tools. For example, both Mandrake and RedHat provide RPMs called autoconf-*.rpm, automake-*.rpm and libtool-*.rpm (where the asterisk stands for the specific release number). Debian also has autoconf-*.deb, automake-*.deb and libtool-*.deb packages. Finally, Solaris .pkg files can also be found on the Sunfreeware site. Thus, installation can be performed with your favourite software management tool.
Installing from sources In case you prefer to install the Gnu Auto tools from the source tarballs, they can be fetched at your nearest GNU ftp mirror site (the list of mirrors is available at ).
A typical directory layout The average software project is made of different kinds of files: source code, documentation, data and possibly script files. Adopting a consistent directory layout separating the files according to their types greatly helps the overall compilation and installation management. The hierarchy that will be used throughout this document is as follows : projectname\ |---src\ |---doc\ |---data\ |---scripts\ where projectname denotes the toplevel directory of our project. If you wish to benefit of the Gnu Auto tools for an existing project, I suggest you create this directory layout and move the files to the appropriate locations. As an example, I'll take a small project whose goal is to generate a random string of four letters (A, C, G and T), given their occurrence probabilities and the total string length. The string preceded by a small header describing how it was generated Yes. This utility generates DNA sequences and the resulting output is a string in FASTA format which is one of the most used formats to store sequence information in bioinformatics.. Initially, the project comprises one source file main_seqgen.c (listed in ) and the Makefile to compile it into an executable called seqgen. Both of these files are located in the src directory. Here's the Makefile's contents: seqgen: main_seqgen.c gcc -o seqgen seqgen.c
Autoconf'ing The main goal of using Gnu Autoconf is to ensure that compilation will be correctly carried out on most platforms. For our toy project, at least one parameter has to be taken into account: the name of the C compiler. In the Makefile, it is hardcoded as being gcc. However, users may wish to use other compilers instead (icc, the Intel compiler, or cc, the Solaris C compiler). To achieve compiler-independence, we have to find a way to transform its name into a variable that will be substituted when configuring the compilation. That's exactly what Gnu Autoconf will do for us. The first step is to replace the name of the compiler with the corresponding variable name written a la Gnu Autoconf. Gnu Autoconf variables simply are strings surrounded with '@' characters. Moreover, Gnu Autoconf already provides a variable for the C compiler, namely @CC@. Hence, after transformation out Makefile becomes: seqgen: main_seqgen.c @CC@ -o seqgen seqgen.c Now how do we get subsitution of @CC@ with its correct value? This is taken care of by the configure script we are about to generate (remember, ./configure is the first command ran when installing a project from a source tarball). configure itself is generated from a file called confgure.in by the autoconf command. So we moved a step further but we still need to compose this configure.in file. We could do with our favourite text editor, but we're better of using the autoscan command which is part of Gnu Autoconf. All we have to do is to run autoscan in our project's root directory. This yields a configure.scan we can use to build our configure.in file. configure.scan provides a skeleton for the compilation configuration file configure.in. Let's rename configure.scan to configure.in and edit the latter. This configuration file contains a set of macros whose purpose is to detect what kind of machine our project is compiled on. Macros are written in the (exotic ?) m4 language but Gnu Autoconf comes with a library of macros covering the needs of most software projectsYou can access the list of existing macros by navigating in the Existing Tests node of the Gnu Autoconf manual available with the info autoconf command.. Regarding our project, we need to insert the specific macro for determining the available C compiler. This macro is called AC_PROG_CC and stores the name of the C compiler in the CC variable. All we have to do is to insert the line: AC_PROG_CC immediately below the line containing: dnl Checks for programs. The steps leading to the generation of the configure script are outlined below : cd to the project root directory. Run autoscan to generate configure.scan. Copy configure.scan to configure.in. Edit configure.in. Run autoconf to generate configure from configure.in. One last step is required before issuing the configure command. As a matter of fact, this command takes files having a .in suffix as input files, performs variable substution on them and generates files without the .in suffix as output. So we'd bette rename our Makefile to Makefile.in for it to be processed by configure. Running ./configure with no Makefile.in will cause any existing Makefile to be overwritten and emptied. Do not edit the Makefile anymore. Any modifications have to be made to the Makefile.in. If things turned out right, you should be able to cd to the src directory and issue the make command leading to the compilation of the executable using the locally available C compiler.
Automake'ing
Libtoolize'ing
Further information A more in-depth tutorial on the usage of the Gnu Auto tools can be found in the Gnu Autoconf, automake and libtool book by Gary V. Vaughan, Ben Elliston, Tom Tromey and Ian Lance Taylor, published by New Riders. Manuals for Gnu Autoconf and automake are also available on Gnu's Web site at .
Sample project <filename>main_seqgen.c</filename> file. #include <assert.h> #include <stdio.h> #include <stdlib.h> #include <string.h> // Nucleotide letter codes. static const char *a_opt = "-a"; static const char *c_opt = "-c"; static const char *g_opt = "-g"; static const char *t_opt = "-t"; static const char *seed_opt = "-s"; static const char *help_opt = "-h"; // Precision threshold for float comparisons. static const float epsilon = 1e-6; // Command-line usage string. static const char *usage_string="Usage : %s [-h] [-a fa -c fc -g fg -t fg]" " [-s seed] seqlen\n"; // FASTA sequence header string. static const char *fasta_header =">seqgen random sequence\n"; float max(float n1, float n2) { return (n1 > n2 ? n1 : n2); } int get_frequency(char **optstring, int index, const char *nucstring, int nuclen, float *freq) { int freq_ok = 0; if (!strncasecmp (nucstring, optstring[index], nuclen)) { *freq = atof (optstring[index+1]); if (*freq >= 0 && *freq <= 1.0) freq_ok = 1; } if (!freq_ok) *freq = -1.0; return freq_ok; } int compute_frequencies(float *freq_a, float *freq_c, float *freq_g, float *freq_t) { int freqs_ok = 0; int auto_freqs = 0; float sum_freqs = 0.0; if (*freq_a >= 0.0 && *freq_a <= 1.0) sum_freqs += *freq_a; else auto_freqs++; if (*freq_c >= 0.0 && *freq_c <= 1.0) sum_freqs += *freq_c; else auto_freqs++; if (*freq_g >= 0.0 && *freq_g <= 1.0) sum_freqs += *freq_g; else auto_freqs++; if (*freq_t >= 0.0 && *freq_t <= 1.0) sum_freqs += *freq_t; else auto_freqs++; if (*freq_a < 0) *freq_a = max ((1.0 - sum_freqs)/auto_freqs, 0.0); if (*freq_c < 0) *freq_c = max ((1.0 - sum_freqs)/auto_freqs, 0.0); if (*freq_g < 0) *freq_g = max ((1.0 - sum_freqs)/auto_freqs, 0.0); if (*freq_t < 0) *freq_t = max ((1.0 - sum_freqs)/auto_freqs, 0.0); if (fabs (*freq_a + *freq_c + *freq_g + *freq_t - 1.0) <= epsilon ) freqs_ok = 1; #ifdef DEBUG fprintf (stderr,"Frequencies: a = %f\tc = %f\tg = %f\tt = %f\n", *freq_a,*freq_c,*freq_g,*freq_t); #endif return (freqs_ok); } char * generate_sequence (int seqlen, float freq_a, float freq_c, float freq_g, float freq_t) { char *nuc_tab = "acgt"; float freq_tab[4]; freq_tab[0] = freq_a; freq_tab[1] = freq_tab[0] + freq_c; freq_tab[2] = freq_tab[1] + freq_g; freq_tab[3] = freq_tab[2] + freq_t; int i=0; int header_len=strlen (fasta_header); char *sequence=(char *) calloc (header_len+seqlen+2, sizeof (char)); strncpy (sequence, fasta_header, strlen (fasta_header)); if (sequence) { for (i = 0; i < seqlen; i++) { double randval = drand48 (); int index=0; while (randval > freq_tab[index]) index++; #ifdef DEBUG assert(index < 4); #endif sequence[header_len+i] = nuc_tab[index]; } sequence[header_len+seqlen]='\n'; } return sequence; } int main (int argc, char **argv) { float freq_a = -1.0; float freq_c = -1.0; float freq_g = -1.0; float freq_t = -1.0; int seed = getpid (); int seq_len = -1; if (argc < 2) { fprintf (stderr,usage_string,argv[0]); return (EXIT_FAILURE); } seq_len = atoi (argv[argc-1]); if (seq_len <= 0) { fprintf (stderr,"%s : sequence length must be positive.\n",argv[0]); return (EXIT_FAILURE); } int opt_ok=0; int index = 1; while (index < argc -1) { opt_ok=get_frequency (argv, index, a_opt, strlen (a_opt), &freq_a); if (opt_ok) index +=2; opt_ok=get_frequency (argv, index, c_opt, strlen (c_opt), &freq_c); if (opt_ok) index +=2; opt_ok=get_frequency (argv, index, g_opt, strlen (g_opt), &freq_g); if (opt_ok) index +=2; opt_ok=get_frequency (argv, index, t_opt, strlen (t_opt), &freq_t); if (opt_ok) index +=2; if (!strncasecmp (seed_opt, argv[index], strlen (seed_opt))) { seed = atoi (argv[index+1]); opt_ok = 1; } if (!opt_ok) { fprintf (stderr, "%s : unknown option %s.\n", argv[0], argv[index]); index++; } } if (!compute_frequencies (&freq_a,&freq_c,&freq_g,&freq_t)) { fprintf (stderr, "%s : wrong frequencies given.\n", argv[0]); return (EXIT_FAILURE); } srand48 (seed); char *sequence=generate_sequence(seq_len,freq_a,freq_c,freq_g,freq_t); if (!sequence) { fprintf (stderr, "%s : unable to generate sequence.\n", argv[0]); return (EXIT_FAILURE); } fprintf (stdout,"%s",sequence); free (sequence); return (EXIT_SUCCESS); }