352 lines
15 KiB
HTML
352 lines
15 KiB
HTML
<!--startcut ==============================================-->
|
|
<!-- *** BEGIN HTML header *** -->
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
|
|
<HTML><HEAD>
|
|
<title>Large Objects and Perl DBI LG #50</title>
|
|
</HEAD>
|
|
<BODY BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#0000AF"
|
|
ALINK="#FF0000">
|
|
<!-- *** END HTML header *** -->
|
|
|
|
<!-- *** BEGIN navbar *** -->
|
|
<A HREF="index.html"><IMG ALT="[ Table of Contents ]"
|
|
SRC="../gx/indexnew.gif" WIDTH=163 HEIGHT=60 ALIGN=bottom ></A>
|
|
<A HREF="../index.html"><IMG ALT="[ Front Page ]"
|
|
SRC="../gx/homenew.gif" WIDTH=163 HEIGHT=60 ALIGN=bottom></A>
|
|
<A HREF="lamb.html"><IMG ALT="[ Prev ]" SRC="../gx/back2.gif" WIDTH=41 HEIGHT=60 ALIGN=bottom></A>
|
|
<A HREF="../faq/index.html"><IMG ALT="[ Linux Gazette FAQ ]"
|
|
SRC="./../gx/dennis/faq.gif"WIDTH=163 HEIGHT=60 ALIGN=bottom></A>
|
|
<A HREF="nielsen2.html"><IMG ALT="[ Next ]" SRC="../gx/fwd.gif" WIDTH=41 HEIGHT=60 ALIGN=bottom ></A>
|
|
<!-- *** END navbar *** -->
|
|
|
|
<!--endcut ============================================================-->
|
|
|
|
<H4>
|
|
"Linux Gazette...<I>making Linux just a little more fun!</I>"
|
|
</H4>
|
|
|
|
<P> <HR> <P>
|
|
<!--===================================================================-->
|
|
|
|
<center>
|
|
<H1><font color="maroon">Large Objects and Perl DBI</font></H1>
|
|
<H4>By <a href="mailto:zing@tcu-inc.com">Mark Nielsen</a></H4>
|
|
</center>
|
|
<P> <HR> <P>
|
|
|
|
<!-- END header -->
|
|
|
|
|
|
|
|
|
|
If this document changes, it will be available here:
|
|
<a href="http://genericbooks.com/Literature/Articles/2">
|
|
http://genericbooks.com/Literature/Articles/2
|
|
</a>.
|
|
|
|
<h2>Index:</h2>
|
|
|
|
<ol>
|
|
<li><a href="#Resources"> Resources</a>
|
|
<li><a href="#Intro"> Introduction</a>
|
|
<li><a href="#DBI"> Using Perl DBI</a>
|
|
<li><a href="#Large"> Using Large Objects and Perl</a>
|
|
<li><a href="#Web">Using the database server with a webserver.</a>
|
|
<li><a href="#Comments"> Comments</a>
|
|
</ol>
|
|
<hr> <a name="Resources"></a> <h2>Resources</h2>
|
|
I assume that you have DBI and DBD::Pg installed with Perl.
|
|
<ol>
|
|
<li> man DBI
|
|
<li> man DBD::Pg
|
|
<li> <a href="ftp://ftp.perl.com/CPAN/modules/by-category/07_Database_Interfaces/">
|
|
ftp://ftp.perl.com/CPAN/modules/by-category/07_Database_Interfaces/
|
|
</a>
|
|
<li> <a href="http://www.postgresql.org/cgi/htsearch?config=pgsql-questions&restrict=&exclude=&method=and&format=builtin-long&sort=score&words=large+object+dbi"> A search for "large" and "dbi"</a> on one of the postgresql
|
|
newsgroups.
|
|
<li> <a href="http://www.postgresql.org/doxlist.html"> http://www.postgresql.org/doxlist.html</a> -- A chapter on Large Objects in the Programmer's
|
|
Guide for PostgreSQL.
|
|
<li> <a href="http://www.postgresql.org/docs/programmer/largeobjects3386.htm"> Two examples using psql</a> which explain how to use large objects.
|
|
</ol>
|
|
|
|
<hr> <a name="Intro"></a> <h2>Introduction</h2>
|
|
I work at ZING Is Not GNU. We sell and distribute free books or literature.
|
|
Well, besides selling books inexpensively, we also want people to download
|
|
the books. This creates a problem when you need to take megabytes of data
|
|
and put them into a database. I have programmed putting large amounts
|
|
of data into databases before, but only 100k or less of data per transaction.
|
|
This was the first time I needed to put really large files into a database.
|
|
My biggest problem was the fact it really isn't
|
|
documented well when you use Perl. Also, I never needed to use
|
|
Large Objects with PostgreSQL before. I have used other database servers
|
|
for Large Objects, but not PostgreSQL.
|
|
<p>
|
|
What are Large Objects? Large Objects are "things" (text or binary data) which
|
|
cannot fit into a normal field in a table. For example, a 100 megabyte file
|
|
cannot really fit into a field in a table.
|
|
<p>
|
|
What were the technologies I was using? RedHat Linux, postgresql-6.5.3,
|
|
Perl 5 (with DBI, DBD::Pg, and Pg), and apache_1.3.9.
|
|
<p>
|
|
In general, how do you use Large Objects in PostgreSQL? You can save
|
|
large objects like "files" where the database server will let you input
|
|
a file and it will give you a number in return, and when you want to
|
|
retrieve the data, you use the number to export your data to a temporary
|
|
file on your hard drive. It is kind of weird. In order to extract a Large
|
|
Object, it sort of takes two steps. First, you copy the data to a file
|
|
on your computer, and then you can read it. The problem is, from my
|
|
point of view, is that data normally is read once. Here, it is read twice,
|
|
to make the file, and then to read the file. There isn't any way around this
|
|
(as far I as know), but I am just new to Large Objects in PostgreSQL (I have
|
|
used them in other database servers before though), so perhaps there
|
|
are better ways.
|
|
|
|
|
|
|
|
<hr> <a name="DBI"></a> <h2>Using Perl DBI</h2>
|
|
<a href="http://www.perl.com">
|
|
PERL</a> is a programming language used by many web and database professionals.
|
|
It took a couple of years, but now people are finally recognizing that
|
|
free and open software, like
|
|
<a href="http://www.perl.com">PERL</a>,
|
|
<a href="http://www.php.net">PHP</a>, and
|
|
<a href="http://www.python.org">Python</a>
|
|
can be used for commercial companies
|
|
as an alternative to bloated, inefficient, and unstable commercial
|
|
programming languages (which most of the time you don't have source code
|
|
for, and hence, you can never be sure what you are using truly does
|
|
everything it claims to do -- there is one company I am
|
|
talking about in particular). Being a contractor, I have noticed an
|
|
explosion of Perl jobs (it pleases me greatly to see the rise of ethically
|
|
clean software), and hence, I am sure this will be useful to other contractors
|
|
out there who would prefer to use PostgreSQL over other alternatives.
|
|
Perl DBI is a generic database interface for all database
|
|
servers or files. In order to connect to a specific database, you also
|
|
need a Perl DBD driver. For example, to connect to PostgreSQL, I use Perl
|
|
DBI and Perl DBD:Pg. Here is a brief example of how I print out
|
|
the first and last names from a PostgreSQL database using the table
|
|
"people".
|
|
<pre>
|
|
#!/usr/bin/perl
|
|
|
|
use DBI;
|
|
use vars qw($dbh);
|
|
### zing is the database I am connecting to.
|
|
$dbh ||= DBI->connect("dbi:Pg:dbname=zing");
|
|
|
|
my $Command = "select first_name,last_name from people
|
|
sort by last_name,first_name";
|
|
|
|
my $sth = $dbh->prepare($Command);
|
|
my $Result = $sth->execute;
|
|
|
|
while (my @row_ary = $sth->fetchrow_array)
|
|
{print "<br> $row_ary[0] $row_ary[1]\n";}
|
|
</pre>
|
|
The problem with PERL DBI is that the Large Object interface isn't
|
|
fined tuned yet, and you have to use the specific DBD driver for each
|
|
type of database. The problem with DBD::Pg is that Large Objects is
|
|
not well documented, and it took me a long time looking through
|
|
newsgroups to find the answer I was looking for. Eventually, I also
|
|
did manage to find
|
|
<a href="http://www.postgresql.org/docs/programmer/largeobjects3386.htm">
|
|
this note on Large Objects</a>.
|
|
|
|
<hr> <a name="Large"></a> <h2>Using Large Objects and Perl</h2>
|
|
ONE WARNING: The reason why temporary files are actually a good thing is
|
|
for the following reason, for big files, if you suck the data all into
|
|
memory at once, Perl running under Apache using mod_perl will use up
|
|
a lot of system memory and it won't give it back (even though Perl will
|
|
reuse the memory itself). Imagine if you have 10 people downloading
|
|
10 megabyte files and your Perl script loads the files into memory
|
|
before it prints them (instead of just printing them line-by-line). Apache
|
|
will use 100 megabytes of system memory (actually Perl will) and it won't
|
|
give it back. This can be bad. In other Database servers using Large Objects,
|
|
I could load the data directly into memory. I would not want to do that
|
|
with really large files anyways. Read this
|
|
<a href="http://perl.apache.org/guide/performance.html">
|
|
performance guide</a> for more info.
|
|
<p>
|
|
|
|
Okay, how do we import and export large object from PostgreSQL?
|
|
This following example comes straight from the
|
|
<a href="http://www.postgresql.org/docs/programmer/largeobjects3386.htm">this
|
|
PostgreSQL documentation</a>. This example uses the program psql.
|
|
|
|
<pre>
|
|
CREATE TABLE image (
|
|
name text,
|
|
raster oid
|
|
);
|
|
|
|
INSERT INTO image (name, raster)
|
|
VALUES ('beautiful image', lo_import('/etc/motd'));
|
|
|
|
SELECT lo_export(image.raster, "/tmp/motd") from image
|
|
WHERE name = 'beautiful image';
|
|
</pre>
|
|
<p> Now, we need to convert this into perl. Here is a perl script which
|
|
would do the exact same thing.
|
|
<pre>
|
|
#!/usr/bin/perl
|
|
|
|
use vars qw($dbh);
|
|
$dbh ||= DBI->connect("dbi:Pg:dbname=zing");
|
|
|
|
my $Command = "INSERT INTO image (name, raster)
|
|
VALUES ('beautiful image', lo_import('/etc/motd'));";
|
|
|
|
my $sth = $dbh->prepare($Command);
|
|
$sth->execute();
|
|
|
|
|
|
$Command = "SELECT lo_export(image.raster, '/tmp/motd') from image
|
|
WHERE name = 'beautiful image'";
|
|
$sth = $dbh->prepare($Command);
|
|
$sth->execute();
|
|
</pre>
|
|
<p>
|
|
Those two examples do the following. The first command loads the file
|
|
"/etc/motd" into a table. The second command takes the data in the table
|
|
and exports it to a file called "/tmp/motd". If you want to get the "oid"
|
|
of data in the table and export it to a file, you could also do this.
|
|
<pre>
|
|
|
|
## This command will return a numeric "oid" number
|
|
$Command = "SELECT raster from image WHERE name = 'beautiful image'";
|
|
$sth = $dbh->prepare($Command);
|
|
$sth->execute();
|
|
my @row_ary = $sth->fetchrow_array;
|
|
my $Oid = $row_ary[0];
|
|
|
|
## This command will export the data with the "oid" to a file
|
|
$Command = "SELECT lo_export($Oid, '/tmp/motd') from image";
|
|
$sth = $dbh->prepare($Command);
|
|
$sth->execute();
|
|
</pre>
|
|
|
|
<p>
|
|
For live examples, ZING at
|
|
<a href="http://www.genericbooks.com">www.genericboosk.com</a>
|
|
has all of its Perl Scripts available for public viewing. Look for
|
|
scripts that view documents that are extracted from the database.
|
|
ZING now has setup scripts to let people upload and download documents,
|
|
so we should have real live examples floating around on the website.
|
|
|
|
<hr> <a name="Web"></a> <h2>Using the database server with a webserver.</h2>
|
|
Okay, what problems can you encounter when you use a webserver to
|
|
input/output the data from a database server? Well, the biggest problem
|
|
I saw, was that if the database server and the webserver run under two
|
|
different accounts (like "www" and "pgsql"), the webserver and database
|
|
server may have problem reading the others temporary files.
|
|
Here is a list of problems
|
|
and their solutions.
|
|
|
|
<ol>
|
|
<li> The webserver cannot upload the file to PostgreSQL because it doesn't
|
|
have super user privileges to use lo_import and lo_export. Well, give the
|
|
webserver its own database server (same account), give the webserver
|
|
super user status with the database server, or look into client-side
|
|
commands that I never had a chance to look at yet, concerning lo_import
|
|
and lo_export.
|
|
<li> The webserver cannot delete the exported files once the database
|
|
server has exported them and the webserver doesn't need them anymore.
|
|
Make the webserver and database server run under the same account,
|
|
setup a "group" where the webserver can delete the database server
|
|
files, or setup a cron job to delete the files every once in a while and
|
|
save the files by the pid of the webserver so that they will tend
|
|
to get overwritten by new processes later. ZING just saves files by their pid
|
|
number so that the next time a document is pulled, it will overwrite the
|
|
previous one (which isn't needed anymore).
|
|
</ol>
|
|
<p> At ZING, we export the files by the pid number of the child process
|
|
of the webserver. When the webserver needs to export another document,
|
|
it hands the process to one of its children, and if the child already
|
|
has exported a document before, it will just overwrite the previous one
|
|
which makes it so we don't end up with tons of exported undeleted files.
|
|
Then, once an hour, we go through and delete any files older than 15 minutes.
|
|
It is not as elegant as I would like, but it works fine.
|
|
<p>
|
|
In general, it is a bad idea to let your webserver have super user
|
|
status or to have the webserver and database server run under the same
|
|
username. For security, you don't want your webserver having the power
|
|
to blow away the database server. Oh well.
|
|
|
|
<hr> <a name="Comments"></a> <h2>Comments</h2>
|
|
I really don't like the way Large Objects are handled in PostgreSQL, or
|
|
any other database server I have used. There should be a way to treat
|
|
importing and exporting data from a database server as STDIN or STDOUT
|
|
where you can just grab the stuff line by line and not the whole darn
|
|
thing at once. I don't like the fact that we have to use intermediate files
|
|
to get to the data. It would be nice if we could choose to directly dump data
|
|
directly into memory or use temporary files if the data is too big to hold
|
|
in memory. CHOICE is the key word here.
|
|
What does this mean? Since PostgreSQL is the best
|
|
free and open database server out there, help the guys out by
|
|
<a href="http://www.postgresql.org/devel-contrib.html">
|
|
becoming a developer for PostgreSQL!</a> Perhaps there is a way to directly
|
|
dump Large Objects into memory using PostgreSQL, and if there is, write
|
|
a follow-up to this article and prove me wrong! Actually, being able to
|
|
get Large Objects one line at a time would really be cool.
|
|
<p>
|
|
Anyways, I will try to find better solutions, so
|
|
<a href="mailto:zing@tcu-inc.com">let me know</a> if you hear
|
|
of any!
|
|
<p>
|
|
Large Objects have always been a pain for me. I have always wanted to
|
|
use Large Objects in PostgreSQL. Now that I had a reason, I finally
|
|
did it. Hopefully it will save other people headaches. If it saved you
|
|
from headaches, donate time or money to
|
|
<a href="http://www.genericbooks.com">ZING</a>, or do something charitable
|
|
for some other cause! If we all do a little, it makes a big impact.
|
|
<p>
|
|
My next goal is to do handle Large Objects in PostgreSQL using the
|
|
programming languages PHP and Python. After that, I want to see if there
|
|
is a way to use Large Objects without being a super user.
|
|
After that, I want to compare this
|
|
to MySQL. For licensing reasons and since PostgreSQL has always been
|
|
100% free and open, I prefer PostgreSQL over MySQL. However I want to
|
|
compare and contrast them to help make PostgreSQL better. Recently, MySQL
|
|
has relaxed its license, but I am going to stick with PostgreSQL since
|
|
they have always had the best license.
|
|
<hr>
|
|
|
|
<a href="http://www.tcu-inc.com/mark">Mark Nielsen</a> works for
|
|
<a href="http://www.tcu-inc.com">The Computer Underground</a> as a
|
|
receptionist and as a
|
|
book binder at <a href="http://www.genericbooks.com">
|
|
ZING</a>.
|
|
In his spare time, he does volunteer stuff, like writing
|
|
these documents for The Linux Gazette (and other magazines).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<!-- *** BEGIN copyright *** -->
|
|
<P> <hr> <P>
|
|
<H5 ALIGN=center>
|
|
|
|
Copyright © 2000, Mark Nielsen<BR>
|
|
Published in Issue 50 of <i>Linux Gazette</i>, February 2000</H5>
|
|
<!-- *** END copyright *** -->
|
|
|
|
<!--startcut ==========================================================-->
|
|
<P> <HR> <P>
|
|
<!-- *** BEGIN navbar *** -->
|
|
<A HREF="index.html"><IMG ALT="[ Table of Contents ]"
|
|
SRC="../gx/indexnew.gif" WIDTH=163 HEIGHT=60 ALIGN=bottom ></A>
|
|
<A HREF="../index.html"><IMG ALT="[ Front Page ]"
|
|
SRC="../gx/homenew.gif" WIDTH=163 HEIGHT=60 ALIGN=bottom></A>
|
|
<A HREF="lamb.html"><IMG ALT="[ Prev ]" SRC="../gx/back2.gif" WIDTH=41 HEIGHT=60 ALIGN=bottom></A>
|
|
<A HREF="../faq/index.html"><IMG ALT="[ Linux Gazette FAQ ]"
|
|
SRC="./../gx/dennis/faq.gif"WIDTH=163 HEIGHT=60 ALIGN=bottom></A>
|
|
<A HREF="nielsen2.html"><IMG ALT="[ Next ]" SRC="../gx/fwd.gif" WIDTH=41 HEIGHT=60 ALIGN=bottom ></A>
|
|
<!-- *** END navbar *** -->
|
|
</BODY></HTML>
|
|
<!--endcut ============================================================-->
|