old-www/LDP/LG/issue70/williams.html

512 lines
23 KiB
HTML

<!--startcut ==============================================-->
<!-- *** BEGIN HTML header *** -->
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML><HEAD>
<title>Using Aggregate Functions and Operators in PostgreSQL LG #70</title>
</HEAD>
<BODY BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000FF" VLINK="#0000AF"
ALINK="#FF0000">
<!-- *** END HTML header *** -->
<CENTER>
<A HREF="http://www.linuxgazette.com/">
<IMG ALT="LINUX GAZETTE" SRC="../gx/lglogo.png"
WIDTH="600" HEIGHT="124" border="0"></A>
<BR>
<!-- *** BEGIN navbar *** -->
<IMG ALT="" SRC="../gx/navbar/left.jpg" WIDTH="14" HEIGHT="45" BORDER="0" ALIGN="bottom"><A HREF="tranter.html"><IMG ALT="[ Prev ]" SRC="../gx/navbar/prev.jpg" WIDTH="16" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="index.html"><IMG ALT="[ Table of Contents ]" SRC="../gx/navbar/toc.jpg" WIDTH="220" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><A HREF="../index.html"><IMG ALT="[ Front Page ]" SRC="../gx/navbar/frontpage.jpg" WIDTH="137" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="http://www.linuxgazette.com/cgi-bin/talkback/all.py?site=LG&article=http://www.linuxgazette.com/issue70/williams.html"><IMG ALT="[ Talkback ]" SRC="../gx/navbar/talkback.jpg" WIDTH="121" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><A HREF="../faq/index.html"><IMG ALT="[ FAQ ]" SRC="./../gx/navbar/faq.jpg"WIDTH="62" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="lg_backpage70.html"><IMG ALT="[ Next ]" SRC="../gx/navbar/next.jpg" WIDTH="15" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><IMG ALT="" SRC="../gx/navbar/right.jpg" WIDTH="15" HEIGHT="45" ALIGN="bottom">
<!-- *** END navbar *** -->
<P>
</CENTER>
<!--endcut ============================================================-->
<H4 ALIGN="center">
"Linux Gazette...<I>making Linux just a little more fun!</I>"
</H4>
<P> <HR> <P>
<!--===================================================================-->
<center>
<H1><font color="maroon">Using Aggregate Functions and Operators in PostgreSQL</font></H1>
<H4>By <a href="mailto:brw@brw.net">Branden R Williams</a></H4>
</center>
<P> <HR> <P>
<!-- END header -->
<b>Preface:</b><br>
This article assumes the reader can
do basic SELECT, INSERT, UPDATE, and DELETE queries to and from a SQL
database. If you are not sure on how these functions work, please read a
tutorial on how these types of queries work. Specifically if you can use
a SELECT query, then you are armed with enough information to read through
this document with a high level of understanding. That said, lets get on
to aggregate functions!
</p>
<p>
<b>Summary:</b><br>
In the beginning of this rather extensive article, I will cover how to use the five most common and
basic aggregate functions on PostgreSQL. Those functions are count(),
min(), max(), avg(), and sum(). Then I will cover how to use several common operators
that exist for your use in PostgreSQL. Depending on your development
environment, a good philosophy to practice is letting your DataBase
Management System (DBMS) craft your results so that they are immediately
usable in your code with little or no processing. Good examples for
the reasoning behind this philosophy are exhibited when using aggregates. Finally, I will cover how to use several common operators with our
aggregate function&nbsp;counterparts&nbsp;that exist for your use in PostgreSQL.
Depending on your development environment, a good philosophy to practice is
letting your DataBase Management System (DBMS) craft your results so that they
are immediately usable in your code with little or no processing. In this article, I will demonstrate how to use some simple
operators in your queries to craft data exactly as you need it.
</p>
<p>
<b>What is an aggregate function?</b><br>
An aggregate function is a function such as count() or sum() that you can
use to calculate totals. In writing expressions and in programming, you
can use SQL aggregate functions to determine various statistics and
values. Aggregate functions can greatly reduce the amount of coding that
you need to do in order to get information from your database.
</p>
<p>
(Excerpt from the PostgreSQL 7.1 manual)<br>
<tt>aggregate_name (expression)<br>
aggregate_name (ALL expression)<br>
aggregate_name (DISTINCT expression)<br>
aggregate_name ( * )<br></tt>
</p>
<p>
where aggregate_name is a previously defined aggregate, and expression is
any expression that does not itself contain an aggregate expression.
</p>
<p>
The first form of aggregate expression invokes the aggregate across all
input rows for which the given expression yields a non-NULL value.
(Actually, it is up to the aggregate function whether to ignore NULLs or
not --- but all the standard ones do.) The second form is the same as the
first, since ALL is the default. The third form invokes the aggregate for
all distinct non-NULL values of the expression found in the input rows.
The last form invokes the aggregate once for each input row regardless of
NULL or non-NULL values; since no particular input value is specified, it
is generally only useful for the count() aggregate function.
</p>
<p>
Consider this example. You are writing a program which tracks sales of
books. You have a table called the "sale" table that contains the book
title, book price, and date of purchase. You want to know what the total
amount of money that you made by selling books for the month of March
2001. Without aggregate functions, you would have to select all the rows
with a date of purchase in March 2001, iterate through them one by one to
calculate the total. Now if you only have 10 rows, this does not make a
big difference (and if you only sell 10 books a month you should hope
those are pretty high dollar!). But consider a bookstore that sells on
average 2000 books a month. Now iterating through each row one by one
does not sound so efficient does it?
</p>
<p>
With aggregate functions you can simply select the sum() of the book price
column for the month of March 2001. Your query will return one value and
you will not have to iterate through them in your code!
</p>
<p>
<b>The SUM() function.</b><br>
The sum() function is very useful as described in the above example.
Based on our fictitious table, consider the following.
</p>
<pre>
table sale (
book_title varchar(200),
book_price real,
date_of_purchase datetime
)
</pre>
<p>
Without aggregate functions:<br>
<tt>SELECT * FROM sale WHERE date_of_purchase BETWEEN '03/01/2001' AND
'04/01/2001';</tt>
</p>
<p>
This returns all rows which correspond to a sale in the month of March
2001.
</p>
<p>
With aggregate functions:<br>
<tt>SELECT SUM(book_price) AS total FROM sale WHERE date_of_purchase BETWEEN
'03/01/2001' AND '04/01/2001';</tt>
</p>
<p>
This returns a single row with a single column called total containing the
total books sold in the month of March 2001.
</p>
<p>
You can also use mathematical operators within the context of the sum()
function to add additional functionality. Say for instance, you wanted to
get the value of 20% of your sum of book_price as all of your books have a
20% markup built in to the price. Your aggregate would look like:
</p>
<p>
<tt>SELECT SUM(book_price) AS total, SUM(book_price * .2) AS profit FROM sale
WHERE date_of_purchase BETWEEN '03/01/2001' AND '04/01/2001';</tt>
</p>
<p>
If you look on a grander scale, you will see even more uses for the sum()
function. For example calculating commissions, generating detailed
reports, and generating running statistical totals. When writing a
report, it is much easier to have SQL do the math for you and simply
display the results than attempting to iterate through thousands or
millions of records.
</p>
<p>
<b>The count() function.</b><br>
Yet another useful aggregate function is count(). This function allows
you to return the number of rows that match a given criteria. Say for
example you have a database table that contains news items and you want to
display your current total of news items in the database without selecting
them all and iterating through them one by one. Simply do the following:
</p>
<p>
<tt>SELECT COUNT(*) AS myCount FROM news;</tt>
</p>
<p>
This will return the total number of news articles in your database.
</p>
<p>
<b>The MAX() and MIN() functions.</b><br>
These two functions will simply return the maximum or minimum value in a
given column. This may be useful if you want to very quickly know the
highest priced book you sold and the lowest price book you sold (back to
the bookstore scenario). That query would look like this.
</p>
<p>
<tt>SELECT MAX(book_price) AS highestPrice, MIN(book_price) AS lowestPrice
FROM sale WHERE date_of_purchase BETWEEN '03/01/2001' AND '04/01/2001';</tt>
</p>
<p>
Again, this simply prevents you from having to select EVERYTHING from the
database, iterate through each row one by one, and calculate the
result by hand.
</p>
<p>
<b>The AVG() function.</b><br>
This particular aggregate is definitely very useful. Any time you would
like to generate an average value for any number of fields, you can use
the avg() aggregate. Without aggregates, you would once again have to
iterate through all rows returned, sum up your column and take a count of
the number of rows, then do your math. In our bookstore example, say you
would like to calculate the average book price that was sold during March
2001. Your query would look like this.
</p>
<p>
<tt>SELECT AVG(book_price) AS avg_price FROM sale WHERE date_of_purchase
BETWEEN '03/01/2001' AND '04/01/2001';</tt>
</p>
<p>
<b>What is an operator?</b><br>
An operator is something that performs on operation or function on the
values that are around it. For an example of this, let's look at
Mathematical Operators. If you wanted to subtract the values from
two fields in a select statement, you would use the subtraction (-)
operator.
</p>
<p>
<tt>SELECT salesperson_name, revenue - cost AS commission FROM sales;</tt>
</p>
<p>
What will be returned is the results of the revenue each sales person
brought in minus the cost of the products that they sold which will yield
their commission amount.
</p>
<p>
<table border=1 cellpadding=5>
<tr><th>salesperson_name</th><th>commission</th></tr>
<tr><td>Branden Williams</td><td>234.43</td></tr>
<tr><td>Matt Springfield</td><td>87.74</td></tr>
</table>
</p>
<p>
Operators can be VERY useful when you have complex calculations or a
need to produce the exact results you need without having your script
do any text or math based processing.
</p>
<p>
Let's refer to our bookstore example. You are writing a program which
will show you the highest margin books (largest amount of profit per book)
so that your marketing monkey can place them closer to the door of the
store. Instead of doing your math on the fly while iterating through
your result set, you can have the result set display the correct
information for you.
</p>
<p>
<pre>table inventory (
book_title varchar(200),
book_cost real,
selling_price real
)</pre>
</p>
<p>
SELECT book_title, selling_price - book_cost AS profit ORDER BY
profit DESC;
</p>
<p>
Which will produce results similar to the following.
</p>
<p>
<table border=1 cellpadding=5>
<tr><th>book_title</th><th>profit</th></tr>
<tr><td>How To Scam Customers Into Buying Your Books</td><td>15.01</td></tr>
<tr><td>How To Crash Windows 2000</td><td>13.84</td></tr>
</table>
</p>
<p>
Now your marketing guy can very quickly see which books are the highest
margin books.
</p>
<p>
Another good use for operators is when you are selecting information
from one table to another. For example, you may have a temporary
table that you select product data into so that it can be proofread
before it is sent into some master data table. Shopping Carts make
great examples of this. You can take the pertinent information from
your production tables and place it in a temporary table to be then
removed, quantity increased, or discounts added before it is placed
into your master order table.
</p>
<p>
In an example like this, you would not want to select out your various
kinds of information, perform some functions to get them just right,
and then insert them back into your temporary table. You can simply
do it all in one query by using operators. It also creates less of
a headache when you are dealing with very dynamic data. Let the
database handle as much of your dynamic data as it can.
</p>
<p>
Now I would like to go into some specific operators and their functions.
To see a complete list of operators, in your pgsql interface window
type '\do'.
</p>
<p>
<b>The +, -, *, and / operators.</b><br>
These are the basic math operators that you can use in PostgreSQL.
See above for good examples on how to use them. A few additional
examples are here.
</p>
<p>
<ul>
<li> Calculating tax (SELECT subtotal * tax AS taxamount)
<li> Calculating unit cost (SELECT extendedcost / quantity AS unitcost)
</ul>
</p>
<p>
Many more uses for math operators will be revealed in the next article
in this series which combines operators with aggregate functions.
</p>
<p>
<b>Inequality (<, >, <=, >=) operators.</b><br>
You most likely have used these in the WHERE clause of a specific
SQL query. For instance.
</p>
<p>
<tt>SELECT book_title FROM inventory WHERE selling_price >= '30.00';</tt>
</p>
<p>
This query will select all books that have a selling price of
$30.00 or more. You could even extend that to our profit example
earlier and do the following.
</p>
<p>
<tt>SELECT book_title, selling_price - book_cost AS profit WHERE
selling_price - book_cost >= '14.00' ORDER BY profit DESC;</tt>
</p>
<p>
Which will only produce the following results.
</p>
<p>
<table border=1 cellpadding=5>
<tr><th>book_title</th><th>profit</th></tr>
<tr><td>How To Scam Customers Into Buying Your Books</td><td>15.01</td></tr>
</table>
</p>
<p>
This can allow you to set thresholds for various kinds of queries
which is very useful in reporting.
</p>
<p>
<b>The || (concatenate) operator.</b><br>
When doing any sort of text concatenation, this operator comes in
handy. Say for instance, you have a product category which has
many different products within it. You might want to print out
the product category name as well as the product item on the
invoice.
</p>
<p>
<tt>SELECT category || CAST(': ' AS VARCHAR) || productname AS title
FROM products;</tt>
</p>
<p>
Notice the use of the CAST() function. Concatenate will require
knowledge about the elements it is operating on. You must tell
PostgreSQL that the string ': ' is of type VARCHAR in order for
your operator to function.
</p>
<p>
Your results may look like:
</p>
<p>
<table border=1 cellpadding=5>
<tr><th>title</th></tr>
<tr><td>Music CDs: Dave Matthews, Listener Supported</td></tr>
<tr><td>DVDs: Airplane</td></tr>
</table>
</p>
<P>In the previous articles, I showed you some simple ways to use operators and
aggregate functions to help speed up your applications. The true power of
operators and aggregate functions come when you combine their respective powers
together. You can cut down on the lines of code your application will need
by simply letting your database handle that for you. This article will arm
you with a plethora of information on this subject.</P>
<P>
<b> Our Scenario:</b><br>
You are hired to create a web-based shopping application. Here is your database
layout for your order table.</P>
<pre>
create table orders (
orderid integer (autoincrement),
customerid integer,
subtotal real,
tax real,
shipping real
)
create table orderdetail (
orderid integer,
productid integer,
price real,
qty integer
)
create table taxtable (
state varchar(2),
rate real
)
create table products (
productid integer,
description varchar(100),
price real
)
create table cart (
sessionid varchar(30),
productid integer,
price real,
qty integer
)
</pre>
<p>
In this example, I will use database driven shopping carts instead of storing the cart information in a session. However, I will need a sessionID to keep up with the changes in the database. Our <tt>cart</tt> table contains the current pre-checkout shopping cart. <tt>Orders</tt> and <tt>Orderdetail</tt> contain the completed order with items. We can calculate each order's Grand Total by adding up the sub parts when needed for tracking or billing. Finally, <tt>products</tt> is our product table which contains a price and description.
</p>
<p>
The point of this exercise is to pass as much of the computation back to the database so that your application layer does not have to perform many trips to and from the database, as well as to reduce the lines of code required to complete your task. In this example, several of your items are stored in a database table so they may be dynamic. Those items are the basis of your subtotal, tax and shipping calculations. If you do not use operators and aggregates (and potentially subqueries), you will run the risk of making many trips around the database and putting added overhead into your application layer. I will break down the calculation of each of those items for you, as well as an example of how to put it all together in the end.
</p>
<p>
<b>The subtotal calculation.</b><br>
This is a rather simple calculation, and will only use an aggregate function and simple operator to extract. In our case.
</p>
<p>
<tt>SELECT SUM(price*qty) AS subtotal FROM cart WHERE sessionid = '9j23iundo239new';</tt>
</p>
<p>
All we need is the sum of the results from every price * qty calculation. This shows how you can combine the power of operators and aggregates very nicely. Remember that the SUM aggregate will return the total sum from every calculation that is performed on a PER ROW basis. Don't forget your order of operations!
</p>
<p>
<b>The tax calculation.</b><br>
This one can be kind of tricky without some fancy SQL. I will be using <tt>COALESCE</tt> to determine the actual tax rate. <tt>COALESCE</tt> takes two arguments. If the results of the first argument are <tt>null</tt>, it will return the second. It is very handy in situations like this. Below is the query. Note: _subtotal_ is simply a placeholder.
</p>
<p>
<tt>SELECT _subtotal_ * COALESCE(tax, 0) AS tax FROM tax WHERE state = 'TX';</tt>
</p>
<p>
In the final query, I will show you how all these will add up so try not to get confused by my nifty placeholders.
</p>
<p>
<b>The shipping calculation.</b><br>
For simplicity, we will just assume that you charge shipping based on a $3 fee per item. You could easily expand that to add some fancy calculations in as well. By adding a weight field to your products table, you could easily calculate shipping based on an algorithm. In our instance, we will just count the number of items in our cart and multiply that by 3.
</p>
<p>
<tt>SELECT COUNT(*) * 3 FROM cart AS shipping WHERE sessionid = '9j23iundo239new';</tt>
</p>
<p>
<b>Tying it all together.</b><br>
Now that I have shown you how to get the results for those calculations separately, lets tie them all together into one big SQL query. This query will handle all of those calculations, and then place them into the <tt>orders</tt> table for you.
</p>
<p>
<tt>INSERT INTO orders (customerid, subtotal, tax, shipping) VALUES (customerid, (SELECT SUM(price*qty) FROM cart WHERE sessionid = '9j23iundo239new'), (SELECT SUM(price*qty) FROM cart WHERE sessionid = '9j23iundo239new') * (SELECT COALESCE(tax, 0) FROM tax WHERE state = 'TX'), (SELECT COUNT(*) * 3 FROM cart WHERE sessionid = '9j23iundo239new'));</tt>
</p>
<p>
Additionally, if you had a Grand Total field in your orders table, you could complete this by adding up the sub items in either a separate query, or inside your <tt>INSERT</tt> query. The first of those two examples might look like this.
</p>
<p>
<tt>UPDATE orders SET grandtotal = subtotal+tax+shipping WHERE orderid = 29898;</tt>
</p>
<p>
To move the rest of the items from the cart table to the orderdetail table the following two queries can be issued in sequence.
</p>
<p>
<tt>INSERT INTO orderdetail (orderid, productid, price, qty) values SELECT _yourorderid_, productid, price, qty FROM cart WHERE sessionid = '9j23iundo239new';</tt><br>
<tt>DELETE FROM cart WHERE sessionid = '9j23iundo239new';</tt>
</p>
<p>
<b>Conclusion:</b><br>
Aggregate functions can greatly simplify and speed up your applications by
allowing the SQL server to handle these kinds of calculations. In more
complex applications they can be used to return customized results from
multiple tables for reporting and other functions. Operators can greatly enhance the quality of the results that you return
from your database. The correct use of operators and aggregate functions can not only increase the speed and accuracy of your application, but it also can greatly reduce your code base by removing unneeded lines of code for looping through result sets, simple calculations, and other line hogs.
<P> I hope that you enjoy reading and learning from this article as much as I enjoyed writing it!
<!-- *** BEGIN bio *** -->
<SPACER TYPE="vertical" SIZE="30">
<P>
<H4><IMG ALIGN=BOTTOM ALT="" SRC="../gx/note.gif">Branden R Williams</H4>
<EM>Branden is currently a consultant for
<a href="http://www.elliptix.net">Elliptix</a>, an e-business and security
consulting firm he co-founded this year. He has over 10 years of experience in
various Internet-related technology disciplines including Unix administration,
network infrastructure design and deployment, and many scripting and
programming languages. For the last six years, Branden has been designing,
building and deploying enterprise-scale e-commerce applications. His real-world
experience is backed up by a Bachelors of Business Administration in Marketing
from the University of Texas, Arlington. Branden can also be reached at
<a href="mailto:brw@brw.net">brw@brw.net</a>. </EM>
<!-- *** END bio *** -->
<!-- *** BEGIN copyright *** -->
<P> <hr> <!-- P -->
<H5 ALIGN=center>
Copyright &copy; 2001, Branden R Williams.<BR>
Copying license <A HREF="../copying.html">http://www.linuxgazette.com/copying.html</A><BR>
Published in Issue 70 of <i>Linux Gazette</i>, September 2001</H5>
<!-- *** END copyright *** -->
<!--startcut ==========================================================-->
<HR><P>
<CENTER>
<!-- *** BEGIN navbar *** -->
<IMG ALT="" SRC="../gx/navbar/left.jpg" WIDTH="14" HEIGHT="45" BORDER="0" ALIGN="bottom"><A HREF="tranter.html"><IMG ALT="[ Prev ]" SRC="../gx/navbar/prev.jpg" WIDTH="16" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="index.html"><IMG ALT="[ Table of Contents ]" SRC="../gx/navbar/toc.jpg" WIDTH="220" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><A HREF="../index.html"><IMG ALT="[ Front Page ]" SRC="../gx/navbar/frontpage.jpg" WIDTH="137" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="http://www.linuxgazette.com/cgi-bin/talkback/all.py?site=LG&article=http://www.linuxgazette.com/issue70/williams.html"><IMG ALT="[ Talkback ]" SRC="../gx/navbar/talkback.jpg" WIDTH="121" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><A HREF="../faq/index.html"><IMG ALT="[ FAQ ]" SRC="./../gx/navbar/faq.jpg"WIDTH="62" HEIGHT="45" BORDER="0" ALIGN="bottom"></A><A HREF="lg_backpage70.html"><IMG ALT="[ Next ]" SRC="../gx/navbar/next.jpg" WIDTH="15" HEIGHT="45" BORDER="0" ALIGN="bottom" ></A><IMG ALT="" SRC="../gx/navbar/right.jpg" WIDTH="15" HEIGHT="45" ALIGN="bottom">
<!-- *** END navbar *** -->
</CENTER>
</BODY></HTML>
<!--endcut ============================================================-->