One of the most important responsibilities a system administrator
has, is monitoring their systems. As a system administrator you'll need
the ability to find out what is happening on your system at any given
time. Whether it's the percentage of system's resources currently used,
what commands are being run, or who is logged on. This chapter will cover
how to monitor your system, and in some cases, how to resolve problems
that may arise.When a performance issue arises, there are 4 main areas to consider:
CPU, Memory, Disk I/O, and Network. The ability to determine where the
bottleneck is can save you a lot of time.System ResourcesBeing able to monitor the performance of your system
is essential. If system resources become too low it can cause a lot of
problems. System resources can be taken up by individual users, or by
services your system may host such as email or web pages. The ability to
know what is happening can help determine whether system upgrades are needed,
or if some services need to be moved to another machine.The top command.The most common of these commands is top.
The top will display a continually updating report
of system resource usage.
#top 12:10:49 up 1 day, 3:47, 7 users, load average: 0.23, 0.19, 0.10
125 processes: 105 sleeping, 2 running, 18 zombie, 0 stopped
CPU states: 5.1% user 1.1% system 0.0% nice 0.0% iowait 93.6% idle
Mem: 512716k av, 506176k used, 6540k free, 0k shrd, 21888k buff
Swap: 1044216k av, 161672k used, 882544k free 199388k cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
2330 root 15 0 161M 70M 2132 S 4.9 14.0 1000m 0 X
2605 weeksa 15 0 8240 6340 3804 S 0.3 1.2 1:12 0 kdeinit
3413 weeksa 15 0 6668 5324 3216 R 0.3 1.0 0:20 0 kdeinit
18734 root 15 0 1192 1192 868 R 0.3 0.2 0:00 0 top
1619 root 15 0 776 608 504 S 0.1 0.1 0:53 0 dhclient
1 root 15 0 480 448 424 S 0.0 0.0 0:03 0 init
2 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 keventd
3 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kapmd
4 root 35 19 0 0 0 SWN 0.0 0.0 0:00 0 ksoftirqd_CPU0
9 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 bdflush
5 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kswapd
10 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kupdated
11 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 mdrecoveryd
15 root 15 0 0 0 0 SW 0.0 0.0 0:01 0 kjournald
81 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 khubd
1188 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kjournald
1675 root 15 0 604 572 520 S 0.0 0.1 0:00 0 syslogd
1679 root 15 0 428 376 372 S 0.0 0.0 0:00 0 klogd
1707 rpc 15 0 516 440 436 S 0.0 0.0 0:00 0 portmap
1776 root 25 0 476 428 424 S 0.0 0.0 0:00 0 apmd
1813 root 25 0 752 528 524 S 0.0 0.1 0:00 0 sshd
1828 root 25 0 704 548 544 S 0.0 0.1 0:00 0 xinetd
1847 ntp 15 0 2396 2396 2160 S 0.0 0.4 0:00 0 ntpd
1930 root 24 0 76 4 0 S 0.0 0.0 0:00 0 rpc.rquotad
The top portion of the report lists information such as
the system time, uptime, CPU usage, physical and swap memory usage,
and number of processes. Below that is a list of the processes sorted
by CPU utilization.You can modify the output of top while
it is running. If you hit an , top will no longer
display idle processes. Hit again to see them
again. Hitting will sort by memory usage,
will sort by how long they processes have been
running, and will sort by CPU usage again.In addition to viewing options, you can also modify processes
from within the top command. You can use
to view processes owned by a specific user,
to kill processes, and to
renice them.For more in-depth information about processes you can look in
the /proc filesystem. In the /proc
filesystem you will find a series of sub-directories with numeric names.
These directories are associated with the processes ids of currently
running processes. In each directory you will find a series of files
containing information about the process.YOU MUST TAKE EXTREME CAUTION TO NOT MODIFY THESE FILES, DOING
SO MAY CAUSE SYSTEM PROBLEMS!The iostat command.The iostat will display the current CPU load
average and disk I/O information. This is a great command to monitor
your disk I/O usage.
#iostatLinux 2.4.20-24.9 (myhost) 12/23/2003
avg-cpu: %user %nice %sys %idle
62.09 0.32 2.97 34.62
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
dev3-0 2.22 15.20 47.16 1546846 4799520
For 2.4 kernels the devices is names using the device's major
and minor number. In this case the device listed is
/dev/hda. To have iostat print this
out for you, use the .
#iostat -xLinux 2.4.20-24.9 (myhost) 12/23/2003
avg-cpu: %user %nice %sys %idle
62.01 0.32 2.97 34.71
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
/dev/hdc 0.00 0.00 .00 0.00 0.00 0.00 0.00 0.00 0.00 2.35 0.00 0.00 14.71
/dev/hda 1.13 4.50 .81 1.39 15.18 47.14 7.59 23.57 28.24 1.99 63.76 70.48 15.56
/dev/hda1 1.08 3.98 .73 1.27 14.49 42.05 7.25 21.02 28.22 0.44 21.82 4.97 1.00
/dev/hda2 0.00 0.51 .07 0.12 0.55 5.07 0.27 2.54 30.35 0.97 52.67 61.73 2.99
/dev/hda3 0.05 0.01 .02 0.00 0.14 0.02 0.07 0.01 8.51 0.00 12.55 2.95 0.01
The iostat man page contains a detailed
explanation of what each of these columns mean.The ps commandThe ps will provide you a list of
processes currently running. There is a wide variety of options
that this command gives you.A common use would be to list all processes currently running.
To do this you would use the ps -ef command.
(Screen output from this command is too large to include, the following
is only a partial output.)
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 Dec22 ? 00:00:03 init
root 2 1 0 Dec22 ? 00:00:00 [keventd]
root 3 1 0 Dec22 ? 00:00:00 [kapmd]
root 4 1 0 Dec22 ? 00:00:00 [ksoftirqd_CPU0]
root 9 1 0 Dec22 ? 00:00:00 [bdflush]
root 5 1 0 Dec22 ? 00:00:00 [kswapd]
root 6 1 0 Dec22 ? 00:00:00 [kscand/DMA]
root 7 1 0 Dec22 ? 00:01:28 [kscand/Normal]
root 8 1 0 Dec22 ? 00:00:00 [kscand/HighMem]
root 10 1 0 Dec22 ? 00:00:00 [kupdated]
root 11 1 0 Dec22 ? 00:00:00 [mdrecoveryd]
root 15 1 0 Dec22 ? 00:00:01 [kjournald]
root 81 1 0 Dec22 ? 00:00:00 [khubd]
root 1188 1 0 Dec22 ? 00:00:00 [kjournald]
root 1675 1 0 Dec22 ? 00:00:00 syslogd -m 0
root 1679 1 0 Dec22 ? 00:00:00 klogd -x
rpc 1707 1 0 Dec22 ? 00:00:00 portmap
root 1813 1 0 Dec22 ? 00:00:00 /usr/sbin/sshd
ntp 1847 1 0 Dec22 ? 00:00:00 ntpd -U ntp
root 1930 1 0 Dec22 ? 00:00:00 rpc.rquotad
root 1934 1 0 Dec22 ? 00:00:00 [nfsd]
root 1942 1 0 Dec22 ? 00:00:00 [lockd]
root 1943 1 0 Dec22 ? 00:00:00 [rpciod]
root 1949 1 0 Dec22 ? 00:00:00 rpc.mountd
root 1961 1 0 Dec22 ? 00:00:00 /usr/sbin/vsftpd /etc/vsftpd/vsftpd.conf
root 2057 1 0 Dec22 ? 00:00:00 /usr/bin/spamd -d -c -a
root 2066 1 0 Dec22 ? 00:00:00 gpm -t ps/2 -m /dev/psaux
bin 2076 1 0 Dec22 ? 00:00:00 /usr/sbin/cannaserver -syslog -u bin
root 2087 1 0 Dec22 ? 00:00:00 crond
daemon 2195 1 0 Dec22 ? 00:00:00 /usr/sbin/atd
root 2215 1 0 Dec22 ? 00:00:11 /usr/sbin/rcd
weeksa 3414 3413 0 Dec22 pts/1 00:00:00 /bin/bash
weeksa 4342 3413 0 Dec22 pts/2 00:00:00 /bin/bash
weeksa 19121 18668 0 12:58 pts/2 00:00:00 ps -ef
The first column shows who owns the process. The second
column is the process ID. The Third column is the parent process
ID. This is the process that generated, or started, the process.
The forth column is the CPU usage (in
percent). The fifth column is the start time, of date if the process
has been running long enough. The sixth column is the tty associated
with the process, if applicable. The seventh column is the cumulitive
CPU usage (total amount of CPU time is has used while running). The
eighth column is the command itself.With this information you can see exacly what is running on
your system and kill run-away processes, or those that are causing
problems.The vmstat commandThe vmstat command will provide a report
showing statistics for system processes, memory, swap,
I/O, and the CPU. These statistics are generated using data from the
last time the command was run to the present. In the case of the
command never being run, the data will be from the last reboot until
the present.#vmstat
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 0 0 181604 17000 26296 201120 0 2 8 24 149 9 61 3 36
The following was taken from the
vmstat man page.
FIELD DESCRIPTIONS
Procs
r: The number of processes waiting for run time.
b: The number of processes in uninterruptable sleep.
w: The number of processes swapped out but otherwise runnable. This
field is calculated, but Linux never desperation swaps.
Memory
swpd: the amount of virtual memory used (kB).
free: the amount of idle memory (kB).
buff: the amount of memory used as buffers (kB).
Swap
si: Amount of memory swapped in from disk (kB/s).
so: Amount of memory swapped to disk (kB/s).
IO
bi: Blocks sent to a block device (blocks/s).
bo: Blocks received from a block device (blocks/s).
System
in: The number of interrupts per second, including the clock.
cs: The number of context switches per second.
CPU
These are percentages of total CPU time.
us: user time
sy: system time
id: idle time
The lsof commandThe lsof command will print out a list of
every file that is in use. Since Linux considers everythihng a file,
this list can be very long. However, this command
can be useful in diagnosing problems. An example of this is if you wish
to unmount a filesystem, but you are being told that it is in use. You
could use this command and grep for the name of the
filesystem to see who is using it.Or suppose you want to see all files in use by a particular process.
To do this you would use lsof -p -processid-.Finding More UtilitiesTo learn more about what command line tools are available, Chris
Karakas has wrote a reference guide titled GNU/Linux
Command-Line Tools Summary. It's a good resource for learning
what tools are out there and how to do a number of tasks.Filesystem UsageMany reports are currently talking about how cheap storage has
gotten, but if you're like most of us it isn't cheap enough. Most of
us have a limited amount of space, and need to be able to monitor it
and control how it's used.The df commandThe df is the simplest tool available to
view disk usage. Simply type in df and you'll
be shown disk usage for all your mounted filesystems in 1K blocks
user@server:~> df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/hda3 5242904 759692 4483212 15% /
tmpfs 127876 8 127868 1% /dev/shm
/dev/hda1 127351 33047 87729 28% /boot
/dev/hda9 10485816 33508 10452308 1% /home
/dev/hda8 5242904 932468 4310436 18% /srv
/dev/hda7 3145816 32964 3112852 2% /tmp
/dev/hda5 5160416 474336 4423928 10% /usr
/dev/hda6 3145816 412132 2733684 14% /var
You can also use the -h to see the output in
"human-readable" format. This will be in K, Megs, or Gigs depending
on the size of the filesystem. Alternately, you can also use the
-B to specify block size.In addition to space usage, you could use the
-i option to view the number of used and available
inodes.
user@server:~> df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/hda3 0 0 0 - /
tmpfs 31969 5 31964 1% /dev/shm
/dev/hda1 32912 47 32865 1% /boot
/dev/hda9 0 0 0 - /home
/dev/hda8 0 0 0 - /srv
/dev/hda7 0 0 0 - /tmp
/dev/hda5 656640 26651 629989 5% /usr
/dev/hda6 0 0 0 - /var
The du commandNow that you know how much space has been used on a filesystem
how can you find out where that data is? To view usage by a directory
or file you can use du. Unless you specify a
filename du will act recursively. For example:
user@server:~> du file.txt
1300 file.txt
Or like the df I can use the -h
and get the same output in "human-readable" form.
user@server:~> du -h file.txt
1.3M file.txt
Unless you specify a filename du will act
recursively.
user@server:~> du -h /usr/local
4.0K /usr/local/games
16K /usr/local/include/nessus/net
180K /usr/local/include/nessus
208K /usr/local/include
62M /usr/local/lib/nessus/plugins/.desc
97M /usr/local/lib/nessus/plugins
164K /usr/local/lib/nessus/plugins_factory
97M /usr/local/lib/nessus
12K /usr/local/lib/pkgconfig
2.7M /usr/local/lib/ladspa
104M /usr/local/lib
112K /usr/local/man/man1
4.0K /usr/local/man/man2
4.0K /usr/local/man/man3
4.0K /usr/local/man/man4
16K /usr/local/man/man5
4.0K /usr/local/man/man
If you just want a summary of that directory you can use the
-s option.
user@server:~> du -hs /usr/local
210M /usr/local
QuotasFor more information about quotas you can read
The Quota HOWTO
.
Monitoring Users
Just because you're paranoid doesn't mean they
AREN'T out to get you... Source Unknown
From time to time there are going to be occasions where you will
want to know exactly what people are doing on your system. Maybe you
notice that a lot of RAM is being used, or a lot of CPU activity.
You are going to want to see who is on the system, what they are
running, and what kind of resources they are using.The who commandThe easiest way to see who is on the system is to do a
who or w. The -->
who is a simple tool that lists out who is logged -->
on the system and what port or terminal they are logged on at.
user@server:~> who
bjones pts/0 May 23 09:33
wally pts/3 May 20 11:35
aweeks pts/1 May 22 11:03
aweeks pts/2 May 23 15:04
The ps command -again!In the previous section we can see that user aweeks is logged
onto both pts/1 and pts/2,
but what if we want to see what they are doing? We could to a
ps -u aweeks and get the following output
user@server:~> ps -u aweeks
20876 pts/1 00:00:00 bash
20904 pts/2 00:00:00 bash
20951 pts/2 00:00:00 ssh
21012 pts/1 00:00:00 ps
From this we can see that the user is doing a psssh.This is a much more consolidated use of the
ps than discussed previously.The w commandEven easier than using the who and
ps -u commands is to use the w.
w will print out not only who is on the system,
but also the commands they are running.
user@server:~> w
aweeks :0 09:32 ?xdm? 30:09 0.02s -:0
aweeks pts/0 09:33 5:49m 0.00s 0.82s kdeinit: kded
aweeks pts/2 09:35 8.00s 0.55s 0.36s vi sag-0.9.sgml
aweeks pts/1 15:03 59.00s 0.03s 0.03s /bin/bash
From this we can see that I have a kde session
running, I'm working in this document :-), and have another terminal
open sitting idle at a bash prompt.The skill command To Be Addednice and reniceTo Be Added