old-www/HOWTO/Serial-HOWTO-16.html

481 lines
25 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
<META NAME="GENERATOR" CONTENT="LinuxDoc-Tools 0.9.21">
<TITLE> Serial HOWTO: Troubleshooting </TITLE>
<LINK HREF="Serial-HOWTO-17.html" REL=next>
<LINK HREF="Serial-HOWTO-15.html" REL=previous>
<LINK HREF="Serial-HOWTO.html#toc16" REL=contents>
</HEAD>
<BODY>
<A HREF="Serial-HOWTO-17.html">Next</A>
<A HREF="Serial-HOWTO-15.html">Previous</A>
<A HREF="Serial-HOWTO.html#toc16">Contents</A>
<HR>
<H2><A NAME="trouble_shoot"></A> <A NAME="s16">16.</A> <A HREF="Serial-HOWTO.html#toc16">Troubleshooting </A></H2>
<P> See Modem-HOWTO for troubleshooting related to modems or getty for
modems. For a Text-Terminal much of the info here will be of value as
well as the troubleshooting info in Text-Terminal-HOWTO.</P>
<H2><A NAME="ser_elect_test"></A> <A NAME="ss16.1">16.1</A> <A HREF="Serial-HOWTO.html#toc16.1">Serial Electrical Test Equipment </A>
</H2>
<H3>Breakout Gadgets, etc.</H3>
<P> While a multimeter (used as a voltmeter) may be all that you need
for just a few serial ports, simple special test equipment has been
made for testing serial port lines. Some are called "breakout ... "
where breakout means to break out conductors from a cable. These
gadgets have a couple of connectors which connect to serial port
connectors (either at the ends of serial cables or at the back of a
PC). Some have test points for connecting a voltmeter. Others have
LED lamps which light when certain modem control lines are asserted
(turned on). The color of the light may indicate the polarity of the
signal (positive or negative voltage). Still others have jumpers so
that you can connect any wire to any wire. Some have switches.</P>
<P>Radio Shack sells (in 2002) a "RS-232 Troubleshooter" (formerly called
"RS-232 Line Tester") Cat. #276-1401. It checks TD, RD, CD, RTS, CTS,
DTR, and DSR. A green light means on (+12 v) while red means off (-12
v). They also sell a "RS-232 Serial Jumper Box" Cat.
#276-1403. This permits connecting the pins anyway you choose. Both
these items are under the heading of "Peripheral hookup helpers".
Unfortunately, they are not listed in the index to the printed
catalog. They are on the same page as the D type connecters so look
in the index under "Connectors, Computer, D-Sub". A store chain named
"Active Components" may have them.</P>
<H3>Measuring voltages</H3>
<P> Any voltmeter or multimeter, even the cheapest that sells for
about $10, should work fine. Trying to use other methods for checking
voltage is tricky. Don't use a LED unless it has a series resistor to
reduce the voltage across the LED. A 470 ohm resistor is used for a
20 ma LED (but not all LED's are 20 ma). The LED will only light for
a certain polarity so you may test for + or - voltages. Does anyone
make such a gadget for automotive circuit testing?? Logic probes may
be damaged if you try to use them since the TTL voltages for which
they are designed are only 5 volts. Trying to use a 12 V incandescent
light bulb is not a good idea. It won't show polarity and due to
limited output current of the UART it probably will not even light up.</P>
<P>To measure voltage on a female connector you may plug in a bent paper
clip into the desired opening. The paper clip's diameter should be no
larger than the pins so that it doesn't damage the contact. Clip
an alligator clip (or the like) to the paper clip to connect up. Take
care not to touch two pins at the same time with any metal object.</P>
<H3>Taste voltage</H3>
<P> As a last resort, if you have no test equipment and are willing to
risk getting shocked (or even electrocuted) you can always taste the
voltage. Before touching one of the test leads with your tongue, test
them to make sure that there is no high voltage on them. Touch both
leads (at the same time) to one hand to see if they shock you. Then
if no shock, wet the skin contact points by licking and repeat. If
this test gives you a shock, you certainly don't want to use your
tongue.</P>
<P>For the test for 12 V, Lick a finger and hold one test lead in it.
Put the other test lead on your tongue. If the lead on your tongue is
positive, there will be a noticeable taste. You might try this with
flashlight batteries first so you will know what taste to expect.</P>
<H2><A NAME="ss16.2">16.2</A> <A HREF="Serial-HOWTO.html#toc16.2">Serial Monitoring/Diagnostics</A>
</H2>
<P> A few Linux programs will monitor the modem control lines and
indicate if they are positive (1) or negative (0). See section
<A HREF="Serial-HOWTO-11.html#serial_mon">Serial Monitoring/Diagnostics</A></P>
<H2><A NAME="ss16.3">16.3</A> <A HREF="Serial-HOWTO.html#toc16.3">(The following subsections are in both the Serial and Modem HOWTOs)</A>
</H2>
<H2><A NAME="cant_find_port"></A> <A NAME="ss16.4">16.4</A> <A HREF="Serial-HOWTO.html#toc16.4">Serial Port Can't be Found</A>
</H2>
<P>There are 3 possibilities:
<OL>
<LI>Your port is disabled since both the BIOS and Linux failed to
enable it. It has no IO address.</LI>
<LI>Your port is enabled and has an IO address but it has no ttyS
device number (like ttyS14) assigned to that address so the port
can't be used. As a last resort, you may need to use "setserial" to
assign a ttyS number to it.</LI>
<LI>Your port does have a ttyS number assigned to it (like ttyS14)
but you don't know which physical connector it is (on the back of your
PC).
See
<A HREF="Serial-HOWTO-10.html#identify_ttyS">Which Connector on the Back of my PC is ttyS1, etc?</A>
</LI>
</OL>
</P>
<P>First check BIOS messages at boot-time (and possibly the BIOS menu for
the serial port). Then for the PCI bus type lspci -v. If this shows
something like "LPC Bridge" then your port is likely on the LPC bus
which is not well supported by Linux yet (but the BIOS might find it)
?? If it's an ISA bus PnP serial port, try "pnpdump --dumpregs"
and/or see Plug-and-Play-HOWTO. If the port happens to be enabled
then the following two paragraphs may help find the IO port:</P>
<H3>Scanning/probing legacy ports</H3>
<P>This is mainly for legacy non-PCI ports and ISA ports that are not
Plug-and-Play.</P>
<P>Using "scanport" (Debian only ??) will scan all enabled bus ports and
may discover an unknown port that could be a serial port (but it
doesn't probe the port). It could hang your PC. If you suspect that
your port may be at a certain address, you may try manually probing
with setserial, but it's a slow tedious task if you have several
addresses to probe. See
<A HREF="Serial-HOWTO-11.html#probing_ss">Probing</A>.</P>
<H2><A NAME="ss16.5">16.5</A> <A HREF="Serial-HOWTO.html#toc16.5">Linux Creates an Interrupt Conflict (your PC has an ISA slot)</A>
</H2>
<P>If your PC has a BIOS that handles ISA (and likely PCI too) then
if you find a IRQ conflict, it might be due to a shortage of free
IRQs. The BIOS often maintains a list of reserved IRQs, reserved for
legacy ISA cards. If too many are reserved, the BIOS may not be able
to find a free IRQ and will erroneously assign an IRQ to the serial
port that creates a conflict. So check to see if all the reserved
IRQs are really needed and if not, unreserve an IRQ that the serial
port can use. For more details, see Plug-and-Play-HOWTO.</P>
<H2><A NAME="slow_"></A> <A NAME="ss16.6">16.6</A> <A HREF="Serial-HOWTO.html#toc16.6">Extremely Slow: Text appears on the screen slowly after long delays</A>
</H2>
<P> It's likely mis-set/conflicting interrupts. Here are some of the
symptoms which will happen the first time you try to use a modem,
terminal, or serial printer. In some cases you type something but
nothing appears on the screen until many seconds later. Only the last
character typed may show up. It may be just an invisible
&lt;return&gt; character so all you notice is that the cursor jumps
down one line. In other cases where a lot of data should appear on
the screen, only a batch of about 16 characters appear. Then there is
a long wait of many seconds for the next batch of characters. You
might also get "input overrun" error messages (or find them in logs).</P>
<P>For more details on the symptoms and why this
happens see
<A HREF="Serial-HOWTO-17.html#irq_prob_details">Interrupt Problem Details</A> and/or
<A HREF="Serial-HOWTO-17.html#irq_conflict">Interrupt Conflicts</A>
and/or
<A HREF="Serial-HOWTO-17.html#irq_ng">Mis-set Interrupts</A>.
If it involves Plug-and-Play devices, see also the Plug-and-Play-HOWTO.</P>
<P>As a quick check to see if it really is an interrupt problem, set the
IRQ to 0 with "setserial". This will tell the driver to use
polling instead of interrupts. If this seems to fix the "slow"
problem then you had an interrupt problem. You should still try to
solve the problem since polling uses excessive computer resources.</P>
<P>Checking to find the interrupt conflict may not be easy since Linux
supposedly doesn't permit any interrupt conflicts and will send you a
<A HREF="#busy_err">/dev/ttyS?: Device or resource busy</A> error
message if it thinks you are attempting to create a conflict. But a
real conflict can be created if "setserial" has told the kernel
incorrect info. The kernel has been lied to and thus doesn't think
there is any conflict. Thus using "setserial" will not reveal the
conflict (nor will looking at /proc/interrupts which bases its info on
"setserial"). You still need to know what "setserial" thinks so that
you can pinpoint where it's wrong and change it when you determine
what's really set in the hardware.</P>
<P>What you need to do is to check how the hardware is set by checking
jumpers or using PnP software to check how the hardware is actually
set. For PnP run either "pnpdump --dumpregs" (if ISA bus) or run
"lspci" (if PCI bus). Compare this to how Linux (e.g. "setserial")
thinks the hardware is set.</P>
<H2><A NAME="ss16.7">16.7</A> <A HREF="Serial-HOWTO.html#toc16.7">Somewhat Slow: I expected it to be a few times faster</A>
</H2>
<P> An obvious reason is that the baud rate is set too slow. It's
claimed that this once happened by trying to set the baud rate to a
speed higher than the hardware can support (such as 230400).</P>
<P>Another reason may be that whatever is on the serial port (such as a
modem, terminal, printer) doesn't work as fast as you thought it did.</P>
<P>Another possible reason is that you have an obsolete serial port: UART
8250, 16450 or early 16550 (or the serial driver thinks you do). See</P>
<P>
<A HREF="Serial-HOWTO-18.html#uart_">What Are UARTS?</A>
Use "setserial -g /dev/ttyS*".
If it shows anything less than a 16550A, this may be your problem.
If you think that "setserial" has it wrong check it out. See
<A HREF="Serial-HOWTO-11.html#set_serial">What is Setserial</A> for more info. If you
really do have an obsolete serial port, lying about it to setserial
will only make things worse.</P>
<H2><A NAME="irqs_shown_wrong"></A> <A NAME="ss16.8">16.8</A> <A HREF="Serial-HOWTO.html#toc16.8">The Startup Screen Shows Wrong IRQs for the Serial Ports</A>
</H2>
<P> For non-PnP ports, Linux does not do any IRQ detection on startup.
When the serial module loads it only does serial device detection.
Thus, disregard what it says about the IRQ, because it's just assuming
the standard IRQs. This is done, because IRQ detection is unreliable,
and can be fooled. But if and when setserial runs from a start-up
script, it changes the IRQ's and displays the new (and hopefully
correct) state on on the startup screen. If the wrong IRQ is not
corrected by a later display on the screen, then you've got a problem.</P>
<P>So, even though I have my <CODE>ttyS2</CODE> set at IRQ 5, I still see
<BLOCKQUOTE><CODE>
<PRE>
ttyS02 at 0x03e8 (irq = 4) is a 16550A
</PRE>
</CODE></BLOCKQUOTE>
at first when Linux boots. (Older kernels may show "ttyS02" as
"tty02" which is the same as ttyS2). You may need to use
<CODE>setserial</CODE> to tell Linux the IRQ you are using.</P>
<H2><A NAME="ss16.9">16.9</A> <A HREF="Serial-HOWTO.html#toc16.9">"Cannot open /dev/ttyS?: Device or resource busy</A>
</H2>
<P> See
<A HREF="#busy_err">/dev/tty? Device or resource busy</A></P>
<H2><A NAME="ss16.10">16.10</A> <A HREF="Serial-HOWTO.html#toc16.10">"Cannot open /dev/ttyS?: Permission denied"</A>
</H2>
<P> Check the file permissions on this port with "ls -l /dev/ttyS?"_
If you own the ttyS? then you need read and write permissions: crw
with the c (Character device) in col. 1. It you don't own it then it
will work for you if it shows rw- in cols. 8 &amp; 9 which means that
everyone has read and write permission on it. Use "chmod" to change
permissions. There are more complicated (and secure) ways to get
access like belonging to a "group" that has group permission. Some
programs change the permissions when they run but restore them when
the program exists normally. But if someone pulls the plug on your
PC it's an abnormal exit and correct permissions may not be restored.</P>
<H2><A NAME="ss16.11">16.11</A> <A HREF="Serial-HOWTO.html#toc16.11">"Cannot open /dev/ttyS?"</A>
</H2>
<P>Unless stty is set for clocal, the CD pin may need to be asserted
in order to open a serial port. If the physical port is not connected
to anything, or if it's connected to something that is not powered on
(such an external modem) then there will be no voltage on CD
from that device. Thus the "cannot open" message. Either set clocal
or connect the serial port connector to something and power it on.</P>
<P>Even if a device is powered on and connected to a port, it may
sometimes prevent opening the port. An example of this is where the
device has negated CD and the CD pin on your PC is negated (negative
voltage). </P>
<H2><A NAME="ss16.12">16.12</A> <A HREF="Serial-HOWTO.html#toc16.12">"Operation not supported by device" for ttyS?</A>
</H2>
<P> This means that an operation requested by setserial, stty, etc.
couldn't be done because the kernel doesn't support doing it.
Formerly this was often due to the "serial" module not being loaded.
But with the advent of PnP, it may likely mean that there is no modem
(or other serial device) at the address where the driver (and
setserial) thinks it is. If there is no modem there, commands (for
operations) sent to that address obviously don't get done. See
<A HREF="Serial-HOWTO-8.html#io-irq_in_hdw">What is set in my serial port hardware?</A></P>
<P>If the "serial" module wasn't loaded but "lsmod" shows you it's now
loaded it might be the case that it's loaded now but wasn't loaded
when you got the error message. In many cases the module will
automatically loaded when needed (if it can be found). To force
loading of the "serial" module it may be listed in the file:
/etc/modules.conf or /etc/modules. The actual module should reside
in: /lib/modules/.../misc/serial.o.</P>
<H2><A NAME="lockfile_"></A> <A NAME="ss16.13">16.13</A> <A HREF="Serial-HOWTO.html#toc16.13">"Cannot create lockfile. Sorry" </A>
</H2>
<P> Sometimes when it can't create a lockfile you get the erroneous
message: "... Device or resource busy" instead of the one above.
When a port is "opened" by a program a lockfile is created in
/var/lock/. Wrong permissions for the lock directory will not allow a
lockfile to be created there. Use "ls -ld /var/lock" to see if the
permissions are OK. Giving rwx permissions for the root owner and the
group should work, provided that the users that need to dialout belong
to that group. Others should have r-x permission. Even with this
scheme, there may be a security risk. Use "chmod" to change
permissions and "chgrp" to change groups. Of course, if there is no
"lock" directory no lockfile can be created there. For more info on
lockfiles see
<A HREF="Serial-HOWTO-13.html#lockfiles_">What Are Lock Files</A></P>
<H2><A NAME="ss16.14">16.14</A> <A HREF="Serial-HOWTO.html#toc16.14">"Device /dev/ttyS? is locked."</A>
</H2>
<P> This means that someone else (or some other process) is supposedly
using the serial port. There are various ways to try to find out what
process is "using" it. One way is to look at the contents of the
lockfile (/var/lock/LCK...). It should be the process id. If the
process id is say 100 type "ps 100" to find out what it is. Then if
the process is no longer needed, it may be gracefully killed by "kill
100". If it refuses to be killed use "kill -9 100" to force it to be
killed, but then the lockfile will not be removed and you'll need to
delete it manually. Of course if there is no such process as 100 then
you may just remove the lockfile but in most cases the lockfile should
have been automatically removed if it contained a stale process id
(such as 100).</P>
<H2><A NAME="busy_err"></A> <A NAME="ss16.15">16.15</A> <A HREF="Serial-HOWTO.html#toc16.15">"/dev/tty? Device or resource busy" </A>
</H2>
<P> This means that the device you are trying to access (or use) is
supposedly busy (in use) or that a resource it needs (such as an IRQ)
is supposedly being used by another device and can't be shared.
This message is easy to understand if it only means that the device is
busy (in use). But it sometimes means that a needed resource is already
in use (busy). What makes it even more confusing is that in some cases
neither the device nor the resources that it needs are actually
"busy". </P>
<P>In olden days, if a PC was shutdown by just turning off the power, a
bogus lockfile might remain and then later on one would get this bogus
message and not be able to use the serial port. Software today is
supposed to automatically remove such bogus lockfiles, but as of 2006
there is still a problem with the "wvdial" dialer program related to
lockfiles. If wvdial can't create a lockfile because it doesn't have
write permission in the /var/lock/ directory, you will see this
erroneous message. See
<A HREF="#lockfile_">Cannot create lockfile. Sorry</A> </P>
<P>The following example is where interrupts can't be shared (at least
one of the interrupts is on the ISA bus). The ``resource busy'' part
often means (example for <CODE>ttyS2</CODE>) ``You can't use <CODE>ttyS2</CODE> since
another device is using ttyS2's interrupt.'' The potential interrupt
conflict is inferred from what "setserial" thinks. A more accurate
error message would be ``Can't use <CODE>ttyS2</CODE> since the setserial data
(and kernel data) indicates that another device is using <CODE>ttyS2</CODE>'s
interrupt''. If two devices use the same IRQ and you start up only
one of the devices, everything is OK because there is no conflict yet.
But when you next try to start the second device (without quitting the
first device) you get a "... busy" error message. This is because the
kernel only keeps track of what IRQs are actually in use and actual
conflicts don't happen unless the devices are in use (open). The
situation for I/O address (such as 0x3f8) conflict is similar.</P>
<P>This error is sometimes due to having two serial drivers: one a module
and the other compiled into the kernel. Both drivers try to grab the
same resources and one driver finds them "busy".</P>
<P>There are two possible cases when you see this message:
<OL>
<LI> There may be a real resource conflict that is being avoided.</LI>
<LI> Setserial has it wrong and the only reason <CODE>ttyS2</CODE> can't be
used is that setserial erroneously predicts a conflict.</LI>
</OL>
</P>
<P>What you need to do is to find the interrupt setserial thinks
<CODE>ttyS2</CODE> is using. Look at /proc/tty/driver/serial. You should
also be able to find it with the "setserial" command for <CODE>ttyS2</CODE>. </P>
<P>Bug in old versions: Prior to 2001 there was a bug which wouldn't let
you see it with "setserial". Trying to see it would give the same
"... busy" error message.</P>
<P>To try to resolve this problem reboot or: exit or gracefully kill all
likely conflicting processes. If you reboot: 1. Watch the boot-time
messages for the serial ports. 2. Hope that the file that runs
"setserial" at boot-time doesn't (by itself) create the same conflict
again.</P>
<P>If you think you know what IRQ say <CODE>ttyS2</CODE> is using then you may
look at /proc/interrupts to find what else (besides another serial
port) is currently using this IRQ. You might also want to double
check that any suspicious IRQs shown here (and by "setserial") are
correct (the same as set in the hardware). A way to test whether or
not it's a potential interrupt conflict is to set the IRQ to 0
(polling) using "setserial". Then if the busy message goes away, it
was likely a potential interrupt conflict. It's not a good idea to
leave it permanently set at 0 since it will put more load on the CPU.</P>
<H2><A NAME="ss16.16">16.16</A> <A HREF="Serial-HOWTO.html#toc16.16">"Input/output error" from setserial, stty, pppd, etc.</A>
</H2>
<P> This means that communication with the serial port isn't working
right. It could mean that there isn't any serial port at the IO
address that setserial thinks your port is at. It could also be an
interrupt conflict (or an IO address conflict). It also may mean that
the serial port is in use (busy or opened) and thus the attempt to
get/set parameters by setserial or stty failed. It will also happen
if you make a typo in the serial port name such as typing "ttys"
instead of "ttyS". </P>
<H2><A NAME="ss16.17">16.17</A> <A HREF="Serial-HOWTO.html#toc16.17">"LSR safety check engaged"</A>
</H2>
<P>LSR is the name of a hardware register. It usually means that
there is no serial port at the address where the driver thinks your
serial port is located. You need to find your serial port and
possibly configure it. See
<A HREF="Serial-HOWTO-8.html#locate_port">Locating the Serial Port: IO address IRQs</A> and/or
<A HREF="Serial-HOWTO-11.html#set_serial">What is Setserial</A></P>
<H2><A NAME="ss16.18">16.18</A> <A HREF="Serial-HOWTO.html#toc16.18">Overrun errors on serial port</A>
</H2>
<P> This is an overrun of the hardware FIFO buffer and you can't
increase its size. Bug note (reported in 2002): Due to a bug in some
kernel 2.4 versions, the port number may be missing and you will only
see "ttyS" (no port number). But if devfs notation such as "tts/2"
was being used, there was no bug. See
<A HREF="Serial-HOWTO-12.html#higher_thruput">Higher Serial Thruput</A>.</P>
<H2><A NAME="ss16.19">16.19</A> <A HREF="Serial-HOWTO.html#toc16.19">Port gets characters only sporadically</A>
</H2>
<P> There could be some other program running on the port. Use "top"
(provided you've set it to display the port number) or type "ps
-alxw". Look at the results to see if the port is being used by
another program. Be on the lookout for the gpm mouse program which
often runs on a serial port.</P>
<H2><A NAME="ss16.20">16.20</A> <A HREF="Serial-HOWTO.html#toc16.20">Troubleshooting Tools</A>
</H2>
<P> These are some of the programs you might want to use in
troubleshooting:
<UL>
<LI> "lsof /dev/ttyS*" will list serial ports which are open.</LI>
<LI> "setserial" shows and sets the low-level hardware configuration
of a port (what the driver thinks it is). See
<A HREF="Serial-HOWTO-11.html#set_serial">What is Setserial</A></LI>
<LI> "stty" shows and sets the configuration of a port (except for
that handled by "setserial").
See the section
<A HREF="Serial-HOWTO-11.html#stty_">Stty</A></LI>
<LI> "modemstat" or "statserial" or "watch head
/proc/tty/driver/serial" will show the current state of various modem
signal lines (such as DTR, CTS, etc.). The one in /proc also shows
byte flow and errors.</LI>
<LI> "irqtune" will give serial port interrupts higher
priority to improve performance.</LI>
<LI> "hdparm" for hard-disk tuning may help some more.</LI>
<LI> "lspci" shows the actual IRQs, etc. of hardware on the PCI bus.</LI>
<LI> "pnpdump --dumpregs" shows the actual IRQs, etc. of hardware for
PnP devices on the ISA bus.</LI>
<LI> Some "files" in the /proc tree (such as ioports, interrupts,
and tty/driver/serial).</LI>
</UL>
</P>
<H2><A NAME="ss16.21">16.21</A> <A HREF="Serial-HOWTO.html#toc16.21">Almost all characters are wrong; Many missing or many extras</A>
</H2>
<P>Perhaps a baud mismatch. If one port sends at twice the speed that
the other port is set to receive, then every two characters sent will
be received as one character. Each bit of this received character
will be a sample of two bits of what is sent and will be wrong. Also,
only half the characters sent seem to get received. For flow in the
reverse direction, it's just the opposite. Twice as many characters
get received than were sent. A worse mismatch will produce even worse
results.</P>
<P>A speed mismatch is not likely to happen with a modem since the modem
autodetects the speed. One cause of a mismatch may be due to serial
port hardware that has been set to run at very fast speeds. It may
actually operate at a speed say 8 times that of which you (or an
application) set it via software. See
<A HREF="Serial-HOWTO-12.html#high_speed">Very High Speeds</A></P>
<HR>
<A HREF="Serial-HOWTO-17.html">Next</A>
<A HREF="Serial-HOWTO-15.html">Previous</A>
<A HREF="Serial-HOWTO.html#toc16">Contents</A>
</BODY>
</HTML>