324 lines
8.5 KiB
HTML
324 lines
8.5 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
|
<HTML
|
|
><HEAD
|
|
><TITLE
|
|
>How does my computer store things in memory?</TITLE
|
|
><META
|
|
NAME="GENERATOR"
|
|
CONTENT="Modular DocBook HTML Stylesheet Version 1.7"><LINK
|
|
REL="HOME"
|
|
TITLE="The Unix and Internet Fundamentals HOWTO"
|
|
HREF="index.html"><LINK
|
|
REL="PREVIOUS"
|
|
TITLE="How does my computer keep processes from stepping on each other?"
|
|
HREF="memory-management.html"><LINK
|
|
REL="NEXT"
|
|
TITLE="How does my computer store things on disk?"
|
|
HREF="disk-layout.html"></HEAD
|
|
><BODY
|
|
CLASS="sect1"
|
|
BGCOLOR="#FFFFFF"
|
|
TEXT="#000000"
|
|
LINK="#0000FF"
|
|
VLINK="#840084"
|
|
ALINK="#0000FF"
|
|
><DIV
|
|
CLASS="NAVHEADER"
|
|
><TABLE
|
|
SUMMARY="Header navigation table"
|
|
WIDTH="100%"
|
|
BORDER="0"
|
|
CELLPADDING="0"
|
|
CELLSPACING="0"
|
|
><TR
|
|
><TH
|
|
COLSPAN="3"
|
|
ALIGN="center"
|
|
>The Unix and Internet Fundamentals HOWTO</TH
|
|
></TR
|
|
><TR
|
|
><TD
|
|
WIDTH="10%"
|
|
ALIGN="left"
|
|
VALIGN="bottom"
|
|
><A
|
|
HREF="memory-management.html"
|
|
ACCESSKEY="P"
|
|
>Prev</A
|
|
></TD
|
|
><TD
|
|
WIDTH="80%"
|
|
ALIGN="center"
|
|
VALIGN="bottom"
|
|
></TD
|
|
><TD
|
|
WIDTH="10%"
|
|
ALIGN="right"
|
|
VALIGN="bottom"
|
|
><A
|
|
HREF="disk-layout.html"
|
|
ACCESSKEY="N"
|
|
>Next</A
|
|
></TD
|
|
></TR
|
|
></TABLE
|
|
><HR
|
|
ALIGN="LEFT"
|
|
WIDTH="100%"></DIV
|
|
><DIV
|
|
CLASS="sect1"
|
|
><H1
|
|
CLASS="sect1"
|
|
><A
|
|
NAME="core-formats"
|
|
></A
|
|
>9. How does my computer store things in memory?</H1
|
|
><P
|
|
>You probably know that everything on a computer is stored as strings of
|
|
bits (binary digits; you can think of them as lots of little on-off
|
|
switches). Here we'll explain how those bits are used to represent the
|
|
letters and numbers that your computer is crunching.</P
|
|
><P
|
|
>Before we can go into this, you need to understand about the
|
|
<I
|
|
CLASS="firstterm"
|
|
>word size</I
|
|
> of your computer. The word size is the
|
|
computer's preferred size for moving units of information around;
|
|
technically it's the width of your processor's
|
|
<I
|
|
CLASS="firstterm"
|
|
>registers</I
|
|
>,
|
|
which are the holding areas your processor uses to do arithmetic and
|
|
logical calculations. When people write about computers having bit sizes
|
|
(calling them, say, <SPAN
|
|
CLASS="QUOTE"
|
|
>"32-bit"</SPAN
|
|
> or <SPAN
|
|
CLASS="QUOTE"
|
|
>"64-bit"</SPAN
|
|
> computers), this is what
|
|
they mean.</P
|
|
><P
|
|
>Most computers now have a word size of 64 bits. In the recent past
|
|
(early 2000s) many PCs had 32-bit words. The old 286 machines back in the
|
|
1980s had a word size of 16. Old-style mainframes often had 36-bit
|
|
words.</P
|
|
><P
|
|
>The computer views your memory as a sequence of words numbered from
|
|
zero up to some large value dependent on your memory size. That value is
|
|
limited by your word size, which is why programs on older machines like
|
|
286s had to go through painful contortions to address large amounts of
|
|
memory. I won't describe them here; they still give older programmers
|
|
nightmares.</P
|
|
><DIV
|
|
CLASS="sect2"
|
|
><H2
|
|
CLASS="sect2"
|
|
><A
|
|
NAME="numbers"
|
|
></A
|
|
>9.1. Numbers</H2
|
|
><P
|
|
>Integer numbers are represented as either words or pairs of words,
|
|
depending on your processor's word size. One 64-bit machine word is the
|
|
most common integer representation.</P
|
|
><P
|
|
>Integer arithmetic is close to but not actually mathematical
|
|
base-two. The low-order bit is 1, next 2, then 4 and so forth as in pure
|
|
binary. But signed numbers are represented in
|
|
<I
|
|
CLASS="firstterm"
|
|
>twos-complement</I
|
|
>
|
|
notation. The highest-order bit is a <I
|
|
CLASS="firstterm"
|
|
>sign
|
|
bit</I
|
|
> which
|
|
makes the quantity negative, and every negative number can be obtained from
|
|
the corresponding positive value by inverting all the bits and adding one.
|
|
This is why integers on a 64-bit machine have the range
|
|
-2<SUP
|
|
>63</SUP
|
|
> to 2<SUP
|
|
>63</SUP
|
|
> - 1.
|
|
That 64th bit is being used for sign; 0 means a positive number or zero, 1
|
|
a negative number.</P
|
|
><P
|
|
>Some computer languages give you access to <I
|
|
CLASS="firstterm"
|
|
>unsigned
|
|
arithmetic</I
|
|
> which is straight base 2 with zero and
|
|
positive numbers only.</P
|
|
><P
|
|
>Most processors and some languages can do operations in
|
|
<I
|
|
CLASS="firstterm"
|
|
>floating-point</I
|
|
>
|
|
numbers (this capability is built into all recent processor chips).
|
|
Floating-point numbers give you a much wider range of values than integers
|
|
and let you express fractions. The ways in which this is done vary and are
|
|
rather too complicated to discuss in detail here, but the general idea is
|
|
much like so-called ‘scientific notation’, where one might
|
|
write (say) 1.234 * 10<SUP
|
|
>23</SUP
|
|
>; the encoding of the
|
|
number is split into a
|
|
<I
|
|
CLASS="firstterm"
|
|
>mantissa</I
|
|
>
|
|
(1.234) and the exponent part (23) for the power-of-ten multiplier (which
|
|
means the number multiplied out would have 20 zeros on it, 23 minus the
|
|
three decimal places).</P
|
|
></DIV
|
|
><DIV
|
|
CLASS="sect2"
|
|
><H2
|
|
CLASS="sect2"
|
|
><A
|
|
NAME="characters"
|
|
></A
|
|
>9.2. Characters</H2
|
|
><P
|
|
>Characters are normally represented as strings of seven bits each in
|
|
an encoding called ASCII (American Standard Code for Information
|
|
Interchange). On modern machines, each of the 128 ASCII characters is the
|
|
low seven bits of an
|
|
<I
|
|
CLASS="firstterm"
|
|
>octet</I
|
|
>
|
|
or 8-bit byte; octets are packed into memory words so that (for example) a
|
|
six-character string only takes up one 64-bit memory word. For an ASCII code
|
|
chart, type ‘man 7 ascii’ at your Unix prompt.</P
|
|
><P
|
|
>The preceding paragraph was misleading in two ways. The minor one is
|
|
that the term ‘octet’ is formally correct but seldom actually
|
|
used; most people refer to an octet as
|
|
<I
|
|
CLASS="firstterm"
|
|
>byte</I
|
|
>
|
|
and expect bytes to be eight bits long. Strictly speaking, the term
|
|
‘byte’ is more general; there used to be, for example, 36-bit
|
|
machines with 9-bit bytes (though there probably never will be
|
|
again).</P
|
|
><P
|
|
>The major one is that not all the world uses ASCII. In fact, much of
|
|
the world can't — ASCII, while fine for American English, lacks many
|
|
accented and other special characters needed by users of other languages.
|
|
Even British English has trouble with the lack of a pound-currency
|
|
sign.</P
|
|
><P
|
|
>There have been several attempts to fix this problem. All use the
|
|
extra high bit that ASCII doesn't, making it the low half of a
|
|
256-character set. The most widely-used of these is the so-called
|
|
‘Latin-1’ character set (more formally called ISO 8859-1).
|
|
This is the default character set for Linux, older versions of HTML, and X.
|
|
Microsoft Windows uses a mutant version of Latin-1 that adds a bunch of
|
|
characters such as right and left double quotes in places proper Latin-1
|
|
leaves unassigned for historical reasons (for a scathing account of the
|
|
trouble this causes, see the <A
|
|
HREF="http://www.fourmilab.ch/webtools/demoroniser/"
|
|
TARGET="_top"
|
|
>demoroniser</A
|
|
>
|
|
page).</P
|
|
><P
|
|
>Latin-1 handles western European languages, including English,
|
|
French, German, Spanish, Italian, Dutch, Norwegian, Swedish, Danish, and
|
|
Icelandic. However, this isn't good enough either, and as a result there
|
|
is a whole series of Latin-2 through -9 character sets to handle things
|
|
like Greek, Arabic, Hebrew, Esperanto, and Serbo-Croatian. For details,
|
|
see the <A
|
|
HREF="http://czyborra.com/charsets/iso8859.html"
|
|
TARGET="_top"
|
|
> ISO alphabet soup</A
|
|
> page.</P
|
|
><P
|
|
>The ultimate solution is a huge standard called Unicode (and its
|
|
identical twin ISO/IEC 10646-1:1993). Unicode is identical to Latin-1 in
|
|
its lowest 256 slots. Above these in 16-bit space it includes Greek,
|
|
Cyrillic, Armenian, Hebrew, Arabic, Devanagari, Bengali, Gurmukhi,
|
|
Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam, Thai, Lao, Georgian,
|
|
Tibetan, Japanese Kana, the complete set of modern Korean Hangul, and a
|
|
unified set of Chinese/Japanese/Korean (CJK) ideographs. For details, see
|
|
the <A
|
|
HREF="http://www.unicode.org/"
|
|
TARGET="_top"
|
|
>Unicode Home Page</A
|
|
>. XML
|
|
and XHTML use this character set.</P
|
|
><P
|
|
>Recent versions of Linux use an encoding of Unicode called UTF-8. In
|
|
UTF, characters 0-127 are ASCII. Characters 128-255 are used only in
|
|
sequences of 2 through 4 bytes that identify non-ASCII characters.</P
|
|
></DIV
|
|
></DIV
|
|
><DIV
|
|
CLASS="NAVFOOTER"
|
|
><HR
|
|
ALIGN="LEFT"
|
|
WIDTH="100%"><TABLE
|
|
SUMMARY="Footer navigation table"
|
|
WIDTH="100%"
|
|
BORDER="0"
|
|
CELLPADDING="0"
|
|
CELLSPACING="0"
|
|
><TR
|
|
><TD
|
|
WIDTH="33%"
|
|
ALIGN="left"
|
|
VALIGN="top"
|
|
><A
|
|
HREF="memory-management.html"
|
|
ACCESSKEY="P"
|
|
>Prev</A
|
|
></TD
|
|
><TD
|
|
WIDTH="34%"
|
|
ALIGN="center"
|
|
VALIGN="top"
|
|
><A
|
|
HREF="index.html"
|
|
ACCESSKEY="H"
|
|
>Home</A
|
|
></TD
|
|
><TD
|
|
WIDTH="33%"
|
|
ALIGN="right"
|
|
VALIGN="top"
|
|
><A
|
|
HREF="disk-layout.html"
|
|
ACCESSKEY="N"
|
|
>Next</A
|
|
></TD
|
|
></TR
|
|
><TR
|
|
><TD
|
|
WIDTH="33%"
|
|
ALIGN="left"
|
|
VALIGN="top"
|
|
>How does my computer keep processes from stepping on each other?</TD
|
|
><TD
|
|
WIDTH="34%"
|
|
ALIGN="center"
|
|
VALIGN="top"
|
|
> </TD
|
|
><TD
|
|
WIDTH="33%"
|
|
ALIGN="right"
|
|
VALIGN="top"
|
|
>How does my computer store things on disk?</TD
|
|
></TR
|
|
></TABLE
|
|
></DIV
|
|
></BODY
|
|
></HTML
|
|
> |