2004-11-03 13:51:07 +00:00
|
|
|
'\"
|
|
|
|
.\" (C) Copyright 1999-2000 David A. Wheeler (dwheeler@dwheeler.com)
|
|
|
|
.\"
|
|
|
|
.\" Permission is granted to make and distribute verbatim copies of this
|
|
|
|
.\" manual provided the copyright notice and this permission notice are
|
|
|
|
.\" preserved on all copies.
|
|
|
|
.\"
|
|
|
|
.\" Permission is granted to copy and distribute modified versions of this
|
|
|
|
.\" manual under the conditions for verbatim copying, provided that the
|
|
|
|
.\" entire resulting derived work is distributed under the terms of a
|
|
|
|
.\" permission notice identical to this one.
|
|
|
|
.\"
|
|
|
|
.\" Since the Linux kernel and libraries are constantly changing, this
|
|
|
|
.\" manual page may be incorrect or out-of-date. The author(s) assume no
|
|
|
|
.\" responsibility for errors or omissions, or for damages resulting from
|
|
|
|
.\" the use of the information contained herein. The author(s) may not
|
|
|
|
.\" have taken the same level of care in the production of this manual,
|
|
|
|
.\" which is licensed free of charge, as they might when working
|
|
|
|
.\" professionally.
|
|
|
|
.\"
|
|
|
|
.\" Formatted or processed versions of this manual, if unaccompanied by
|
|
|
|
.\" the source, must acknowledge the copyright and authors of this work.
|
|
|
|
.\"
|
|
|
|
.\" Fragments of this document are directly derived from IETF standards.
|
|
|
|
.\" For those fragments which are directly derived from such standards,
|
|
|
|
.\" the following notice applies, which is the standard copyright and
|
|
|
|
.\" rights announcement of The Internet Society:
|
|
|
|
.\"
|
|
|
|
.\" Copyright (C) The Internet Society (1998). All Rights Reserved.
|
|
|
|
.\" This document and translations of it may be copied and furnished to
|
|
|
|
.\" others, and derivative works that comment on or otherwise explain it
|
|
|
|
.\" or assist in its implementation may be prepared, copied, published
|
|
|
|
.\" and distributed, in whole or in part, without restriction of any
|
|
|
|
.\" kind, provided that the above copyright notice and this paragraph are
|
|
|
|
.\" included on all such copies and derivative works. However, this
|
|
|
|
.\" document itself may not be modified in any way, such as by removing
|
|
|
|
.\" the copyright notice or references to the Internet Society or other
|
|
|
|
.\" Internet organizations, except as needed for the purpose of
|
|
|
|
.\" developing Internet standards in which case the procedures for
|
|
|
|
.\" copyrights defined in the Internet Standards process must be
|
|
|
|
.\" followed, or as required to translate it into languages other than English.
|
|
|
|
.\"
|
|
|
|
.\" Modified Fri Jul 25 23:00:00 1999 by David A. Wheeler (dwheeler@dwheeler.com)
|
|
|
|
.\" Modified Fri Aug 21 23:00:00 1999 by David A. Wheeler (dwheeler@dwheeler.com)
|
|
|
|
.\" Modified Tue Mar 14 2000 by David A. Wheeler (dwheeler@dwheeler.com)
|
|
|
|
.\"
|
|
|
|
.TH URI 7 2000-03-14 "Linux" "Linux Programmer's Manual"
|
|
|
|
.SH NAME
|
|
|
|
uri, url, urn \- uniform resource identifier (URI), including a URL or URN
|
|
|
|
.SH SYNOPSIS
|
|
|
|
.nf
|
|
|
|
.HP 0.2i
|
|
|
|
URI = [ absoluteURI | relativeURI ] [ "#" fragment ]
|
|
|
|
.HP
|
|
|
|
absoluteURI = scheme ":" ( hierarchical_part | opaque_part )
|
|
|
|
.HP
|
|
|
|
relativeURI = ( net_path | absolute_path | relative_path ) [ "?" query ]
|
|
|
|
.sp
|
|
|
|
.HP
|
|
|
|
scheme = "http" | "ftp" | "gopher" | "mailto" | "news" | "telnet" | "file" | "man" | "info" | "whatis" | "ldap" | "wais" | \&...
|
|
|
|
.HP
|
|
|
|
hierarchical_part = ( net_path | absolute_path ) [ "?" query ]
|
|
|
|
.sp
|
|
|
|
.HP
|
|
|
|
net_path = "//" authority [ absolute_path ]
|
|
|
|
.HP
|
|
|
|
absolute_path = "/" path_segments
|
|
|
|
.HP
|
|
|
|
relative_path = relative_segment [ absolute_path ]
|
|
|
|
.fi
|
|
|
|
.SH DESCRIPTION
|
|
|
|
.PP
|
|
|
|
A Uniform Resource Identifier (URI) is a short string of characters
|
|
|
|
identifying an abstract or physical resource (for example, a web page).
|
|
|
|
A Uniform Resource Locator (URL) is a URI
|
|
|
|
that identifies a resource through its primary access
|
|
|
|
mechanism (e.g., its network "location"), rather than
|
|
|
|
by name or some other attribute of that resource.
|
|
|
|
A Uniform Resource Name (URN) is a URI
|
|
|
|
that must remain globally unique and persistent even when
|
|
|
|
the resource ceases to exist or becomes unavailable.
|
|
|
|
.PP
|
|
|
|
URIs are the standard way to name hypertext link destinations
|
|
|
|
for tools such as web browsers.
|
|
|
|
The string "http://www.kernelnotes.org" is a URL (and thus it's a URI).
|
|
|
|
Many people use the term URL loosely as a synonym for URI
|
|
|
|
(though technically URLs are a subset of URIs).
|
|
|
|
.PP
|
|
|
|
URIs can be absolute or relative.
|
|
|
|
An absolute identifier refers to a resource independent of
|
|
|
|
context, while a relative
|
|
|
|
identifier refers to a resource by describing the difference
|
|
|
|
from the current context.
|
|
|
|
Within a relative path reference, the complete path segments "." and
|
|
|
|
".." have special meanings: "the current hierarchy level" and "the
|
|
|
|
level above this hierarchy level", respectively, just like they do in
|
|
|
|
Unix-like systems.
|
|
|
|
A path segment which contains a colon
|
|
|
|
character can't be used as the first segment of a relative URI path
|
|
|
|
(e.g., "this:that"), because it would be mistaken for a scheme name;
|
|
|
|
precede such segments with ./ (e.g., "./this:that").
|
2005-06-24 14:44:16 +00:00
|
|
|
Note that descendants of MS-DOS (e.g., Microsoft Windows) replace
|
2004-11-03 13:51:07 +00:00
|
|
|
devicename colons with the vertical bar ("|") in URIs, so "C:" becomes "C|".
|
|
|
|
.PP
|
|
|
|
A fragment identifier, if included, refers to a particular named portion
|
|
|
|
(fragment) of a resource; text after a '#' identifies the fragment.
|
|
|
|
A URI beginning with '#' refers to that fragment in the current resource.
|
|
|
|
.SH USAGE
|
|
|
|
There are many different URI schemes, each with specific
|
|
|
|
additional rules and meanings, but they are intentionally made to be
|
|
|
|
as similar as possible.
|
|
|
|
For example, many URL schemes
|
|
|
|
permit the authority to be the following format, called here an
|
|
|
|
.I ip_server
|
|
|
|
(square brackets show what's optional):
|
|
|
|
.HP
|
|
|
|
.IR "ip_server = " [ user " [ : " password " ] @ ] " host " [ : " port ]
|
|
|
|
.PP
|
|
|
|
This format allows you to optionally insert a user name,
|
|
|
|
a user plus password, and/or a port number.
|
|
|
|
The
|
|
|
|
.I host
|
|
|
|
is the name of the host computer, either its name as determined by DNS
|
|
|
|
or an IP address (numbers separated by periods).
|
|
|
|
Thus the URI
|
|
|
|
<http://fred:fredpassword@xyz.com:8080/>
|
|
|
|
logs into a web server on host xyz.com
|
|
|
|
as fred (using fredpassword) using port 8080.
|
|
|
|
Avoid including a password in a URI if possible because of the many
|
|
|
|
security risks of having a password written down.
|
|
|
|
If the URL supplies a user name but no password, and the remote
|
|
|
|
server requests a password, the program interpreting the URL
|
|
|
|
should request one from the user.
|
|
|
|
.PP
|
|
|
|
Here are some of the most common schemes in use on Unix-like systems
|
|
|
|
that are understood by many tools.
|
|
|
|
Note that many tools using URIs also have internal schemes or specialized
|
|
|
|
schemes; see those tools' documentation for information on those schemes.
|
2005-07-06 07:41:37 +00:00
|
|
|
.SS "http \- Web (HTTP) server"
|
2004-11-03 13:51:07 +00:00
|
|
|
.RI http:// ip_server / path
|
|
|
|
.br
|
|
|
|
.RI http:// ip_server / path ? query
|
|
|
|
.PP
|
|
|
|
This is a URL accessing a web (HTTP) server.
|
|
|
|
The default port is 80.
|
|
|
|
If the path refers to a directory, the web server will choose what
|
|
|
|
to return; usually if there is a file named "index.html" or "index.htm"
|
|
|
|
its content is returned, otherwise, a list of the files in the current
|
|
|
|
directory (with appropriate links) is generated and returned.
|
|
|
|
An example is <http://lwn.net>.
|
|
|
|
.PP
|
|
|
|
A query can be given in the archaic "isindex" format, consisting of a
|
|
|
|
word or phrase and not including an equal sign (=).
|
|
|
|
A query can also be in the longer "GET" format, which has one or more
|
|
|
|
query entries of the form
|
|
|
|
.IR key = value
|
|
|
|
separated by the ampersand character (&).
|
|
|
|
Note that
|
|
|
|
.I key
|
|
|
|
can be repeated more than once, though it's up to the web server
|
|
|
|
and its application programs to determine if there's any meaning to that.
|
|
|
|
There is an unfortunate interaction with HTML/XML/SGML and
|
|
|
|
the GET query format; when such URIs with more than one key
|
|
|
|
are embedded in SGML/XML documents (including HTML), the ampersand
|
|
|
|
(&) has to be rewritten as &.
|
|
|
|
Note that not all queries use this format; larger forms
|
|
|
|
may be too long to store as a URI, so they use a different
|
|
|
|
interaction mechanism (called POST) which does not include the data in the URI.
|
|
|
|
See the Common Gateway Interface specification at
|
|
|
|
<http://www.w3.org/CGI> for more information.
|
2005-07-06 07:41:37 +00:00
|
|
|
.SS "ftp \- File Transfer Protocol (FTP)"
|
2004-11-03 13:51:07 +00:00
|
|
|
.RI ftp:// ip_server / path
|
|
|
|
.PP
|
|
|
|
This is a URL accessing a file through the file transfer protocol (FTP).
|
|
|
|
The default port (for control) is 21.
|
|
|
|
If no username is included, the user name "anonymous" is supplied, and
|
|
|
|
in that case many clients provide as the password the requestor's
|
|
|
|
Internet email address.
|
|
|
|
An example is
|
|
|
|
<ftp://ftp.is.co.za/rfc/rfc1808.txt>.
|
2005-07-06 07:41:37 +00:00
|
|
|
.SS "gopher \- Gopher server"
|
2004-11-03 13:51:07 +00:00
|
|
|
.RI gopher:// ip_server / "gophertype selector"
|
|
|
|
.br
|
|
|
|
.RI gopher:// ip_server / "gophertype selector" %09 search
|
|
|
|
.br
|
|
|
|
.RI gopher:// ip_server / "gophertype selector" %09 search %09 gopher+_string
|
|
|
|
.br
|
|
|
|
.PP
|
|
|
|
The default gopher port is 70.
|
|
|
|
.I gophertype
|
|
|
|
is a single-character field to denote the
|
|
|
|
Gopher type of the resource to
|
|
|
|
which the URL refers.
|
|
|
|
The entire path may also be empty, in
|
|
|
|
which case the delimiting "/" is also optional and the gophertype
|
|
|
|
defaults to "1".
|
|
|
|
.PP
|
|
|
|
.I selector
|
|
|
|
is the Gopher selector string. In the Gopher protocol,
|
|
|
|
Gopher selector strings are a sequence of octets which may contain
|
|
|
|
any octets except 09 hexadecimal (US-ASCII HT or tab), 0A hexadecimal
|
|
|
|
(US-ASCII character LF), and 0D (US-ASCII character CR).
|
2005-07-06 07:41:37 +00:00
|
|
|
.SS "mailto \- Email address"
|
2004-11-03 13:51:07 +00:00
|
|
|
.RI mailto: email-address
|
|
|
|
.PP
|
|
|
|
This is an email address, usually of the form
|
|
|
|
.IR name @ hostname .
|
|
|
|
See
|
|
|
|
.BR mailaddr (7)
|
|
|
|
for more information on the correct format of an email address.
|
|
|
|
Note that any % character must be rewritten as %25.
|
|
|
|
An example is <mailto:dwheeler@dwheeler.com>.
|
2005-07-06 07:41:37 +00:00
|
|
|
.SS "news \- Newsgroup or News message"
|
2004-11-03 13:51:07 +00:00
|
|
|
.RI news: newsgroup-name
|
|
|
|
.br
|
|
|
|
.RI news: message-id
|
|
|
|
.PP
|
|
|
|
A
|
|
|
|
.I newsgroup-name
|
|
|
|
is a period-delimited hierarchical name, such as
|
|
|
|
"comp.infosystems.www.misc".
|
|
|
|
If <newsgroup-name> is "*" (as in <news:*>), it is used to refer
|
|
|
|
to "all available news groups".
|
|
|
|
An example is <news:comp.lang.ada>.
|
|
|
|
.PP
|
|
|
|
A
|
|
|
|
.I message-id
|
|
|
|
corresponds to the Message-ID of
|
2006-07-20 16:16:51 +00:00
|
|
|
.URL http://www.ietf.org/rfc/rfc1036.txt
|
2005-07-20 07:50:45 +00:00
|
|
|
IETF RFC\ 1036,
|
2004-11-03 13:51:07 +00:00
|
|
|
.UE
|
|
|
|
without the enclosing "<"
|
|
|
|
and ">"; it takes the form
|
|
|
|
.IR unique @ full_domain_name .
|
|
|
|
A message identifier may be distinguished from a news group name by the
|
|
|
|
presence of the "@" character.
|
2005-07-06 07:41:37 +00:00
|
|
|
.SS "telnet \- Telnet login"
|
2004-11-03 13:51:07 +00:00
|
|
|
.RI telnet:// ip_server /
|
|
|
|
.PP
|
|
|
|
The Telnet URL scheme is used to designate interactive text services that
|
|
|
|
may be accessed by the Telnet protocol. The final "/" character may be omitted.
|
|
|
|
The default port is 23.
|
|
|
|
An example is <telnet://melvyl.ucop.edu/>.
|
2005-07-06 07:41:37 +00:00
|
|
|
.SS "file \- Normal file"
|
2004-11-03 13:51:07 +00:00
|
|
|
.RI file:// ip_server / path_segments
|
|
|
|
.br
|
|
|
|
.RI file: path_segments
|
|
|
|
.PP
|
|
|
|
This represents a file or directory accessible locally.
|
|
|
|
As a special case,
|
|
|
|
.I host
|
|
|
|
can be the string "localhost" or the empty
|
|
|
|
string; this is interpreted as `the machine from which the URL is
|
|
|
|
being interpreted'.
|
|
|
|
If the path is to a directory, the viewer should display the
|
|
|
|
directory's contents with links to each containee;
|
|
|
|
not all viewers currently do this.
|
|
|
|
KDE supports generated files through the URL <file:/cgi-bin>.
|
|
|
|
If the given file isn't found, browser writers may want to try to expand
|
|
|
|
the filename via filename globbing
|
|
|
|
(see
|
|
|
|
.BR glob (7)
|
|
|
|
and
|
|
|
|
.BR glob (3)).
|
|
|
|
.PP
|
|
|
|
The second format (e.g., <file:/etc/passwd>)
|
|
|
|
is a correct format for referring to
|
|
|
|
a local file. However, older standards did not permit this format,
|
|
|
|
and some programs don't recognize this as a URI.
|
|
|
|
A more portable syntax is to use an empty string as the server name, e.g.,
|
|
|
|
<file:///etc/passwd>; this form does the same thing
|
|
|
|
and is easily recognized by pattern matchers and older programs as a URI.
|
|
|
|
Note that if you really mean to say "start from the current location," don't
|
|
|
|
specify the scheme at all; use a relative address like <../test.txt>,
|
|
|
|
which has the side-effect of being scheme-independent.
|
|
|
|
An example of this scheme is <file:///etc/passwd>.
|
2005-07-06 07:41:37 +00:00
|
|
|
.SS "man \- Man page documentation"
|
2004-11-03 13:51:07 +00:00
|
|
|
.RI man: command-name
|
|
|
|
.br
|
|
|
|
.RI man: command-name ( section )
|
|
|
|
.PP
|
|
|
|
This refers to local online manual (man) reference pages.
|
|
|
|
The command name can optionally be followed by a parenthesis and section number;
|
|
|
|
see
|
|
|
|
.BR man (7)
|
|
|
|
for more information on the meaning of the section numbers.
|
|
|
|
This URI scheme is unique to Unix-like systems (such as Linux)
|
|
|
|
and is not currently registered by the IETF.
|
|
|
|
An example is <man:ls(1)>.
|
2005-07-06 07:41:37 +00:00
|
|
|
.SS "info \- Info page documentation"
|
2004-11-03 13:51:07 +00:00
|
|
|
.RI info: virtual-filename
|
|
|
|
.br
|
|
|
|
.RI info: virtual-filename # nodename
|
|
|
|
.br
|
|
|
|
.RI info:( virtual-filename )
|
|
|
|
.br
|
|
|
|
.RI info:( virtual-filename ) nodename
|
|
|
|
.PP
|
|
|
|
This scheme refers to online info reference pages (generated from
|
|
|
|
texinfo files), a documentation format used by programs such as the GNU tools.
|
|
|
|
This URI scheme is unique to Unix-like systems (such as Linux)
|
|
|
|
and is not currently registered by the IETF.
|
|
|
|
As of this writing, GNOME and KDE differ in their URI syntax
|
|
|
|
and do not accept the other's syntax.
|
|
|
|
The first two formats are the GNOME format; in nodenames all spaces
|
|
|
|
are written as underscores.
|
|
|
|
The second two formats are the KDE format;
|
|
|
|
spaces in nodenames must be written as spaces, even though this
|
|
|
|
is forbidden by the URI standards.
|
|
|
|
It's hoped that in the future most tools will understand all of these
|
|
|
|
formats and will always accept underscores for spaces in nodenames.
|
|
|
|
In both GNOME and KDE, if the form without the nodename is used the
|
|
|
|
nodename is assumed to be "Top".
|
|
|
|
Examples of the GNOME format are <info:gcc> and <info:gcc#G++_and_GCC>.
|
|
|
|
Examples of the KDE format are <info:(gcc)> and <info:(gcc)G++ and GCC>.
|
2005-07-06 07:41:37 +00:00
|
|
|
.SS "whatis \- Documentation search"
|
2004-11-03 13:51:07 +00:00
|
|
|
.RI whatis: string
|
|
|
|
.PP
|
|
|
|
This scheme searches the database of short (one-line) descriptions of commands
|
|
|
|
and returns a list of descriptions containing that string.
|
|
|
|
Only complete word matches are returned.
|
|
|
|
See
|
|
|
|
.BR whatis (1).
|
|
|
|
This URI scheme is unique to Unix-like systems (such as Linux)
|
|
|
|
and is not currently registered by the IETF.
|
2005-07-06 07:41:37 +00:00
|
|
|
.SS "ghelp \- GNOME help documentation"
|
2004-11-03 13:51:07 +00:00
|
|
|
.RI ghelp: name-of-application
|
|
|
|
.PP
|
|
|
|
This loads GNOME help for the given application.
|
|
|
|
Note that not much documentation currently exists in this format.
|
2005-07-06 07:41:37 +00:00
|
|
|
.SS "ldap \- Lightweight Directory Access Protocol"
|
2004-11-03 13:51:07 +00:00
|
|
|
.RI ldap:// hostport
|
|
|
|
.br
|
|
|
|
.RI ldap:// hostport /
|
|
|
|
.br
|
|
|
|
.RI ldap:// hostport / dn
|
|
|
|
.br
|
|
|
|
.RI ldap:// hostport / dn ? attributes
|
|
|
|
.br
|
|
|
|
.RI ldap:// hostport / dn ? attributes ? scope
|
|
|
|
.br
|
|
|
|
.RI ldap:// hostport / dn ? attributes ? scope ? filter
|
|
|
|
.br
|
|
|
|
.RI ldap:// hostport / dn ? attributes ? scope ? filter ? extensions
|
|
|
|
.PP
|
|
|
|
This scheme supports queries to the
|
|
|
|
Lightweight Directory Access Protocol (LDAP), a protocol for querying
|
|
|
|
a set of servers for hierarchically-organized information
|
|
|
|
(such as people and computing resources).
|
|
|
|
More information on the LDAP URL scheme is available in
|
|
|
|
.UR http://www.ietf.org/rfc/rfc2255.txt
|
2005-07-20 07:50:45 +00:00
|
|
|
RFC\ 2255.
|
2004-11-03 13:51:07 +00:00
|
|
|
.UE
|
|
|
|
The components of this URL are:
|
|
|
|
.IP hostport 12
|
|
|
|
the LDAP server to query, written as a hostname optionally followed by
|
|
|
|
a colon and the port number.
|
|
|
|
The default LDAP port is TCP port 389.
|
|
|
|
If empty, the client determines which the LDAP server to use.
|
|
|
|
.IP dn
|
|
|
|
the LDAP Distinguished Name, which identifies
|
|
|
|
the base object of the LDAP search (see
|
|
|
|
.UR http://www.ietf.org/rfc/rfc2253.txt
|
2005-07-20 07:50:45 +00:00
|
|
|
RFC\ 2253
|
2004-11-03 13:51:07 +00:00
|
|
|
.UE
|
|
|
|
section 3).
|
|
|
|
.IP attributes
|
|
|
|
a comma-separated list of attributes to be returned;
|
2005-07-20 07:50:45 +00:00
|
|
|
see RFC\ 2251 section 4.1.5.
|
|
|
|
If omitted, all attributes should be returned.
|
2004-11-03 13:51:07 +00:00
|
|
|
.IP scope
|
|
|
|
specifies the scope of the search, which can be one of
|
|
|
|
"base" (for a base object search), "one" (for a one-level search),
|
|
|
|
or "sub" (for a subtree search). If scope is omitted, "base" is assumed.
|
|
|
|
.IP filter
|
|
|
|
specifies the search filter (subset of entries
|
|
|
|
to return). If omitted, all entries should be returned.
|
|
|
|
See
|
|
|
|
.UR http://www.ietf.org/rfc/rfc2254.txt
|
2005-07-20 07:50:45 +00:00
|
|
|
RFC\ 2254
|
2004-11-03 13:51:07 +00:00
|
|
|
.UE
|
|
|
|
section 4.
|
|
|
|
.IP extensions
|
|
|
|
a comma-separated list of type=value
|
|
|
|
pairs, where the =value portion may be omitted for options not
|
|
|
|
requiring it. An extension prefixed with a '!' is critical
|
|
|
|
(must be supported to be valid), otherwise it's non-critical (optional).
|
|
|
|
.PP
|
|
|
|
LDAP queries are easiest to explain by example.
|
|
|
|
Here's a query that asks ldap.itd.umich.edu for information about
|
|
|
|
the University of Michigan in the U.S.:
|
|
|
|
.RS
|
|
|
|
ldap://ldap.itd.umich.edu/o=University%20of%20Michigan,c=US
|
|
|
|
.RE
|
|
|
|
.PP
|
|
|
|
To just get its postal address attribute, request:
|
|
|
|
.RS
|
|
|
|
ldap://ldap.itd.umich.edu/o=University%20of%20Michigan,c=US?postalAddress
|
|
|
|
.RE
|
|
|
|
.PP
|
|
|
|
To ask a host.com at port 6666 for information about the person
|
|
|
|
with common name (cn) "Babs Jensen" at University of Michigan, request:
|
|
|
|
.RS
|
|
|
|
ldap://host.com:6666/o=University%20of%20Michigan,c=US??sub?(cn=Babs%20Jensen)
|
|
|
|
.RE
|
2005-07-06 07:41:37 +00:00
|
|
|
.SS "wais \- Wide Area Information Servers"
|
2004-11-03 13:51:07 +00:00
|
|
|
.RI wais:// hostport / database
|
|
|
|
.br
|
|
|
|
.RI wais:// hostport / database ? search
|
|
|
|
.br
|
|
|
|
.RI wais:// hostport / database / wtype / wpath
|
|
|
|
.PP
|
|
|
|
This scheme designates a WAIS database, search, or document
|
|
|
|
(see
|
|
|
|
.UR http://www.ietf.org/rfc/rfc1625.txt
|
2005-07-20 07:50:45 +00:00
|
|
|
IETF RFC\ 1625
|
2004-11-03 13:51:07 +00:00
|
|
|
.UE
|
|
|
|
for more information on WAIS).
|
|
|
|
Hostport is the hostname, optionally followed by a colon and port number
|
|
|
|
(the default port number is 210).
|
|
|
|
.PP
|
|
|
|
The first form designates a WAIS database for searching.
|
|
|
|
The second form designates a particular search of the WAIS database
|
|
|
|
.IR database .
|
|
|
|
The third form designates a particular document within a WAIS
|
|
|
|
database to be retrieved.
|
|
|
|
.I wtype
|
|
|
|
is the WAIS designation of the type of the object and
|
|
|
|
.I wpath
|
|
|
|
is the WAIS document-id.
|
|
|
|
.SS "other schemes"
|
|
|
|
There are many other URI schemes.
|
|
|
|
Most tools that accept URIs support a set of internal URIs
|
|
|
|
(e.g., Mozilla has the about: scheme for internal information,
|
|
|
|
and the GNOME help browser has the toc: scheme for various starting
|
|
|
|
locations).
|
|
|
|
There are many schemes that have been defined but are not as widely
|
|
|
|
used at the current time
|
|
|
|
(e.g., prospero).
|
|
|
|
The nntp: scheme is deprecated in favor of the news: scheme.
|
|
|
|
URNs are to be supported by the urn: scheme, with a hierarchical name space
|
|
|
|
(e.g., urn:ietf:... would identify IETF documents); at this time
|
|
|
|
URNs are not widely implemented.
|
|
|
|
Not all tools support all schemes.
|
|
|
|
.SH "CHARACTER ENCODING"
|
|
|
|
.PP
|
|
|
|
URIs use a limited number of characters so that they can be
|
|
|
|
typed in and used in a variety of situations.
|
|
|
|
.PP
|
|
|
|
The following characters are reserved, that is, they may appear in a
|
|
|
|
URI but their use is limited to their reserved purpose
|
|
|
|
(conflicting data must be escaped before forming the URI):
|
|
|
|
.IP
|
|
|
|
; / ? : @ & = + $ ,
|
|
|
|
.PP
|
|
|
|
Unreserved characters may be included in a URI.
|
|
|
|
Unreserved characters
|
|
|
|
include include upper and lower case English letters,
|
|
|
|
decimal digits, and the following
|
|
|
|
limited set of punctuation marks and symbols:
|
|
|
|
.IP
|
2005-07-06 07:41:37 +00:00
|
|
|
\- _ . ! ~ * ' ( )
|
2004-11-03 13:51:07 +00:00
|
|
|
.PP
|
|
|
|
All other characters must be escaped.
|
|
|
|
An escaped octet is encoded as a character triplet, consisting of the
|
|
|
|
percent character "%" followed by the two hexadecimal digits
|
|
|
|
representing the octet code (you can use upper or lower case letters
|
|
|
|
for the hexadecimal digits). For example, a blank space must be escaped
|
|
|
|
as "%20", a tab character as "%09", and the "&" as "%26".
|
|
|
|
Because the percent "%" character always has the reserved purpose of
|
|
|
|
being the escape indicator, it must be escaped as "%25".
|
|
|
|
It is common practice to escape space characters as the plus symbol (+)
|
|
|
|
in query text; this practice isn't uniformly defined
|
|
|
|
in the relevant RFCs (which recommend %20 instead) but any tool accepting
|
|
|
|
URIs with query text should be prepared for them.
|
|
|
|
A URI is always shown in its "escaped" form.
|
|
|
|
.PP
|
|
|
|
Unreserved characters can be escaped without changing the semantics
|
|
|
|
of the URI, but this should not be done unless the URI is being used
|
|
|
|
in a context that does not allow the unescaped character to appear.
|
|
|
|
For example, "%7e" is sometimes used instead of "~" in an http URL
|
|
|
|
path, but the two are equivalent for an http URL.
|
|
|
|
.PP
|
|
|
|
For URIs which must handle characters outside the US ASCII character set,
|
|
|
|
the HTML 4.01 specification (section B.2) and
|
2005-07-20 07:50:45 +00:00
|
|
|
IETF RFC\ 2718 (section 2.2.5) recommend the following approach:
|
2004-11-03 13:51:07 +00:00
|
|
|
.IP 1. 4
|
2005-07-20 07:50:45 +00:00
|
|
|
translate the character sequences into UTF-8 (IETF RFC\ 2279) \(em see
|
2004-11-03 13:51:07 +00:00
|
|
|
.BR utf-8 (7)
|
2005-07-06 07:41:37 +00:00
|
|
|
\(em and then
|
2004-11-03 13:51:07 +00:00
|
|
|
.IP 2.
|
|
|
|
use the URI escaping mechanism, that is,
|
|
|
|
use the %HH encoding for unsafe octets.
|
|
|
|
.SH "WRITING A URI"
|
|
|
|
When written, URIs should be placed inside doublequotes
|
|
|
|
(e.g., "http://www.kernelnotes.org"),
|
|
|
|
enclosed in angle brackets (e.g., <http://lwn.net>),
|
|
|
|
or placed on a line by themselves.
|
|
|
|
A warning for those who use double-quotes:
|
|
|
|
.B never
|
|
|
|
move extraneous punctuation (such as the period ending a sentence or the
|
|
|
|
comma in a list)
|
|
|
|
inside a URI, since this will change the value of the URI.
|
|
|
|
Instead, use angle brackets instead, or
|
|
|
|
switch to a quoting system that never includes extraneous characters
|
|
|
|
inside quotation marks.
|
|
|
|
This latter system, called the 'new' or 'logical' quoting system by
|
|
|
|
"Hart's Rules" and the "Oxford Dictionary for Writers and Editors",
|
|
|
|
is preferred practice in Great Britain and hackers worldwide
|
|
|
|
(see the
|
2006-07-20 16:16:51 +00:00
|
|
|
Jargon File's section on Hacker Writing Style,
|
|
|
|
.IR http://www.fwi.uva.nl/~mes/jargon/h/HackerWritingStyle.html ,
|
2004-11-03 13:51:07 +00:00
|
|
|
for more information).
|
|
|
|
Older documents suggested inserting the prefix "URL:"
|
|
|
|
just before the URI, but this form has never caught on.
|
|
|
|
.PP
|
|
|
|
The URI syntax was designed to be unambiguous.
|
|
|
|
However, as URIs have become commonplace, traditional media
|
|
|
|
(television, radio, newspapers, billboards, etc.) have increasingly
|
|
|
|
used abbreviated URI references consisting of
|
|
|
|
only the authority and path portions of the identified resource
|
|
|
|
(e.g., <www.w3.org/Addressing>).
|
|
|
|
Such references are primarily
|
|
|
|
intended for human interpretation rather than machine, with the
|
|
|
|
assumption that context-based heuristics are sufficient to complete
|
|
|
|
the URI (e.g., hostnames beginning with "www" are likely to have
|
|
|
|
a URI prefix of "http://" and hostnames beginning with "ftp" likely
|
|
|
|
to have a prefix of "ftp://").
|
|
|
|
Many client implementations heuristically resolve these references.
|
|
|
|
Such heuristics may
|
|
|
|
change over time, particularly when new schemes are introduced.
|
|
|
|
Since an abbreviated URI has the same syntax as a relative URL path,
|
|
|
|
abbreviated URI references cannot be used where relative URIs are
|
|
|
|
permitted, and can only be used when there is no defined base
|
|
|
|
(such as in dialog boxes).
|
|
|
|
Don't use abbreviated URIs as hypertext links inside a document;
|
|
|
|
use the standard format as described here.
|
|
|
|
.SH NOTES
|
|
|
|
Any tool accepting URIs (e.g., a web browser) on a Linux system should
|
|
|
|
be able to handle (directly or indirectly) all of the schemes described here,
|
|
|
|
including the man: and info: schemes.
|
|
|
|
Handling them by invoking some other program is fine and in fact encouraged.
|
|
|
|
.PP
|
|
|
|
Technically the fragment isn't part of the URI.
|
|
|
|
.PP
|
|
|
|
For information on how to embed URIs (including URLs) in a data format,
|
|
|
|
see documentation on that format.
|
|
|
|
HTML uses the format <A HREF="\fIuri\fP">
|
|
|
|
.I text
|
|
|
|
</A>.
|
|
|
|
Texinfo files use the format @uref{\fIuri\fP}.
|
|
|
|
Man and mdoc have the recently-added UR macro, or just include the
|
|
|
|
URI in the text (viewers should be able to detect :// as part of a URI).
|
|
|
|
.PP
|
|
|
|
The GNOME and KDE desktop environments currently vary in the URIs they accept,
|
|
|
|
in particular in their respective help browsers.
|
|
|
|
To list man pages, GNOME uses <toc:man> while KDE uses <man:(index)>, and
|
|
|
|
to list info pages, GNOME uses <toc:info> while KDE uses <info:(dir)>
|
|
|
|
(the author of this man page prefers the KDE approach here, though a more
|
|
|
|
regular format would be even better).
|
|
|
|
In general, KDE uses <file:/cgi-bin/> as a prefix to a set of generated
|
|
|
|
files.
|
|
|
|
KDE prefers documentation in HTML, accessed via the
|
|
|
|
<file:/cgi-bin/helpindex>.
|
|
|
|
GNOME prefers the ghelp scheme to store and find documentation.
|
|
|
|
Neither browser handles file: references to directories at the time
|
|
|
|
of this writing, making it difficult to refer to an entire directory with
|
|
|
|
a browsable URI.
|
|
|
|
As noted above, these environments differ in how they handle the info: scheme,
|
|
|
|
probably the most important variation.
|
|
|
|
It is expected that GNOME and KDE
|
|
|
|
will converge to common URI formats, and a future
|
|
|
|
version of this man page will describe the converged result.
|
|
|
|
Efforts to aid this convergence are encouraged.
|
|
|
|
.SH SECURITY
|
|
|
|
.PP
|
|
|
|
A URI does not in itself pose a security threat.
|
|
|
|
There is no general guarantee that a URL, which at one time
|
|
|
|
located a given resource, will continue to do so. Nor is there any
|
|
|
|
guarantee that a URL will not locate a different resource at some
|
|
|
|
later point in time; such a guarantee can only be
|
|
|
|
obtained from the person(s) controlling that namespace and the
|
|
|
|
resource in question.
|
|
|
|
.PP
|
|
|
|
It is sometimes possible to construct a URL such that an attempt to
|
|
|
|
perform a seemingly harmless operation, such as the
|
|
|
|
retrieval of an entity associated with the resource, will in fact
|
|
|
|
cause a possibly damaging remote operation to occur. The unsafe URL
|
|
|
|
is typically constructed by specifying a port number other than that
|
|
|
|
reserved for the network protocol in question. The client
|
|
|
|
unwittingly contacts a site that is in fact running a different
|
|
|
|
protocol. The content of the URL contains instructions that, when
|
|
|
|
interpreted according to this other protocol, cause an unexpected
|
|
|
|
operation. An example has been the use of a gopher URL to cause an
|
|
|
|
unintended or impersonating message to be sent via a SMTP server.
|
|
|
|
.PP
|
|
|
|
Caution should be used when using any URL that specifies a port
|
|
|
|
number other than the default for the protocol, especially when it is
|
|
|
|
a number within the reserved space.
|
|
|
|
.PP
|
|
|
|
Care should be taken when a URI contains escaped delimiters for a
|
|
|
|
given protocol (for example, CR and LF characters for telnet
|
|
|
|
protocols) that these are not unescaped before transmission. This
|
|
|
|
might violate the protocol, but avoids the potential for such
|
|
|
|
characters to be used to simulate an extra operation or parameter in
|
|
|
|
that protocol, which might lead to an unexpected and possibly harmful
|
|
|
|
remote operation to be performed.
|
|
|
|
.PP
|
|
|
|
It is clearly unwise to use a URI that contains a password which is
|
|
|
|
intended to be secret. In particular, the use of a password within
|
2005-06-24 14:44:16 +00:00
|
|
|
the 'userinfo' component of a URI is strongly recommended against except
|
2004-11-03 13:51:07 +00:00
|
|
|
in those rare cases where the 'password' parameter is intended to be public.
|
|
|
|
.SH "CONFORMING TO"
|
|
|
|
.PP
|
2006-07-20 16:16:51 +00:00
|
|
|
.IR http://www.ietf.org/rfc/rfc2396.txt
|
|
|
|
(IETF RFC\ 2396),
|
|
|
|
.I http://www.w3.org/TR/REC-html40
|
|
|
|
(HTML 4.0).
|
2004-11-03 13:51:07 +00:00
|
|
|
.UE
|
|
|
|
.SH BUGS
|
|
|
|
.PP
|
|
|
|
Documentation may be placed in a variety of locations, so there
|
|
|
|
currently isn't a good URI scheme for general online documentation
|
|
|
|
in arbitrary formats.
|
|
|
|
References of the form
|
|
|
|
<file:///usr/doc/ZZZ> don't work because different distributions and
|
|
|
|
local installation requirements may place the files in different
|
|
|
|
directories
|
|
|
|
(it may be in /usr/doc, or /usr/local/doc, or /usr/share, or somewhere else).
|
|
|
|
Also, the directory ZZZ usually changes when a version changes
|
|
|
|
(though filename globbing could partially overcome this).
|
|
|
|
Finally, using the file: scheme doesn't easily support people who dynamically
|
|
|
|
load documentation from the Internet (instead of loading the files
|
|
|
|
onto a local filesystem).
|
|
|
|
A future URI scheme may be added (e.g., "userdoc:") to permit
|
|
|
|
programs to include cross-references to more detailed documentation without
|
|
|
|
having to know the exact location of that documentation.
|
|
|
|
Alternatively, a future version of the filesystem specification may
|
|
|
|
specify file locations sufficiently so that the file: scheme will
|
|
|
|
be able to locate documentation.
|
|
|
|
.PP
|
|
|
|
Many programs and file formats don't include a way to incorporate
|
|
|
|
or implement links using URIs.
|
|
|
|
.PP
|
|
|
|
Many programs can't handle all of these different URI formats; there
|
|
|
|
should be a standard mechanism to load an arbitrary URI that automatically
|
|
|
|
detects the users' environment (e.g., text or graphics, desktop environment,
|
|
|
|
local user preferences, and currently-executing tools) and invokes the
|
|
|
|
right tool for any URI.
|
|
|
|
.SH AUTHOR
|
|
|
|
David A. Wheeler (dwheeler@dwheeler.com) wrote this man page.
|
|
|
|
.SH "SEE ALSO"
|
|
|
|
.BR lynx (1),
|
|
|
|
.BR man2html (1),
|
|
|
|
.BR mailaddr (7),
|
|
|
|
.BR utf-8 (7)
|
|
|
|
.UR http://www.ietf.org/rfc/rfc2255.txt
|
2005-07-20 07:50:45 +00:00
|
|
|
IETF RFC\ 2255.
|
2004-11-03 13:51:07 +00:00
|
|
|
.UE
|