\documentstyle[rfc,fancyheadings,times]{cernman}
\lhead[RFC XXX]{1 Jan 1994}
\chead{Uniform Resource Locators}
\rhead[1 Jan 1994]{RFC XXXX}
\lfoot[\thepage]{Berners-Lee}
\rfoot[Berners-Lee]{\thepage}
\cfoot{}
\pagestyle{fancy}
\begin{document}
% First page special
\thispagestyle{plain}
\begin{tabular*}{\textwidth}{@{}l@{\extracolsep{\fill}}r@{}}
Uniform Resource Locators&Tim Berners-Lee\\
draft-ietf-uri-url-02.{txt,ps}&CERN\\
Expires 1 July 1994&1 Jan 1994\\[0.5cm]
\end{tabular*}

\begin{center}
\Large\bf\sf
Uniform Resource Locators\\[1cm]
\large A unifying syntax for the expression of names and addresses
of objects on the network\\[1cm]
\end{center}
% --------------------------------------------------------


\section*{Status of this memo}This document is an Internet Draft.
Internet Drafts are working documents
of the Internet Engineering Task
Force (IETF), its Areas, and its
Working Groups.  Note that other
groups may also distribute working
documents as Internet Drafts.  \par 
Internet Drafts are working documents
valid for a maximum of six months.
Internet Drafts may be updated, replaced,
or obsoleted by other documents at
any time.  It is not appropriate
to use Internet Drafts as reference
material or to cite them other than
as a "working draft" or "work in
progress".  \par 
Distribution of this document is
unlimited.  Please send comments
to the author as timbl@info.cern.ch.
or to the discussion list  ietf-url@merit.edu.
\section*{Abstract  }Many protocols and systems for document
search and retrieval are currently
in use, and many more protocols or
refinements of existing protocols
are to be expected in a field whose
expansion is explosive.  \par 
These systems are aiming to achieve
global search and readership of documents
across differing computing platforms,
and despite a plethora of protocols
and data formats.   As protocols
evolve, gateways can allow global
access to remain possible. As data
formats evolve, format conversion
programs can preserve global access.
There is one area, however, in which
it is impractical to make conversions,
and that is in the names and addresses
used to identify objects.  This is
because names and addresses of objects
are passed on in so many ways, from
the backs of envelopes to hypertext
objects, and may have a long life.\par 
A common feature of almost all the
data models of past and proposed
systems is something which can be
mapped onto a concept of "object"
and some kind of name, address, or
identifier for that object.  One
can therefore define a set of name
spaces in which these objects can
be said to exist.\par 
Practical systems need to access
and mix objects which are part of
different existing and proposed systems.
\par 
This paper discusses the requirements
on a universal syntax which can be
used to encapsulate a name in any
registered name space.  This will
allow names in different spaces to
be treated in a common way, even
though names in different spaces
have differing characteristics, as
do the objects to which they refer\par 
The universal syntax to objects available
using existing protocols, and may
be extended with technology.  It
makes a recommendation for a generic
syntax, and for specific forms for
"Uniform Resource Locators" (URLs)of
objects accessible using existing
Internet protocols.\par 
The syntax has been in widespread
use by World-Wide Web software since
1990.


\section*{Terms}The objects on the network which
are to be named and addressed include
typically objects which can be retrieved,
and objects which can be searched.
 There is a great variety of other
objects which may support other operations.
We imply nothing about the contents
of objects in this document. Whereas
human-readable documents are currently
the center of interest of the field,
we envisage all aspects discussed
in this paper applying to generalized
objects when systems to handle them
become available. The "object" is
the unit of reference and need not
correspond to any unit of storage.
We refer to objects which can be
searched as "indexes".  We emphasize
that this is the abstract view of
the client, and these objects need
not correspond to physical files
on computers. We refer to the person
who does the retrieval or searching
as the user.  \par 
Within this document, we use the
terms "name" very generally for a
string of characters describing an
object,  whatever its combination
of properties mentioned below.  (The
term usually has a narrower meaning
but we needed some term for the universal
set.).  This uniform syntax applied
to a generic name is known as a Uniform
Resource Identifier (URI). The term
"address" is reserved for an string
which specifies a more or less physical
location.  The term "locator" refers
to a URL as here defined.  URIs which
have a greater persistence than URLs
are referred to as URNs.


\section*{Characteristics  }This section characteristics of various
naming schemes, requirements which
some ofexisting schemes meet, and
requirements for the URL scheme itself.
URLs, as an introduction of and background
for the Recommendations section.
\subsection*{Uses of names and addresses  }A name allows a user, with the help
of a "client" program, to retrieve
or operate on objects via a "server"
program.  A name may be passed for
example:  
\begin{itemize}
\item In communication of any form between
two people, to refer to a document,
or part of a document;
\item As part of the description of a link
associated with a hypertext document;
\item As part of the result of searching
an index.  
\end{itemize}Some typical requirements on a name
which are met to a varying degree
by various schemes are for example
that the name is
\begin{DL}{allow this much space}
\item[Persistent
] A given name will remain
valid as long as it is needed;  
\item[Extensible
] A given naming syntax
will remain valid through the introduction
of new protocols and directory technologies;
\item[Resolvable
] A name will contain enough
information to allow the document
or index to which it refers to be
accessed, perhaps via resolution
into an intermediate, more physical,
name. 
\item[Unique
] Each object can only have
one such name.  The fact that two
such names are different implies
that the objects to which they refer
are different (in some way).  
\item[Unambiguous
] The fact that two names
are identical implies that the objects
named are the same (in some way).
\end{DL}
The syntax discussed is the syntax
of one name, be it a lasting name
or a physical address.  When a directory
server or hypertext link contains
a set of alternative names, then
that is beyond the scope of this
syntax.  Similarly, a syntax for
describing a compound object is outside
the scope of this syntax.  The specific
locator name spaces (defined under
the umbrella of the general syntax)
each meet the requirements above
to a greater or lesser extent.
\subsection*{Current practice  }Current protocols use many different
standards for names. For some protocols,
such as ISO-10163 Search and Retrieve
protocol\lbrack 16\rbrack , the names returned
in a search are only valid during
the session. For others, such as
FTP\lbrack 9\rbrack , they are lasting names which
may be used for object retrieval
at a later time.  Typically, however,
they are not long-lasting names which
are independent of the location of
the object. Such names may be provided
using directory servers such as x.500.
They will refer to the registration,
however formal or informal, of a
object with a particular organisation
or person.  Both hypertext and  manual
references rely on long- lasting
names.  Current names are basically
location specifiers (addresses).
These may be known as Uniform Resource
Locators (URLs). They give the necessary
parts of an address for a reader
to access an information provider
using the given protocol, and ask
for the object required. Examples
of names used by various protocols
include
\subsubsection*{File Transfer Protocol (Postel 1985):}
\begin{itemize}
\item Host name or IP-address
\item \lbrack TCP port\rbrack   
\item \lbrack user name, password\rbrack   
\item Filename
\end{itemize}
\subsubsection*{W.A.I.S. (Kahle 1990)}
\begin{itemize}
\item Host name or IP-address
\item \lbrack TCP port\rbrack   
\item local document id
\end{itemize}
\subsubsection*{Gopher (Alberti 1991)}
\begin{itemize}
\item Host name or IP-address  
\item \lbrack TCP port\rbrack 
\item database name
\item selector string
\end{itemize}
\subsubsection*{HTTP (Berners-Lee 1991)}
\begin{itemize}
\item Host name or IP-address  
\item \lbrack TCP port\rbrack   
\item local object id
\end{itemize}
\subsubsection*{NNTP (Kantor 1986)}
\paragraph{NNTP  group}
\begin{itemize}
\item Group name
\end{itemize}
\paragraph{NNTP article}
\begin{itemize}
\item Host name
\item unique message identifier
\end{itemize}
\subsubsection*{Prospero links (Neuman 1992)}
\begin{itemize}
\item Host name or IP address  
\item \lbrack UDP port\rbrack   
\item  Host specific object name  
\item \lbrack version\rbrack   
\item \lbrack identifier\rbrack *
\end{itemize}
\subsubsection*{x.500 distinguished name}
\begin{itemize}
\item Country  
\item Organisation  
\item Organisational unit  
\item Person  
\item Local object identifier  
\end{itemize}Other systems with their own naming
schemes include BITNET "LISTSERV"
application, FTAM file retrieval,
SQLnetTM remote database search,
proprietary  distributed file systems,
etc. Conventional syntax for writing
these addresses involve various forms
of punctuation to separate these
parts.  This sometimes,  but not
always, allows the naming scheme
to be deduced from the punctuation.
For example, a name of the form xxx.yyy.zz.edu:/pub.aa.bb.cc
often implies anonymous FTP access.
However, there is no well-defined
algorithm for parsing an arbitrary
name, as there is no common syntax.
\subsection*{Expandability  }There will necessarily be a phase
during which lasting names will become
more  common, as the deployment of
directory services increases to the
point where  every user has direct
or indirect access to one.  Even
then, however, one can envisage more
than one competing directory system,
and cases in which physical  names
are still required.  A directory
service takes a lasting name and
reduces it  to a physical address
(or set of addresses) which, though
less useful for lasting reference,
is the only way to actually retrieve
the object. An addressing syntax
is required which will be able to
encompass existing  physical address
spaces, and be extendible to any
future protocols.  This  requires
that it contain an identifier for
the protocol in use. The format of
the rest  of the address will necessarily
depend to a certain extent on the
protocol.
\subsection*{Relevance  }The life of a name is limited by
any information contained within
it which may  become prematurely
invalid. It is therefore necessary
to limit the contents of a name to
the information required for the
operations above.  Other extraneous
information about the object (its
size, data format, authorisation
details, etc.) may in general change
with time and should not be part
of the name.  One might expect such
information to be part of the "header"
of a object, and for protocols to
allow the header information to be
retrieved independently of the objects
themselves.  Any physical address
may be subject to change with time:
hence we encourage the move to lasting
names and directory services.
\subsection*{Uniqueness  }Clearly one requires unambiguous
names in the sense that one name
should refer to only one logical
object. This is the case with all
the addressing schemes in use, whether
they are directory systems or physical
addresses. (The internet addresses
all rely on the domain name (Mockapetris
1987) of the host to achieve this).
However, given that names can be
translated, many apparently different
names  may lead to the same object.
Any object may therefore be referred
to by many  names. One needs to be
able to know whether two objects,
retrieved through  different paths,
are in fact the same object.  It
is suggested that each object have
a unique "official" name. This name
could be stored in the object in
some representations, or stored in
a database  accessible to the server,
for example.  Any references within
that object should be parsed in the
context of the official name.  In
the presence of a  directory service,
the official name will normally be
the registered name of the object.
However, a name in any scheme will
do, so long as it is completely specified.
On systems which do not allow the
name to be stored (such as anonymous
FTP archive sites), a possible ambiguity
will always exist as to whether two
similarly named objects are in fact
the same.  Note that Internet newsgroup
names are unique world-wide, and
news articles carry a unique message
id. In most other cases, however,
there is no guarantee that dereferencing
a URL will work, or that if it does
the object it refers to will in fact
be the object intended.  URLs such
as FTP addresses are transient in
that files may be moved and even
replaced by different files of the
same name.  This disorganisation
may be limited by good server management,
but a naming scheme which is independent
also of internet host name is obviously
preferable.
\subsection*{Readability by people  }This requirement has been put forward
by several people (Clifford Lynch,
Douglas Engelbart among others),
and disputed by others.  The author's
view is that it will be a while before
technology and standardisation have
reached the point at which names
and addresses will be hidden from
human beings. As long as they must
be written on the backs of envelopes
and "cut and pasted" between workstation
windows, there is a strong need for
names to be  
\begin{itemize}
\item Short  
\item  Composed of printable (preferably
non-white) characters  
\item To a certain extent, understadable
by a human being.
\end{itemize}
\subsection*{Structure of names and addresses}A physical address is required in
order for:
\begin{itemize}
\item The user's program to contact the
server;
\item The server to perform the operation
(e.g. search and index, retrieve
a object,  or look up the name) and
return a result;  
\item The user's program to locate an individual
position or element within a returned
object. 
\end{itemize}This suggests that a name be structured,
such that the parts necessary for
these  three operations be separate
and only used by those system elements
which need  those parts. This corresponds
to the basic principle of information
hiding.  In fact,  four parts are
necessary, including the indicator
of the naming scheme to be used:
\begin{itemize}
\item The naming scheme: a registered identifier
for the protocol.  
\item The name of a suitable server. The
format of this part must be well
defined. It will depend on the lower-layer
protocols in use.  Systems which
use widely distributed information,
such as x.500 and NNTP, do not need
this part as each client generally
contacts his nearest server (or a
particular server).
\item Information to be passed to the server.
This may be private to the server,
as all names may be generated and
used by the same server. This part
of the name should be opaque to the
client.
\item Information to be used by the application
once the object has been retrieved.
This part is private to the application
(or, more strictly, the data format)
and so cannot be defined here.
\end{itemize}Both lasting names and physical addresses
often share a hierarchical structure.
This follows often from the organisation
of the system. From the naming point
of view, it has the advantage that
a reference in one object to another
object need not include that part
of the structure which is common
to both names.
\subsection*{Choices for a universal syntax }The requirements above leave little
room for choice save for the order
and punctuation of the elements of
an address.  It is only reasonable
for the order of writing of the parts
to be consistently from left to right
(or right to left) with increasing
specificity.  Punctuation schemes
fall into two categories (Huitema
1991): tagged schemes in which field
are given names, and fields which
use special characters and field
order. The latter tend to be more
compact schemes.
\begin{verbatim}
	protocol: aftp host: xxx.yyy.edu path:  

        /pub/doc/README

        PR=aftp; H=xx.yy.edu; PA=/pub/doc/README;

        PR:aftp/xx.yy.edu/pub/doc/README
  
        /aftp/xx.yy.edu/pub/doc/README

\end{verbatim}
Fig 1. Some alternative tagged and
untagged representations \par 
The choice of special symbols for
punctuation tends to be a matter
of taste. It is easier to read  addresses
whose symbols correspond to those
of one's favourite operating system.
A variety of symbols is needed so
that when a name is abbreviated it
is possible to tell which parts have
been omitted. \par 
The  recommendation below uses special
characters in order to achieve a
compact name, and uses where possible
punctuation symbols established in
the internet or unix community.\par 
The choice of escape character for
introducing representations of non-allowed
characters also tends to be a matter
of taste. An ANSI standard exists
in the C language, using the back-slash
character "\char'134 ". The use of this character
on unix command lines, however, can
be a problem as it is interpreted
by many shell programs, and would
have itself to be escaped. \par 
There is a conflict between the need
to be able to represent many characters
including spaces within a URL directly,
and the need to be able to use a
URL in environments which have limited
character sets or in which certain
characters are prone to corruption.
This conflict has been resolved by
use of an hexadecimal escaping method
which may be applied to any characters
forbidden in a given context. When
URLs are moved between contexts,
the set of characters escaped may
be enlarged or reduced unambiguously.\par 
The use of multiple white space characters
is discouraged  in URLs to be printed
or sent by electronic mail.  This
is because of the frequent introduction
of extraneous white space when lines
are wrapped by systems such as mail,
or sheer necessity of narrow column
width, and because of the  inter-conversion
of various forms of white space which
occurs during character code conversion
and the transfer of text between
applications.


\section*{Recommendations}This section describes the syntax
for "Uniform Resource Locators" (URLs):
that is, basically physical addresses
of objects which are retrievable
using protocols already deployed
on the net.  The generic syntax provides
a framework for new schemes for names
to be resolved using as yet undefined
protocols.  \par 
The syntax is described in two parts.
Firstly, we give the syntax rules
of a completely specified name; secondly,
we give the rules under which parts
of the name may be omitted in a well-defined
context.
\subsection*{Full form  }A complete URL consists of a naming
scheme specifier followed by a string
whose format is a function of the
naming scheme. For locators of information
on the internet, a common syntax
is used for the  IP address part.
A BNF description of the URL syntax
is given in an a later section. The
components are as follows.  Fragment
identifiers and partial URLs are
not involved in the basic URL definition.
\subsection*{Scheme  }Within the URL of a object, the first
element is the name of the scheme,
separated from the rest of the object
by a colon. The rest of the URL follows
the colon in a format depending on
the scheme.
\subsubsection*{Internet protocol parts  }Those schemes which refer to internet
protocols have a common syntax for
the rest of the object name. This
starts with a double slash "//" to
indicate its presence, and continues
until the following slash "/".  Within
that section are  
\begin{DL}{allow this much space}
\item[An optional user name,
] if this must
be quoted to the server, followed
by  a commercial at sign "@".  (Use
of this field is discouraged. Provision
of encoding a password after the
user name, delimited by a colon,
could  be made but obviously is only
useful when the password is public,
in  which case it should not be necessary,
so that is also discouraged.)  
\item[The internet domain name
] of the host
in RFC1037 format (or, optionally
and less advisably, the IP address
as a set of four decimal digits)
\item[The port number,
] if it is not the
default number for the protocol,
is given in decimal notation after
a colon.
\item[Path
] The rest of the locator is known
as the "path". It may define details
of how the client should communicate
with the server, including information
to be passed transparently to the
server without any processing by
the client.  
\end{DL}
The path is interpreted in a manner
dependent on the protocol being used.
However, when it contains slashes,
these must imply a hierarchical structure.
\subsection*{Encoding prohibited characters}When a system uses a local addressing
scheme, it is useful to provide a
mapping from local addresses into
URLs so that references to objects
within the addressing scheme may
be referred to globally, and possibly
accessed through gateway servers.\par 
Any mapping scheme may be defined
provided it is unambiguous, reversible,
and provides valid URLs. It is recommended
that where hierarchical aspects to
the local naming scheme exist, they
be mapped onto the hierarchical URL
path syntax in order to allow the
partial form to be used.  \par 
The following encoding method shall
be used for mapping WAIS, FTP, Prospero
and Gopher addresses onto URLs. Where
the local naming scheme uses ASCII
characters which are not allowed
in the URL,  these may be represented
in the URL by a percent sign "\%"
followed by two hexadecimal digits
(0-9, A-F) giving the ISO Latin 1
code for that character.  Character
codes other than those allowed by
the syntax shall not be used in a
URL. \par 
The same encoding method may be used
for encoding characters whose use,
although technically allowed in a
URL, would be unwise due to problems
of corruption by imperfect gateways
or misrepresentation due to the use
of variant character sets, or which
would simply be awkward in a given
environment.  As a \% sign always
indicates an encoded character, a
URL may be made safer simply by encoding
any characters considered unsafe,
while leaving already encoded characters
still encoded.  \par 
(Note: If a new naming scheme is
introduced which encodes binary data
as opposed to text, then a more compact
encoding such as pure hex or base
64 would be more appropriate.)  \par 
The same considerations apply to
mapping local fragment identifiers
onto the fragmentid part of a URL.


\section*{Specific Schemes  }The mapping for some existing standard
and experimental protocols is outlined
in the BNF syntax definition .  Notes
on particular protocols follow.
\subsection*{HTTP  }The HTTP protocol specifies that
the path is handled transparently
by those who handle URLs, except
for the servers which de-reference
them.   The path is passed by the
client to the server with any request,
but is not otherwise understood by
the client.  The fragmentid part
is not sent with the request.  The
search part, if present, is sent.
Spaces in URLs should be escaped
for transmission in HTTP.
\subsection*{FTP  }The ftp: prefix indicates a file
which is to be picked up from the
file system of the given host. The
FTP protocol is used. The port number
if given gives the port of the FTP
server if not the FTP default. (A
client may in practice use local
file access to retrieve objects which
are available though more efficient
means such as local file open or
NFS mounting, where this is available
and equivalent). \par 
 The syntax allows for the inclusion
of a user name and even a password
for those systems which do not use
the anonymous FTP convention. The
default, however, if no user or password
is supplied, will be to use that
convention, viz. that the user name
is "anonymous" and the password the
user's mail address. \par 
The adoption of a unix-style syntax
involves the conversion into non-unix
local forms by either the client
or server. Some non-unix servers
do this, but clients wishing to access
sites which do not have unix-style
naming will need certain algorithms
to enable  other file systems to
be identified and treated.  Client
software may also have to be flexible
in terms of the sequence of FTP commands
used with different varieties of
server.  In view of a tendency for
file systems to look increasingly
similar, it was felt that the URL
convention should not be weighed
down by extra mechanisms for identifying
these cases. \par 
The data format of a file can only,
in the general FTP case, be deduced
from the name, normally the suffix
of the name. This is not standardized.
An alternative is for it to be transferred
in information outside the URL. The
transfer mode (binary or text) must
in turn be deduced from the data
format.  It is recommended that conventions
for suffixes of public archives be
established, but it outside the scope
of this paper.
\subsection*{News  }The news locators refer to either
news group names or article message
identifiers which must conform to
the rules of RFC 850.  A message
identifier may be distinguished from
a news group name by the presence
of the commercial at "@" character.
These rules imply that within an
article, a reference to a news group
or to another article will be a valid
URL (in the partial form). \par 
A news URL may be dereferenced using
NNTP or using any other protocol
for the conveyance of usenet news
articles.
\subsubsection*{Note1: }Among URLs the news: URLs are anomalous
in that they are location-independent.
They are unsuitable as URN candidates
because the NNTP architecture relies
on the expiry of articles and therefore
a small number of articles being
available at any time.  When a news:
URL is quoted, the assumption is
that the reader will fetch the article
or group from his or her local news
host.  News host names are NOT part
of news URLs.
\subsubsection*{Note 2:}An outstanding problem is that the
message identifier is insufficient
to allow the retrieval of an expired
article, as no algorithm exists for
deriving an archive site and file
name. The addition of the date and
news group set to the article's URL
would allow this if a directory existed
of archive sites by news group. Suggested
subject of study in conjunction with
NNTP WG.  Further extension possible
may be to allow the naming of subject
threads as addressable objects.
\subsection*{NNTP}This is an alternative form of reference
for news articles, specifically to
be used with NNTP servers, and particularly
those incomplete server implementations
which do not allow retrieval by message
identifier.\par 
The news server name, newsgroup name,
and index number of an article within
the newsgroup on that particular
server are given. 
\subsubsection*{Note1.}This form of URL is not of global
accessiablity, as typically NNTP
servers only allow access from local
clients.  This form or URL should
not be quoted outside this local
area.  It should not be used within
news articles for wider circulation
than the one server.
\subsection*{WAIS  }The current WAIS implementation public
domain requires that a client know
the "type" of a object prior to retrieval.
This value is returned along with
the internal object identifier in
the search response. It has been
encoded into the path part of the
URL in order to make the URL sufficient
for the retrieval of the object.
Within the WAIS world, names do not
of course not need to be prefixed
by "wais:"  (by the partial form
rules).
\subsection*{Prospero  }The Prospero (Neuman, 1991) directory
service is used to resolve the URL
yielding an access method for the
object (which can then itself be
represented as a URL if translated).
The host part contains a host name
or internet address.  The port part
is optional.  \par 
The path part contains a host specific
object name and an optional version
number. If present, the version number
is seperated from the  host specific
object name by the characters "\%00"
(percent zero zero), this being an
escaped string terminator (null).
 External Prospero links are represented
as URLs of the underlying access
method and are not represented as
Prospero URLs.
\subsection*{Gopher  }The first character of the URL path
part (after the initial single slash)
is a single-character "type" field
which is that used by the Gopher
protocol.  The rest of the path is
the "selector string", with disallowed
characters encoded. Note that some
selector strings begin with a copy
of the gopher type character, in
which case that character will occur
twice consecutively in the URL. If
the type character and selector are
omitted, the type defaults to "1".
Gopher links which refer to non-Gopher
protocols are represented directly
as URLs of the underlying access
method and are not represented as
Gopher URLs.
\subsection*{Mailto}This allows a URL to specify an RFC822
addr-spec mail address.  Note that
use of \% , for example as used in
forming a gatewayed mail address,
requires conversion to \%25 in a URL.\par 
This semantics may be considered
to be that the object referred to
by the mailto: URL is the set of
messages sent to or from that address.
There is no algorithm to retrieve
this set, but the SMTP protocol allows
messages to be added to it, and any
given user may be aware of a subset
of its members.
\subsection*{Telnet, rlogin, tn3270  }The use of URLs to represent interactive
sessions is a convenient extension
to their uses for objects.  This
allows access to information systems
which only provide an interactive
service, and no information server.
As information within the service
cannot be addressed individually
or, in general, automatically retrieved,
this is a less desirable, though
currently common, solution.
\subsection*{x500  }The mapping of x500 names onto URLs
is not defined here. A decision is
required as to whether "distinguished
names" or "user friendly names" (ufn),
or both, should be allowed. If any
punctuation conversions are needed
from the adopted x500 representation
(such as the use of slashes between
parts of a ufn) they must be defined.
This is a subject for study.
\subsection*{WHOIS  }This prefix describes the access
using the "whois++" scheme in the
process of definition. The host name
part is the same as for other IP
based schemes. The path part can
be either a whois handle for a whois
object, or it can be a valid whois
query string. This is a subject for
further study.
\subsection*{Network Management Database  }This is a subject for study.
\subsection*{Registration of naming schemes  }A new naming scheme may be introduced
by defining a mapping onto a conforming
URL syntax, using a new scheme identifier.
Experimental scheme identifiers may
be used by mutual agreement between
parties, and must start with the
characters "x-".  The scheme name
"urn:" is reserved for the work in
progress on a scheme for more persistent
names.  Therefore URNs (Names) and
URLs (Locators)  be distinguishable.
An object which is either a URL or
a URN is known as a URI (Identifier).\par 
It is proposed that the Internet
Assigned Numbers Authority (IANA)
perform the function of registration
of new schemes. Any submission of
a new URI scheme must include a definition
of an algorithm for the retrieval
of any object within that scheme.
The algorithm must take  the URI
and produce either a set of URL(s)
which will lead to the desired object,
or the object itself, in a well-defined
or determinable format.\par 
It is recommended that those proposing
a new scheme demonstrate its utility
and operability by the provision
of a gateway which will provide images
of objects in the new scheme for
clients using an existing protocol.
If the new scheme is not a locator
scheme, then the properties of names
in the new space should be clearly
defined.  It is likewise recommended
that, where a protocol allows for
retrieval by URI, that the client
software have provision for being
configured to use specific gateway
locators for indirect access through
new naming schemes.


\section*{BNF syntax}This is a BNF-like description of
the Uniform Resource Locator syntax.
A vertical  line "$|$"  indicates alternatives,
and \lbrack brackets\rbrack   indicate optional
parts.  Spaces are representated
by the word "space", and the vertical
line character by "vline".   Single
letters stand for single letters.
All words of more than one letter
below are entities described somewhere
in this description.  \par 
The current IETF URI working group
prefereence  is for the prefiexedurl
production. (Nov 1993. July 93: url).\par 
The "generic" production gives a
higher level parsing of the same
URLs as the other productions.  The
"national" and "punctuation" characters
fo not appear in any productions
and therefore may not appear in URLs.\par 
The "afsaddress" is left in as historical
note, but is not a url production
\begin{DL}{allow this much space}
\item[prefixedurl
] u r l : url
\item[fragmentaddress
] uri \lbrack  \# fragmentid
\rbrack   
\item[uri
] url $|$ generic
\item[ur l
] generic $|$ httpaddress $|$ ftpaddress
$|$ newsaddress $|$ nntpaddress $|$ prosperoaddress
$|$ telnetaddress  $|$ gopheraddress
$|$ waisaddress $|$ mailtoaddress 
\item[generic
] scheme :  path \lbrack  ? search
\rbrack   
\item[scheme
] ialpha  
\item[httpaddress
] h t t p :   / / hostport
\lbrack   / path \rbrack  \lbrack  ? search \rbrack   
\item[ftpaddress
] f t p : / / login / path
\item[afsaddress
] a f s : / / cellname /
path  
\item[newsaddress
] n e w s : groupart  
\item[nntpaddress
] n n t p : group /  digits
\item[mailtoaddress
] m a i l t o : : xalphas
@ hostname
\item[waisaddress
] waisindex $|$ waisdoc 
\item[waisindex
] w a i s : / / hostport
/ database \lbrack  ? search \rbrack   
\item[waisdoc
] w a i s : / / hostport /
database / wtype  / path
\item[groupart
] * $|$ group $|$ article  
\item[group
] ialpha \lbrack  . group \rbrack   
\item[article
] xalphas @ host  
\item[database
] xalphas  
\item[wtype
] xalphas  
\item[prosperoaddress
] prosperolink  
\item[prosperolink
] p r o s p e r o : /
/ hostport / hsoname \lbrack  \%  0 0 version
\lbrack  attributes \rbrack  \rbrack   
\item[hsoname
] path  
\item[version
] digits  
\item[attributes
] attribute \lbrack  attributes
\rbrack   
\item[attribute
] alphanums  
\item[telnetaddress
] t e l n e t : / / login
\item[gopheraddress
] g o p h e r : / / hostport
\lbrack / gtype  \lbrack  selector \rbrack  \rbrack  \lbrack  ? search
\rbrack   
\item[login
] \lbrack  user \lbrack  : password \rbrack  @ \rbrack  hostport
\item[hostport
] host \lbrack  : port \rbrack   
\item[host
] hostname $|$ hostnumber  
\item[cellname
] hostname  
\item[hostname
] ialpha \lbrack   .  hostname \rbrack 
\item[hostnumber
] digits . digits . digits
. digits
\item[port
] digits  
\item[selector
] path  
\item[path
] void $|$  xpalphas  \lbrack   / path
\rbrack   
\item[search
] xalphas \lbrack  + search \rbrack   
\item[user
] xalphas 
\item[password
] xalphas
\item[fragmentid
] xalphas  
\item[gtype
] xalpha  
\item[xalpha
] alpha $|$ digit $|$ safe $|$ extra
$|$ escape  
\item[xalphas
] xalpha \lbrack  xalphas \rbrack   
\item[xpalpha
] xalpha $|$ +  
\item[xpalphas
] xpalpha \lbrack  xpalpha \rbrack   
\item[ialpha
] alpha \lbrack  xalphas \rbrack 
\item[alpha
] a $|$ b $|$ c $|$ d $|$ e $|$ f $|$ g $|$
h $|$ i $|$ j $|$ k $|$ l $|$ m $|$ n $|$ o  $|$
p $|$ q $|$ r $|$ s $|$ t $|$ u $|$ v $|$ w $|$ x
$|$ y $|$ z $|$ A $|$ B $|$ C  $|$ D $|$ E $|$ F
$|$ G $|$ H $|$ I $|$ J $|$ K $|$ L $|$ M $|$ N $|$
O $|$ P $|$  Q $|$ R $|$ S $|$ T $|$ U $|$ V $|$
W $|$ X $|$ Y $|$ Z  
\item[
] 0 $|$1 $|$ 2 $|$ 3 $|$ 4 $|$ 5 $|$ 6 $|$ 7 $|$ 8
$|$ 9  
\item[safe
] \$ $|$ - $|$ \_ $|$ @ $|$ . $|$ \&  $|$ + $|$
-
\item[extra
] ! $|$ * $|$ " $|$  ' $|$ ( $|$ ) $|$ :
$|$ ; $|$ , $|$ space  
\item[escape
] \% hex hex  
\item[hex
] digit $|$ a $|$ b $|$ c $|$ d $|$ e $|$ f
$|$ A $|$ B $|$ C $|$ D $|$ E $|$ F  
\item[national
] \{ $|$ \} $|$ vline $|$ \lbrack  $|$ \rbrack  $|$
\char'134  $|$ {\char94} $|$ \~  
\item[punctuation
] $<$ $|$ $>$
\item[digits
] digit \lbrack  digits \rbrack   
\item[alphanum
] alpha $|$ digit  
\item[alphanums
] alphanum \lbrack  alphanums \rbrack 
\item[void
]
\end{DL}


\section*{Security considerations  }The URL scheme does not in itself
pose a security threat. Users should
beware that there is no general guarantee
that a URL which at one time points
to a given object continues to do
so, and does not even at some later
time point to a different object
due to the movement of objects on
servers.\par 
The use of URLs containing passwords
is clearly unwise.


\section*{Conclusion}A need has been demonstrated, and
a number of requirements have been
stated for uniform resource locators
(URLs). A scheme has been proposed
which builds on existing conventions
to define a syntax for URLs.  This
scheme has been in serious use by
World-Wide Web (W3) initiative since
1991.  Adoption of the scheme in
correspondence, standards and software
will ease the use of references to
on-line information in a flexible
way as the coming information age
arrives.


\section*{Acknowledgements  }This paper builds on the basic W3
design and much discussion of these
issues by many people on the network.
The discussion was particularly stimulated
by articles by Clifford Lynch (1991),
Brewster Kahle (1991) and Wengyik
Yeong (1991b). Contributions from
John Curran (NEARnet), Clifford Neuman
(ISI) Ed Vielmetti (MSEN) and later
the IETF URL BOF and URI working
group have been incorporated into
this issue of this paper.  \par 
The draft url4  (Internet Draft 00)
was generated from url3 following
discussion and overall approval of
the URL working group on 29 March
1993. The paper url3 had been generated
from udi2 in the light of discussion
at the UDI BOF meeting at the Boston
IETF in July 1992. Draft url4 was
Internet Draft 00. Draft url5 incorporated
changes suggested by Clifford Neuman,
and draft url6 (ID 01) incorporated
character group changes and a few
other fixes defined by the IETF URI
WG in submitting it as a proposed
standard.  URL7 (Internet Draft 02)
incorporated changes introduced at
the Amsterdam IETF and refined in
net discussion.


\chapter{Appendices}The following are not formally part
of this document.


\section{Fragment-id  }This represents a part of, fragment
of, or a sub-function within, an
object or object. Its syntax and
semantics are defined by the application
responsible for the object, or the
specification of the content type
of the object. The only definition
here is of the allowed characters
by which it may be represented in
a URL.  \par 
The fragment-id follows the URL of
the whole object from which it is
separated by a hash sign (\#).  If
the fragment-id is void, the hash
sign may be omitted: A void fragment-id
with or without the hash sign means
that the URL refers to the whole
object.\par 
While this hook is allowed for identification
of fragments, the question of addressing
of parts of objects, or of the grouping
of objects and relationship between
contined and containing objects,
is not addressed by this object.\par 
This object does not address the
question of objects which are different
versions of a "living" object, nor
of expressing the relationships between
different versions and the living
object.


\section{Partial form  }In a certain limited set of cases,
generally within a certain application,
it may be useful to pass only a section
of the URL. Within a object whose
URL is well defined, the URL of another
object may be given in abbreviated
form, where parts of the two URLs
are the same. This allows objects
within a group to refer to each other
without requiring the space for a
complete reference, and it incidentally
allows the group of objects  to be
moved without changing any references.
This is not discussed in detail here,
it is only mentioned so that the
characters required by the technique
be reserved for that purpose.  It
must be emphasised that when a reference
is passed in anything other than
a well controlled context, the full
form must always be used.  \par 
The partial form relies on a property
of the URL syntax that certain characters
("/") and certain path elements ("..",
".") have a significance reserved
for representing a hierarchical space,
and must be recognised as such by
both clients and servers.  \par 
A partial form can be distinguished
from a full form in that a full form
must have a colon and that colon
must occur before any slash characters.\par 
The rules for the use of a partial
name are:  
\begin{itemize}
\item If the scheme parts  are different,
the whole absolute locator must be
given. Otherwise, the scheme is omitted,
and:
\item If the host and/or port parts are
the different, the host, port name
and all the rest of the locator must
be given.
\item If the access and host parts are
the same, then the path may be given
in absolute (fully qualified) or
relative form. Within the path:
\item If a leading slash is present, the
path is absolute. Otherwise, a relative
path is interpreted as follows: 
\item The last part of the path of the
context locator (anything following
the rightmost slash) is removed,
and the given partial URL appended
in its place.
\item Within the result,  all occurrences
of "xxx/../"  or "/." are recursively
removed, where xxx, ".." and "."
are complete path elements.
\end{itemize}Note:  If a path of the context locator
end in slash, partial URLs will be
treated differently to their treatment
with respect to the same path without
a slash.   Using a trailing slash
on a directory name is not therefore
recommended.  The signifcance of
a trailing slash may be considered
as that of the locator of a file
with void name within that  directory.


\section{Wrappers for URIs in plain text}This section does not formally form
part of the URL specification.\par 
URIs, including URLs, will ideally
be transmitted though protocols which
accept them and data formats which
define a context for them.  However,
in practice nowadays there are many
occasions when URLs are included
in plain ASCII non-marked-up text
such as electronic mail and usenet
news messages.\par 
In this case, it is convenient to
have a separate wrapper syntax to
define delimiters which will enable
the human or automated reader to
recognize that the URI is a URI.\par 
The recommendation is that the angle
brackets (less than and greater than
signs) of the ASCII set be used for
this purpose.\par 
These wrappers do not form part of
the URL, are not mandatory, and should
not be used in contexts (such as
SGML parameters, HTTP requests, etc)
in which delimiters are already specified.
\subsubsection{Example}
\begin{verbatim}		Yes, Jim, I found it under <ftp://info.cern.ch/pub> but
		you can probably pick it up from <ftp://ds.internic.net/rfc>.


\end{verbatim}


\chapter{References}
\begin{DL}{allow this much space}
\item[Alberti, R., et.al.  (1991)
] "Notes
on the Internet Gopher  Protocol"
University of Minnesota, December
1991,  $<$ftp://boombox.micro.umn.edu/pub/gopher/gopher\_protocol$>$
. See also  $<$gopher://gopher.micro.umn.edu/00/Information
About Gopher/About Gopher$>$
\item[Berners-Lee, T ., (1991)
] "Hypertext
Transfer Protocol (HTTP)" , CERN,
December 1991,  $<$ftp://info.cern.ch/pub/www/doc/http-spec.txt$>$
\item[Crocker 
]"Standard for ARPA Internet
Text Messages" . David H. Crocker,
RFC822, 
\item[Davis, F, et  al., (1990)
] "WAIS Interface
Protocol: Prototype  Functional Specification",
Thinking Machines Corporation,  April
23, 1990  $<$ftp://quake.think.com/pub/wais/doc/protspec.txt$>$
\item[International Standards Organization,
(1991)
] Information and  Documentation
- Search and Retrieve Application
Protocol  Specification for open
Systems Interconnection, ISO-10163
\item[Huitema, C., (1991)
] "Naming: strategies
and techniques",  Computer Networks
and ISDN Systems 23 (1991) 107-110.
\item[Kahle, Brewster, (1991) 
]"Document
Identifiers,  or  International Standard
Book Numbers for the Electronic Age",
$<$ftp://quake.think.com/pub/wais/doc/doc-ids.txt$>$
\item[Kantor, B., and Lapsley, P., (1986)
]
"A proposed standard for  the stream-based
transmission of news", Internet RFC-977,
February 1986. $<$ftp://ds.internic.net/rfc/rfc977.txt$>$
\item[Lynch, C., Coallition for Networked
Information: (1991)  
]"Workshop on
ID and Reference Structures for Networked
Information", November 1991. See
$<$wais://quake.think.com/wais-discussion-archives?lynch$>$
\item[Mockapetris, P., (1987)
] "Domain names
+ concepts and  facilities", RFC-1034,
USC-ISI, November 1987,  $<$ftp://ds.internic.net/rfc/rfc1034.txt$>$
\item[Neuman, B. Clifford, (1992)
] "Prospero:
A Tool for Organizing  Internet Resources",
Electronic Networking: Research,
Applications and Policy, Vol 1 No
2, Meckler Westport CT  USA.  See
also  $<$ftp://prospero.isi.edu/pub/prospero/oir.ps$>$
\item[Postel, J. and Reynolds, J. (1985)
]
"File Transfer Protocol  (FTP)",
Internet RFC-959, October 1985. $<$ftp://ds.internic.net/rfc/rfc959.txt$>$
\item[Yeong, W., (1991a)
] "Towards Networked
Information Retrieval",  Technical
report 91-06-25-01, June 1991, Performance
Systems International, Inc.  $<$ftp://uu.psi.com/wp/nir.txt$>$
\item[Yeong, W., (1991b),
] "Representing
Public Archives in the  Directory",
Internet Draft, November 1991, now
expired.
\end{DL}

\section{Author's address  }
\begin{verbatim}
			   Tim Berners-Lee  
		Address:   World-Wide Web project  
			   CERN,
			   1211 Geneva 23,
		           Switzerland
 
	    	Telephone: +41 (22)767 3755
		Fax:       +41 (22)767 7155 
		Email:     timbl@info.cern.ch

\end{verbatim}


\end{document}