Internet Guide Logo

What is a URL (Uniform Resource Locator)

Last Edit: 01/08/17
bullet Introduction

A Uniform Resource Locator (URL) is a type of Uniform Resource Identifier (URI) and is a extensible and simple means for identifying a World Wide Web resource - the term "resource" is used in a general sense, common resources are images and electronic documents. Tim Berners-Lee, alongside L. Masinter and M. McCahill, defined URLs as a compact string in RFC1738 and outlined the semantics and syntax of URLs in this document. URLs were used since the launch of the World-Wide Web global information initiative (mid-to-late 1990) and the purpose of URLs was always to locate a resource on the Web. The syntax of URLs are a sequence of letters, special characters and digits; Berners-Lee has admitted to regretting using two slashes within URLs. Domain names are used within many URLs: therefore, the Domain Name System (DNS) is heavily relied upon for identifying Web resources. URLs are sometimes referred to as 'web addresses'.

bullet URL Syntax

RFC1738, states that the syntax for URLs is separated into two parts: <scheme>:<scheme-specific-part>. A colon has always been used to separate the scheme from the scheme-specific-part.

<scheme>

The scheme within URLs is typically one of the following: http(s), mailto, telnet, news, gopher, tel, data, ftp, irc and file. The syntax for the scheme-specific-part of a URL differs for each scheme, which is shown by the following scheme examples:

<scheme-specific-part>

The syntax for the scheme-specific-part of a URL is usually as follows - for Internet Protocol (IP) based protocols:

//<user>:<password>@<host>:<port>/<url-path>

All or some of the (above) parts of the scheme-specific-part may be excluded.

bullet HTTP Scheme

The most commonly used scheme for locating web resources is the HTTP (HyperText Transfer Protocol) scheme. The HTTP scheme is defined as the following (RFC1738):

http://<host>:<port>/<path>?<searchpart>

If the port is not included, then the default number used is port:80 (assigned by IANA). User and password are not allowed in the HTTP scheme, and path and searchpart do not need to be included. Therefore, the majority of HTTP scheme URLs tend to be: http://<host>. The host part of the HTTP scheme is typically a domain name: such as http://www.example.com. A full stop "." is used to separate the top-level domain from the second and third-level domains in the domain name. Slashes are used to create a hierarchical structure.