Internet Guide Logo



HTML stands for the 'Hypertext Markup Language' and was the first markup language used to create web documents (known as webpages) (located at websites). The traditional use of a markup language is as a document language: widely used by book publishing companies to automate the process of editing and publishing their books. Markup languages use 'elements' to distinguish two things:

  1. Annotation
  2. Text

HTML was created for the World Wide Web by it's inventor: Sir Tim Berners-Lee, and is standardised by the World Wide Web Consortium (W3C). The World Wide Web is a document system that is comprised of hypertext documents (webpages), and originally, HTML was the only markup language used to design webpages. Webpages are interconnected by an element, called a hyperlink, to create a 'web' of connected documents referred to as the World Wide Web.

The World Wide Web uses a client-server model: a client (browser) application retrieves webpages from servers (computers) connected to the Internet. Browsers do not only retrieve webpages from the Internet - using Internet protocols like HTTP and TCP - they also use the elements of a markup language to render the document into a visual display. Style sheet languages, like CSS, have also been developed to help format the appearance of markup languages.

While HTML is the original markup language created for the World Wide Web, other markup languages, like XML and XHTML, have been developed as an alternative. The original version of HTML was inspired by the Standard Generalized Markup Language (SGML). However, HTML5 abandoned HTML's link with SGML, and began to use XML syntax instead.

As of 2015, the World Wide Web Consortium (W3C) - the standards organization who develop HTML - do not plan to further enhance HTML. Instead, W3C plan to update XHTML: XHTML is a form of HTML, but is an application of XML, rather than SGML. Therefore, while the World Wide Web will always include documents written in the original HTML, XML is slowly, in a 'roundabout way', replacing the original HTML.


The history of HTML is intertwined with the origins of the World Wide Web. Tim Berners-Lee, whilst working at CERN in the late 1980's, proposed a hypertext system that would be Internet-based. The World Wide Web was specified as a hypertext document system, and the hypertext documents would be written in a markup language that Berners-Lee originally referred to as: HTML Tags. The World Wide Web was launched in 1991, and HTML was initially released with eighteen elements.

While developing HTML, it is generally accepted that Berners-Lee was heavily influenced by the Standard Generalized Markup Language (SGML). HTML includes a range of elements found within SGML, and in theory, it is viewed as a SGML-based language. SGML, itself, is a markup language that evolved from IBM's Generalized Markup Language (GML). Therefore, HTML's syntax has been derived from both SGML and GML.

One of the earliest specifications of HTML was defined in 1993, the specification documents was named: A Representation of Textual Information and MetaInformation for Retrieval and Interchange. The introduction of this document is displayed below:

A Representation of Textual Information and MetaInformation for Retrieval and Interchange

The authors of the document were Tim Berners-Lee and Dan Connolly, and it was published in June, 1993. Since 1993, HTML has evolved to include more elements and attributes. HTML has been developed, since 1995, by a number of organisations and working groups:

  1. Web Hypertext Application Technology Working Group
  2. World Wide Web Consortium
  3. Internet Engineering Task Force

As of 2015, HTML has been released in the following draft and standardised versions.

  1. HTML Tags: released in October, 1991. (draft version)
  2. HTML DTD: released in June, 1992. (draft version)
  3. HTML DTD 1.1: released in November, 1992. (draft version)
  4. HTML 1.0: released in June, 1993. (draft version)
  5. HTML 2.0: released on the 24th of November, 1995.
  6. HTML 3.0: released in April, 1995.
  7. HTML 3.2: released in January, 1997. (draft version)
  8. HTML 4.0: released in December, 1997.
  9. HTML 4.01: released in December, 1999.
  10. HTML5: released in October, 2014

The last SGML inspired version of HTML was version 4.01, and from the year 2000, the World Wide Web Consortium (W3C) stopped developing HTML and focused it's development resources upon XML and XHTML.

XML and XHTML: Future of HTML

In 1998, XML 1.0 was released: XML was a new markup language, developed by a range of W3C working groups, and it was viewed as an alternative and possible replacement for HTML. XML did not replace HTML, but it's syntax was used to create the Extensible Hypertext Markup Language (XHTML).

The development of XHTML was inspired by the publication of a W3C Working Draft titled 'Reformulating HTML in XML'; released in 1998. XHTML 1.0 was released in 2000. HTML had not been updated since 2000 (HTML 4.01), and the World Wide Web Consortium (W3C) has focused all future developments upon XHTML.

By 2010, HTML and XHTML were commonly used on the World Wide Web. HTML5 was developed as an attempt to create a single markup language that encompasses and includes both HTML and XHTML. When HTML5 was released, it became the first version of HTML to become an application of XML - instead of SGML - and aims to increase the interoperability of future web content.

The Web Hypertext Application Technology Working Group (WHATWG) was formed in 2004 to encourage the development of web technologies and to evolve HTML. WHATWG played an important role in the development of HTML5: creating a range of proposals and papers for W3C working groups to vote upon. Members of WHATWG included the Mozilla Foundation and Opera Software (browser developers).

Structure of HTML

HTML is a markup language that is used to create a hypertext document (webpage). The World Wide Web Consortium (W3C) is responsible for creating a standardised version of HTML. HTML is written with tags, tags include the name of a HTML element: the element is generated when the document is parsed by a web browser. HTML parsers are a software component, within a browser, that interpret the element name within tags to build a data structure that is rendered into a visual display.

Typically a HTML tag will have a beginning and an end <tag></tag>: tags are written within angle brackets, and content is included between the start and the end of a tag. As stated, tags contain an element name: elements are the 'building blocks' of a webpage: allowing objects and images to be embedded into a webpage to create a structured document.

Elements can be categorised into two broad kinds: void elements and normal element. Void elements only have a start tag, and therefore can not contain any content. A prime example of a void element is the image tag (which loads a image into a webpage):

Void Element: <img src="picture.gif">

Normal elements have a beginning <tag> and an end </tag> tag . Normal elements allow content and additional tags to be inserted inbetween the start and the end of the tag. A prime example of a normal element is a paragraph tag:

Normal Element: <p>Text inserted in-between a normal element</p>

Void and normal elements can be classified as being:

  1. Structural element: either <html>, <head> or <body> tag
  2. Header element: an example would be a <meta> tag
  3. Body element: an example would be a <p> tag

HTML tags can include an attribute: an attribute will modify the element when it is parsed by a web browser. Consider the alt attribute: which modifies an image element by combining text with the image. When a user hovers a cursor on-top of the image: the browser will display text - which should describe what the image represents. Without the alt attribute: a browser would only display the image.

The first tag written in a HTML document is the structural tag: <html>. Before the structure tag is written in a HTML document: the document can include a document type declaration. The purpose of a document type declaration is to declare to the client reading the document (browser) what version of HTML the document is written in: so that the browser can render the document in the required standards mode.

Shown below, is the basic structure of a HTML document:

Basic html code and document for showing the essential element and tags

Description of the HTML tags (in the document above):

<html> - Everything within this tag constitutes the document.
<head> - Defines and describes the information in the document.
<title> - The title of the document, describes it's content.
<body> - The content of the document.
<h1> - Header tag, which is used at the top of important paragraphs
<p> - Paragraph, the same purpose for grammar as with a written document.

List of additional (popular) HTML tags:

<applet> - An applet is a program which can be embedded into a HTML document.
<frame> - Can divide a document in sections, which are called frames.
<header> - Defines a section within a document, called a header.
<meta> - Defines metadata about the document; refresh tags are meta tags.
<noscript> - Enables content for scripts that can't be run by a user agent (browser).

Development of a HTML document

Once the structure and elements of HTML are understood, it is simple to create a static HTML document. All a developer requires is a text editor - like Notepad in Windows - then the code can be written and saved with a (.html, .htm) extension.

When a web designer develops a template for a website, Cascading Style Sheets (CSS) can be used to define the appearance of a webpage. The World Wide Web Consortium (W3C), which develops and maintains HTML, recommends the use of CSS. While webmasters can write HTML code by hand, there is an even simpler way to create HTML documents: which is by using a HTML editor: like Macromedia Dreamweaver.

HTML editor's automatically generate HTML code, and create a webpage in a graphical interface. Therefore, a webpage can be created with no knowledge of HTML. Likewise, there are numerous software applications that can generate webpage images, such as: banners, animated gif, and image maps. Interactive content can also be embedded into HTML documents, and are typically created by using: Flash or Java.

While it is easy to create a static HTML document, it is rather more complex to create dynamic HTML documents. Dynamic HTML content is generated 'on the fly'; typically using session ID's. Creating dynamic web content is complex, and typically requires the use of: CGI, Javascript, Perl or PHP. Dynamic content typically uses a database, therefore, websites that generate dynamic content are often described as: database driven. CSV (known as a CSV feed) and MySQL are two technologies that are widely used to create and maintain database driven websites. Server side commands are used to administer dynamic web content: so that content from databases can be easily inserted into web documents.

While webmasters can create a database driven website from scratch, they typically purchase 'off the shelf' software applications. For example, most webmasters do not create their own bulletin board software: instead they download and use a professionally developed application. There are numerous 'content applications' that help webmasters create and maintain web content: the most obvious of which is the Wiki system (designed using CGI); which has resulted in websites like WikiLeaks being developed.

The homepage of a website has a default file name, which is usually: index (also known as the entry page). Before content is uploaded to a website, the index page is sometimes uploaded with a 'under construction' holding page. During the construction of a website, a htaccess file can be uploaded to it's web server to redirect visitors.

Some factors to consider when building a webpage:

  1. Make sure the webpage is compatible with a wide range of browsers.
  2. Avoid using annoying images and text that either flashes or flickers.
  3. Make sure your pages are small (below 40K); they load well in mobile browsers.
  4. Avoid using too many images on your pages; for the above reason
  5. Search Engines love content (text) websites.
  6. Banner exchanges and tacky promotion gimmicks leave a poor impression.