University of Botswana History Department

Introduction to HTML

HTML Index Page || Site Index
Common HTML tags || Using tables in HTML || Using colour in HTML || Introduction to style sheets || Basics of academic web usage


Back to contents   ||   Back to top

What is HTML?

Web pages are documents, not unlike word-processor documents with which you are already familiar. However, they are not in ordinary word-processor formats like the MS Word *.doc format, but in a special format known as HyperText Markup Language (HTML). "Hypertext" refers to the way in which pages are linked to others by hyperlinks, to create a vast "web" of text, the World Wide Web (WWW). HTML is quite easy to learn.

The following notes describe "HTML 4 Transitional", which is currently the most common standard for HTML. However it is due to be replaced in due course by something called "XHTML". XHTML will not very different in practice, at least at first. It is a bit less "forgiving" than HTML. E.g. whereas HTML allows many things to be either upper-case or lower-case, XHTML doesn't. We have noted some of these points below so that you can be aware of what would be good habits to pick up now!

Back to contents   ||   Back to top

The basics

Web-page documents include both text and formatting. "Text" is the actual words of the document; "formatting" is information about how they should be displayed by the browser. For example, consider the first sentence of this paragraph. The text is "Web-page documents include both text and formatting". The formatting is the extra information that the words "text" and "formatting" are to be put in italics.

In HTML, this information is conveyed by putting markers, usually known as tags, around the text to be formatted. Tags are shown by <angle brackets>. The tag for "italics" is <I>. The HTML would be:

Web-page documents include both <I>text</I> and <I>formatting</I>.

<I> means "start using italics", and </I> means "stop using italics". (Incidentally, tags are not case sensitive so we could equally well have written <i>.) [In XHTML, all tags should be lower-case.] Most tags are like this; for example <B> and </B> for bold type.

[Digression: You might ask why you need to indicate formatting indirectly with <I> etc. - why not just make it italic like in a word processor such as MS Word? The reason is that in reality, even in MS Word, the text and the formatting are separate. MS Word does not have a picture of the words - it too has the words, in characters, and information about how to display them. The difference is that in Word, the formatting is completely hidden from the user, and all you see is the end-result of italic type.]

HTML provides tags for formatting, but in fact it is better not to think of HTML as simply a set of codes for formatting. HTML is a way of creating a document which is centred on the logical structure of the text, rather than the precise formatting. For example, consider a list to appear something like this:

  First item
  Second item
  Third item

In HTML, this would be defined as an unordered list of three items. The browser renders the list by putting the "bullets" at the left of the three items, but another browser might render the list with some other markings. What is defined by the HTML is that the three items should appear as a list but without numbers or letters marking an order: the precise rendering of this is left to the browser.

Back to contents   ||   Back to top


HTML documents are made up of elements. Elements include, for example, paragraphs, lists, images, and tables. Elements often contain other elements. An element consists of

  1. an opening tag
  2. in most cases, some content
  3. if there is content, a closing tag.

For example, consider the <I> element.

Description of <I> element
<I>opening tag
text to appear in italicscontent
</I>closing tag

In some cases the closing tag is optional, but it is good practice always to use them. [In XHTML, closing tags can never be omitted.]

Some elements do not have content; they are "empty". That is to say, they do not have an opening and a closing tag, with content between, but rather consist of a single tag. An example is the <HR> element, which produces a horizontal line across the page. There is no </HR> tag. The <HR> stands alone. Another common empty element is the <IMG> (image) element. You might wonder how this can be said to "have no content". It does of course contain information, about the image, but this information is given inside the <IMG> tag (see below).

[In XHTML, empty elements are indicated by the ending "/>". Thus instead of <br>, XHTML uses <br />. The space before the "/" is not strictly required by XHTML, but is necessary for compatibility - i.e. existing browsers will have problems unless the space is included.]

Back to contents   ||   Back to top


Elements can have attributes. Attributes define properties of the element. They appear as attribute/value pairs, within the start tag. For example, consider the <HR> element. This element simply means "draw a horizontal line across the page here". It can take various attributes, such as WIDTH and SIZE. WIDTH defines how wide the line should be; for example WIDTH="50%" means the line should be half the width of the page. SIZE="5" means the line should be 5 pixels thick. The HR element is now written

<HR WIDTH="50%" SIZE="5">
<HR SIZE="5" WIDTH="50%">

- the order of attributes does not matter. The values should be enclosed in double quotation marks. (You can omit the quotation marks in some but not all cases, so it is good practice always to use them. [In XHTML, quotation marks must always be used.])

In the case of the <IMG> element mentioned above, information about the image is given by various attributes.

Back to contents   ||   Back to top

Basic structure of an HTML document

An HTML document is made up of

  1. a DOCTYPE declaration, which defines the type of HTML being used
  2. an HTML element, which includes:
    1. a HEAD element, which contains information such as the title of the document
    2. a BODY element, which contains the actual text.

Thus, the outline of the document is like this:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "">


<HEAD> ... </HEAD>

<BODY> ... </BODY>


(1) The DOCTYPE declaration.

This is rather technical, but fortunately you do not need to understand exactly how it works. For History web-site documents the DOCTYPE declaration should be <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" ""> The DOCTYPE is needed because HTML is in fact part of a larger system of document formatting known as SGML (Standard Generalized Markup Language). This includes XML, which in the future is expected to be used on the Web together with HTML (in particular for specialized applications such as commerce or technical papers). The DOCTYPE defines the DTD (Document Type Definition) being used. Actually, since browsers at present assume you are using HTML, you can get away with omitting the DOCTYPE part, and many web-sites do so. But it is good practice to include it.

(2) The HEAD element.

This includes information about the document. This information is not actually displayed as part of the document as viewed in a web-browser. The HEAD element must contain a TITLE element. (Other possible components are optional.) This title is displayed by the browser at the top of the screen (not in the actual window where the document itself is shown).

<TITLE>Test document no. 1</TITLE>

(3) The BODY element.

This contains all the actual document to be displayed by the browser. Typically, it will contain elements such as heading elements, paragraph elements, tables, and lists. BODY itself can take some attributes, notable BGCOLOR which sets a background colour for the page. Some colours can be specified by their names, e.g.
<BODY BGCOLOR="yellow">
but this is limited; it is often better to use the colours' RGB values. For example
See the page on colours in HTML.


[XHTML is, technically, a version of HTML that is valid as XML and thus can be used by XML agents. An XHTML document will look like the following:

<?xml version="1.0" encoding="UTF-8"?>
"-//W3C//DTD XHTML 1.0 Transitional//EN"
<html xmlns="" xml:lang="en" lang="en">
<p>test text</p>

Back to contents   ||   Back to top


The paragraph element consists of text enclosed between <P> and </P>. The </P> can be omitted since the browser will realize when it finds another <P> that the previous paragraph has ended, but it is clearer to include it. Example:

<P>This is a paragraph.</P> <P>This is the second
paragraph. It will be shown as separate from the first one.</P>

This will appear as:

This is a paragraph.

This is the second paragraph. It will be shown as separate from the first one.

Notice that the line breaks in the text as displayed are not the same as in the original HTML. This is because the browser gets it formatting only from the formatting instructions in the <P> tags, etc. Any amount of "white space" will collapsed to a single space in the page displayed on screen. (There are some exceptions to this, but we can disregard them for the present.)

Back to contents   ||   Back to top


Headings are defined in up to six levels. A top-level heading is <H1> ... </H1>. A second-level heading is <H2>, and so on to <H6>. You should not jump from <H1> to <H3> without using <H2>.

That is, headings might be arranged like this:

Heading at H1 level
Some text
Another heading at H1 level
   A subheading at H2 level
   Some text
   Another subheading at H2 level
   More text
An H1 level heading
More text 

Back to contents   ||   Back to top


For a list, you define a series of list items <LI>. You enclose this series of list items within a list element. For example, consider the following three list items:

<LI>African Languages</LI>

We can enclose these in an ordered list element <OL>, like this:

<LI>African Languages</LI>

in which case the browser will show it as

  1. African Languages
  2. French
  3. History

Alternatively, we could make it an unordered list <UL>:

<LI>African Languages</LI>

in which case it will appear as

There is also a "definition list", <DL>. This includes not LI list items but DT (definition term) list items and DD (definition list definition) list items. The basic use for this is in providing lists of definitions, as the name suggests, thus:

   <DD>one Botswana citizen or Tswana person></DD>
   <DD>(plural) Botswana citizens or Tswana persons</DD>

will appear as

one Botswana citizen or Tswana person
(plural) Botswana citizens or Tswana persons

Although the basic use if for lists of definitions, the HTML 4.0 specification allows other uses where you want a list with two parts. For example, you can legitimately use a DL to format a dialogue, with the speaker's name in DT and the text in DD, as in the following example marked up as a DL:

Hello, is this Gaborone?
No, this is Tsabong.
Ah, that would explain the camels.

DL formatting does not particularly seem impressive in its plain form, but it can be combined with style sheets to produce very good effects.

Back to contents   ||   Back to top


Links are fundamental to the Web; they are what constitutes hypertext. A link normally appears on the browser screen as a piece of text in a special colour and underlined: clicking on it sends you to some other page or part of a page, etc.

A link has two ends, which are called anchors. The source anchor - which will be shown on the screen as a clickable link - is defined by an anchor element <A>. The destination anchor may be a new file which the link leads to, or a particular point within a page. In the latter case the destination point must be marked, usually by an anchor element.

Back to contents   ||   Back to top

Source anchors:

A source anchor is created by an anchor element with an HREF attribute. The value of the HREF attribute is a URI (i.e., an Internet address). For example,

<P>If you are interested in Botswana's history, visit the <A HREF="">UB History Dept web-site</A> where there is lots of information.</P>

will appear as

If you are interested in Botswana's history, visit the UB History Dept web-site where there is lots of information.

with the underlined text being a clickable link leading to

A URI begins with a part which indicates how the connection is to be made. For web-pages this will normally be http://

Back to contents   ||   Back to top

Types of address:

In the above example, the link went to a page defined by its complete internet address (absolute URI). This is an absolute link. There are two other common types of address, though:

Back to contents   ||   Back to top

(1) Relative links:

A relative URI is incomplete; it defines an address in terms of the current document. For example, suppose we are writing the page and wish to make a link to the page We could write out the whole URI:

<A HREF="">Click here for archaeology</A>

but since it is in the same folder, we can just write

<A HREF="arch.htm">Click here for archaeology</A>

One advantage of this is that much less re-writing is needed when files and folders are rearranged, as the relative URI does not change if the files are still in the same folder. Relative URIs can be given for files not in the same folder, using relative paths. These work the same as DOS relative paths except that with URIs folders are separated by forward slashes / rather than backslashes \.

If you are unfamiliar with how relative addresses work in DOS etc., see the page on relative addresses.

Back to contents   ||   Back to top

(2) Internal links:

To create a link to another part of the same page, you need first to define the destination anchor. The easiest way is to create an anchor element with a NAME attribute. For example, we could mark the start of Chapter 2 in a long document by

<A NAME="chapter2">Chapter 2</A>

We can now link to this by a source anchor

<A HREF="#chapter2">Click here to go to Chapter 2</A>

Notice that the destination is indicated by the NAME which we have defined, preceded by the character # which shows that this is an internal link. (You need to remember that the # is required in the source anchor, but must not appear in the NAME="" part. It is not part of the name, but indicates that a destination is an internal link and not another page.)

It is in fact possible to combine internal links with URIs, as in the following:

<A HREF="book.htm#chapter2">Click here for Chapt. 2 of the book</A>

which means

  1. go to the new page book.htm
  2. go, within that page, to the "chapter2" anchor.

Some older browsers may however not manage the second part, and will just go to the page.

As an alternative to using <A NAME="name">, you can use an ID attribute to almost any element, e.g. <P ID="thispara">. (Note that the ID name must be unique in the document.) You can them link to this ID by an <A HREF="#thispara"> in the normal way. However this only works in the more recent browsers, so at present the <A NAME="thispara"> method is preferable.

Back to contents   ||   Back to top

(3) Non-HTTP links:

All the above links have been HTTP links - that is, they tell the browser to go to an http:// address. But it is possible to specify other types of URI in the link. For example, consider the following URI:
This is an FTP address. The connection is to be made using the FTP (file transfer protocol) type of connection rather than the HTTP type of connection used for ordinary web-pages. FTP is typically used for downloading large files which are not web-pages. (In this case, the FTP site is one for downloading Gutenberg electronic texts.)

The URI is written in the anchor like a HTTP URI:
<A HREF="">text of link </A> In the case of FTP links, the browser will often make the connection itself. In some other cases, it may start another program. For example, consider the mailto: link:
<A HREF=""></A>
In this case, clicking the link will start an email program which will open a message to the address stated (here,

Note that unlike http:// and ftp://, mailto: does not end with //

You can try clicking the following FTP link to see how your browser deals with it:

Back to contents   ||   Back to top


Images or pictures have to be stored as separate files from the HTML files. Images are usually either GIF files (*.gif), PNG files (*.png) or JPEG files (*.jpg). In the HTML file, an image element <IMG> indicates that an image should be displayed at that point. The IMG element is always "empty": i.e. it consists just of a start tag. It must include a SRC (source) attribute, which indicates where the actual image file is to be found. The value of this SRC attribute is a URI. For example, suppose we wish to show the picture which has been stored as flag.gif:

<IMG SRC="flag.gif">

IMG can take a number of other attributes which define things such as big the image should be, how it should be aligned in relation to the surrounding text, etc. One essential attribute is the ALT (alternative) attribute. This is a short text which is displayed if the image is not. Many people, when using the Web, have their browser set not to display images (as this speeds up the process greatly when you have a slow connection). Where an ALT is defined, the browser displays the ALT text where the image should be. The user thus knows what the picture would have been, and can decide whether it is worth the extra time to load it.

<IMG SRC="flag.gif" ALT="Botswana flag">

Back to contents   ||   Back to top

Nesting tags

Consider a paragraph. Within it, some of the text may be bold, and perhaps some text will be both bold and italic. In this case the various elements are nested one inside the other:

<P>This is a paragraph. <B>This sentence is in bold</B>. But this is <B><I>both in bold and in italics</I></B>.

producing the following:

This is a paragraph. This sentence is in bold. But this is both in bold and in italics.

There are two things to note when nesting elements:

Back to contents   ||   Back to top

Nesting symmetrically

You must nest them symmetrically: that is, the <I> ... </I> must be entirely inside the <B> ... </B> (or vice versa). The example above is correctly nested. But the following is wrong:

This is <I><B>both in bold and in italics</I></B>.

The following diagram may help to clarify what is needed.

 |                   |
 |   /----------\    |
 |   |          |    |
 |   |          |    |
<I> <B> ..... </B> </I>

Back to contents   ||   Back to top

In-line versus block-level elements

Elements can be divided into block-level elements, such as paragraphs, tables, etc., and in-line elements such as <B>, <STRONG>, etc. Block-level elements are the building blocks of a document, whereas in-line elements are just little formatting pieces. In-line elements are inside block-level elements - not vice versa. I.e., the following is wrong:

<B><P>This paragraph is meant to be all bold. But the nesting is faulty.</P></B>.

Back to contents   ||   Back to top


You may want to add a note to the HTML source file which is not supposed to appear on the web-page as displayed. For example, you might want to add a note like "Pictures to be inserted here". The way to do this is to use an HTML comment. A comment is enclosed in <!-- -->, thus:
<P>The first President of Botswana was Sir Seretse Khama. <!--Picture to be added here--> He was succeeded by Sir Ketumile Masire.</P>
This will appear as
The first President of Botswana was Sir Seretse Khama. He was succeeded by Sir Ketumile Masire.

Caution: although this comment will not appear on the web-page, it can be read (along with the rest of your source code) by any visitor to the web-site, simply by using the "View Source" command. (See "How to learn from other people's HTML" below.) So do not use comments for anything which is confidential!

The "comment" marks <!-- and --> tell the browser to ignore whatever is between them - not to try to interpret it as HTML to appear on the screen. One use of comments is to remove something from the page to be displayed without actually deleting it. For example, if there is something which needs to be corrected, but which you don't want to have completely re-write, you can just put <!-- and --> marks around it. Thus, for example:

<P>This para is OK </P>
<!-- <P>But this one needs to be checked </P> -->

- only the first paragraph will appear on screen. This use of comments is called "commenting [something] out".

Digression: The concept of a "comment" in computer code comes from computer programming, where it is considered good practice to add human-readable explanatory comments to the program source. These have to be marked in some way so that they are not confused with the actual program, e.g. in the "C" language they are marked by /* and */ :

#include <stdio.h>
/* This is a comment in a "C" program 
  - as you see it is normal English, 
  unlike the computer program around it */
void main (void)
printf ("Hello world\n");

Back to contents   ||   Back to top

Putting it together

We have now got enough HTML to produce a simple web-page. Below is, firstly, the HTML code for this sample page, and secondly, what this code would produce. (Because this is being displayed within another page, the code actually used is not quite identical - for example we cannot use the DOCTYPE declaration inside a page.)

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<TITLE>Test HTML document</TITLE>
<H1>Heading of document</H1>
<H2>First subheading</H2>
<P>This is a paragraph. It is enclosed in P tags. <A HREF="">Click here to go to</A>. Alternatively, <A HREF="#dest">try this internal link.</A></P> <P>This is another paragraph. It will appear separately from the first</P>
<H2>Second subheading</H2>
<P>This is a paragraph containing a destination anchor. Here is the destination anchor: <A NAME="dest">the link in the first paragraph should arrive here.</A>
<P>Now a list:</P>
<LI>An item</LI>
<LI>Another item</LI>
<LI>And yet another</LI>
<P>The end</P>

And here is how the above code will appear:

Heading of document

First subheading

This is a paragraph. It is enclosed in P tags. Click here to go to Alternatively, try this internal link.

This is another paragraph. It will appear separately from the first

Second subheading

This is a paragraph containing a destination anchor. Here is the destination anchor: the link in the first paragraph should arrive here.

Now a list:

  • An item
  • Another item
  • And yet another

The end

Back to contents   ||   Back to top

Style sheets

We have already noted that many of the old stylistic and formatting elements in HTML, such as FONT, are deprecated. What are we supposed to use instead? The answer is that style is supposed to be defined by a system of style sheets. These work by clearly separating content and formatting. Style is defined for particular HTML elements; for example, instead of making each H2 red by code such as <H2><FONT COLOR="red">Text of heading</FONT></H2> we have a style declaration at the start of the document like this:

H2 {color: red}

For an introduction to this system ,see our Introduction to style sheets.

There is one major problem with style-sheets: older browsers may not understand them at all, and only the latest browsers render them correctly. Thus, we recommend that you should continue to use HTML formatting for essential formatting. (This does not necessarily apply to pages designed for an "intranet", i.e. a site intended only for local-network users for whom you know exactly what browsers they use.)

Back to contents   ||   Back to top

How to learn from other people's HTML

In most broswers, there is a menu command which displays the original HTML of the page you are looking at. In Internet Explorer it is View - Source; in Netscape it is View- Page Source. Whenever you see something that interests you in the way a page is set out, you can find out how it was done by viewing the source.

There is a slight complication with pages using frames. There will be two or more separate files which have to be accessed.

Back to contents   ||   Back to top

Creating HTML files

An HTML file is a plain text file, that is, it consists simply of characters without formatting - when you open it, it looks like something written on a typewriter. In Windows or DOS, a plain text file often has the extension .TXT, although HTML files should have the extension .HTM or .HTML. To create an HTML file, there are several possible options.

Back to contents   ||   Back to top

Composing your own HTML

Back to contents   ||   Back to top

Using an HTML editor

This is the best option, although it requires a little practice. We recommend the HTML-Kit HTML editor, which is free (see the software page). Using an HTML editor is a little like using a word-processor, but instead of adding formatting to text, it adds HTML coding. For example, in an ordinary word-processor you might select a string of words and click the  I  (italics) button, whereupon the words are changed to italics. In HTML-Kit, if you select a string of words and click the italics button, <I> and </I> tags are placed around those words.

Back to contents   ||   Back to top

Using a plain text editor

Text editors are simple word-processors that only produce plain text. If you are using Windows, there is a text editor provided called Notepad. To start it, go to the Start Menu, go up to Programs, then Accessories, then Notepad. A better text editor is Metapad. See the software page for information on how to download this. When you save a document in one of these, the application may try to add a ".TXT" extension. This can be overridden by entering the filename in quotation marks.

Back to contents   ||   Back to top

Using a word-processor

Ordinary word-processors (such as Microsoft Word, WordPerfect, StarOffice Write, AbiWord etc.) can save documents as plain text. Use the "Save As" command and select "Plain Text" or "Text Only". Enter the filename as "whatever.htm". This is often easier than working with a plain text editor, but you need to be careful that the word-processor hasn't somehow reverted to its normal format.

Back to contents   ||   Back to top

Automatic composition

It is quite possible to produce web-pages without knowing HTML, by using one of the many programs that generate HTML files. You use these like word-processors and do not need to understand the actual HTML produced. Even if you do know HTML, you may find these save time. Such programs are sometimes called WYSIWYG ("What You See Is What You Get", i.e. what you are editing is what the final display is supposed to look like). Some such programs are specialized applications intended purely for web design; others are word-processors that are capable of saving as HTML. (StarOffice, and recent versions of MS Word, can save as HTML.)

There are however serious disadvantages to relying on such programs.

Although such programs save time, they save less than you might think. Once you are familiar with an HTML editor such as HTML-Kit, it is almost as fast as using an ordinary word-processor.

However, the most important thing is to produce your web pages one way or another! Using WYSIWYG is a perfectly valid option. Automatic HTML generation has some uses that even expert HTML users sometimes find valuable:

Back to contents   ||   Back to top

Any Browser! (or, using correct HTML)

You may have seen, on many web-pages, the words "Best viewed with Microsoft Internet Explorer" or "optimized for Internet Explorer 5". You may even have had the experience of going to a web-site and getting a special page telling you that as you are using an old, or non-Microsoft browser, the pages may not display correctly. This is a bad thing. The whole point of the World Wide Web is its openness and availability. If you write correct HTML according to the international W3C standards, your pages can (broadly speaking) be read in any browser. It should not rely on any particular proprietary software.

There are, admittedly, a few special cases where there is an excuse. One is in the case of an intranet, i.e. where web-pages are intended solely for internal users within some large organization. In such an intranet, everyone has the same software and so compatibility issues do not arise. An example of such an intranet would be a hospital information system which allowed medical and other staff to access records all over the hospital, but was not open to the outside. However, this situation is a fairly limited one, and should not be confused with the more common one of an institution with documents that are mainly intended for internal use, but which might be read from outside the local network. There is also the point that older browsers will not be able to render more recent developments in HTML, but this is different from writing pages for one particular modern browser.

Any browser! See the Any Browser Campaign for more information. If you write web-pages which rely on one particular browser, you are in effect casting your vote for monopoly. Luckily, it is easy to avoid this. Just write standard HTML.

Back to contents   ||   Back to top

Conclusion: to UB colleagues:

As you see, HTML is really quite simple. As you learn more of it, you will find that even apparently complex things can be coded very easily. HTML's brilliant simplicity and versatility derives from its origins: it was not dreamed up by some software company, or by a team of consultants with a mission statement, but was invented by working academics as a practical tool. It originates in fact at the European nuclear research laboratory CERN, where a scientist named Tim Berners-Lee created it to facilitate communication between scientists. (See CERN page on the origins of HTML.) His creation combined

The World-Wide Web, and HTML, were created by and for academics. They are now of course extensively used by all sorts of other interests, such as business, and academic usage is only a relatively small part of the present-day web. This is no doubt a good thing. The point is, however, that as academics we should be using the web actively, creating and publishing our own web-pages and sites. An academic web-site is not primarily something for experts, but something practical, for academics communicating with each other. To think of "someone developing a web-site for us" would be like thinking of a publisher "writing some academic books for us". As academics we are expected to write the books, we should produce the web-site. A University will, these days, need to have an official web-site for general purpose communication; but this should not be confused with the academic part of the web-site, which is what academics create and use. So get writing!

Back to top

Copyright © 2000 University of Botswana History Department
Last updated 19 November 2003