===========================================================================
Today on The World Vol. 4 #161 Monday, June 22, 1998
===========================================================================
Continuing the HTML lessons...
Remember that you can find the other chapters on
http://world.std.com/help/web/tutorial
for now.
(kibo)
---------------------------------------------------------------------------
HTML TUTORIAL -- CHAPTER 11 -- HTML DOCUMENT STRUCTURE
11.1 Parts of a page
Remember how in Chapter 1 I got you started with this skeleton:
My Page.
...and then immediately said "I'll explain this later"? Well, now
is later.
Every HTML document (that is, every Web page) begins with
and ends with . ... encloses everything.
You'll notice that inside ... there are two sections:
... and ....
The section, the heading, is where information about the page
goes -- the title, keywords, the creator's name, and other stuff
used for indexing, searching, or annotating your site. I'll say
more about what can go in below. For now, must
(repeat, MUST) contain ... for the title of the
page, and may contain other stuff.
... should be 64 characters or fewer (because many
browsers chop it off at or near 64 -- this is a W3C specification)
and I recommend not using entities or "weird" characters in it,
because some programs do not process entities and character sets
in . I also recommend that, for purposes of people making
bookmarks and sorting them alphabetically, titles like
Welcome to The World!
or
The World
should be changed to alphabetize under something other than "The"
or "Welcome":
World: Welcome to The World!
(Yeah, I know that in the real world, you ignore "the" when alphabetizing.
Please tell my Web browsers that.)
The other part of the HTML document -- usually much bigger than
... -- is .... contains all the
displayable stuff in your document: text and tags such as , ,
, , , etc. In other words, everything interesting
goes there. is for "meta-information" about the page
(generally not visible, except for the required ) and
is for the content of the page.
Do not put content in . Do not put stuff like in .
Do not put anything outside those two sections (except ...
and comments.)
Ah, yes, comments. Comments are fun.
11.2 Comments
You've seen comments already:
Comments are not displayed when the page is displayed (although obviously
people can see them if they do "View Source".) The purpose of comments
is to leave notes to yourself about how your site works. Some people
also use them to temporarily disable tags (change "<" and ">" to "") and sometimes new non-HTML features (like the Server-Side Includes
on some Web servers, or JavaScript) are added inside comments, so that
they will be ignored when they aren't supported.
Because comments don't do anything (with the exception of the special case
where you use them to contain JavaScript, Server-Side Includes, etc.)
you can put them anywhere, in or . They even seem to work
outside (there's no reason they shouldn't.)
Early versions of HTML comments this way:
...and some Web browsers interpret it this way, meaning that any ">"
inside a comment turns off the comment and goes back to displaying stuff,
meaning that you couldn't embed tags in comments (you still can't, in
some browsers!) The current HTML syntax,
...starts with an exclamation point, two hyphens, and a space, then the
comment, then another space, and two hyphens. (They usually work without
the spaces, but you should get in the habit of using the spaces.)
So, because comments start with "", you're
supposed to be able to say:
...if you want to "comment out" some of your HTML so that it does not
show up. However, like I said earlier, some browsers still follow the
earlier HTML standard which says that any ">" will terminate a comment,
so I usually spend a little time doing it this way:
11.3 Things in your
There are all sorts of other things you can put in ....
I'll mention a few of them here.
specifies the URL this document is stored at. It is used
when a relative link () or an internal link
( ) is followed -- the and the
are strung together to make the full path. On The World, if you are
using Home Page Alone (HPA), you should have one of these in your
main index.html (they're not needed elsewhere) to ensure that such
links work properly (because the Web server's standard security features
try to hide your actual home directory by pretending that your
"public_html" IS your home directory.)
The other nice thing about is that if you move the page
to a different location, or even if someone saves the page to their
hard disk and then views it, all the relative links will still go
where they used to. This can be helpful or annoying depending on
how you set your site up. In other words, don't put on
every page unless you understand what it will do.
Note: takes the full URL of the current document.
It will probably still work if you leave off part of the URL, but you
shouldn't.
This is used by some programs (I think mainly lynx) for a "send mail
to author" command. Mostly it isn't used.
The "GENERATOR" is used by programs that edit HTML (such as
Adobe PageMill, Microsoft FrontPage, Claris Home Page) to establish
what program created the page. It's not something that you want
to put in if you're editing your own HTML.
Some searching or indexing tools use the "DESCRIPTION" and "NAME" 's.
The standard is that the "KEYWORDS" should be separated by commas (not
spaces!) The "KEYWORDS" are often used by search engines. Some people
will put in dozens of copies of the same keyword in the hopes that this
makes their page go to the top of results lists. Well, this doesn't usually
work, but you should give some thought to constructing a good set of
keywords that covers all the bases:
...or whatever is appropriate for your site. Note: If you want to
funnel all your site's visitors to your front page (index.html), it
may be wise to only put the "DESCRIPTION" and "KEYWORDS" on that
one page.
Some search engines (not all) can be told to NOT index your site with:
(Almost all of them support the Robot Exclusion Standard, which looks
for a "robots.txt" file on your Web site. See the URL below.)
There are all sorts of other options you can use, because they
are made up by the browser makers and the search engines, so
everything looks for its own particular brand of 's. In other
words, if you're creating a search tool for your own pages with some
particular program, find out what tags that program looks at.
There's also a version of with an HTTP-EQUIV option:
HTTP-EQUIV is used to override the HTTP response headers sent by the
server to the browser (these are invisible and are transmitted
before the Web page is sent.) Please don't start monkeying around
with HTTP-EQUIV unless you understand how HTTP works (as opposed to
HTML!) Besides overriding the Web server's headers, the most useful
functions of HTTP-EQUIV are many things, such as setting "cookies",
establishing PICS ratings, and specifying that your document uses a
particular character set.
You can find more on -matters at
http://searchenginewatch.internet.com/webmasters/meta.html
http://www.webdeveloper.com/categories/html/html_metatags.html
http://vancouver-webpages.com/META/
http://www.yahoo.com/Computers_and_Internet/Information_and_Documentation/Data_Formats/HTML/META_Tag/
Robot Exclusion Standard (for robots.txt):
http://info.webcrawler.com/mak/projects/robots/norobots.html
HTTP 1.1 specification (if you're curious about HTTP-EQUIV):
http://ds.internic.net/rfc/rfc2068.txt
http://www.cis.ohio-state.edu/htbin/rfc/rfc2068.html
PICS ratings:
http://www.w3.org/PICS/
http://vancouver-webpages.com/PICS/HOWTO.html
(kibo)
==========================================================================
[] Send suggestions for tips & URLs to today@world.std.com.
We're also collecting links for our Web pages at eyeguy@world.std.com.
[] To contact CUSTOMER SUPPORT, send mail to support@world.std.com or
call 617-739-0202.
[] To subscribe to the "Today" mailing list, send a note saying 'subscribe
announcements' to majordomo@world.std.com.
Subscriptions to this mailing list are open to World customers only.
[] Yesterday I took an old TV Guide out of my freezer. Guess I forgot.
[] Today on The World is (C) Copyright 1998 by Software Tool & Die.
Its contents may freely be redistributed as long as credit is given.