=========================================================================== Today on The World Vol. 4 #161 Monday, June 22, 1998 =========================================================================== Continuing the HTML lessons... Remember that you can find the other chapters on http://world.std.com/help/web/tutorial for now. (kibo) --------------------------------------------------------------------------- HTML TUTORIAL -- CHAPTER 11 -- HTML DOCUMENT STRUCTURE 11.1 Parts of a page Remember how in Chapter 1 I got you started with this skeleton: My Page. ...and then immediately said "I'll explain this later"? Well, now is later. Every HTML document (that is, every Web page) begins with and ends with . ... encloses everything. You'll notice that inside ... there are two sections: ... and .... The section, the heading, is where information about the page goes -- the title, keywords, the creator's name, and other stuff used for indexing, searching, or annotating your site. I'll say more about what can go in below. For now, must (repeat, MUST) contain ... for the title of the page, and may contain other stuff. ... should be 64 characters or fewer (because many browsers chop it off at or near 64 -- this is a W3C specification) and I recommend not using entities or "weird" characters in it, because some programs do not process entities and character sets in . I also recommend that, for purposes of people making bookmarks and sorting them alphabetically, titles like <TITLE>Welcome to The World! or The World should be changed to alphabetize under something other than "The" or "Welcome": World: Welcome to The World! (Yeah, I know that in the real world, you ignore "the" when alphabetizing. Please tell my Web browsers that.) The other part of the HTML document -- usually much bigger than ... -- is .... contains all the displayable stuff in your document: text and tags such as

, , , , , etc. In other words, everything interesting goes there. is for "meta-information" about the page (generally not visible, except for the required ) and <BODY> is for the content of the page. Do not put content in <HEAD>. Do not put stuff like <TITLE> in <BODY>. Do not put anything outside those two sections (except <HTML>...</HTML> and comments.) Ah, yes, comments. Comments are fun. 11.2 Comments You've seen comments already: <!-- this is a comment --> Comments are not displayed when the page is displayed (although obviously people can see them if they do "View Source".) The purpose of comments is to leave notes to yourself about how your site works. Some people also use them to temporarily disable tags (change "<" and ">" to "<!-- " and " -->") and sometimes new non-HTML features (like the Server-Side Includes on some Web servers, or JavaScript) are added inside comments, so that they will be ignored when they aren't supported. Because comments don't do anything (with the exception of the special case where you use them to contain JavaScript, Server-Side Includes, etc.) you can put them anywhere, in <HEAD> or <BODY>. They even seem to work outside <HTML> (there's no reason they shouldn't.) Early versions of HTML comments this way: <!blah blah blah> ...and some Web browsers interpret it this way, meaning that any ">" inside a comment turns off the comment and goes back to displaying stuff, meaning that you couldn't embed tags in comments (you still can't, in some browsers!) The current HTML syntax, <!-- blah blah blah --> ...starts with an exclamation point, two hyphens, and a space, then the comment, then another space, and two hyphens. (They usually work without the spaces, but you should get in the habit of using the spaces.) So, because comments start with "<!-- " and end with " -->", you're supposed to be able to say: <!-- <A HREF="http://www.nowhere.com">Broken site</A> --> ...if you want to "comment out" some of your HTML so that it does not show up. However, like I said earlier, some browsers still follow the earlier HTML standard which says that any ">" will terminate a comment, so I usually spend a little time doing it this way: <!-- [A HREF="http://www.nowhere.com"]Broken site[/A] --> 11.3 Things in your <HEAD> There are all sorts of other things you can put in <HEAD>...</HEAD>. I'll mention a few of them here. <BASE HREF="http://world.std.com/index.html"> <BASE HREF> specifies the URL this document is stored at. It is used when a relative link (<A HREF="hobbies.html">) or an internal link (<A HREF="bottom">) is followed -- the <BASE HREF> and the <A HREF> are strung together to make the full path. On The World, if you are using Home Page Alone (HPA), you should have one of these in your main index.html (they're not needed elsewhere) to ensure that such links work properly (because the Web server's standard security features try to hide your actual home directory by pretending that your "public_html" IS your home directory.) The other nice thing about <BASE HREF> is that if you move the page to a different location, or even if someone saves the page to their hard disk and then views it, all the relative links will still go where they used to. This can be helpful or annoying depending on how you set your site up. In other words, don't put <BASE HREF> on every page unless you understand what it will do. Note: <BASE HREF="url"> takes the full URL of the current document. It will probably still work if you leave off part of the URL, but you shouldn't. <LINK REV="MADE" HREF="mailto:webmaster@world.std.com"> This is used by some programs (I think mainly lynx) for a "send mail to author" command. Mostly it isn't used. <META NAME="GENERATOR" CONTENT="Adobe PepperMill 4.7 BeOS"> The "GENERATOR" <META> is used by programs that edit HTML (such as Adobe PageMill, Microsoft FrontPage, Claris Home Page) to establish what program created the page. It's not something that you want to put in if you're editing your own HTML. <META NAME="DESCRIPTION" CONTENT="Official site of the 2004 Olympics"> <META NAME="KEYWORDS" CONTENT="olympics,olympiad,2004,official,site,sports"> Some searching or indexing tools use the "DESCRIPTION" and "NAME" <META>'s. The standard is that the "KEYWORDS" should be separated by commas (not spaces!) The "KEYWORDS" are often used by search engines. Some people will put in dozens of copies of the same keyword in the hopes that this makes their page go to the top of results lists. Well, this doesn't usually work, but you should give some thought to constructing a good set of keywords that covers all the bases: <META NAME="KEYWORDS" CONTENT="olympic,olympics,olympiad,2004, 2004 olypmics,2004olympics,official,site,sports,sport,sporting, javelin,discus,basketball,events,results"> ...or whatever is appropriate for your site. Note: If you want to funnel all your site's visitors to your front page (index.html), it may be wise to only put the "DESCRIPTION" and "KEYWORDS" on that one page. Some search engines (not all) can be told to NOT index your site with: <META NAME="ROBOTS" CONTENT="NOINDEX"> (Almost all of them support the Robot Exclusion Standard, which looks for a "robots.txt" file on your Web site. See the URL below.) There are all sorts of other <META> options you can use, because they are made up by the browser makers and the search engines, so everything looks for its own particular brand of <META>'s. In other words, if you're creating a search tool for your own pages with some particular program, find out what <META> tags that program looks at. There's also a version of <META> with an HTTP-EQUIV option: <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> HTTP-EQUIV is used to override the HTTP response headers sent by the server to the browser (these are invisible and are transmitted before the Web page is sent.) Please don't start monkeying around with HTTP-EQUIV unless you understand how HTTP works (as opposed to HTML!) Besides overriding the Web server's headers, the most useful functions of HTTP-EQUIV are many things, such as setting "cookies", establishing PICS ratings, and specifying that your document uses a particular character set. You can find more on <META>-matters at http://searchenginewatch.internet.com/webmasters/meta.html http://www.webdeveloper.com/categories/html/html_metatags.html http://vancouver-webpages.com/META/ http://www.yahoo.com/Computers_and_Internet/Information_and_Documentation/Data_Formats/HTML/META_Tag/ Robot Exclusion Standard (for robots.txt): http://info.webcrawler.com/mak/projects/robots/norobots.html HTTP 1.1 specification (if you're curious about HTTP-EQUIV): http://ds.internic.net/rfc/rfc2068.txt http://www.cis.ohio-state.edu/htbin/rfc/rfc2068.html PICS ratings: http://www.w3.org/PICS/ http://vancouver-webpages.com/PICS/HOWTO.html (kibo) ========================================================================== [] Send suggestions for tips & URLs to today@world.std.com. We're also collecting links for our Web pages at eyeguy@world.std.com. [] To contact CUSTOMER SUPPORT, send mail to support@world.std.com or call 617-739-0202. [] To subscribe to the "Today" mailing list, send a note saying 'subscribe announcements' to majordomo@world.std.com. Subscriptions to this mailing list are open to World customers only. [] Yesterday I took an old TV Guide out of my freezer. Guess I forgot. [] Today on The World is (C) Copyright 1998 by Software Tool & Die. Its contents may freely be redistributed as long as credit is given.