Question: What are validators and what do you need to include in your code so that it will validate properly?
Answer: Validators are websites (usually) that allow you to upload your code or a URL, and the content of your website will be evaluated against independent standards of HTML and CSS coding. The best validators I'm aware of are available through either W3Schools.com or W3C (http://www.w3.org). W3C (World Wide Web Consortium) is an independent group that has been working with browser companies, graphic companies like Adobe, etc. to define standards for HTML, CSS, XML, and many other internet-based technologies. After Microsoft's Internet Explorer defeated Netscape in Browser Wars I (circa 1995-2001), W3C was like a voice crying in the wilderness for independent standards, because Microsoft defined the standards, and since they controlled 90%+ market share of the browsers viewing the internet, they didn't really care what anyone else thought. Now that Microsoft faces off against the next generation of browsers (and is fighting to maintain dominance, at least according to W3Schools's numbers), they are learning to play a little more nicely with others, pushing to have IE8 be compliant with W3C HTML and CSS standards (we've already discussed the problems they have going down that road).
The next generation of browsers include the following: (1) Mozilla Firefox (which is really just Netscape, retooled and beefed up), (2) Opera (which I've heard is actually one of the most standards-compliant browsers out there, but which has negligible market share), and (3) Google Chrome, which is the new kid on the block, but sure to begin throwing its weight around.
Now, as for getting your code to validate correctly...when the internet was first formed, it was basically designed with academia in mind: scholars from different schools across the country could use the internet to transmit their academic findings to one another. This is even apparent in the names of some of the oldest HTML tags (mostly deprecated, or slated for scheduled obsolescence). For instance, <em></em>(for emphasis, or italics), <u></u>(for underline), <b></b>(for bold), <address></address>, <blockquote></blockquote>, etc. There are lots of old tags that sound like they are much better used for presenting a research paper than they are for presenting streaming media, high-resolution graphics, or data-driven applications, and that's just the point: none of those things fundamentally fit into HTML's design. Layout and styling, Flash and graphical designs are at best hacks to what is still fundamentally a means for transmitting text meaningfully from person A to person B, with ample metadata (i.e. information that describes/categorizes/organizes certain pieces of your text) to keep everything neat and explicit.
As time went on though, the internet began to morph more into some of what we see today: people wanted their webpages to show pictures, play music, format their text to be aesthetically pleasing, and use programming languages to show when/how/if certain parts of their webpages would even be shown. This led to a couple interesting ideas, namely:
- The <table><tbody><tr><td></td>...</tr>...</tbody></table>tags to represent tabular data in an unambiguous row-and-column format;
- Layout tags like <h1...5><, <center></center>(now deprecated); and
- Style tags like <font></font>(also deprecated).
But this soon proved to not go far enough, so they said, "We should be able to apply different styles or formatting to any tag." What they eventually developed are Cascading Style Sheets which, as we know them today, exist in three separate areas: (1) externally, (2) in the page's head, and (3) in an individual tag. They're called cascading because the closer to the actual tag you get, the higher precedence the styling has--an external style sheet applied to, say, a paragraph, has lower priority than a head stylesheet, which has lower priority than putting your style right in the paragraph tag of your choosing. While all this organization can get confusing, once you get the system down, it can really keep your code from getting cluttered, and allow you to make sweeping changes very quickly and easily (for instance, you could include the same style on every page in your website, and make your paragraphs style the same way throughout).
Soon, even this proved to be inadequate, because people did begin cluttering up their code with inline styles, putting, for instance, left padding of 15 pixels on every paragraph tag on the page when they could have defined it once for all much easier and more quickly. So this idea--that the content and data you wish to present should be as far removed as possible from the styling and presentation applied to it--became one of the cornerstone principles of XHTML compliance. (When you run your code through a validator, you're most likely checking it against XHTML compliance.)
XHTML is an intermediary stage between HTML (which you already know about, rather free-wheeling and footloose, fancy-free, etc.) and XML. XML takes the stylesheet a step further and says that the content page, suffixed by XML, should present exclusively content and data, and that it cannot and should not display correctly without having an external stylesheet document applied to it. XHTML is a step down the road to XML compliance.
One other thing about HTML: it can be a very forgiving language, sometimes frustratingly so. For instance, you can have valid HTML paragraphs that have an open </p> tag, text, and then no closing </p> tag. This sort of leniency discourages good markup practices among coders and makes the language unnecessarily ambiguous. XHTML, in contrast, sets down the following rules (among other things):
- All tags should either end in a slash (i.e. <img />) or have opening and closing tags (<p>...</p>).
- All attributes in a tag should have names that are all lowercase, and values that are wrapped in double quotes.
One of the things greater standardization brings is greater accessibility. There are now HTML and browser parsers that operate for people that are blind or have low visibilty; I'm sure you can imagine how much easier this software is to use when the language it expects is regimented, standardized, and well-defined. These standards have been codified in things like Bobby compliance and Section 508 compliance. They also pose a problem for modern-day web applications, because they were written before things like that were really popular, and the mixing of technologies that is necessary for modern day web applications seems to move further away from accessibility rather than towards it, but that's a story for another day.
So, to review: Why do we try to write compliant code?
- It makes our code less cluttered and easier to read.
- It makes our code easier to maintain, since we can make higher-level changes more quickly.
- Since many of the non-compliant aspects of our code are being deprecated or will be some day, it's best to plan for the future now.
- It helps enforce best practices programming and markup, by standardizing and regimenting code.
- It fosters accessibility for people who otherwise might not have a chance to use the internet.