Monday, July 6, 2009

Validators

Question: What are validators and what do you need to include in your code so that it will validate properly?

Answer: Validators are websites (usually) that allow you to upload your code or a URL, and the content of your website will be evaluated against independent standards of HTML and CSS coding. The best validators I'm aware of are available through either W3Schools.com or W3C (http://www.w3.org). W3C (World Wide Web Consortium) is an independent group that has been working with browser companies, graphic companies like Adobe, etc. to define standards for HTML, CSS, XML, and many other internet-based technologies. After Microsoft's Internet Explorer defeated Netscape in Browser Wars I (circa 1995-2001), W3C was like a voice crying in the wilderness for independent standards, because Microsoft defined the standards, and since they controlled 90%+ market share of the browsers viewing the internet, they didn't really care what anyone else thought. Now that Microsoft faces off against the next generation of browsers (and is fighting to maintain dominance, at least according to W3Schools's numbers), they are learning to play a little more nicely with others, pushing to have IE8 be compliant with W3C HTML and CSS standards (we've already discussed the problems they have going down that road).

The next generation of browsers include the following: (1) Mozilla Firefox (which is really just Netscape, retooled and beefed up), (2) Opera (which I've heard is actually one of the most standards-compliant browsers out there, but which has negligible market share), and (3) Google Chrome, which is the new kid on the block, but sure to begin throwing its weight around.

Now, as for getting your code to validate correctly...when the internet was first formed, it was basically designed with academia in mind: scholars from different schools across the country could use the internet to transmit their academic findings to one another. This is even apparent in the names of some of the oldest HTML tags (mostly deprecated, or slated for scheduled obsolescence). For instance, <em></em>(for emphasis, or italics), <u></u>(for underline), <b></b>(for bold), <address></address>, <blockquote></blockquote>, etc. There are lots of old tags that sound like they are much better used for presenting a research paper than they are for presenting streaming media, high-resolution graphics, or data-driven applications, and that's just the point: none of those things fundamentally fit into HTML's design. Layout and styling, Flash and graphical designs are at best hacks to what is still fundamentally a means for transmitting text meaningfully from person A to person B, with ample metadata (i.e. information that describes/categorizes/organizes certain pieces of your text) to keep everything neat and explicit.

As time went on though, the internet began to morph more into some of what we see today: people wanted their webpages to show pictures, play music, format their text to be aesthetically pleasing, and use programming languages to show when/how/if certain parts of their webpages would even be shown. This led to a couple interesting ideas, namely:

  1. The <table><tbody><tr><td></td>...</tr>...</tbody></table>tags to represent tabular data in an unambiguous row-and-column format;
  2. Layout tags like <h1...5><, <center></center>(now deprecated); and
  3. Style tags like <font></font>(also deprecated).

But this soon proved to not go far enough, so they said, "We should be able to apply different styles or formatting to any tag." What they eventually developed are Cascading Style Sheets which, as we know them today, exist in three separate areas: (1) externally, (2) in the page's head, and (3) in an individual tag. They're called cascading because the closer to the actual tag you get, the higher precedence the styling has--an external style sheet applied to, say, a paragraph, has lower priority than a head stylesheet, which has lower priority than putting your style right in the paragraph tag of your choosing. While all this organization can get confusing, once you get the system down, it can really keep your code from getting cluttered, and allow you to make sweeping changes very quickly and easily (for instance, you could include the same style on every page in your website, and make your paragraphs style the same way throughout).

Soon, even this proved to be inadequate, because people did begin cluttering up their code with inline styles, putting, for instance, left padding of 15 pixels on every paragraph tag on the page when they could have defined it once for all much easier and more quickly. So this idea--that the content and data you wish to present should be as far removed as possible from the styling and presentation applied to it--became one of the cornerstone principles of XHTML compliance. (When you run your code through a validator, you're most likely checking it against XHTML compliance.)

XHTML is an intermediary stage between HTML (which you already know about, rather free-wheeling and footloose, fancy-free, etc.) and XML. XML takes the stylesheet a step further and says that the content page, suffixed by XML, should present exclusively content and data, and that it cannot and should not display correctly without having an external stylesheet document applied to it. XHTML is a step down the road to XML compliance.

One other thing about HTML: it can be a very forgiving language, sometimes frustratingly so. For instance, you can have valid HTML paragraphs that have an open </p> tag, text, and then no closing </p> tag. This sort of leniency discourages good markup practices among coders and makes the language unnecessarily ambiguous. XHTML, in contrast, sets down the following rules (among other things):

  1. All tags should either end in a slash (i.e. <img />) or have opening and closing tags (<p>...</p>).
  2. All attributes in a tag should have names that are all lowercase, and values that are wrapped in double quotes.

One of the things greater standardization brings is greater accessibility. There are now HTML and browser parsers that operate for people that are blind or have low visibilty; I'm sure you can imagine how much easier this software is to use when the language it expects is regimented, standardized, and well-defined. These standards have been codified in things like Bobby compliance and Section 508 compliance. They also pose a problem for modern-day web applications, because they were written before things like that were really popular, and the mixing of technologies that is necessary for modern day web applications seems to move further away from accessibility rather than towards it, but that's a story for another day.

So, to review: Why do we try to write compliant code?

  1. It makes our code less cluttered and easier to read.
  2. It makes our code easier to maintain, since we can make higher-level changes more quickly.
  3. Since many of the non-compliant aspects of our code are being deprecated or will be some day, it's best to plan for the future now.
  4. It helps enforce best practices programming and markup, by standardizing and regimenting code.
  5. It fosters accessibility for people who otherwise might not have a chance to use the internet.

Plugins

Question: Do you know where I can find some stats on browser plugins? Most popular or usage?

Answer: Mozilla publishes the total number of downloads for extensions (per week or ever) on their extensions website: https://addons.mozilla.org/en-US/firefox/

Is that the kind of plugin that you mean, or do you mean Flash/Quicktime/etc. ?

Question: Flash, Quicktime, Java... the only place I have found any stats was on Adobe.com.... They said that FlashPlayer has reached 99% of internet enabled desktops in a mature market...

Answer: Analytics keeps track of Java and Flash usage (including version). Very little of my development experience is in flash, and essentially none is in Quicktime or Java, so I am probably not the best for that kind of information. The new kid on the block is Microsoft's Silverlite, a supposed answer to Flash, which seems to me to be a day late and a dollar short.

Flash version can sometimes be very important, since Flash is becoming more of an application platform, and the functionality that makes your app go might not be available in the version your users have. That's why free products like SWFObject (http://code.google.com/p/swfobject/), which is a drop-in Javascript product for using Flash on your site, in addition to allowing all kinds different options for how your Flash is displayed, also allows for prompting your users to upgrade their Flash before continuing, even down to minor version and revision...i.e. you have version x.y.0 but this page requires version x.y.z.

Browser statistics, cross-browser compatibility

Question: Do you ever look up browser usage statistics? If you do where do you get them from and how reliable do you think they are? Why is it important to view your web pages in different browsers? And do web pages look different on different operating systems?

UPDATE: Bear in mind that the W3Schools statistics are not based on the Internet population in general, but the population that frequents W3Schools.com's website. They will tend to have more diverse (and advanced) browser preferences.

For the internet in general, I look at W3Schools.com's statistics: Browser: http://w3schools.com/browsers/browsers_stats.asp, Operating System: http://w3schools.com/browsers/browsers_os.asp and Screen size: http://w3schools.com/browsers/browsers_display.asp. There's a lot of additional, very helpful information on the browser tutorial in particular, and the entire W3Schools site in general. Wikipedia also usually offers great histories of the browsers.

As to reliability, I think these stats are reliable with a couple caveats. My perception would be that the Blair County area is slightly behind the average in tendency to upgrade browser versions and screen sizes, and tends to be more uniform in its browser decisions. More specifically, they tend to use smaller, lower-resolution screens, older browser versions, and tend to prefer Internet Explorer to other browser options like Firefox, Opera and Google Chrome. (Chrome was released a week or two ago, and is Google's first foray into the browser market. Check out their very technical comic book here: http://www.google.com/googlebooks/chrome/.) I also perceive that State College's browsing tastes would probably be at or above average; we have a client in State College whose site is going live in a month or two. In six months we'll have some good data on exactly what their visitors are like.

Internet Explorer 8 was also released in beta mode recently. We found it pretty humorous that sites like ibm.com, adobe.com and even microsoft.com don't display correctly in it. It makes you wonder why Microsoft even released it before looking into these issues. IE8 was allegedly supposed to conform strictly to W3C standards, but so far they seem to have little success.

Gaining a perception of what internet tastes are like in specific is a function of several factors, like geographic region, perceived age range of target audience, perceived professional status, and industry. For instance, one of our clients has been tracking their site statistics for 5 months now, and the results are overwhelmingly in favor of Internet Explorer, with IE6 and 7 fighting it out.

Cross-browser compatibility is becoming an ever more important issue as Firefox takes an ever-greater piece of the browser market share. When FF was 5% of the population, you could mostly design for IE and not worry about anyone else, but now that FF is over 50% of the market in some regions of the economy, the game has definitely changed. Groups like W3C are trying to establish best practices standards that they hope all browser producers will adhere to.

Note that the cross-browser issue is not just an important one for how the page looks, but also for how it functions. Technologies like Javascript are putting more and more intelligence in code that is directly downloaded onto the browser, and so naturally there are some differences between how Microsoft chooses to render it and how Mozilla (FF's creators) choose. There are also attempts by Javascript framework developers, like JQuery and YUI (developed and maintained by Yahoo!), to create cross-browser functional solutions so that the page functions as expected on every browser. You'll notice, if you read the Chrome comic, that Google has taken an entirely new, out-of-the-box approach to handling browser processes, and while this presents new opportunities, it will also create new challenges for developers as they try to get everything to work correctly in everyone's browser of choice.

As to web pages looking different on different operating systems, the area where this is most prevalent is in character sets, in my experience anyway. The original character set that most computers for the last 20 years have used is called ASCII, which basically comprises the US letters and numbers, punctuation, and simple formatting characters, like borders, arrows, etc. These only required an 8-bit storage container to carry the necessary data. Over the years though, as the international community has become increasingly connected to the internet, foreign character sets have required more storage for more characters, 8, 16, or even 32 bits' worth of storage. The group most actively setting standards for this technology is called Unicode. Windows went Unicode with the release of Windows 2000, and Unix systems have been Unicode for longer than that, I think. Most of the operating systems discrepancies I have seen have dealt with differences in rendering characters, or an inability to print characters created in one operating system on another one.