Ted's Radio Weblog

Thursday, January 26, 2006

What HTML markup is used on the web?
Slashdot post: A Statistical Review of 1 Billion Web Pages. chrisd writes "As part of a recent examination of the most popular html authoring techniques, my colleague Ian Hickson parsed through a billion web pages from the Google repository to find out what are the most popular class names, elements, attributes, and related metadata. We decided that to publish this would be of significant utility to developers. It's also a fascinating look into how people create web pages. For instance one thing that surprised me was that the < title > is more popular than ..." "The graphs in the report require a browser with SVG and CSS support (like Firefox 1.5!). Enjoy!"
The study by Google has some interesting conclusions, like this one from the page on the body tag:
One conclusion one can draw from the spread of attributes used on the body element is that authors don't care about what the specifications say. Of these top twenty attributes, nine are completely invalid, and five have been deprecated for nearly eight years, half the lifetime of the Web so far.
Where does all this bad code come from? Are individual authors writing junk in Notepad and vim, or are large commercial sites using bad HTML, augmented with lots of Javascript and CSS tricks to try to render some cross-browser effect they can't do through the standards? A few answers are on their page on Editors, but this is mostly a survey that indicates there's need for more study.
9:19:27 AM comment []