Tonight I happened to read an article that made a claim about the BestBuy.com website and its use of certain semantic web technology. I was curious how they employed the technology so I looked at one of their web pages for a random TV.
I was amused that even such a large retailer could make some simple mistakes. I found numerous places where invalid HTML was used, due to using reserved characters in regular text. Proper HTML should use substitues called entities. The error is triggered by a TV’s screen size being measured in Inches, which is often expressed with the double quote sign (“). However the double quote is a reserved character in HTML and so needs to be replaced by " where ever it is used.
Here are a few examples from BestBuy.com
<meta name="keywords" content="DYNEX, 42" Class / LED / 1080p / 60Hz / HDTV, DX-42E250A12, 30"+ Televisions, Televisions" />
<meta name="description" content="DYNEX 42" Class / LED / 1080p / 60Hz / HDTV: 2 HDMI inputs; 1080p resolution; 160-degree horizontal and vertical viewing angles" />
<li class="property included-item">Dynex™ 42" Class / LED / 1080p / 60Hz / HDTV</li>
Its funny that the page encodes one special character properly (the Trademark symbol as ™), but not the other. But then in other places it messes up the trademark symbol and encodes the double quote correctly
<meta content="Dynexâ„¢ 42" Class / LED / 1080p / 60Hz / HDTV" itemprop="name"/>
As it happens this error is in the area of code I was interested in. And yes, in one place both are correct.
Dynex™ - 42" Class / LED / 1080p / 60Hz / HDTV - DX-42E250A12</title>
If you read the source code it is peppered with things like tracking codes and semantic web data to make it attractive for search engines and other programs that analyze code automatically. I think these encoding mistakes do mitigate those efforts to a certain degree.
For that reason I check all (most of) my pages with an HTML syntax validator. Not that I correct all mistakes, because most browsers can handle some of the mistakes just fine (including this one, except for the third example). However, every browser (and other programs reading HTML, such as search engine crawlers) is different in their ability to handle invalid code. So I try to take as little chances as necessary.