Archive for the ‘HTML’ Category

March

16

by Kaj Kandler

Tonight I happened to read an article that made a claim about the BestBuy.com website and its use of certain semantic web technology. I was curious how they employed the technology so I looked at one of their web pages for a random TV.

I was amused that even such a large retailer could make some simple mistakes. I found numerous places where invalid HTML was used, due to using reserved characters in regular text. Proper HTML should use substitues called entities. The error is triggered by a TV’s screen size being measured in Inches, which is often expressed with the double quote sign (“). However the double quote is a reserved character in HTML and so needs to be replaced by " where ever it is used.

Here are a few examples from BestBuy.com

<meta name="keywords" content="DYNEX, 42" Class / LED / 1080p / 60Hz / HDTV, DX-42E250A12, 30"+ Televisions, Televisions" />
<meta name="description" content="DYNEX 42" Class / LED / 1080p / 60Hz / HDTV: 2 HDMI inputs; 1080p resolution; 160-degree horizontal and vertical viewing angles" />


<li class="property included-item">Dynex&#153; 42" Class / LED / 1080p / 60Hz / HDTV</li>

Its funny that the page encodes one special character properly (the Trademark symbol as ™), but not the other. But then in other places it messes up the trademark symbol and encodes the double quote correctly

<meta content="Dynexâ„¢ 42&quot; Class / LED / 1080p / 60Hz / HDTV" itemprop="name"/>

As it happens this error is in the area of code I was interested in. And yes, in one place both are correct.

<title>
Dynex&#153; - 42&#34; Class / LED / 1080p / 60Hz / HDTV - DX-42E250A12</title>

If you read the source code it is peppered with things like tracking codes and semantic web data to make it attractive for search engines and other programs that analyze code automatically. I think these encoding mistakes do mitigate those efforts to a certain degree.

For that reason I check all (most of) my pages with an HTML syntax validator. Not that I correct all mistakes, because most browsers can handle some of the mistakes just fine (including this one, except for the third example). However, every browser (and other programs reading HTML, such as search engine crawlers) is different in their ability to handle invalid code. So I try to take as little chances as necessary.

June

05

by Kaj Kandler

According to Susan Lister, OpenOffice.org is a good tool to convert your Powerpoint presentations into good looking web pages.

Susan is dissatisfied with “clunky ‘powerpoint to webpage’ slideshows”, produced by MS Powerpoint. so she looked for a better solution and found it in OpenOffice.org Impress. She discovered:

These experiments showed that I can make a better web page set up using Open Office – my final website was a smaller file size as well as smaller in the amount of screen real estate. I liked the fact that html wizard gave me control over whether I wanted frames, show notes included and a title screen as well the size of the final presentation (640×480, 800×600, 1024×768).

Susan discovered, that the HTML web pages created by Impress not only look better but are also smaller by a factor of 10.

June

02

by Kaj Kandler

I can’t believe what I just found on the Yahoo!Search Blog about removing pages from a website. The author says “The best way to remove dead URLs from the Yahoo! Search index is to return an HTTP Error 404 when our crawler requests the page.”

Are they serious, really serious?

The HTTP spec clearly says return code 404 is “Not Found” temporarily and 410 is “Gone” permanently. They even say in th explanation for code 404 “The 410 (Gone) status code SHOULD be used if the server knows, through some internally configurable mechanism, that an old resource is permanently unavailable and has no forwarding address.”

Yahoo slurp is free to treat a 404 page as if removed although I don’t think it serves the searching public well. However, I can’t understand why the Yahoo!Search blog teaches webmasters to send a 404 if a 410 return code is appropriate.

Just needed to rant about this, because this blog has for sure a wide readership.