If you run a website of some mild success, then you have come across so called “scraper” sites. A scraper site copies content form RSS feeds and potentially the web pages of a site and re-publishes it as their own content. Tonight I read a blog post about “benign scraper sites” by AK John.

Scraper sites hope to attract visitors that then click on advertisement and so make money for their owners. If they are combined with Search Engine Optiomization, they can outrank the original. Scraper sites are certainly a violation of copyright. John thinks that even benign scrapes, those that link back to the original source are harmful duplication of content that cloggs the arteries of the Internet.

When I also read Johns recent post on Google’s ambitions with “AuthorRank and the rel=author verification”. It became clear to me that Google can/will use the author verification of content to know which site has the original content and which site has the copy. Because the Google+ Author profile will point back only to the original site.

So to outrun the Scraper sites I will claim author ship for my content.

Here is the question for my readers, will Google be able to detect if the scraper site sets up fake Google+ profiles and modifies the author links? Does Google have a way to detect who published first?



Tonight I happened to read an article that made a claim about the website and its use of certain semantic web technology. I was curious how they employed the technology so I looked at one of their web pages for a random TV.

I was amused that even such a large retailer could make some simple mistakes. I found numerous places where invalid HTML was used, due to using reserved characters in regular text. Proper HTML should use substitues called entities. The error is triggered by a TV’s screen size being measured in Inches, which is often expressed with the double quote sign (“). However the double quote is a reserved character in HTML and so needs to be replaced by " where ever it is used.

Here are a few examples from

<meta name="keywords" content="DYNEX, 42" Class / LED / 1080p / 60Hz / HDTV, DX-42E250A12, 30"+ Televisions, Televisions" />
<meta name="description" content="DYNEX 42" Class / LED / 1080p / 60Hz / HDTV: 2 HDMI inputs; 1080p resolution; 160-degree horizontal and vertical viewing angles" />

<li class="property included-item">Dynex&#153; 42" Class / LED / 1080p / 60Hz / HDTV</li>

Its funny that the page encodes one special character properly (the Trademark symbol as ™), but not the other. But then in other places it messes up the trademark symbol and encodes the double quote correctly

<meta content="Dynexâ„¢ 42&quot; Class / LED / 1080p / 60Hz / HDTV" itemprop="name"/>

As it happens this error is in the area of code I was interested in. And yes, in one place both are correct.

Dynex&#153; - 42&#34; Class / LED / 1080p / 60Hz / HDTV - DX-42E250A12</title>

If you read the source code it is peppered with things like tracking codes and semantic web data to make it attractive for search engines and other programs that analyze code automatically. I think these encoding mistakes do mitigate those efforts to a certain degree.

For that reason I check all (most of) my pages with an HTML syntax validator. Not that I correct all mistakes, because most browsers can handle some of the mistakes just fine (including this one, except for the third example). However, every browser (and other programs reading HTML, such as search engine crawlers) is different in their ability to handle invalid code. So I try to take as little chances as necessary.



Can 900,000+ users a week be wrong? It appears that nearly a million people download since the release of 2.3. Mark Herring, Senior Director, Marketing, StarOffice/ at Sun Microsystems Inc. reports in details about the uptick in weekly download triggered by the latest release and the publicity of the OOoCon 2007 in Barcelona.

While the numbers are impressive, I think Mark’s speculation of cost for a regular markerting campaign to reach the same results is excessive. I think it is safe to assume that the majority of extra downloads are upgrades by existing users. If this would be a commercial product, one would not need to buy millions of e-mail addresses to reach the existing users. In a traditional proprietary software model, users register their software and with that allow the company to inform them of new releases. So there is no cost of 10c per e-mail to reach the existing user base. And some proprietary products get their users to even download automatically what ever they throw at them. I see this comparison as a bit shaky.



… unless they can get it for free.

A marketing study at the Univeristy of Arizona asks the question what makes students pay for office suite software and are free open source alternatives like Open Office an alternative to pirated copies of the market leading MS Office?

The research looked at how much students would be willing to pay for a legal copy if the consequences woudl be the two choices. It turns out that $98 is the media price students were willing to pay to own a legal license. And that registration was a wee more effective than the publication that the software is not registered with every document that is produced and shared with others.

Interestingly, a group of students that was educated of the free open source alternative Open Office did not show less incline to pay for the MS Office suite. The researchers conclude that stability of the product and logevity of the maker are more important than the price to pay. Also an important factor is the convenience of using an application that is already familiar and does not come with the pain of re-training.

* The article cited mentions in the introduction: “Microsoft Office suite claims an impressive 95 percent market share.” Benjamin Horst an Open Office dvocate from NY, pointed out in a discussion about this article that market share numbers are often misleading in the context of free software. Because, market sizes are measured in annual revenue spend for a particular product. However, free products do not generate any revenue, so the basis for comparison is off. By Horst’s estimation, Microsoft claims 400 Million Office installations, and claims 100 Million. Ignoring the rest of the competition, he estimates a 20% market share for Open Office.



Last summer I went to the first BarCamp Boston. I had a great time there and did not want to miss BarCamp Boston 2 this past weekend.

BarCamp Boston 2 was held at MIT Stata Center, the famous building by architect Frank O. Gehry.

The rules for a BarCamp an unconference of geeks are simple. Every participant can chair a session, discussion or provide a lightning talk. The organizers have set aside a few appropriate meeting rooms and a schedule on a blackboard where one can read the program and add one self to the offering. In addition the organizers and sponsors did provide us with food and refreshments.

The first session I attended was “JavaScript Encryption” by Alan Taylor. Alan presented a self contained HTML document that included encrypted content which could only be revealed with the correct password. He calls his project Message Vault. His experience with making the application secure was very interesting. His biggest challenge was to embed an encrypted form of the password that was hard to decipher.

Next, I attended “Open/Collaborative/Green Mapping” by Jerrad Pierce. I had met Jerrad earlier in the hall where he presented his maps and had talked him into presenting his experience with this project in a session. He has created a Green Map of Cambridge, as part of the GreenMaps initiative. He also wrote his thesis on the subject of a better index to points on the map. Jerrad had 45+ interested listeners and a lot of questions where asked. How did he get the data from public sources? What tools did he use? What other tools he could recommend, especially those that where available at no cost?

Amanda Watlington presented before the afternoon break about “Video – How to Make It Found in Search Engines”. She stressed that video and audio files become more important to search as people use the web increasingly to consume media. So she told webmasters that it is important to annotate the media assets with internal and external keyword tags and to write, if possible, a transcript from the media and post it on a page that contains the file. In addition she recommended to submit the media file to specialty search engines, in order to make it available to the searching public.

My last session for the day was “Financing your Startup” by David Kaufman. It wasn’t all new, but certainly a comprehensive overview of how to finance your startup. I took away the following tidbits of wisdom: “Revenue or advanced financing by your (future) customers is the best way to survive the first phase” and “VC financing is only appropriate if you can show a very fast adoption curve and a large market.” Typically VCs want to invest X Millions and have that returned 10 fold within 3 to 5 years. If your business model does not show a plausible case for this kind of development, do not spend (waste) your time with talking to VCs. In addition, think about who the VC would potentially sell his share in the company? It helps to know who would be a potential buyer, especially as the default exit strategy of an Initial Public Offering (IPO) is not as available as it used to be.

Unfortunately, I was not able to attend the socializing in the evening, as I had prior commitments.



I just received a nice New Year’s surprise gift (I’m still struggling with writing ’07 dates).

Michael Katz, a local e-newsletter marketing consultant, just published the recording of his audio seminar – “FINDING (AND USING) YOUR AUTHENTIC VOICE“. He usually sells his recordings, but decided to make it a free download this month.

Michael has a really authentic voice in e-newsletter marketing and he teaches his clients to be themselves in their writing. He has started a series of “Coffee with Michael“. This month he invited Lissa Bergin-Boles, a life coach from Toronto. I trust him to know something about this topic.

If you got 20 minutes and are interested in finding your authentic voice and using it in newsletters, articles or blogs, listen in.



Some folks at Mozilla had an idea to show their love for Mozilla Firefox, the free and open source web browser that keeps gaining market share.

They discussed the idea with others at OSCON06 and found collaborators in the Oregon State University Linux User Group. The idea was to create a crop circle for the Firefox logo.

The execution is awesome. Congratulations for showing creativity and stamina to make this beautiful work of art.



Steve Rubel from Micro Persuasion and Matt McAlister comment today on screencasting with advertising. They refer to Infoworld’s new series of screencasts, where they now add an advertisement trailer.

Just in case, Infoworld intends to patent this one, I claim prior art since 2003.

See (wayback machine)

I did not make this for money reasons nor did I use arbitrary ads. I simply used it to make the time required to load the screencast more entertaining and to benefit the sponsor (or buyer) of the screencast.



continues its line of competitive advertisements. The marketing project unveiled a new campaign that capitalizes on the market leader's delays with its upcoming release Office 2007.

“Take a Test Drive – Keep the Car!” hints that you can test drive and keep for free. While Microsoft invited potential users for an online test drive of its beta 2 pre-release in which saving and printing is disabled, aims at customers to download the latest release 2.0.3 and install the full version. The catch, upcoming Office 2007 will pinch testers who like in the wallet once it is available. In contrast is free and open source.

In addition, Office 2007 will have many user interface changes which frightens many users because they have to relearn their skills. Another point of critic is that Microsoft does not support the new ISO 26300 office document standard and is still haggling with Adobe over the support of PDF files.



Tens days ago I held a talk about blogs and forums as business and marketing tools at the Network@TheLibrary in Winchester, MA. Today I discovered the “A-Z of Professional Blogging“, a list that answers many questions my audience had.

It includes from A like “AdSense” the Google advertisement program used by many bloggers to defray some of the costs, to “Zoudry” a blog editor, all you wanted to know about blogging.

Don’t be afraid, it is not only for professional bloggers. The list helps especially those that want to learn a bit more about blogging or have started already.