Archive for the ‘Business’ Category



by Kaj Kandler

If you run a website of some mild success, then you have come across so called “scraper” sites. A scraper site copies content form RSS feeds and potentially the web pages of a site and re-publishes it as their own content. Tonight I read a blog post about “benign scraper sites” by AK John.

Scraper sites hope to attract visitors that then click on advertisement and so make money for their owners. If they are combined with Search Engine Optiomization, they can outrank the original. Scraper sites are certainly a violation of copyright. John thinks that even benign scrapes, those that link back to the original source are harmful duplication of content that cloggs the arteries of the Internet.

When I also read Johns recent post on Google’s ambitions with “AuthorRank and the rel=author verification”. It became clear to me that Google can/will use the author verification of content to know which site has the original content and which site has the copy. Because the Google+ Author profile will point back only to the original site.

So to outrun the Scraper sites I will claim author ship for my content.

Here is the question for my readers, will Google be able to detect if the scraper site sets up fake Google+ profiles and modifies the author links? Does Google have a way to detect who published first?



by Kaj Kandler

Tonight I happened to read an article that made a claim about the website and its use of certain semantic web technology. I was curious how they employed the technology so I looked at one of their web pages for a random TV.

I was amused that even such a large retailer could make some simple mistakes. I found numerous places where invalid HTML was used, due to using reserved characters in regular text. Proper HTML should use substitues called entities. The error is triggered by a TV’s screen size being measured in Inches, which is often expressed with the double quote sign (“). However the double quote is a reserved character in HTML and so needs to be replaced by " where ever it is used.

Here are a few examples from

<meta name="keywords" content="DYNEX, 42" Class / LED / 1080p / 60Hz / HDTV, DX-42E250A12, 30"+ Televisions, Televisions" />
<meta name="description" content="DYNEX 42" Class / LED / 1080p / 60Hz / HDTV: 2 HDMI inputs; 1080p resolution; 160-degree horizontal and vertical viewing angles" />

<li class="property included-item">Dynex&#153; 42" Class / LED / 1080p / 60Hz / HDTV</li>

Its funny that the page encodes one special character properly (the Trademark symbol as ™), but not the other. But then in other places it messes up the trademark symbol and encodes the double quote correctly

<meta content="Dynexâ„¢ 42&quot; Class / LED / 1080p / 60Hz / HDTV" itemprop="name"/>

As it happens this error is in the area of code I was interested in. And yes, in one place both are correct.

Dynex&#153; - 42&#34; Class / LED / 1080p / 60Hz / HDTV - DX-42E250A12</title>

If you read the source code it is peppered with things like tracking codes and semantic web data to make it attractive for search engines and other programs that analyze code automatically. I think these encoding mistakes do mitigate those efforts to a certain degree.

For that reason I check all (most of) my pages with an HTML syntax validator. Not that I correct all mistakes, because most browsers can handle some of the mistakes just fine (including this one, except for the third example). However, every browser (and other programs reading HTML, such as search engine crawlers) is different in their ability to handle invalid code. So I try to take as little chances as necessary.



by Kaj Kandler

Can 900,000+ users a week be wrong? It appears that nearly a million people download since the release of 2.3. Mark Herring, Senior Director, Marketing, StarOffice/ at Sun Microsystems Inc. reports in details about the uptick in weekly download triggered by the latest release and the publicity of the OOoCon 2007 in Barcelona.

While the numbers are impressive, I think Mark’s speculation of cost for a regular markerting campaign to reach the same results is excessive. I think it is safe to assume that the majority of extra downloads are upgrades by existing users. If this would be a commercial product, one would not need to buy millions of e-mail addresses to reach the existing users. In a traditional proprietary software model, users register their software and with that allow the company to inform them of new releases. So there is no cost of 10c per e-mail to reach the existing user base. And some proprietary products get their users to even download automatically what ever they throw at them. I see this comparison as a bit shaky.



by Kaj Kandler

… unless they can get it for free.

A marketing study at the Univeristy of Arizona asks the question what makes students pay for office suite software and are free open source alternatives like Open Office an alternative to pirated copies of the market leading MS Office?

The research looked at how much students would be willing to pay for a legal copy if the consequences woudl be the two choices. It turns out that $98 is the media price students were willing to pay to own a legal license. And that registration was a wee more effective than the publication that the software is not registered with every document that is produced and shared with others.

Interestingly, a group of students that was educated of the free open source alternative Open Office did not show less incline to pay for the MS Office suite. The researchers conclude that stability of the product and logevity of the maker are more important than the price to pay. Also an important factor is the convenience of using an application that is already familiar and does not come with the pain of re-training.

* The article cited mentions in the introduction: “Microsoft Office suite claims an impressive 95 percent market share.” Benjamin Horst an Open Office dvocate from NY, pointed out in a discussion about this article that market share numbers are often misleading in the context of free software. Because, market sizes are measured in annual revenue spend for a particular product. However, free products do not generate any revenue, so the basis for comparison is off. By Horst’s estimation, Microsoft claims 400 Million Office installations, and claims 100 Million. Ignoring the rest of the competition, he estimates a 20% market share for Open Office.



by Kaj Kandler

As an increasing number of companies and institutions migrate to Linux and, interoperability becomes more and more important. The world is still geared towards Microsoft’s document formats and that poses barriers to migration, one of which is fonts and their influence on how documents print and break into pages.

The leading Linux distributions in the enterprise space, Red Hat and SuSE delivered some new fonts that are metrically identical to the widely used Microsoft fonts. What does this mean for you? You can receive an MS Office document and use the equivalent font and print it w/o fear of it breaking into a different number of pages. It also means you do not need to update the table of content because of re-pagination. Off course the same is true in the opposite direction ODF –> MS Office document.

Use Plan-B for to learn more about how to configure Writer for optimal MS document compatibility.



by Kaj Kandler

Sun and the community found an agreement Pentaho to integrate business intelligence features into the next release of Pentaho has recently integrated formerly separate open source projects JFreeReport, Mondrian, Kettle, and Weka to a powerful business intelligence server complete with reporting, analysis and OLAP capabilities.

The project offers a J2EE compliant reporting server, that can connect to many data sources and integrates workflow to create and distribute important report information to the authorized people in an enterprise. The project also offers a powerful report designer based on Eclipse and is modular so it can be integrated into other applications.

Apparently, Sun has decided it will build a Report designer of its own that defines reports in Pentaho’s formats. These reports will draw data from the Pentaho business intelligence server as well as from other sources.

If you want to see how example the integration of OLAP features into an Excel spreadsheet could look like, watch the demos of Jedox Palo Server a repository and OLAP server for Excel spreadsheets. These demos cover a specific case of OLAP and Spreadsheet integration, which I think is one possible use of the Business Inteligence integration project. However, it makes the abstract term of business intelligence more concrete. By the way Palo announced at the beginning of the year that it seeks sponsors to build a spreadsheet server for OpenOffice Calc. The sponsors role is to help cover the cost of open source development and to become first users.



by Kaj Kandler

I’m going off topic for this post about PayPal.

I just completed a change of e-mail address at Phil Taylor’s website for Joomla components. Phil has an unusual security scheme, asking for any purchase date to verify my identity. Well, I bought his components 2-3 years ago and his page reminded me that I must have payed with PayPal. But I could hardly remember what day I bought the component.

So, I went to PayPal to see if I still could find the transaction in the archives. It turns out, PayPal is doing a really good job here. The site allows you access to all back transactions, even years ago. Wow! That is much better than most banks that cut you off after 3, 6, or 12 month.

It came to me as a shock how many transactions I have made with through PayPal over time. I thought I used them really only occasionally, but over the years it adds up.

Thanks PayPal, you saved my day!



by Kaj Kandler

Last summer I went to the first BarCamp Boston. I had a great time there and did not want to miss BarCamp Boston 2 this past weekend.

BarCamp Boston 2 was held at MIT Stata Center, the famous building by architect Frank O. Gehry.

The rules for a BarCamp an unconference of geeks are simple. Every participant can chair a session, discussion or provide a lightning talk. The organizers have set aside a few appropriate meeting rooms and a schedule on a blackboard where one can read the program and add one self to the offering. In addition the organizers and sponsors did provide us with food and refreshments.

The first session I attended was “JavaScript Encryption” by Alan Taylor. Alan presented a self contained HTML document that included encrypted content which could only be revealed with the correct password. He calls his project Message Vault. His experience with making the application secure was very interesting. His biggest challenge was to embed an encrypted form of the password that was hard to decipher.

Next, I attended “Open/Collaborative/Green Mapping” by Jerrad Pierce. I had met Jerrad earlier in the hall where he presented his maps and had talked him into presenting his experience with this project in a session. He has created a Green Map of Cambridge, as part of the GreenMaps initiative. He also wrote his thesis on the subject of a better index to points on the map. Jerrad had 45+ interested listeners and a lot of questions where asked. How did he get the data from public sources? What tools did he use? What other tools he could recommend, especially those that where available at no cost?

Amanda Watlington presented before the afternoon break about “Video – How to Make It Found in Search Engines”. She stressed that video and audio files become more important to search as people use the web increasingly to consume media. So she told webmasters that it is important to annotate the media assets with internal and external keyword tags and to write, if possible, a transcript from the media and post it on a page that contains the file. In addition she recommended to submit the media file to specialty search engines, in order to make it available to the searching public.

My last session for the day was “Financing your Startup” by David Kaufman. It wasn’t all new, but certainly a comprehensive overview of how to finance your startup. I took away the following tidbits of wisdom: “Revenue or advanced financing by your (future) customers is the best way to survive the first phase” and “VC financing is only appropriate if you can show a very fast adoption curve and a large market.” Typically VCs want to invest X Millions and have that returned 10 fold within 3 to 5 years. If your business model does not show a plausible case for this kind of development, do not spend (waste) your time with talking to VCs. In addition, think about who the VC would potentially sell his share in the company? It helps to know who would be a potential buyer, especially as the default exit strategy of an Initial Public Offering (IPO) is not as available as it used to be.

Unfortunately, I was not able to attend the socializing in the evening, as I had prior commitments.



by Kaj Kandler

I just received a nice New Year’s surprise gift (I’m still struggling with writing ’07 dates).

Michael Katz, a local e-newsletter marketing consultant, just published the recording of his audio seminar – “FINDING (AND USING) YOUR AUTHENTIC VOICE“. He usually sells his recordings, but decided to make it a free download this month.

Michael has a really authentic voice in e-newsletter marketing and he teaches his clients to be themselves in their writing. He has started a series of “Coffee with Michael“. This month he invited Lissa Bergin-Boles, a life coach from Toronto. I trust him to know something about this topic.

If you got 20 minutes and are interested in finding your authentic voice and using it in newsletters, articles or blogs, listen in.



by Kaj Kandler

Jedox, the company behind the Palo Spreadsheet server has started to seek sponsors for supporting Calc. In an interesting marriage of open source and commercial project sponsorship, they have found pledges from an Australian Winery and some German engineering firms. However, at this point the tally stands at 6500 Euro, which is not much for a medium size software project.

The idea behind this effort is to store spread sheet data on a server and offer OLAP capability to create sophisticated reports, that can be aggregated among many dimensions, such as sales data by month, quarter, year, sales person, region, customer size, promotional costs, support costs or any combination of these. This kind of application gears towards enterprise customers who need analytical aggregation of data to support decision processes.

Palo server is an open source project and is currently only available for Microsoft Excel.