by Kaj Kandler
Some quick analysis revealed, that a typical traceroute to www.google.com from my home went through 12 hops of Internet and then another 17 hops in Google's network and the latency jumped by 100ms four hops into Google's network. I found that rather odd. However, when I'm on the corporate VPN, the number of hops inside the Google network shrinks to 6-8 and their latency is much smaller.
I also noticed that www.google.com had a different IP address if I used the corporate network. Measuring the latency to the IP address that I got when on the VPN showed similar latency and traceroute results. So my configured DNS servers were to blame.
Back when Verizon started to break the DNS protocol in their servers I had configured some public DNS server from Level 3, namely 22.214.171.124 and 126.96.36.199 as they had the best latency at the time. I had to reconsider that decision.
Armed with a free open source tool named namebench I found the fastest DNS server’s available for my connection. But it turned out that their IPs for *.google.com were as bad as the previous one’s. So I tested the two name servers that Verizon configures automatically and they provide IP addresses with 20 – 40 ms ping times and much shorter traceroutes. I guess with multi homing the Internet’s architecture has fundamentally changed. That said, Verizon still uses an intentionally broken implementation of DNS, which does not return a failure if a request can’t be resolved, instead it returns it’s own web server. I almost considered to leave it at that, as better performance seemed more important then a broken DNS. However, the usability of this “helpful” Verizon server is horrible, as it redirects to its own URL, so if I make a typo I have to essentially retype my address or edit the original request in my URL bar to correct it.
As a last resort, I tested Google’s public DNS servers 188.8.131.52 and 184.108.40.206. While Google’s DNS servers do not answer as fast as my local verizon servers, they are only marginally slower and can deliver the proper IP addresses for the Google network without breaking the DNS protocol in the process.
by Kaj Kandler
If you run a website of some mild success, then you have come across so called “scraper” sites. A scraper site copies content form RSS feeds and potentially the web pages of a site and re-publishes it as their own content. Tonight I read a blog post about “benign scraper sites” by AK John.
Scraper sites hope to attract visitors that then click on advertisement and so make money for their owners. If they are combined with Search Engine Optiomization, they can outrank the original. Scraper sites are certainly a violation of copyright. John thinks that even benign scrapes, those that link back to the original source are harmful duplication of content that cloggs the arteries of the Internet.
When I also read Johns recent post on Google’s ambitions with “AuthorRank and the rel=author verification”. It became clear to me that Google can/will use the author verification of content to know which site has the original content and which site has the copy. Because the Google+ Author profile will point back only to the original site.
So to outrun the Scraper sites I will claim author ship for my content.
Here is the question for my readers, will Google be able to detect if the scraper site sets up fake Google+ profiles and modifies the author links? Does Google have a way to detect who published first?
by Kaj Kandler
Sun and the OpenOffice.org community found an agreement Pentaho to integrate business intelligence features into the next release of OpenOffice.org. Pentaho has recently integrated formerly separate open source projects JFreeReport, Mondrian, Kettle, and Weka to a powerful business intelligence server complete with reporting, analysis and OLAP capabilities.
The project offers a J2EE compliant reporting server, that can connect to many data sources and integrates workflow to create and distribute important report information to the authorized people in an enterprise. The project also offers a powerful report designer based on Eclipse and is modular so it can be integrated into other applications.
Apparently, Sun has decided it will build a Report designer of its own that defines reports in Pentaho’s formats. These reports will draw data from the Pentaho business intelligence server as well as from other sources.
If you want to see how example the integration of OLAP features into an Excel spreadsheet could look like, watch the demos of Jedox Palo Server a repository and OLAP server for Excel spreadsheets. These demos cover a specific case of OLAP and Spreadsheet integration, which I think is one possible use of the OpenOffice.org Business Inteligence integration project. However, it makes the abstract term of business intelligence more concrete. By the way Palo announced at the beginning of the year that it seeks sponsors to build a spreadsheet server for OpenOffice Calc. The sponsors role is to help cover the cost of open source development and to become first users.
by Kaj Kandler
IDC and OpenOffice.org have launched a survey to better understand the usage of Openoffice.org (now closed). IDC is a leading IT market analysis firm. This survey will analyze who is using the Openoffice.org suite and how.
IDC and OpenOffice.org will share the results of the survey with the public 3 month after conclusion. I think the OpenOffice.org community will welcome the feedback and use it to define the future direction of the productivity suite.
To attract more participants, IDC enters everybody into a raffle of 5 x $100 prizes. I’d encourage all my readers to take the survey right now (It’s too late, now).