Archive for July 17th, 2006
We wanted to add a ’search this site’ function to a client’s website
but did not have the time to study the 200+ existing ways of doing
this. Perhaps using the
“Microsoft Indexing Service” (or “Index Server”, IS), which fits well
with the software running the existing site (IIS), can easily be
extended to search within MS Office and PDF documents?
But there is a problem with using IS for this: IS can only index files
on a local or remote file system, it does not crawl a website.
In our case that is not good enough because the content lives in a
database, and we have to follow links like http://mysite.com?page=42.
Moreover, we wanted to make sure exactly the content exported through
HTTP is indexed, no more no less.
The solution we came up with works like this:
- Use a standard webcrawler to download a
copy of the site through HTTP and store it the local filesystem of
the server.
- Use Indexing Service to index the local copy of the site.
- Use a small hashtable for mapping the filenames returned by a
query back into URLs.
This cleanly separates the webcrawl and the indexing, and the search is
entirely ignorant about the (possibly heterogeneous and complicated)
software architecture of the site.
So far it is just a prototype, but it seems to work fine.
July 17th, 2006
sebastian
It’s the usual story: there’ve been other demands on my time (projects
here at LShift, among many other things), and so the release of Icing
has suffered.
The good news is that I’ve been given some time to package it up and
make it available, and barring unexpected interruptions, I ought to
have something presentable by the end of the week.
July 17th, 2006
tonyg
Whilst projects like Xen and new hardware extensions to CPUs from Intel and AMD allow multiple OSes to run on the same machine at the same time, for me, there are currently few cases where I need this. I work under Linux and all I need is virtualisation to run multiple Linuxes at the same time. Also, virtualisation at the level of Xen requires that you set harddisc space and RAM for each running OS instance: the instances don’t share resources very well.
Linux VServer is virtualisation at a different level: there is only one Linux kernel ever running, but a chroot-on-steroids-like system ensures that you can start up multiple instances of linux and they do not interfere with each other in anyway possible. However, because it’s only one kernel running, the multiple instances do share resources such as RAM and harddisc partitions much more effectively. Having got some vservers up and running, they can be cloned, moved between machines, started and stopped easily and generally be manipulated very easily. You can do private networking between your vservers, and you can even get X up and running inside a vserver.
Once this is all up and running, it makes migration between different services very much easier. For example, last week I upgraded our bugzilla installation. In the past I’ve tended to upgrade our main installation in place which has been a bad idea in several cases. So this time, I copied our current installation onto a clean vserver and checked it worked. I then cloned that vserver and performed the upgrade on the clone, then fixing everything that broke. This meant that at all points I had a working copy of the original installation to refer to and that I could make sure I got the upgraded version at least to the same level of functionality as the old version before rolling it out on top of our main installation. The result was that I knew in advance all the “gotchas” of the upgrade before doing the upgrade on the main installation and consequently it went very smoothly. Almost as important is that as the upgrade is now complete, I can quite happily delete the bugzilla vservers as they’re no longer needed: because of the total separation of the vservers, this is very easy (much easier that trying to uninstall packages and delete databases) and it means that if you use vservers, you never have your main working environment polluted by the software you are working on.
It rather looks like I’ll be putting vserver on every machine I install from now on…
July 17th, 2006
matthew