Tuesday 21 April 2009

Search This Blog

The eagle-eyed amongst you may have spotted that I've added a search box to the blog (top right if you haven't spotted it yet).

I've got frustrated a few times recently when trying to find old posts so that I could link to them in new articles and so had started to think about adding a search feature. A request from a reader (thanks GB) provided the extra kick of motivation to finally get me to sort out a proper solution.

Now I could easily have added search to the blog by enabling the blogger navbar but I don't like the navbar for two reasons: it looks terrible but more importantly the search feature is weird.

The search provided by the navbar is limited to just searching the content of each post (and I assume the title) but not the labels or comments which can make finding old posts more difficult than it should be. So I decided to try a different approach.

Google provide a rather cool custom search service. This allows you to build your own search engine by providing filters that select just a subset of Google's main web index. The simplest option for creating a custom search engine for a blog is to provide a single filter that selects everything. So for this blog I could have used www.dcs.shef.ac.uk/~mark/blog/* as the filter. Whilst this is easy the downside is that you get a lot of repetition in the search results. Remember that each post appears on it's own page as well as on the monthly archive page and the page for each of the labels it is tagged with. To get around this problem I'm actually using three filters to select just the post pages. I need three filters as I've posted articles in three different years (2007, 2008 and 2009). So the first filter is www.dcs.shef.ac.uk/~mark/blog/2007/* and I'm sure you can guess at the other two. Of course this solution isn't perfect either. Firstly when we move into 2010 I have to remember to add a new filter but secondly the whole page is now being indexed which again can lead to repetitive search results. For example, only a few posts contain the word sugar (4 I think) but it appears on every page as it is in the blog description. Fortunately Google is quite good at filtering these useless results out as you can see here.

There are quite a few options available for customizing the search engine so I may fine tune things later but for now at least I have a useful search tool for me and my readers. If you spot anything weird or have any suggestions then please leave me a comment.

If you have been annoyed at the way the standard navbar search works then I'd certainly have a play with the custom search service. Of course there are no limits on what pages you can include in the search. You could create a search engine which indexes multiple blogs or pages you frequently visit. One thing I have noticed though is that if you change any settings (or when you create the search engine) it can take five or ten minutes before the changes take affect so don't be surprised if it doesn't work straight away -- I couldn't understand why I got no results for any search to start with but after about five minutes it started to work just as I had expected.
21 April 2009 at 17:39 , Rob said...

Haven't looked into this sort of thing for a while but it sounds like you need to provide a sitemap so that you can give a clue which pages the actual content is on. Alternatively I seem to remember there are NOINDEX and NOFOLLOW http (or meta equiv) headers which you could use to hide the indexing pages from google and just show the content. Then you could just use the site search as first planned.

21 April 2009 at 17:46 , Mark said...

Hmm, I could use a sitemap to get around the need for multiple filters but I don't think that fixes the issue of the blog search not indexing the post labels or the fact that the navbar is ugly.

I was wondering if a sitemap would allow me to tell it about page markup in any way though so that it wouldn't look at the common page elements.

As I said I think it needs work but I still prefer it over enabling the navbar.

(Mark goes off to investigate sitemaps and the custom search developers documents...)

21 April 2009 at 21:50 , GB said...

Thanks. Looks good - a bonus but, more importantly, I immediately found that for which I was looking. Great.

Post a Comment