October 21, 2010

Search Solutions 2010 (BCS-IRSG)

by Tyler Tate

Today I was fortunate enough to attend the Search Solutions conference in London put on by the Information Retrieval Specialist Group of the British Computer Society. Here are my notes from each talk of the day.

Behshad Behzadi on Web Search Freshness

When searching the web, it’s crucial to strike a balance between new results, and very relevant results.

Sometimes users express a desire for fresh results in their query, such as “latest news chilean miners.” In this case, the user’s desire for recent results is obvious.

Other times, users don’t explicitly ask for freshness in the query, but it can be inferred if that query is spiking in popularity. For instance, when there is a large surge of users searching for “chilean miners,” they are probably looking for news stories, not encyclopaedia entries.

Some queries have seasonal patterns. “American Idol Winner,” for instance, should show results for this year’s winner, not last year’s. But this isn’t universally true — a query like “Turkey recipe” shares the same query trending pattern as “American Idol Winner”, but this year’s turkey recipe is not necessarily better than a recipe from 5 years ago.

Challenges to freshness

  • Different queries have different freshness granularity needs. (Traffic = minutes, news = hours, politics = days)
  • How to you know how old a result is? It’s difficulty to actually reliable pin down the age of the page. The search knows when the page was first indexed, but not necessarily when it was originally created. It’s also difficult to know how fresh the page is based purely on when it was modified (because a small modification to the page may not actually affect the main content of the page).
  • When there are many fresh results, how do you decide which ones are worth showing? Freshness is at odds with relevancy.

Solutions to freshness

  • Show the date in the description synopsis
  • Allow users to restrict results to a particular date range
  • Offer a realtime search option

Vishwa Vinay on Click evidence — Signals and Tasks

One approach to scoring results is to rank them as a probability that the document will be clicked. This score can be constructed by using each click on a result as a “vote” for that document.

An advantage of a probability-based ranking is that the scores are comparable across indexes, making it easy to merge results when doing federated search (50% is always better than 25%).

Downsides of probability ranking are “rank bias” and “lock-in”, where users click on the first result not because it’s the most relevant, but simply because it’s the most prominent. To combat this, the search engine can perform A/B testing of results by ordering them in descending order in one case (1,2,3), ascending order in another case (3,2,1), and comparing the outcome.

Another method to combat “rank bias” is to keep track of how many results are clicked by a given user searching for a given query. If the user clicks on three of the results, for instance, it can be inferred that he is most satisfied with the last document clicked. If the user only clicks a single result, it’s a strong sign of satisfaction with that document.

Vivian Lin Dufour on How to help searchers become better searchers

Yahoo track trending topics to help users discover timely results, and use several techniques:

  • Query suggestions
  • Result suggestions
  • Related searches

There are three trends affecting enterprise search.

Information Governance

There are several opportunities for search in information governance:

  • Archiving
  • eDisclosure
  • Records & storage management

There two approaches to information governance, both involving search technologies:

  • Store everything and rely on a great search engine to find it (simple, but large volumes of data)
  • Rely on a great search engine and dedeuplication to selectively delete what is not needed and then search what’s left (harder to achieve, but saves on data volume).

Search-based applications

Search is gaining traction over relational databases as a primary driver of domain-specific applications, and have the potential of being hugely disruptive in data management. Success lies in:

  • Building partnerships with application vendors
  • Focusing on developers
  • Focusing on user interfaces

In 2010, search-based applications tend to be be custom-built and expensive. Over the next few years, expect to see search-based apps appearing more as software as a service.

The smallest area of the three, enterprise search for intranets looks destined to be dominated by Google Search Appliance, Sharepoint, and open source search.

Chirag Gandhi on I still haven’t found what I am looking for

The history of enterprise search is one of businesses purchasing very expensive search systems and building very complicated applications, while users have been neglected. Companies often fall into a number of traps.

Gotchas

  • Throw it all in. Too often organisations thoughtlessly toss any and everything into their search index, rather than carefully considering the documents and metadata
  • No attention to the user experience. Companies just want it to look like Google
  • No proper support for multiple languages (especially individual documents that contain multiple languages)
  • Format conversions — search engines don’t digest a wide variety of documents very well
  • Federated search. As organisations begin to centralise their infrastructure, federated search is often required
  • Finding people, not just documents

Dusan Rnic on Enterprise Search and its evolution

  • Efficiency
  • Compliance
  • Centralised finding

Information growth is accelerating at a phenomenal rate (thanks to emails, file systems, CMS, databases, etc. But there is a long tail of searches (a very few extremely popular queries, but a large body of rare queries). eCommerce was the first sector to successfully tackle the long tail problem, and they addressed with techniques such as dynamic classification, spotlighting results, cross-content navigation, visualisation.

In a word, eCommerce websites helped shoppers explore and discover products by focusing on the user experience. By telling using what content is there (through the likes of faceted navigation), you help users discover the needle in the haystack. “Start any search project by thinking about the user experience.”

Greg Lindahl on Instant indexing

Blekko are attempting to build a brand new web search engine from scratch. The company was founded in 2007 with $24 million of investment, and they’re currently in private beta. They are attempting to completely reinvent the technologies used in web search, and apparently they have a unique approach to indexing.

Charlie Hull on What’s the story with open source?

Search is no longer a bolt-on, but a platform for innovation, and open source is no longer the outsider.

Indexing

  • Content is created for publication, not for search
  • Content isn’t published consistently or available to all
  • Ranking is never simple
  • Must be able to publish rapidly
  • Essential metadata — byline, title, source
  • Content restriction and embargo data
  • Solution: lightweight, customisable index scripts using open source libraries

Searching

  • Free text with boolean operators
  • Filters for metadata and date ranges
  • Combine date and relevance ranking
  • Faceted search
  • Saved searches and alerting
  • Similar results
  • Solution: template-based user interface scripts, again using open source libraries

Why open source?

  • Flexible, extendable
  • Powerful and scalable
  • Lower cost
  • Commercial support is available
  • Freedom to innovate

Looking to the future

  • More and more content including social media
  • Multiple delivery platforms
  • Search-powered websites and applications
  • No-SQL
  • Cloud computing

Roberto Cornacchia on Search by strategy

Current search engines do a poor job of understanding what users are actually looking for. For instance, “amsterdam canals balcony faces west” is unlikely to return desirable results from a traditional search engine. The Spinique approach break the query down into its component parts (location, desired attributes, etc). What’s unique about Spinique is that it doesn’t these attributes as cut and dry filters, but instead provides sliders for the user to indicate the importance of each attribute.

The combination of attributes expressed by users can be thought of as domain-specific preferences, and can be used to influence the ranking of results for users in general.

I wasn’t able to attend this talk, unfortunately.

Mihai Lupu (Information Retrieval Facility) on Scaling up innovation

The innovation cycle brings together information professionals, scientists, and technology experts to share one technical platform, one set of standers, and one community.

The Information Retrieval Facility (IRF) was founded in 2007 in Vienna to encourage cooperation between researchers and industry. Their key areas of research include multiple indexing, text annotation, information extraction, document categorisation, image retrieval, and machine translation. They are looking for partner companies who can help put their research into practice in the real world.

Rob Stacey on Reconciling facts

True Knowledge are a Cambridge-based company who strive to provide direct answers to user’s questions (rather than just a list of search results). They currently know 300 million static facts. True Knowledge check new, incoming knowledge, and compare it to previously existing knowledge. When they contradict, the historically more reliable source wins out.

Panel: What will search look like in 2015

  • Stefan Rueger (Open University): Search will be more immersive than it is now, performed less at the computer, and more on the move in the real world. Search will also offer more of a browsing experience than it does now
  • Jody Goodall (Trader Media): More data, and more commoditisation of data.
  • Charlie Hull (Flax): open-source search will be the dominant player as search technologies become commoditised. There will be an increase in the number of applications that are driven primarily by search, as opposed to search being an after thought.
  • Nick Patience (451 Group): Open source search will be big, there will be more leveraging of social search tools to improve relevancy.

In Summary

While Search Solutions was heavily represented by computer scientists and academics, there were quite a few references to the importance of the user experience of search. Dusan Rnic from Endeca, for instance, advocated: “Start any search project by thinking about the user experience,” while Nick Patience, an analyst at 451 Group, claimed that the success of search-based applications lies in “a focus on the user interface.”

Thanks to Tony Russell-Rose, Andy MacFarlane, Alex Bailey, Leif Azzopardi, and Udo Kruschwitz for organising a great event!

blog comments powered by Disqus