Best of the Lists
(including Best of BUSLIB-L)
Home l About us l Services l Courses l Calendar l A - Z Index l Sign up l Login

Book
Book details

Share

To be notified of new articles, sign up for email alerts or subscribe to our RSS feed RSS


Do portal search engines work?

Posted to kmgov@list.jpl.nasa.gov on 10/20/2010

Question (posted by Jo Ann Remshard)
I'm interesting in learning from you what best practices and pros/cons you've encountered implementing Autonomy. Have you implemented functionality such as visualization?  Have you conducted training on using Autonomy? If so, what tips do you have?

In addition, I've learned that NASA has implemented Autonomy and Google. How were these implemented? What tips do you have? Were the two search engines integrated into one interface? If so, how?

Answer (posted by Michael Corrigan)
The Air Force has used Autonomy extensively as its search engine in the Air Force portal. I can try to run down the actual engineers doing that work if you want. But before you say yes, let me give you some user-related feedback and a rationalization for that response.

There is a perception within the general AF user population that the Portal Discovery, well, sucks. That perception is anecdotal - after 6 years working for the CIO and now the CMO of the Air Force, I can't think of a single positive response regarding the portal discovery. That is from personal conversations, technical meetings, conferences, a range of fora. The portal engineers might have other data to present, so I do not want to present this as an official AF position. This is purely observational.

However, the AF has tried other search engines to improve the responsiveness of the Portal discovery. Not much improvement has been realized. We recently had one of the newer vendors come in to talk to us, and he asked us directly what we thought of the Portal discovery, and the unanimous response was, well, it sucks. He stated that he was surprised that so many users, not just us, said the same thing. We told him that we all used Google first to find DOD and AF information - the commercial site, not the appliance you can implement on your own network. And that was deemed far more useful than the portal.

So the question becomes is it the technology? My response is no. The key is how the engine is set up. For most search engines, Autonomy included, they like to autogenerate a taxonomy for the content they will be searching by perusing a corpus of the content. This will generate the basic taxonomy, which then needs to be edited by SMEs. As feedback to you, do not underestimate this process. When we did some work with Autonomy, they wanted a corpus of 10,000 documents to ensure they get a comprehensive taxonomy. Of course, garbage in garbage out, so if those 10,000 documents include emails on birthdays and NCAA basketball pools, well, you get the idea. 

So we did a test in the Air Force of the ability of COTS search engines to autogenerate discovery metadata for us. And we used an approach whereby our SMEs, with the help of an ontologist (actually, a knowledge engineer), to develop our own taxonomies. These taxonomies represented the information in a specific problem context (we are now working with ontologies purely, so we are much closer to actual knowledge representation). We found that given these taxonomies the search engine generated 95% accurate matches between their results for identifying relevant discovery metadata and that generated manually by the SMEs. (Actually, the search engines were far more accurate than the SMEs, we had to train the SMEs on their metadata generation using the taxonomies before they actually got decent results. Something we have all known, the average human consumer of information doesn't generate metadata very well. With an ontologist/knowledge engineer, the SMEs generated excellent taxonomies. But left to their own devices to utilize those same taxonomies in metadata generation, they weren't so effective.)

Forgive the long-windedness, but the bottom line is that the most critical aspect of a successful search engine is the specification of the problem context, or vocabulary represented as a taxonomy or ontology, that will drive the search engine. Even Google now recognizes this, acknowledging the limitations of their core search technology, and acquiring Metaweb (www.metaweb.com) to integrate contextual-based search into their engine. The engines we tested, Autonomy, FastSearch, ConceptSearching, and Convera, all performed well given a good vocabulary. That is the key.  And if I can make one more recommendation - build the vocabulary first using your SMEs and good knowledge elicitation, rather than using the search engine to generate it from a corpus. We found that it takes just as long if not longer to edit the engine-generated taxonomy as it does to build it from scratch using SMEs, and engine-generated taxonomy is less comprehensive. If you do this, you reduce the amount of time you spend encountering and then eliminating errors such as that we discovered when searching for information about terrorist incidents. We found out the Yankees had bombed the Red Sox.