Adding Unstructured Knowledge Sources

The search stage in Ephyra's pipeline is organized as a set of searcher components. There are two types of searchers: knowledge miners for unstructured resources and knowledge annotators for structured or semistructured resources. This tutorial deals with unstructured resources and thus focuses on knowledge miners. To add a new resource (e.g. a newswire corpus), or to use a different search engine (e.g. Yahoo) for an existing resource, you can add one of these knowledge miners:

  • Extend the class info.ephyra.search.searchers.KnowledgeMiner. We recommend to append the suffix KM to the class name and to add your new class to the package info.ephyra.search.searchers.
  • Implement the methods doSearch(), getCopy(), getMaxResultsTotal(), and getMaxResultsPerQuery(). Refer to existing knowledge miners for examples.
  • Finally, you need to add your knowledge miner to one of the init-methods in the main class that you are running. For instance, let us assume that you have developed the knowledge miner MyWebKM to wrap a Web search engine, and that you want to use it to answer factoid questions. Then you would add the following line to the search part of the initFactoid() method in your main class:
    Search.addKnowledgeMiner(new MyWebKM());

Comments about this tutorial? Please email Nico Schlaefer.