Overview of the Ephyra Architecture

Ephyra is a modular and extensible framework for question answering. This document gives an overview of the pipeline layout and the main data structures used in this pipeline.

Overall Pipeline

The system is organized as a pipeline of standardized components for question analysis, query generation, search, and answer extraction and selection. These components can be combined and arranged arbitrarily, which facilitates experimenting with different setups and finding the most effective configuration. Furthermore, multiple approaches and knowledge sources can be combined in one system, and components can be shared among different approaches. An overview of a typical pipeline setup is shown below.

http://www.cs.cmu.edu/~nico/ephyra/doc/images/overall_architecture.jpg

Note that in Ephyra's pipeline, the answer extraction and selection stages have been combined, which is different from most other QA systems. This takes into account that both stages perform similar operations on the same data structures, and that an interleaved execution of answer extraction and selection components may be beneficial.

Main Data Structures

The following data structures are used to pass information along the different pipeline stages.

AnalyzedQuestion

An AnalyzedQuestion represents a asyntactic and semantic analysis of a question. It serves as an interace between the question analysis and query generation stages.

http://www.cs.cmu.edu/~nico/ephyra/doc/images/analyzedquestion_datastructure.jpg

Query

A Query is a search engine query generated at the query generation stage and executed at the search stage.

http://www.cs.cmu.edu/~nico/ephyra/doc/images/query_datastructure.jpg

Result

A Result is a document retrieved at the search stage or an answer candidate in the answer extraction and answer selection stages.

http://www.cs.cmu.edu/~nico/ephyra/doc/images/result_datastructure.jpg