Running an Evaluation on TREC Data

This tutorial describes how to evaluate Ephyra on the data sets from the TREC 8-11 and TREC 13-16 question answering main tracks. Ephyra currently does not include an evaluation tool for the TREC 12 questions.

  1. You need to obtain the AQUAINT2 and Blog06 corpus (for TREC 16) or the AQUAINT corpus (for all previous evaluations). Visit the TREC website for details.
  1. Build an index (or indices) and set up Ephyra to use it (them) by following steps 1 to 8 in the tutorial Using Indri to Search a Document Collection.
  1. Run an evaluation tool for TREC questions.

    TREC 13-16: Run the class info.ephyra.trec.EphyraTREC13to16.

    TREC 8-11: Run the class info.ephyra.trec.EphyraTREC8to11.

    We recommend to use the following VM arguments:
    -server
    -Xms1000m
    -Xmx1500m
    -Djava.library.path=lib/search/
    
    The -server parameter speeds up the execution, but it is not supported by all VMs. We recommend to assign more memory to the VM if available. If you have less memory, you can still try to run the evaluation tool, but you may encounter memory problems.

    For each class, the valid command line arguments are described in the Javadoc comment of the main method. You need to specify at least the file containing the question set. All TREC questions, answers and patterns are located in the folder res/testdata/trec/. (Thanks to the National Institute of Standards and Technology (NIST) for allowing us to redistribute this data!)

    A sample script for running an evaluation tool can be found in the scripts/ folder.
  1. Depending on the configuration of Ephyra and your hardware, the evaluation can take several hours. Note that there is an option for reading existing answers from a log file if your want to stop and later resume the evaluation. See the Javadoc comment of the main method of the evaluation tool for details.
  1. Two files will be created in the log/ folder. The file ending in '_out' contains the results in the TREC submission format. If you used command line arguments to specify answer pattern files, a '+' or '-' before an answer will indicate whether it is correct or wrong. The second file contains intermediate results that may be useful for error analysis. If you specified answer pattern files, you can also find the overall scores at the end of this file.

Comments about this tutorial? Please email Nico Schlaefer.