Method

Given a full-fledged or a telegraphic query on entity counts, CoQEx uses a span-based QA model to separately extract candidate count contexts and instances from the top-50 search-engine snippets. The user is shown the following components.

  • An answer inference is predicted by a distribution-aware inference over count contexts.
  • The count contexts are further classified into semantic groups with respect to the inferred answer to form the explanation by contexts. They are grouped based on whether the contexts are quite similar to the inferred answer or if they represent a subset of the inferred answer or if they are incomparable.
  • The instances are ranked by their compatibility with the answer type. They form the explanation by instances since they likely ground the counts into their constituting entities. CoQEx extracts the answer type from the query.
  • The snippets are annotated with the count context and instance candidates to form the explanation by provenance.
System overview Overview of the CoQEx pipeline.

CoQEx Output

System output

This figure illustrates the output for the query how many languages are spoken in Indonesia?. This is a query about the entity Indonesia and the set of languages spoken in the country. We can answer this query through context containing counts, such as, estimated 700 languages and grounding instances, such as, Javanese and Sundanese.

Related Papers

CoQEx: Entity Counts Explained. Shrestha Ghosh, Simon Razniewski, Gerhard Weikum. (WSDM 2023) [pdf]

Answering Count Questions with Structured Answers from Text. Shrestha Ghosh, Simon Razniewski, Gerhard Weikum. (JoWS 2022) [Preprint] [Journal]

Answering Count Queries with Explanations. Shrestha Ghosh, Simon Razniewski, Gerhard Weikum. (SIGIR 2022) [Pdf] [code|data]

Contact

For feedback and clarifications, please contact: Shrestha Ghosh

To know more about our group, please visit https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/.