Temporal Recall@K¶
Retrieval Metric
Temporal Recall@K measures the fraction of all temporally relevant documents that appear in the top-K retrieved results. This tells you how many of the "correct" temporal documents your system successfully retrieved.
Formula¶
\[
\text{TemporalRecall@K} = \frac{|\{d \in D_K : \text{rel}(d, q) = 1\}|}{|\{d \in C : \text{rel}(d, q) = 1\}|}
\]
Where:
- \(D_K\) = set of top-K retrieved documents
- \(C\) = full document collection
- \(\text{rel}(d, q) = 1\) if \(|QFT \cap DFT_d| > 0\), otherwise 0
In simple terms: What fraction of all temporally relevant documents did you retrieve?
Inputs¶
retrieved_idsandgold_ids(gold mode)queryandretrieved_docs(LLM mode)qft,dfts, andtotal_relevant(Focus Time mode)k(cutoff)
Output¶
- Range: [0, 1], higher is better.
Prompt (LLM mode)¶
## Task
Judge if this document is temporally relevant to answering the query.
## Query
{query}
## Document
{document}
## Criteria
A document is temporally relevant if it:
1. Contains temporal information (dates, periods, durations) needed to answer the query
2. Discusses events/facts from the time period the query asks about
3. Provides temporal context that helps answer the query
## Output (JSON)
{
"is_relevant": true or false,
"relevance_score": 0.0 to 1.0,
"reasoning": "brief explanation"
}
Examples¶
Focus Time¶
from tempoeval.metrics import TemporalRecall
metric = TemporalRecall(use_focus_time=True)
score = metric.compute(qft={2020}, dfts=[{2020}, {2019}], total_relevant=1, k=2)
LLM¶
from tempoeval.metrics import TemporalRecall
metric = TemporalRecall(use_llm=True)
metric.llm = llm
score = await metric.acompute(
query="When did X happen?",
retrieved_docs=["..."],
k=5
)
Synchronous usage
Use compute(...) for sync calls and acompute(...) for async.