Skip to content

irgroup/query_sim_validation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Query Simulation Validation

Python version

The objective of this repository is to provide a tool for validating query simulation approaches in the context of (interactive) information retrieval.

Getting Started

We recommend using a virtual environment to run the code locally.

Create virtual environment

We provide instructions for creating a virtual environment using conda (recommended) or another method of your choice.

Conda environment (recommended)

To create a virtual environment using conda, run the following commands in your terminal:

  1. Make the script executable (if not already):
chmod +x scripts/create_environment.sh
  1. Run the script to create the environment:
./scripts/create_environment.sh 
conda activate query_sim_validation

Own method

To create a virtual environment using your own method, follow these steps:

  1. Create a virtual environment (e.g., using venv or virtualenv).
  2. Activate the virtual environment.
  3. Install the requirements: pip install -r requirements.txt
  4. Download the nltk data files:
python -c "import nltk; nltk.download('wordnet'); nltk.download('punkt_tab'); nltk.download('averaged_perceptron_tagger_eng')"
  1. Download the spacy model:
python -m spacy download en_core_web_sm

Data Preparation

If you want to validate your own data, make sure it follows the correct structure. See the data description guide for more details on the expected data format, along with example data files.

Usage

To run the validation, use the following command:

python -m query_sim_validation.main --config <path_to_config_file>

Use python -m query_sim_validation.main --help to see all available options.

We provide a default configuration file at config/config_default.yaml. You can create a custom configuration file or specify the parameters directly in the command line. To run the validation with the default configuration, you need to unzip the original_sessions.zip and simulated_sessions.zip files in the data directory.

The configuration is automatically saved in a specific output directory (output by default).

⚠️ Note: Some measures may increase runtime (e.g., BERT Score Similarity). And traditional IR measures require Qrels to be provided in the configuration file.

Supported Validation Measures

Currently, the following validation measures are implemented:

Measures that assess how similar two queries are in terms of language, structure, and observable behavior.

Facet Measure Ref
Behavioural Similarity Measures Flesch Kincaid Score flesch_kincaid_scores
Type-Token Ratio ttr
Data Similarity Measures Query Length query_length
Number of Query Terms query_num_terms
Number of Named Entity Query Terms query_num_named_entities
Query Similarity Measures Jaccard Similarity jaccard_similarity
Cosine Similarity cosine_similarity
BERT Score Similarity bert_score
WordNet Similarity wordnet_similarity
Rank Diversity Score rank_diversity_score

Measures that estimate how similarly two queries perform in retrieval settings.

Facet Measure Ref
SERP Overlap Measures SERP-based Jaccard Similarity serp_jaccard
Ranked Biased Overlap rbo
Traditional IR Measures Mean Average Precision (MAP) / MAP@K map / map_cut@k
Normalized Discounted Cumulative Gain (NDCG) / NDCG@K ndcg / ndcg_cut@k
Precision / Precision@K P / P@k
Recall@K recall@k
Reciprocal Rank recip_rank
Rank-biased Precision rbp

Citation

If you plan on using the framework in your work, please cite it the following way:

@inproceedings{Kruff:2026:ECIR,
  title={Validating Search Query Simulations: A Taxonomy of Measures},
  booktitle={Proceedings of the 48th European Conference on Information Retrieval},
  author={Kruff, Andreas Konstantin and Bernard, Nolwenn and Schaer, Philipp},
  year={2026},
  series={ECIR '26}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors