How are search results ranked?

ElasticSearch examines multiple fields (i.e., title, keywords, and full OCR text) and assigns a relevancy score based on:

  1. The number of times a term appears in the field
  2. The length of the field
  3. How often that term appears

If a field is short, like the title, then a term appearing in that field is weighted higher.

If a field is very long, and a term appears many times in the field, and many times over all of the texts in the corpus, then that term is weighted lower. This helps give a lower weight to words like a, and, or the.

If a field is very long, and a term appears infrequently across all of the texts in the corpus, then it’s ranked higher. This helps give a stronger weight to words like hippopotamus or giraffe.

All three of these factors are combined to produce a score for each field in the document, and then the scores for each field are combined to assign a score to the entire document. On top of this, we can force certain fields to be given a greater “weight” than others.

Fields with a higher “weight” have more of an affect on the score than fields with a lower “weight”. The final document score is used to rank the document in the search results.

Nitty gritty details: https://www.elastic.co/guide/en/elasticsearch/guide/current/scoring-theory.html

Tags: user interface, browse