ElasticSearch examines multiple fields (i.e., title, keywords, and full OCR text) and assigns a relevancy score based on:
- The number of times a term appears in the field
- The length of the field
- How often that term appears
If a field is short, like the title, then a term appearing in that field is weighted higher.
If a field is very long, and a term appears many times in the field, and many times over all of the texts in the corpus, then that term is weighted lower. This helps give a lower weight to words like a, and, or the.
If a field is very long, and a term appears infrequently across all of the texts in the corpus, then it’s ranked higher. This helps give a stronger weight to words like hippopotamus or giraffe.
All three of these factors are combined to produce a score for each field in the document, and then the scores for each field are combined to assign a score to the entire document. On top of this, we can force certain fields to be given a greater “weight” than others.
Fields with a higher “weight” have more of an affect on the score than fields with a lower “weight”. The final document score is used to rank the document in the search results.
Nitty gritty details: https://www.elastic.co/guide/en/elasticsearch/guide/current/scoring-theory.html
Tags: user interface, browse