18/03/2016 - ELASTICSEARCH
Elasticsearch scores query results itself so you don't have to tell it what to do. However, if you want to tell elasticsearch how to score each records in result then it will score them as you tell it to. For that we use boost
flag. It is like saying to car salesman "Although I like bmw and mercedes equally, I would be happier to get bmw so please do your best to show me all bmw cars first".
In the case of boosting score calculation, the shorter the length the better the score unless you modify boost_mode
in your query. e.g. if you're looking for "hello" as keyword, the record that contains just "hello" will score higher than the one contains "hello world".
+----+-------------------+
| id | title |
+----+-------------------+
| 1 | one |
| 2 | two |
| 3 | three |
| 4 | one two |
| 5 | one three |
| 6 | one two three |
| 7 | two three |
| 8 | none | <- This will never appear in queries below
| 9 | one abc |
| 10 | two abc |
| 11 | three abc |
| 12 | one two abc |
| 13 | one two three abc |
| 14 | two three abc |
+----+-------------------+
14 rows in set (0.00 sec)
Query below will return all the records which contain "one", "two" or "three" keywords in "title" field. It will also make sure that the records that contain "three" scores higher than "two" and records that contain "two" scores higher than "one".
curl -XPOST "http://127.0.0.1:9200/_search?post_dev" -d'
{
"query": {
"bool": {
"must": {
"match": {
"title": {
"query": "one two three",
"operator": "or" /* if you remove this line, it would still run OR query because is it default behaviour */
}
}
},
"should": [
{
"match": {
"title": {
"query": "one",
"boost": 1.5
}
}
},
{
"match": {
"title": {
"query": "two",
"boost": 2.5
}
}
},
{
"match": {
"title": {
"query": "three",
"boost": 3.5
}
}
}
]
}
},
"sort": [
{
"_score": {
"order": "desc"
}
}
],
"from": "0",
"size": "100"
}'
Since elasticsearch uses a complex scoring algorithm, it is sometimes hard to explain why some records score less that others although they contain more valuable keywords then others so it applies to our example here.
+----+-------------------+
| id | title | ES _score OUR _score
+----+-------------------+
| 6 | one two three | 0.9846948 7.5
| 13 | one two three abc | 0.9846948 7.5
| 4 | one two | 0.5968839 4
| 3 | three | 0.51140535 3.5
| 5 | one three | 0.49692816 5
| 7 | two three | 0.48417675 6
| 14 | two three abc | 0.38734144 6
| 11 | three abc | 0.35777172 3.5
| 12 | one two abc | 0.31694633 4
| 10 | two abc | 0.23624702 2.5
| 2 | two | 0.1286011 2.5
| 1 | one | 0.09882839 1.5
| 9 | one abc | 0.08662249 1.5
+----+-------------------+
13 rows in set (0.00 sec)
Query below will return all the records which contain "one", "two" or "three" keywords in "title" field. It will also make sure that the records that contain "three" scores better than "two" and records that contain "two" scores better than "one".
curl -XPOST "http://127.0.0.1:9200/_search?post_dev" -d'
{
"query": {
"bool": {
"must": {
"match": {
"title": {
"query": "one two three",
"operator": "or" /* if you remove this line, it would still run OR query because is it default behaviour */
}
}
},
"should": [
{
"match": {
"title": {
"query": "one",
"boost": 1.5
}
}
},
{
"match": {
"title": {
"query": "two",
"boost": 2.5
}
}
},
{
"match": {
"title": {
"query": "three",
"boost": 3.5
}
}
}
]
}
},
"sort": [
{
"_score": {
"order": "desc"
}
}
],
"from": "0",
"size": "10"
}'
Since elasticsearch uses a complex scoring algorithm, it is sometimes hard to explain why some records score less that others although they contain more valuable keywords then others so it applies to our example here.
+----+-------------------+
| id | title | ES _score OUR _score
+----+-------------------+
| 6 | one two three | 0.762778 7.5
| 13 | one two three abc | 0.762778 7.5
| 4 | one two | 0.43070683 4
| 7 | two three | 0.3947725 6
| 5 | one three | 0.38228768 5
| 3 | three | 0.33812702 3.5
| 14 | two three abc | 0.315818 6
| 11 | three abc | 0.23125261 3.5
| 12 | one two abc | 0.22994567 4
| 10 | two abc | 0.15094957 2.5
| 2 | two | 0.08401244 2.5
| 1 | one | 0.057243854 1.5
| 9 | one abc | 0.050172962 1.5
+----+-------------------+
13 rows in set (0.00 sec)
Query below will return all the records which contain "one", "two" or "three" keywords in "title" field. It will also make sure that the records that contain "three" scores better than "two" and records that contain "two" scores better than "one".
curl -XPOST "http://127.0.0.1:9200/_search?post_dev" -d'
{
"query": {
"function_score": {
"query": {
"match": {
"title": "one two three"
}
},
"functions": [
{
"filter": {
"query": {
"match": {
"title": "one"
}
}
},
"weight": 1.5
},
{
"filter": {
"query": {
"match": {
"title": "two"
}
}
},
"weight": 2.5
},
{
"filter": {
"query": {
"match": {
"title": "three"
}
}
},
"weight": 3.5
}
],
"score_mode": "sum",
"boost_mode": "replace"
}
},
"sort": [
{
"_score": {
"order": "desc"
}
}
],
"from": "0",
"size": "100"
}'
There we go, we got we wanted. Result is ordered as we expected.
+----+-------------------+
| id | title | ES _score OUR _score
+----+-------------------+
| 6 | one two three | 7.5 7.5
| 13 | one two three abc | 7.5 7.5
| 7 | two three | 6 6
| 14 | two three abc | 6 6
| 5 | one three | 5 5
| 4 | one two | 4 4
| 12 | one two abc | 4 4
| 11 | three abc | 3.5 3.5
| 3 | three | 3.5 3.5
| 2 | two | 2.5 2.5
| 10 | two abc | 2.5 2.5
| 1 | one | 1.5 1.5
| 9 | one abc | 1.5 1.5
+----+-------------------+
13 rows in set (0.00 sec)