Task #11340 (new)
Opened 11 years ago
Last modified 8 years ago
Improve Ricerca handling of negative examples — at Initial Version
Reported by: | spli | Owned by: | icao-berg-x |
---|---|---|---|
Priority: | major | Milestone: | Unscheduled |
Component: | API | Version: | 4.4.8 |
Keywords: | searcher | Cc: | analysis@…, pwalczysko |
Resources: | n.a. | Referenced By: | n.a. |
References: | n.a. | Remaining Time: | n.a. |
Sprint: | n.a. |
Description
The handling of negative results in Ricerca doesn't take into account the dissimilarity score, only the ranking:
https://github.com/icaoberg/ricerca/blob/17e0252b49d1f197553cd6d1319e143631ebf9f3/ricerca/content.py#L151
This means when the negative set of samples is very spread out, and for many positive and negative samples d([-],x) >> d([+],x), a negative image can still come out ahead of what you'd intuitively expect to be a positive sample- see the following 1D example below where x1- is ranked above x2+ and x3+. This is probably an even bigger problem with high-dimensional data.
[-] = negative reference [+] = positive reference x1, x4: Negative samples x2, x3: Positive samples 0 9 13 16 29 55 - + [+] + [-] - | | | | x1 x2 x3 x4 rev. avg d[+] d[-] rank+ rank- rank- rank x1 13 29 3 4 1 2 x2 4 20 2 2 3 2.5 x3 3 13 1 1 4 2.5 x4 42 26 4 3 2 3