I´ve been using Lucene for a while, and it works fine, but recently I detected a wrong result set of documents that led me to a "need to solve" situation, or the search module of my system will be doomed. The sentence that the user wrote was:
term1 NOT term2 AND term3
The user meant "find documents that has term1 and term3, but not term2". Of course the way the user wrote seems a little confused. But lets see the facts. If Lucene considers that the sentence is OK, it can translate it to:
option A: term1 NOT (term2 AND term3)
option B: (term1 NOT term2) AND term3
Option A should return documents that HAS term1, but none should have term2 and term3 simultaneously. That would include documents with term1 and term3, or with term1 and term3.
Option B should return documents that HAS both term1 and term3, but never term2.
If Lucene produced results according to Option A or Option B, that would have satisfied me, because it would be only a question of the way we should use the connectors. The problem is that Lucene returned a small quantity of documents, where none of them had term1 inside its text - something totally disconnected from options A and B.
I used all combinations and subsets of the search, and all return perfectly logical results.
I also could get the correct result desired by the user when I used the following sentece:
term1 AND term3 NOT term2
The result was ZERO docs. I don´t know if it was a coincidence, or if this "zero docs" has something to do with the wrong result when using the user´s sentence.
I´m using Lucene 3.4.0, and maybe I should upgrade to the newest version, but I know I´ll have to change the way some methods are used, and that will take some time I don´t have now.
Anyone has a clue?
Thank you in advance.