 |
|
|
 |
|
 |
excellent explanation of the concept.. very informative n helpful
|
|
|
|
 |
|
 |
I want to run this classifier over reuters21578 collection,
as you know, we have both train(learn) stories and test stories on collection,
and on Bayesian classifier it is like that too ( I mean you have to first apply
some learnings then some test then run in real world) ; on your program I dont
understand how you are doing this !
I mean by using what kind of addressings your system takes learning and
testing stories ?
|
|
|
|
 |
|
 |
Hi
I am developing e-mail filter for Microsoft outlook so to filter from body content i hope to use Naive Bayesian Classifier. ur code seems to be very useful for me but didn't have idea for its output pls help me
|
|
|
|
 |
|
 |
Given that several categories were loaded with the test file why would the probabilities of a match with that category be as low as your results suggest? How do you verify the scores you calculate as being accurate? I know that sorting them will rank them in the order "most like" but I just want to have a better feel for the numbers that come out of the Classify function...
Also do you care to comment on my other post to this forum?
Thanks
Keith
|
|
|
|
 |
|
 |
In the forum notes below the author stated that the Bayesian probability is really a logarithm. For those of you who don't like math, you can return it from a logarithm by raising 10 to the power of the returned values from this program.
So if you got a returned value like.... Cat1: -0.30102999566, you would take 10^-0.30102999566 which roughly equals 0.5 or 50%.
In the buttonTest_Click event handler of the BayesClassifierDemo you can change the values back to proportions by using Math.Pow(10, score[c]). Math.Pow(10, score[c]) takes 10 and raises it to the power of score[c], which is the returned value from the classifier.
It's a shame the guy who wrote this couldn't have added that simple step, it would have removed a lot of other peoples' confusion.
|
|
|
|
 |
|
 |
Did you study this project much? I applied your suggestion and all of the results went to very small numbers (10^ -29 ish). Given that the test file was used to also load categories 2,3,4 why would the probabilities be so close to 0?
|
|
|
|
 |
|
 |
Very good, straight to the point, useful article. I need to implement this using plain old vanilla VC++ (6.0). Did anybody try that? I have not found any other VC++ code samples even close to this. I have no real desire to switch to C# at this point. The "interface" got me stomped. Where can I get some examples what would be an equivalent of "interface" in VC++. Inheritance, perhaps? Thanks for reading. Vaclav
|
|
|
|
 |
|
 |
What precisely does the score number mean. I understand that the smaller |score| is the better the match, but a number of say 20 or 27 is completely arbitrary. For example, let's say cat1 = 20 and cat2 = 27. The only thing this tells me is that comparatively there is a difference between the subject data, cat1 data and cat2 data. It tells me nothing about the degree of difference. Is there a way to convert these numbers into probabilities? For example, spamassassin uses Bayesian classifying to determine the probability that an email is spam. Somehow it returns a percent probability for each piece of email. Is such a thing possible with your code? I have to admit I know very little about the math of Bayesian classification.
|
|
|
|
 |
|
 |
hello i have a simple question plz contact me at yonido[@]slimail.com
It cannot be asked here.
|
|
|
|
 |
|
 |
i have a pretty basic question...
i see that you are classifying the file
Dictionary score = m_Classifier.Classify(new System.IO.StreamReader(file));
but can you plz explain the part.
'Dictionary score'
|
|
|
|
 |
|
 |
Sorry i was confused apt the part...
< string , double >
|
|
|
|
 |
|
 |
I have some questions and suggestions concerning your Bayesian Classifier and would really appreciate it, if you would contact me via email in German: malte [at] stecki.de
|
|
|
|
 |
|
 |
I am working on pixel classification of an image.So please give me suggetions how to use this bayes classifier
kamalakar
|
|
|
|
 |
|
 |
I'm certainly not an expert in image classification, but I have the feeling that this is not a perfect fit for the classification:
1) The Bayes' classifier does not take any kind of proximity (or context) into account
2) It only works for groups of discrete symbols (i.e. words) and their occurence. Pixels probably have groups of values, and might need to be fuzzified first.
Sorry
|
|
|
|
 |
|
 |
for example:
m_Classifier.TeachCategory("Cat1", new System.IO.StreamReader(TrainFile));
...
Dictionary score = m_Classifier.Classify(new System.IO.StreamReader(TestFile));
I use different trainfile to train the classifier and I wanna figure out the similarity between the trainfile and TestFile. But the result is -7164.342439***, I was confused about the figure??? How could that be??? Could you please give me some tips? Thank you.
|
|
|
|
 |
|
 |
As stated below, the final step involves calculating log(P(Cati|Doc)) which is always < 1 - and that gives a negative value.
In general, the smaller the number the better the match. Interestingly the match with Cat4 in the sample is better than the exact match Cat1 and Cat3 - but that is only due to the double trainging data.
|
|
|
|
 |
|
 |
Trough "the smaller the number the better the match" you undrestands that the value of that number is smaller, or the number without the "-" (minus sign) because it is a negative number (eg: -5 < -2 ) and my tests not reveal clearly which match is better and I'm confused a little bit
|
|
|
|
 |
|
 |
My First Qusition:
why results tends to zero and not normalized
My Second Qustion:
why you don't say cat1 relevant with 10% and cat2 with 20% and so on
My Last Qustion:
Does Naive Bayesian suitable for Story Link Detection ... Like all the relevant documents assigned to category 1 and else to category 2
Thanks alot
AG
|
|
|
|
 |
|
 |
1) The final step involves calculating log(P(Cati|Doc)) which is always < 1 - and that gives a negative value.
You don't get absolute values for a category, just relative to other categroies.
2) can be easily recalculated - I was just interested in finding the best match.
3) I did not get the point - Sorry
|
|
|
|
 |
|
 |
I gave you 5-stars, although your description is somewhat sparse (some might not fully understand the
probability formula on Wikipedia) however this is the first Naive Bayesian Classifier I have seen posted.
|
|
|
|
 |
|
 |
Sorry about the briefness - but I was in a bit of a hurry and I hoped the sources would be sufficient.
|
|
|
|
 |