java - Understanding algorithm - Multinomial Naive Bayes -
i have been introduced naive bayes classification method (multinomial nb), reference how described michael sipser in book "the theory of computation".
i looking @ algorithm described both training , applying multinomial nb, presented follows:
however, i'm coming loss when interpreting aspects of algorithm. instance, in trainmultinomialnb(c, d) on line 6:
- what concatenate_text_of_all_docs_in_class(d, c) do?
so far, understand follows. suppose have 3 - 3 - documents in class "movies" , "songs":
movies doc1 = "big fish" doc2 = "big lebowski" doc3 = "mystic river" songs doc1 = "purple rain" doc2 = "crying in rain" doc3 = "anaconda"
after applying concatenate_text_of_all_docs_in_class(d, c), left with, strings:
string concatenatedmovies = "big fish big lebowski mystic river" string concatenatedsongs = "purple rain crying in rain anaconda"
is right? understand highly appreciated.
in end, want able clasify text based on content. want able if songs or movies, etc.
in order bayes (or other method), first use train data build model.
first, creating priors
(docs in class / total docs) on line 5. compute conditional probabilities
(probability of word fish given class movies, probability of word rain given class songs), lines 7-10. divide occurences of term total number of terms in class (plus smoothing -> +1). why concatinate - able count occurences of term in class.
in end, plug these values in bayes formula , can categorize unknonw document movies, songs, ... more wiki
Comments
Post a Comment