java - Understanding algorithm - Multinomial Naive Bayes -


i have been introduced naive bayes classification method (multinomial nb), reference how described michael sipser in book "the theory of computation".

i looking @ algorithm described both training , applying multinomial nb, presented follows:

enter image description here

however, i'm coming loss when interpreting aspects of algorithm. instance, in trainmultinomialnb(c, d) on line 6:

  • what concatenate_text_of_all_docs_in_class(d, c) do?

so far, understand follows. suppose have 3 - 3 - documents in class "movies" , "songs":

movies     doc1 = "big fish"     doc2 = "big lebowski"     doc3 = "mystic river"  songs     doc1 = "purple rain"     doc2 = "crying in rain"     doc3 = "anaconda"     

after applying concatenate_text_of_all_docs_in_class(d, c), left with, strings:

string concatenatedmovies = "big fish big lebowski mystic river"  string concatenatedsongs = "purple rain crying in rain anaconda"  

is right? understand highly appreciated.

in end, want able clasify text based on content. want able if songs or movies, etc.
in order bayes (or other method), first use train data build model.

first, creating priors (docs in class / total docs) on line 5. compute conditional probabilities (probability of word fish given class movies, probability of word rain given class songs), lines 7-10. divide occurences of term total number of terms in class (plus smoothing -> +1). why concatinate - able count occurences of term in class.
in end, plug these values in bayes formula , can categorize unknonw document movies, songs, ... more wiki


Comments

Popular posts from this blog

Load Balancing in Bluemix using custom domain and DNS SRV records -

oracle - pls-00402 alias required in select list of cursor to avoid duplicate column names -

python - Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>] error -