Trouble understanding the data field in MALLET instance object -


currently i'm working on project , using csviterator mallet api create instancelist. however, i'm not sure quite how data field in mallet instance object supposed formatted. i'm attempting write data parsed line of text file.

i understand data field typically featurevector object in instancelist i'm not sure csviterator looking for.

thanks.

for classification or topic modeling, "data" field in input file should original document spaces substituted newline characters.

how mallet understands "data" field determined pipes use. these classes define rules convert string input featurevector.

the default behavior implemented in csv2vectors class, example, divides string tokens based on regular expression, , converts each token string feature data alphabet. there pipe objects many common transformations such lower-casing , stopword removal.


Comments

Popular posts from this blog

javascript - How to get current YouTube IDs via iMacros? -

c# - Maintaining a program folder in program files out of date? -

emulation - Android map show my location didn't work -