Trouble understanding the data field in MALLET instance object -


currently i'm working on project , using csviterator mallet api create instancelist. however, i'm not sure quite how data field in mallet instance object supposed formatted. i'm attempting write data parsed line of text file.

i understand data field typically featurevector object in instancelist i'm not sure csviterator looking for.

thanks.

for classification or topic modeling, "data" field in input file should original document spaces substituted newline characters.

how mallet understands "data" field determined pipes use. these classes define rules convert string input featurevector.

the default behavior implemented in csv2vectors class, example, divides string tokens based on regular expression, , converts each token string feature data alphabet. there pipe objects many common transformations such lower-casing , stopword removal.


Comments

Popular posts from this blog

Load Balancing in Bluemix using custom domain and DNS SRV records -

oracle - pls-00402 alias required in select list of cursor to avoid duplicate column names -

python - Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>] error -