nlp - Why does my NamedEntityAnnotator for date mentions differ from CoreNLP demo's output? -
the date detected following program gets split 2 separate mentions whereas detected date in ner output of corenlp demo single should be. should edit in program correct this.
properties props = new properties(); props.setproperty("annotators", "tokenize, ssplit, pos, lemma, ner, entitymentions"); stanfordcorenlp pipeline = new stanfordcorenlp(props); string text = "this software released on februrary 5, 2015."; annotation document = new annotation(text); pipeline.annotate(document); list<coremap> sentences = document.get(sentencesannotation.class); for(coremap sentence: sentences) { list<coremap> mentions = sentence.get(mentionsannotation.class); if (mentions != null) { (coremap mention : mentions) { system.out.println("== token=" + mention.get(textannotation.class)); system.out.println("ner=" + mention.get(namedentitytagannotation.class)); system.out.println("normalized ner=" + mention.get(normalizednamedentitytagannotation.class)); } } }
output program:
== token=februrary 5, ner=date normalized ner=****0205 == token=2015 ner=date normalized ner=2015
note online demo showing sequence of consecutive tokens same ner tag belonging same unit. consider sentence:
the event happened on february 5th january 9th.
this example yields "february 5th january 9th" single date in online demo.
yet recognizes "february 5th" , "january 9th" separate entity mentions.
your sample code looking @ mentions, not ner chunks. mentions not being shown online demo.
that being said, not sure why sutime not joining february 5th , 2015 in example. bringing up, improving module fix issue in future releases.
Comments
Post a Comment