You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/edu/stanford/nlp/tagger/maxent/MaxentTagger.java
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -168,7 +168,7 @@
168
168
The second format is a file of Penn Treebank formatted (i.e., s-expression) tree files. Trees are loaded one at a time and the tagged words in a tree are used as a training sentence.
169
169
To specify this format, preface the filename with "{@code format=TREES,}". <br>
170
170
The final possible format is TSV files (tab-separated columns). To specify a TSV file, set {@code trainFile} to "{@code format=TSV,wordColumn=x,tagColumn=y,filename}".
171
-
Column numbers are indexed from 0, and sentences are separated with blank lines. The default wordColumn is 0 and default tagColumn is 1.
171
+
Column numbers are indexed from 0, and sentences are separated with blank lines. The default wordColumn is 0 and default tagColumn is 1. If comments=true, then comment lines will be skipped (a common thing to appear in conllu files)
172
172
<br>
173
173
A file can be in a different character set encoding than the tagger's default encoding by prefacing the filename with {@code "encoding=ENC,"}.
174
174
You can specify the tagSeparator character in a TEXT file by prefacing the filename with "tagSeparator=c,". <br>
0 commit comments