-
Notifications
You must be signed in to change notification settings - Fork 5
ami search
support material for ami search.
Currently moving from oldstyle (SearchArgProcessor) to AMISearchTool.
The first part of this page is picocli commands which "run".
(beneath the surface oldstyle passes commands to SearchArgProcessor but later will go direct to AMISearchTool).
ami search --help
Usage: ami search [OPTIONS]
Description
===========
Searches text (and maybe SVG).
Options
=======
--dictionary=<dictionaryList>...
symbolic names of builtin dictionaries (likely to be obsoleted). Good values are (country, disease,
funders)
--dictionarySuffix=<dictionarySuffix>
suffix for search dictionary
Default: xml
--dictionaryTop=<dictionaryTopList>
local dictionary home directory
-h, --help Show this help message and exit.
--ignorePlugins=<ignorePluginList>...
list of plugins to skip (mainly for debugging)
Default: []
--no-oldstyle (A) use oldstyle style of processing (project based) for unconverted tools; new style is per
tree
--stripNumbers Strip numbers from words
-V, --version Print version information and exit.
--wikidataBiblio lookup wikidata biblographic object
--wordCount=<wordCountRange>
count range for words for frequencies (comma-separated); null
Default: (20,1000000)
--wordLength=<wordLengthRange>
length range for words for wordlengths (comma-separated); null
Default: (1,20)
The command ami search links (too many) separate subprocesses. It requires fulltext (scholarly.html) which is normally created from published documents (fulltext.xml or fulltext.pdf) by ami transform. The scholarly.html is then analysed with ami word to discover word frequencies ("word clouds"). Then it is searched (ami search) using dictionaries to create results subtrees. The results are then displayed as a "dashboard" (dataTables) . Finally we run 1-D (histograms) and 2-D (cooccurrence) and plot the results.
The actions above will be highlighted as picocli commands. Most of the code already exists.
Searches use dictionaries usage
ami -p <project> search --dictionary <dict1> <dict2> ...
The dictionaries are addressed in several ways
For developers
SearchSearcher(AMISearcher).searchWithDictionary(List<String>) line: 328
SearchSearcher.searchWordList() line: 49
SearchArgProcessor(AbstractSearchArgProcessor).runSearch() line: 94
SearchArgProcessor.runSearch(ArgumentOption) line: 64
Note createWordList() is discussed in https://github.com/petermr/ami3/wiki/ami-words .
public ResultsElement searchWordList() {
// create wordList from document
List<String> wordsToSearch = new WordCollectionFactory(
(AbstractSearchArgProcessor)this.getArgProcessor()).createWordList();
// search, includes phrases
ResultsElement resultsElement = searchWithDictionary(wordsToSearch);
return resultsElement;
}
org.contentmine.ami.plugins.AMISearcher:
public ResultsElement searchWithDictionary(List<String> strings) {
LOG.debug("SEARCH with dictionary");
ResultsElement resultsElement = new ResultsElement();
if (strings != null) {
for (int pos = 0; pos < strings.size(); pos++) {
String firstword = strings.get(pos);
List<List<String>> trailingListList = dictionary.getTrailingWords(firstword);
if (trailingListList != null) {
int trailingOffset = canFitTrailing(trailingListList, strings, pos);
if (trailingOffset != -1) {
ResultElement resultElement = createResultElement(strings, pos, trailingOffset);
resultsElement.appendChild(resultElement);
}
}
}
}
return resultsElement;
}
public List<List<String>> getTrailingWords(String headWord) {
return trailingWordsByLeadWord != null ? trailingWordsByLeadWord.get(headWord) : null;
}