Skip to content
Emanuel Faria edited this page Aug 19, 2020 · 11 revisions

ami search

support material for ami search.

Currently moving from oldstyle (SearchArgProcessor) to AMISearchTool.

The first part of this page is picocli commands which "run". (beneath the surface oldstyle passes commands to SearchArgProcessor but later will go direct to AMISearchTool).

picocli 2020-08-10


ami search --help
Usage: ami search [OPTIONS]
Description
===========
Searches text (and maybe SVG).
Options
=======
      --dictionary=<dictionaryList>...
                         symbolic names of builtin dictionaries (likely to be obsoleted). Good values are (country, disease,
                           funders)
      --dictionarySuffix=<dictionarySuffix>
                         suffix for search dictionary
                           Default: xml
      --dictionaryTop=<dictionaryTopList>
                          local dictionary home directory
  -h, --help             Show this help message and exit.
      --ignorePlugins=<ignorePluginList>...
                          list of plugins to skip (mainly for debugging)
                           Default: []
      --no-oldstyle      (A) use oldstyle style of processing (project based) for unconverted tools; new style is per
                           tree
      --stripNumbers     Strip numbers from words
  -V, --version          Print version information and exit.
      --wikidataBiblio    lookup wikidata biblographic object
      --wordCount=<wordCountRange>
                         count range for words for frequencies (comma-separated); null
                           Default: (20,1000000)
      --wordLength=<wordLengthRange>
                         length range for words for wordlengths (comma-separated); null
                           Default: (1,20)

existing architecture

The command ami search links (too many) separate subprocesses. It requires fulltext (scholarly.html) which is normally created from published documents (fulltext.xml or fulltext.pdf) by ami transform. The scholarly.html is then analysed with ami word to discover word frequencies ("word clouds"). Then it is searched (ami search) using dictionaries to create results subtrees. The results are then displayed as a "dashboard" (dataTables) . Finally we run 1-D (histograms) and 2-D (cooccurrence) and plot the results.

proposed new architecture

The actions above will be highlighted as picocli commands. Most of the code already exists.

ami transform

ami words

ami search

ami display --datatables

ami display --histogram and --cooccurrence

ami summary

dictionary

Searches use dictionaries usage

ami -p <project> search --dictionary <dict1> <dict2> ...

The dictionaries are addressed in several ways

local dictionary

relative or absolute filename

builtin

URL-based


For developers

stack

against dictionary (deepest first)

SearchSearcher(AMISearcher).searchWithDictionary(List<String>) line: 328	
SearchSearcher.searchWordList() line: 49	
SearchArgProcessor(AbstractSearchArgProcessor).runSearch() line: 94	
SearchArgProcessor.runSearch(ArgumentOption) line: 64	

searchWordList()

Note createWordList() is discussed in https://github.com/petermr/ami3/wiki/ami-words .

	public ResultsElement searchWordList() {
// create wordList from document 
		List<String> wordsToSearch = new WordCollectionFactory(
                    (AbstractSearchArgProcessor)this.getArgProcessor()).createWordList();
// search, includes phrases
		ResultsElement resultsElement = searchWithDictionary(wordsToSearch);
		return resultsElement;
	}

ResultsElement AMISearcher.searchWithDictionary(List<String> strings)

org.contentmine.ami.plugins.AMISearcher:
	public ResultsElement searchWithDictionary(List<String> strings) {
		LOG.debug("SEARCH with dictionary");
		ResultsElement resultsElement = new ResultsElement();
		if (strings != null) {
			for (int pos = 0; pos < strings.size(); pos++) {
				String firstword = strings.get(pos);
				List<List<String>> trailingListList = dictionary.getTrailingWords(firstword);
				if (trailingListList != null) {
					int trailingOffset = canFitTrailing(trailingListList, strings, pos);
					if (trailingOffset != -1) {
						ResultElement resultElement = createResultElement(strings, pos, trailingOffset);
						resultsElement.appendChild(resultElement);
					}
				}
			}
		}
		return resultsElement;
	}


List<List<String>> getTrailingWords(String headWord) (in DefaultStringDictionary)

	public List<List<String>> getTrailingWords(String headWord) {
		return trailingWordsByLeadWord != null ? trailingWordsByLeadWord.get(headWord) : null;
	}

adding synonyms

Clone this wiki locally