ami search

support material for ami search.

Currently moving from oldstyle (SearchArgProcessor) to AMISearchTool.

The first part of this page is picocli commands which "run". (beneath the surface oldstyle passes commands to SearchArgProcessor but later will go direct to AMISearchTool).

`picocli` 2020-08-10


ami search --help
Usage: ami search [OPTIONS]
Description
===========
Searches text (and maybe SVG).
Options
=======
      --dictionary=<dictionaryList>...
                         symbolic names of builtin dictionaries (likely to be obsoleted). Good values are (country, disease,
                           funders)
      --dictionarySuffix=<dictionarySuffix>
                         suffix for search dictionary
                           Default: xml
      --dictionaryTop=<dictionaryTopList>
                          local dictionary home directory
  -h, --help             Show this help message and exit.
      --ignorePlugins=<ignorePluginList>...
                          list of plugins to skip (mainly for debugging)
                           Default: []
      --no-oldstyle      (A) use oldstyle style of processing (project based) for unconverted tools; new style is per
                           tree
      --stripNumbers     Strip numbers from words
  -V, --version          Print version information and exit.
      --wikidataBiblio    lookup wikidata biblographic object
      --wordCount=<wordCountRange>
                         count range for words for frequencies (comma-separated); null
                           Default: (20,1000000)
      --wordLength=<wordLengthRange>
                         length range for words for wordlengths (comma-separated); null
                           Default: (1,20)

existing architecture

The command ami search links (too many) separate subprocesses. It requires fulltext (scholarly.html) which is normally created from published documents (fulltext.xml or fulltext.pdf) by ami transform. The scholarly.html is then analysed with ami word to discover word frequencies ("word clouds"). Then it is searched (ami search) using dictionaries to create results subtrees. The results are then displayed as a "dashboard" (dataTables) . Finally we run 1-D (histograms) and 2-D (cooccurrence) and plot the results.

proposed new architecture

The actions above will be highlighted as picocli commands. Most of the code already exists.

`ami transform`

`ami words`

`ami search`

`ami display --datatables`

`ami display --histogram and --cooccurrence`

`ami summary`

dictionary

Searches use dictionaries usage

ami -p <project> search --dictionary <dict1> <dict2> ...

The dictionaries are addressed in several ways

local dictionary

relative or absolute filename

builtin

URL-based

For developers

stack

against dictionary (deepest first)

SearchSearcher(AMISearcher).searchWithDictionary(List<String>) line: 328	
SearchSearcher.searchWordList() line: 49	
SearchArgProcessor(AbstractSearchArgProcessor).runSearch() line: 94	
SearchArgProcessor.runSearch(ArgumentOption) line: 64

`searchWordList()`

Note createWordList() is discussed in https://github.com/petermr/ami3/wiki/ami-words .

	public ResultsElement searchWordList() {
// create wordList from document 
		List<String> wordsToSearch = new WordCollectionFactory(
                    (AbstractSearchArgProcessor)this.getArgProcessor()).createWordList();
// search, includes phrases
		ResultsElement resultsElement = searchWithDictionary(wordsToSearch);
		return resultsElement;
	}

`ResultsElement AMISearcher.searchWithDictionary(List<String> strings)`

org.contentmine.ami.plugins.AMISearcher:
	public ResultsElement searchWithDictionary(List<String> strings) {
		LOG.debug("SEARCH with dictionary");
		ResultsElement resultsElement = new ResultsElement();
		if (strings != null) {
			for (int pos = 0; pos < strings.size(); pos++) {
				String firstword = strings.get(pos);
				List<List<String>> trailingListList = dictionary.getTrailingWords(firstword);
				if (trailingListList != null) {
					int trailingOffset = canFitTrailing(trailingListList, strings, pos);
					if (trailingOffset != -1) {
						ResultElement resultElement = createResultElement(strings, pos, trailingOffset);
						resultsElement.appendChild(resultElement);
					}
				}
			}
		}
		return resultsElement;
	}

`List<List<String>> getTrailingWords(String headWord)` (in DefaultStringDictionary)

	public List<List<String>> getTrailingWords(String headWord) {
		return trailingWordsByLeadWord != null ? trailingWordsByLeadWord.get(headWord) : null;
	}

ami search

ami search

picocli 2020-08-10

existing architecture

proposed new architecture

ami transform

ami words

ami search

ami display --datatables

ami display --histogram and --cooccurrence

ami summary

dictionary

local dictionary

relative or absolute filename

builtin

URL-based

stack

against dictionary (deepest first)

searchWordList()

ResultsElement AMISearcher.searchWithDictionary(List<String> strings)

List<List<String>> getTrailingWords(String headWord) (in DefaultStringDictionary)

adding synonyms

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

`picocli` 2020-08-10

`ami transform`

`ami words`

`ami search`

`ami display --datatables`

`ami display --histogram and --cooccurrence`

`ami summary`

`searchWordList()`

`ResultsElement AMISearcher.searchWithDictionary(List<String> strings)`

`List<List<String>> getTrailingWords(String headWord)` (in DefaultStringDictionary)