-
Notifications
You must be signed in to change notification settings - Fork 4
Description
I currently have two options to merge the frequency file of the corpus to be corrected with the background file (representing a lexicon or a large background corpus or some combination of both).
However, I seem to have only one option to create a focus file. And this results in all the ngrams that do not completely make/meet the artifrq being incorporated in the focus file. (Unless I am mistaken and overlook another option...)
I would like to have the option to only have those ngrams from the corpus to be corrected that do not meet the artifrq to be incorporated in the focus file.
This might be achieved perhaps by may deferring to include the background file to TICCL-anahash, and have this (or TICCL-unk ?) produce the focus file.
It would be handier, too, if the focus file would also list the actual word forms included, for easy reference.
This last would also enable TICCL-LDcalc to focus only on this (probably) single version of the possible anagrams associated with a particular anagram value, rather than processing them all.
You may wish to regard the above as two separate issues and handle them accordingly.
Thank you!