Skip to content

Focus-file creation #21

@martinreynaert

Description

@martinreynaert

I currently have two options to merge the frequency file of the corpus to be corrected with the background file (representing a lexicon or a large background corpus or some combination of both).

However, I seem to have only one option to create a focus file. And this results in all the ngrams that do not completely make/meet the artifrq being incorporated in the focus file. (Unless I am mistaken and overlook another option...)

I would like to have the option to only have those ngrams from the corpus to be corrected that do not meet the artifrq to be incorporated in the focus file.

This might be achieved perhaps by may deferring to include the background file to TICCL-anahash, and have this (or TICCL-unk ?) produce the focus file.

It would be handier, too, if the focus file would also list the actual word forms included, for easy reference.

This last would also enable TICCL-LDcalc to focus only on this (probably) single version of the possible anagrams associated with a particular anagram value, rather than processing them all.

You may wish to regard the above as two separate issues and handle them accordingly.

Thank you!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions