Skip to content
This repository was archived by the owner on Sep 9, 2025. It is now read-only.
This repository was archived by the owner on Sep 9, 2025. It is now read-only.

What is the difference between the cdb.dat files provided by the UMLS Small/Full models versus creating own cdb.dat from UMLS DB #137

@stefanhgm

Description

@stefanhgm

Hi everyone,

first of all thanks for your great work! I want to setup MedCatTrainer for UMLS tagging. When using one of the prepared UMLS models (small/large) I get the tagging including the CUI, but no other information. I think the problem is the missing "concepts imported" (red cross for the project). Hence, I tried to upload the necessary concepts (cdb.dat).

I wondered if the cdb.dat files of the prepared UMLS models come with the necessary information already (i.e. UMLS name, types, ...) or if it is still necessary to load the UMLS into Postgres, execute the script from the MedCat paper to get a CSV, and building the cdb. I tried the latter process, but it is very cumbersome and the documentation seems outdated (e.g. prep_cdb.prepare_csvs(paths) as used here https://towardsdatascience.com/medcat-extracting-diseases-from-electronic-health-records-f53c45b3d1c1 does not exist anymore).

In case a newly generated cdb.dat file is needed, I wondered if you could also provide it via the NIH links as the UMLS models. I think this would save a lot of hassle for anyone trying to use the UMLS as a backbone of MedCat.

Cheers!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions