❗ Most of the functionality in this project has now been made available the library clinlp: production ready NLP pipelines for Dutch Clinical Text. Although the code here might still benefit some projects, the project itself is no longer maintained (and thus archived).
This package bundles some functionality for applying NLP (preprocessing) techniques to clinical text in psychiatry. Specifically, it contains the following submodules:
preprocessing-- Preprocessing textspelling-- Spelling correctionentity-- Entity matchingcontext-- Detecting properties of entities (e.g. negation, plausibility) based on context
These submodules are further documented in their respective readmes, which you will find by following the links above.
Since some paths need to be initialized, installation is most easily done by downloading the source, modifying paths in (psynlp/utils.py -- see Requirements below), and running:
pip install -r requirements.txt
python setup.py install The psynlp package has the following dependencies (automatically installed when using the commands above):
doublemetaphonegensimnltkpandasspacy
Some functionality requires specific models, which are not included in the repository because of their privacy-sensitive nature. Their paths should be specified in psynlp/utils.py.
- A
spacymodel can be obtained here (e.g.python -m spacy download nl_core_news_smfor standard Dutch model) - A
gensimtrained Word2Vec model, used for theEmbeddingRankerin thespellingmodule. - Token frequencies in the specific corpus required for the
NoisyRanker, in acsvfile (;-separated with atokenand afrequencycolumn).
psynlp follows an object-oriented paradigm, much like the sklearn libary for machine learning. To use the spelling correction from the spelling submodule for instance, the following code can be used:
from psynlp.spelling import SpellChecker
c = SpellChecker(spacy_model="your_spacy_model_name")
c.correct("Dit is een tekst met daarin een splefout")
>>> "Dit is een tekst met daarin een spelfout"Usage is futher documented in detail in the respective submodule READMEs.
Basic usage and API of each submodule is documented in the submodule README. Additionally, some use cases are documented in the following notebooks (also referenced in the relevant submodule READMEs):
preprocessing.ipynb-- Example code for preprocessingspelling.ipynb-- Example code for spelling correctionentity.ipynb-- Example code for entity recognitioncontext.ipynb-- Example code for context matchingexample_pipeline.ipynb-- Example code for extracting variables from text, using all of the four submodules
Vincent Menger -- Conceptualization, developing code
Nick Ermers -- Improving context detection