|
2 | 2 |
|
3 | 3 | ## Presentation
|
4 | 4 |
|
5 |
| -EDS-NLP offers two components to extract behavioral patterns, namely the tobacco and alcohol consumption status. Each component is based on the ContextualMatcher component. |
6 |
| -Some general considerations about those components: |
| 5 | +EDS-NLP offers two components to extract behavioral patterns, namely the tobacco and alcohol consumption status. Each component is based on the [ContextualMatcher][edsnlp.pipes.core.contextual_matcher.ContextualMatcher] matcher, itself based on `eds.contextual_matcher` component. |
7 | 6 |
|
8 |
| -- Extracted entities are stored in `doc.ents` and `doc.spans`. For instance, the `eds.tobacco` component stores matches in `doc.spans["tobacco"]`. |
9 |
| -- The matched comorbidity is also available under the `ent.label_` of each match. |
10 |
| -- Matches have an associated `_.status` attribute taking the value `1`, or `2`. A corresponding `_.detailed_status` attribute stores the human-readable status, which can be component-dependent. See each component documentation for more details. |
11 |
| -- Some components add additional information to matches. For instance, the `tobacco` adds, if relevant, extracted *pack-year* (= *paquet-année*). Those information are available under the `ent._.assigned` attribute. |
12 |
| -- Those components work on **normalized** documents. Please use the `eds.normalizer` pipeline with the following parameters: |
13 |
| - ```{ .python .no-check } |
14 |
| - nlp.add_pipe( |
15 |
| - eds.normalizer( |
16 |
| - accents=True, |
17 |
| - lowercase=True, |
18 |
| - quotes=True, |
19 |
| - spaces=True, |
20 |
| - pollution=dict( |
21 |
| - information=True, |
22 |
| - bars=True, |
23 |
| - biology=True, |
24 |
| - doctors=True, |
25 |
| - web=True, |
26 |
| - coding=True, |
27 |
| - footer=True, |
28 |
| - ), |
29 |
| - ), |
30 |
| - ) |
31 |
| - ``` |
32 |
| - |
33 |
| -!!! warning "Use qualifiers" |
34 |
| - Those components **should be used with a qualification pipeline** to avoid extracted unwanted matches. At the very least, you can use available rule-based qualifiers (`eds.negation`, `eds.hypothesis` and `eds.family`). Better, a machine learning qualification component was developed and trained specifically for those components. For privacy reason, the model isn't publicly available yet. |
35 |
| - |
36 |
| - !!! aphp "Use the ML model" |
37 |
| - |
38 |
| - The model will soon be available in the models catalogue of AP-HP's CDW. |
39 |
| - |
40 |
| -## Usage |
41 |
| - |
42 |
| -```{ .python .no-check } |
43 |
| -import edsnlp, edsnlp.pipes as eds |
44 |
| -
|
45 |
| -nlp = edsnlp.blank("eds") |
46 |
| -nlp.add_pipe(eds.sentences()) |
47 |
| -nlp.add_pipe( |
48 |
| - eds.normalizer( |
49 |
| - accents=True, |
50 |
| - lowercase=True, |
51 |
| - quotes=True, |
52 |
| - spaces=True, |
53 |
| - pollution=dict( |
54 |
| - information=True, |
55 |
| - bars=True, |
56 |
| - biology=True, |
57 |
| - doctors=True, |
58 |
| - web=True, |
59 |
| - coding=True, |
60 |
| - footer=True, |
61 |
| - ), |
62 |
| - ), |
63 |
| -) |
64 |
| -nlp.add_pipe(eds.tobacco()) |
65 |
| -nlp.add_pipe(eds.diabetes()) |
66 |
| -
|
67 |
| -text = """ |
68 |
| -Compte-rendu de consultation. |
69 |
| -
|
70 |
| -Je vois ce jour M. SCOTT pour le suivi de sa rétinopathie diabétique. |
71 |
| -Le patient va bien depuis la dernière fois. |
72 |
| -Je le félicite pour la poursuite de son sevrage tabagique (toujours à 10 paquet-année). |
73 |
| -
|
74 |
| -Sur le plan de son diabète, la glycémie est stable. |
75 |
| -""" |
76 |
| -
|
77 |
| -doc = nlp(text) |
78 |
| -
|
79 |
| -doc.spans |
80 |
| -# Out: { |
81 |
| -# 'pollutions': [], |
82 |
| -# 'tobacco': [sevrage tabagique (toujours à 10 paquet-année], |
83 |
| -# 'diabetes': [rétinopathie diabétique, diabète] |
84 |
| -# } |
85 |
| -
|
86 |
| -tobacco_matches = doc.spans["tobacco"] |
87 |
| -tobacco_matches[0]._.detailed_status |
88 |
| -# Out: "ABSTINENCE" # |
89 |
| -
|
90 |
| -tobacco_matches[0]._.assigned["PA"] # paquet-année |
91 |
| -# Out: 10 # (1) |
92 |
| -
|
93 |
| -
|
94 |
| -diabetes = doc.spans["diabetes"] |
95 |
| -(diabetes[0]._.detailed_status, diabetes[1]._.detailed_status) |
96 |
| -# Out: ('WITH_COMPLICATION', 'WITHOUT_COMPLICATION') # (2) |
97 |
| -``` |
98 |
| - |
99 |
| -1. Here we see an example of additional information that can be extracted |
100 |
| -2. Here we see the importance of document-level aggregation to extract the correct severity of each comorbidity. |
| 7 | +--8<-- "docs/pipes/ner/disorders/presentation.md" |
0 commit comments