Skip to content

Commit c1cf750

Browse files
committed
docs: add details for disorders and behavior pipes
1 parent 5d790d2 commit c1cf750

File tree

5 files changed

+90
-149
lines changed

5 files changed

+90
-149
lines changed

docs/pipes/ner/behaviors/alcohol.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
# Alcohol consumption {: #edsnlp.pipes.ner.behaviors.alcohol.factory.create_component }
22

3+
--8<-- "docs/pipes/ner/disorders/warning.md"
4+
35
::: edsnlp.pipes.ner.behaviors.alcohol.factory.create_component
46
options:
57
heading_level: 2

docs/pipes/ner/behaviors/index.md

Lines changed: 2 additions & 95 deletions
Original file line numberDiff line numberDiff line change
@@ -2,99 +2,6 @@
22

33
## Presentation
44

5-
EDS-NLP offers two components to extract behavioral patterns, namely the tobacco and alcohol consumption status. Each component is based on the ContextualMatcher component.
6-
Some general considerations about those components:
5+
EDS-NLP offers two components to extract behavioral patterns, namely the tobacco and alcohol consumption status. Each component is based on the [ContextualMatcher][edsnlp.pipes.core.contextual_matcher.ContextualMatcher] matcher, itself based on `eds.contextual_matcher` component.
76

8-
- Extracted entities are stored in `doc.ents` and `doc.spans`. For instance, the `eds.tobacco` component stores matches in `doc.spans["tobacco"]`.
9-
- The matched comorbidity is also available under the `ent.label_` of each match.
10-
- Matches have an associated `_.status` attribute taking the value `1`, or `2`. A corresponding `_.detailed_status` attribute stores the human-readable status, which can be component-dependent. See each component documentation for more details.
11-
- Some components add additional information to matches. For instance, the `tobacco` adds, if relevant, extracted *pack-year* (= *paquet-année*). Those information are available under the `ent._.assigned` attribute.
12-
- Those components work on **normalized** documents. Please use the `eds.normalizer` pipeline with the following parameters:
13-
```{ .python .no-check }
14-
nlp.add_pipe(
15-
eds.normalizer(
16-
accents=True,
17-
lowercase=True,
18-
quotes=True,
19-
spaces=True,
20-
pollution=dict(
21-
information=True,
22-
bars=True,
23-
biology=True,
24-
doctors=True,
25-
web=True,
26-
coding=True,
27-
footer=True,
28-
),
29-
),
30-
)
31-
```
32-
33-
!!! warning "Use qualifiers"
34-
Those components **should be used with a qualification pipeline** to avoid extracted unwanted matches. At the very least, you can use available rule-based qualifiers (`eds.negation`, `eds.hypothesis` and `eds.family`). Better, a machine learning qualification component was developed and trained specifically for those components. For privacy reason, the model isn't publicly available yet.
35-
36-
!!! aphp "Use the ML model"
37-
38-
The model will soon be available in the models catalogue of AP-HP's CDW.
39-
40-
## Usage
41-
42-
```{ .python .no-check }
43-
import edsnlp, edsnlp.pipes as eds
44-
45-
nlp = edsnlp.blank("eds")
46-
nlp.add_pipe(eds.sentences())
47-
nlp.add_pipe(
48-
eds.normalizer(
49-
accents=True,
50-
lowercase=True,
51-
quotes=True,
52-
spaces=True,
53-
pollution=dict(
54-
information=True,
55-
bars=True,
56-
biology=True,
57-
doctors=True,
58-
web=True,
59-
coding=True,
60-
footer=True,
61-
),
62-
),
63-
)
64-
nlp.add_pipe(eds.tobacco())
65-
nlp.add_pipe(eds.diabetes())
66-
67-
text = """
68-
Compte-rendu de consultation.
69-
70-
Je vois ce jour M. SCOTT pour le suivi de sa rétinopathie diabétique.
71-
Le patient va bien depuis la dernière fois.
72-
Je le félicite pour la poursuite de son sevrage tabagique (toujours à 10 paquet-année).
73-
74-
Sur le plan de son diabète, la glycémie est stable.
75-
"""
76-
77-
doc = nlp(text)
78-
79-
doc.spans
80-
# Out: {
81-
# 'pollutions': [],
82-
# 'tobacco': [sevrage tabagique (toujours à 10 paquet-année],
83-
# 'diabetes': [rétinopathie diabétique, diabète]
84-
# }
85-
86-
tobacco_matches = doc.spans["tobacco"]
87-
tobacco_matches[0]._.detailed_status
88-
# Out: "ABSTINENCE" #
89-
90-
tobacco_matches[0]._.assigned["PA"] # paquet-année
91-
# Out: 10 # (1)
92-
93-
94-
diabetes = doc.spans["diabetes"]
95-
(diabetes[0]._.detailed_status, diabetes[1]._.detailed_status)
96-
# Out: ('WITH_COMPLICATION', 'WITHOUT_COMPLICATION') # (2)
97-
```
98-
99-
1. Here we see an example of additional information that can be extracted
100-
2. Here we see the importance of document-level aggregation to extract the correct severity of each comorbidity.
7+
--8<-- "docs/pipes/ner/disorders/presentation.md"

docs/pipes/ner/disorders/index.md

Lines changed: 2 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -2,58 +2,6 @@
22

33
## Presentation
44

5-
The following components extract 16 different conditions from the [Charlson Comorbidity Index](https://www.rdplf.org/calculateurs/pages/charlson/charlson.html). Each component is based on the ContextualMatcher component.
5+
The following components extract 16 different conditions from the [Charlson Comorbidity Index](https://www.rdplf.org/calculateurs/pages/charlson/charlson.html). Each component is based on the [ContextualMatcher][edsnlp.pipes.core.contextual_matcher.ContextualMatcher] matcher, itself based on `eds.contextual_matcher` component.
66

7-
The components were developed by AP-HP's Data Science team with a team of medical experts, following the insights of the algorithm proposed by [@petitjean_2024]
8-
9-
Some general considerations about those components:
10-
11-
- Extracted entities are stored in `doc.ents` and `doc.spans`. For instance, the `eds.tobacco` component stores matches in `doc.spans["tobacco"]`.
12-
- The matched comorbidity is also available under the `ent.label_` of each match.
13-
- Matches have an associated `_.status` attribute taking the value `1`, or `2`. A corresponding `_.detailed_status` attribute stores the human-readable status, which can be component-dependent. See each component documentation for more details.
14-
- Some components add additional information to matches. For instance, the `tobacco` adds, if relevant, extracted *pack-year* (= *paquet-année*). Those information are available under the `ent._.assigned` attribute.
15-
- Those components work on **normalized** documents. Please use the `eds.normalizer` pipeline with the following parameters:
16-
17-
```{ .python .no-check }
18-
import edsnlp, edsnlp.pipes as eds
19-
...
20-
21-
nlp.add_pipe(
22-
eds.normalizer(
23-
accents=True,
24-
lowercase=True,
25-
quotes=True,
26-
spaces=True,
27-
pollution=dict(
28-
information=True,
29-
bars=True,
30-
biology=True,
31-
doctors=True,
32-
web=True,
33-
coding=True,
34-
footer=True,
35-
),
36-
),
37-
)
38-
```
39-
40-
!!! warning "Use qualifiers"
41-
Those components **should be used with a qualification pipeline** to avoid extracted unwanted matches. At the very least, you can use available rule-based qualifiers (`eds.negation`, `eds.hypothesis` and `eds.family`). Better, a machine learning qualification component was developed and trained specifically for those components. For privacy reason, the model isn't publicly available yet.
42-
43-
!!! aphp "Use the ML model"
44-
45-
The model will soon be available in the models catalogue of AP-HP's CDW.
46-
47-
!!! tip "On the medical definition of the comorbidities"
48-
49-
Those components were developped to extract **chronic** and **symptomatic** conditions only.
50-
51-
## Aggregation
52-
53-
For relevant phenotyping, matches should be aggregated at the document-level. For instance, a document might mention a complicated diabetes at the beginning ("*Le patient a une rétinopathie diabétique*"), and then refer to this diabetes without mentionning that it is complicated anymore ("*Concernant son diabète, le patient ...*").
54-
Thus, a good and simple aggregation rule is, for each comorbidity, to
55-
56-
- disregard all entities tagged as irrelevant by the qualification component(s)
57-
- take the maximum (i.e., the most severe) status of the leftover entities
58-
59-
An implementation of this rule is presented [here][aggregating-results]
7+
--8<-- "docs/pipes/ner/disorders/presentation.md"
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
The components were developed by AP-HP's Data Science team with a team of medical experts, following the insights of the algorithm proposed by [@petitjean_2024]
2+
3+
Some general considerations about those components:
4+
5+
- Extracted entities are stored in `doc.ents` and `doc.spans`. For instance, the `eds.tobacco` component stores matches in `doc.spans["tobacco"]`.
6+
- The matched comorbidity is also available under the `ent.label_` of each match.
7+
- Matches have an associated `_.status` attribute taking the value `1`, or `2`. A corresponding `_.detailed_status` attribute stores the human-readable status, which can be component-dependent. See each component documentation for more details.
8+
- Some components add additional information to matches. For instance, the `tobacco` adds, if relevant, extracted *pack-year* (= *paquet-année*). Those information are available under the `ent._.assigned` attribute.
9+
- Those components work on **normalized** documents. Please use the `eds.normalizer` pipeline (see [Usage](#usage) below)
10+
11+
--8<-- "docs/pipes/ner/disorders/warning.md"
12+
13+
!!! warning "Use qualifiers"
14+
Those components **should be used with a qualification pipeline** to avoid extracted unwanted matches. At the very least, you should use available rule-based qualifiers (`eds.negation`, `eds.hypothesis` and `eds.family`). Better, a machine learning qualification component was developed and trained specifically for those components. For privacy reason, the model isn't publicly available yet.
15+
16+
!!! aphp "Use the ML model"
17+
18+
For projects working on AP-HP's CDW, this model is available via its models catalogue.
19+
20+
## Usage
21+
22+
```{ .python .no-check }
23+
import edsnlp, edsnlp.pipes as eds
24+
25+
nlp = edsnlp.blank("eds")
26+
nlp.add_pipe(eds.sentences())
27+
nlp.add_pipe(
28+
eds.normalizer(
29+
accents=True,
30+
lowercase=True,
31+
quotes=True,
32+
spaces=True,
33+
pollution=dict(
34+
biology=True, #(1)
35+
coding=True, #(2)
36+
),
37+
),
38+
)
39+
nlp.add_pipe(eds.tobacco())
40+
nlp.add_pipe(eds.diabetes())
41+
42+
text = """
43+
Compte-rendu de consultation.
44+
45+
Je vois ce jour M. SCOTT pour le suivi de sa rétinopathie diabétique.
46+
Le patient va bien depuis la dernière fois.
47+
Je le félicite pour la poursuite de son sevrage tabagique (toujours à 10 paquet-année).
48+
49+
Sur le plan de son diabète, la glycémie est stable.
50+
"""
51+
52+
doc = nlp(text)
53+
54+
doc.spans
55+
# Out: {
56+
# 'pollutions': [],
57+
# 'tobacco': [sevrage tabagique (toujours à 10 paquet-année],
58+
# 'diabetes': [rétinopathie diabétique, diabète]
59+
# }
60+
61+
tobacco_matches = doc.spans["tobacco"]
62+
tobacco_matches[0]._.detailed_status
63+
# Out: "ABSTINENCE" #
64+
65+
tobacco_matches[0]._.assigned["PA"] # paquet-année
66+
# Out: 10 # (3)
67+
68+
69+
diabetes = doc.spans["diabetes"]
70+
(diabetes[0]._.detailed_status, diabetes[1]._.detailed_status)
71+
# Out: ('WITH_COMPLICATION', 'WITHOUT_COMPLICATION') # (4)
72+
```
73+
74+
1. This will discard mentions of biology results, which often leads to false positive
75+
2. This will discard mentions of ICD10 coding that sometimes appears at the end of clinical documents
76+
3. Here we see an example of additional information that can be extracted
77+
4. Here we see the importance of document-level aggregation to extract the correct severity of each comorbidity.
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
!!! danger "On overlapping entities"
2+
When using multiple disorders or behavior pipelines, some entities may be extracted from different pipes. For instance:
3+
4+
* "Intoxication éthylotabagique" will be tagged both by `eds.tobacco` and `eds.alcohol`
5+
* "Chirrose alcoolique" will be tagged both by `eds.liver_disease` and `eds.alcohol`
6+
7+
As `doc.ents` discards overlapping entities, you should use `doc.spans` instead.

0 commit comments

Comments
 (0)