Skip to content

Commit 0b4eda6

Browse files
committed
Modified: changelog
1 parent 47add63 commit 0b4eda6

File tree

1 file changed

+114
-0
lines changed

1 file changed

+114
-0
lines changed

changelog.md

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,119 @@
11
# Changelog
22

3+
## Unreleased
4+
5+
### Added
6+
- Relation implementation in `doc.spans["<label>"][i]._.rel = [{'type':'rel_type', 'target': <span>},]`
7+
- Relation connector with brat2docs and docs2brat in `edsnlp.connectors.brat` compatible with `edsnlp.data.read_*` and `edsnlp.data.write_*` (modified files : `edsnlp.data.converters`, `edsnlp.data.standoff`)
8+
- Rule based relation model using proximity and/or sentence in `edsnlp.pipes.misc.relations` registered as `eds.relation`
9+
- Documentation using Mkdocs for relations `docs.pipes.misc.relations.md` and `docs.pipes.misc.index.md`
10+
- Tests for relations `tests.pipelines.misc.test_relations` and ressources `ressources.relations`
11+
- `data.set_processing(...)` now expose an `autocast` parameter to disable or tweak the automatic casting of the tensor
12+
during the processing. Autocasting should result in a slight speedup, but may lead to numerical instability.
13+
- Use `torch.inference_mode` to disable view tracking and version counter bumps during inference.
14+
15+
### Changed
16+
17+
### Fixed
18+
19+
- `edsnlp.load("your/huggingface-model", install_dependencies=True)` now correctly resolves the python pip
20+
(especially on Colab) to auto-install the model dependencies
21+
- We now better handle empty documents in the `eds.transformer`, `eds.text_cnn` and `eds.ner_crf` components
22+
23+
## v0.12.3
24+
25+
### Changed
26+
27+
Packages:
28+
29+
- Pip-installable models are now built with `hatch` instead of poetry, which allows us to expose `artifacts` (weights)
30+
at the root of the sdist package (uploadable to HF) and move them inside the package upon installation to avoid conflicts.
31+
- Dependencies are no longer inferred with dill-magic (this didn't work well before anyway)
32+
- Option to perform substitutions in the model's README.md file (e.g., for the model's name, metrics, ...)
33+
- Huggingface models are now installed with pip *editable* installations, which is faster since it doesn't copy around the weights
34+
35+
## v0.12.1
36+
37+
### Added
38+
39+
- Added binary distribution for linux aarch64 (Streamlit's environment)
40+
- Added new separator option in eds.table and new input check
41+
42+
### Fixed
43+
44+
- Make catalogue & entrypoints compatible with py37-py312
45+
- Check that a data has a doc before trying to use the document's `note_datetime`
46+
47+
## v0.12.0
48+
49+
### Added
50+
51+
- The `eds.transformer` component now accepts `prompts` (passed to its `preprocess` method, see breaking change below) to add before each window of text to embed.
52+
- `LazyCollection.map` / `map_batches` now support generator functions as arguments.
53+
- Window stride can now be disabled (i.e., stride = window) during training in the `eds.transformer` component by `training_stride = False`
54+
- Added a new `eds.ner_overlap_scorer` to evaluate matches between two lists of entities, counting true when the dice overlap is above a given threshold
55+
- `edsnlp.load` now accepts EDS-NLP models from the huggingface hub 🤗 !
56+
- New `python -m edsnlp.package` command to package a model for the huggingface hub or pypi-like registries
57+
- Improve table detection in `eds.tables` and support new options in `table._.to_pd_table(...)`:
58+
- `header=True` to use first row as header
59+
- `index=True` to use first column as index
60+
- `as_spans=True` to fill cells as document spans instead of strings
61+
62+
### Changed
63+
64+
- :boom: Major breaking change in trainable components, moving towards a more "task-centric" design:
65+
- the `eds.transformer` component is no longer responsible for deciding which spans of text ("contexts") should be embedded. These contexts are now passed via the `preprocess` method, which now accepts more arguments than just the docs to process.
66+
- similarly the `eds.span_pooler` is now longer responsible for deciding which spans to pool, and instead pools all spans passed to it in the `preprocess` method.
67+
68+
Consequently, the `eds.transformer` and `eds.span_pooler` no longer accept their `span_getter` argument, and the `eds.ner_crf`, `eds.span_classifier`, `eds.span_linker` and `eds.span_qualifier` components now accept a `context_getter` argument instead, as well as a `span_getter` argument for the latter two. This refactoring can be summarized as follows:
69+
70+
```diff
71+
- eds.transformer.span_getter
72+
+ eds.ner_crf.context_getter
73+
+ eds.span_classifier.context_getter
74+
+ eds.span_linker.context_getter
75+
76+
- eds.span_pooler.span_getter
77+
+ eds.span_qualifier.span_getter
78+
+ eds.span_linker.span_getter
79+
```
80+
81+
and as an example for the `eds.span_linker` component:
82+
83+
```diff
84+
nlp.add_pipe(
85+
eds.span_linker(
86+
metric="cosine",
87+
probability_mode="sigmoid",
88+
+ span_getter="ents",
89+
+ # context_getter="ents", -> by default, same as span_getter
90+
embedding=eds.span_pooler(
91+
hidden_size=128,
92+
- span_getter="ents",
93+
embedding=eds.transformer(
94+
- span_getter="ents",
95+
model="prajjwal1/bert-tiny",
96+
window=128,
97+
stride=96,
98+
),
99+
),
100+
),
101+
name="linker",
102+
)
103+
```
104+
- Trainable embedding components now all use `foldedtensor` to return embeddings, instead of returning a tensor of floats and a mask tensor.
105+
- :boom: TorchComponent `__call__` no longer applies the end to end method, and instead calls the `forward` method directly, like all torch modules.
106+
- The trainable `eds.span_qualifier` component has been renamed to `eds.span_classifier` to reflect its general purpose (it doesn't only predict qualifiers, but any attribute of a span using its context or not).
107+
- `omop` converter now takes the `note_datetime` field into account by default when building a document
108+
- `span._.date.to_datetime()` and `span._.date.to_duration()` now automatically take the `note_datetime` into account
109+
- `nlp.vocab` is no longer serialized when saving a model, as it may contain sensitive information and can be recomputed during inference anyway
110+
111+
### Fixed
112+
113+
- `edsnlp.data.read_json` now correctly read the files from the directory passed as an argument, and not from the parent directory.
114+
- Overwrite spacy's Doc, Span and Token pickling utils to allow recursively storing Doc, Span and Token objects in the extension values (in particular, span._.date.doc)
115+
- Removed pendulum dependency, solving various pickling, multiprocessing and missing attributes errors
116+
3117
## v0.11.2
4118

5119
### Fixed

0 commit comments

Comments
 (0)