|
1 | 1 | # Changelog
|
2 | 2 |
|
| 3 | +## Unreleased |
| 4 | + |
| 5 | +### Added |
| 6 | +- Relation implementation in `doc.spans["<label>"][i]._.rel = [{'type':'rel_type', 'target': <span>},]` |
| 7 | +- Relation connector with brat2docs and docs2brat in `edsnlp.connectors.brat` compatible with `edsnlp.data.read_*` and `edsnlp.data.write_*` (modified files : `edsnlp.data.converters`, `edsnlp.data.standoff`) |
| 8 | +- Rule based relation model using proximity and/or sentence in `edsnlp.pipes.misc.relations` registered as `eds.relation` |
| 9 | +- Documentation using Mkdocs for relations `docs.pipes.misc.relations.md` and `docs.pipes.misc.index.md` |
| 10 | +- Tests for relations `tests.pipelines.misc.test_relations` and ressources `ressources.relations` |
| 11 | +- `data.set_processing(...)` now expose an `autocast` parameter to disable or tweak the automatic casting of the tensor |
| 12 | + during the processing. Autocasting should result in a slight speedup, but may lead to numerical instability. |
| 13 | +- Use `torch.inference_mode` to disable view tracking and version counter bumps during inference. |
| 14 | + |
| 15 | +### Changed |
| 16 | + |
| 17 | +### Fixed |
| 18 | + |
| 19 | +- `edsnlp.load("your/huggingface-model", install_dependencies=True)` now correctly resolves the python pip |
| 20 | + (especially on Colab) to auto-install the model dependencies |
| 21 | +- We now better handle empty documents in the `eds.transformer`, `eds.text_cnn` and `eds.ner_crf` components |
| 22 | + |
| 23 | +## v0.12.3 |
| 24 | + |
| 25 | +### Changed |
| 26 | + |
| 27 | +Packages: |
| 28 | + |
| 29 | +- Pip-installable models are now built with `hatch` instead of poetry, which allows us to expose `artifacts` (weights) |
| 30 | + at the root of the sdist package (uploadable to HF) and move them inside the package upon installation to avoid conflicts. |
| 31 | +- Dependencies are no longer inferred with dill-magic (this didn't work well before anyway) |
| 32 | +- Option to perform substitutions in the model's README.md file (e.g., for the model's name, metrics, ...) |
| 33 | +- Huggingface models are now installed with pip *editable* installations, which is faster since it doesn't copy around the weights |
| 34 | + |
| 35 | +## v0.12.1 |
| 36 | + |
| 37 | +### Added |
| 38 | + |
| 39 | +- Added binary distribution for linux aarch64 (Streamlit's environment) |
| 40 | +- Added new separator option in eds.table and new input check |
| 41 | + |
| 42 | +### Fixed |
| 43 | + |
| 44 | +- Make catalogue & entrypoints compatible with py37-py312 |
| 45 | +- Check that a data has a doc before trying to use the document's `note_datetime` |
| 46 | + |
| 47 | +## v0.12.0 |
| 48 | + |
| 49 | +### Added |
| 50 | + |
| 51 | +- The `eds.transformer` component now accepts `prompts` (passed to its `preprocess` method, see breaking change below) to add before each window of text to embed. |
| 52 | +- `LazyCollection.map` / `map_batches` now support generator functions as arguments. |
| 53 | +- Window stride can now be disabled (i.e., stride = window) during training in the `eds.transformer` component by `training_stride = False` |
| 54 | +- Added a new `eds.ner_overlap_scorer` to evaluate matches between two lists of entities, counting true when the dice overlap is above a given threshold |
| 55 | +- `edsnlp.load` now accepts EDS-NLP models from the huggingface hub 🤗 ! |
| 56 | +- New `python -m edsnlp.package` command to package a model for the huggingface hub or pypi-like registries |
| 57 | +- Improve table detection in `eds.tables` and support new options in `table._.to_pd_table(...)`: |
| 58 | + - `header=True` to use first row as header |
| 59 | + - `index=True` to use first column as index |
| 60 | + - `as_spans=True` to fill cells as document spans instead of strings |
| 61 | + |
| 62 | +### Changed |
| 63 | + |
| 64 | +- :boom: Major breaking change in trainable components, moving towards a more "task-centric" design: |
| 65 | + - the `eds.transformer` component is no longer responsible for deciding which spans of text ("contexts") should be embedded. These contexts are now passed via the `preprocess` method, which now accepts more arguments than just the docs to process. |
| 66 | + - similarly the `eds.span_pooler` is now longer responsible for deciding which spans to pool, and instead pools all spans passed to it in the `preprocess` method. |
| 67 | + |
| 68 | + Consequently, the `eds.transformer` and `eds.span_pooler` no longer accept their `span_getter` argument, and the `eds.ner_crf`, `eds.span_classifier`, `eds.span_linker` and `eds.span_qualifier` components now accept a `context_getter` argument instead, as well as a `span_getter` argument for the latter two. This refactoring can be summarized as follows: |
| 69 | + |
| 70 | + ```diff |
| 71 | + - eds.transformer.span_getter |
| 72 | + + eds.ner_crf.context_getter |
| 73 | + + eds.span_classifier.context_getter |
| 74 | + + eds.span_linker.context_getter |
| 75 | + |
| 76 | + - eds.span_pooler.span_getter |
| 77 | + + eds.span_qualifier.span_getter |
| 78 | + + eds.span_linker.span_getter |
| 79 | + ``` |
| 80 | + |
| 81 | + and as an example for the `eds.span_linker` component: |
| 82 | + |
| 83 | + ```diff |
| 84 | + nlp.add_pipe( |
| 85 | + eds.span_linker( |
| 86 | + metric="cosine", |
| 87 | + probability_mode="sigmoid", |
| 88 | + + span_getter="ents", |
| 89 | + + # context_getter="ents", -> by default, same as span_getter |
| 90 | + embedding=eds.span_pooler( |
| 91 | + hidden_size=128, |
| 92 | + - span_getter="ents", |
| 93 | + embedding=eds.transformer( |
| 94 | + - span_getter="ents", |
| 95 | + model="prajjwal1/bert-tiny", |
| 96 | + window=128, |
| 97 | + stride=96, |
| 98 | + ), |
| 99 | + ), |
| 100 | + ), |
| 101 | + name="linker", |
| 102 | + ) |
| 103 | + ``` |
| 104 | +- Trainable embedding components now all use `foldedtensor` to return embeddings, instead of returning a tensor of floats and a mask tensor. |
| 105 | +- :boom: TorchComponent `__call__` no longer applies the end to end method, and instead calls the `forward` method directly, like all torch modules. |
| 106 | +- The trainable `eds.span_qualifier` component has been renamed to `eds.span_classifier` to reflect its general purpose (it doesn't only predict qualifiers, but any attribute of a span using its context or not). |
| 107 | +- `omop` converter now takes the `note_datetime` field into account by default when building a document |
| 108 | +- `span._.date.to_datetime()` and `span._.date.to_duration()` now automatically take the `note_datetime` into account |
| 109 | +- `nlp.vocab` is no longer serialized when saving a model, as it may contain sensitive information and can be recomputed during inference anyway |
| 110 | + |
| 111 | +### Fixed |
| 112 | + |
| 113 | +- `edsnlp.data.read_json` now correctly read the files from the directory passed as an argument, and not from the parent directory. |
| 114 | +- Overwrite spacy's Doc, Span and Token pickling utils to allow recursively storing Doc, Span and Token objects in the extension values (in particular, span._.date.doc) |
| 115 | +- Removed pendulum dependency, solving various pickling, multiprocessing and missing attributes errors |
| 116 | + |
3 | 117 | ## v0.11.2
|
4 | 118 |
|
5 | 119 | ### Fixed
|
|
0 commit comments