Skip to content

Conversation

@ArneBinder
Copy link
Owner

@ArneBinder ArneBinder commented Sep 23, 2025

This PR implements #459, i.e., it adds the models and taskmodules implemented originally in pie-modules (except for QA and span-pair based RE, see potential follow-ups below).

  • added models:
    • SequenceClassificationModelWithPooler
    • SequencePairSimilarityModelWithPooler
    • SimpleTokenClassificationModel
    • SimpleGenerativeModel
    • SimpleSequenceClassificationModel
    • TokenClassificationModelWithSeq2SeqEncoderAndCrf
  • added taskmodules:
    • RETextClassificationWithIndicesTaskModule
    • TextToTextTaskModule
    • LabeledSpanExtractionByTokenClassificationTaskModule
    • PointerNetworkTaskModuleForEnd2EndRE
    • CrossTextBinaryCorefTaskModule

IMPORTANT: This restricts the version of transformers to >=4.35.0,<4.37.0! So, this is breaking.

requires:

Additional changes:

  • add tabulate, and pytorch-crf to dev dependencies
  • set dependence torchmetrics[text] >=1.5, <2 to solve conflicts with nltk (text loads the required additional dependencies and >=1.5 ensures that no deprecated nltk models are loaded. Note that we already use the modern nltk models in pie_documents.document.processing.NltkSentenceSplitter)
  • add SpanNotAlignedWithTokenException and get_aligned_token_span to utils.document
  • add RequiresMaxInputLength and RequiresTaskmoduleConfig to models.interface

follow-ups:

  • make AutoAnnotationPipeline work #502
  • [OPTIONAL] add remaining models (SimpleExtractiveQuestionAnsweringModel and SpanTupleClassificationModel)
  • [OPTIONAL] add remaining taskmodules (ExtractiveQuestionAnsweringTaskModule, and RESpanPairClassificationTaskModule)

@ArneBinder ArneBinder self-assigned this Sep 23, 2025
@ArneBinder ArneBinder force-pushed the add-models-and-taskmodules-from-pie-modules branch from 8236e00 to b6ac518 Compare September 23, 2025 17:50
@ArneBinder ArneBinder added the breaking Breaking Changes label Sep 23, 2025
@ArneBinder
Copy link
Owner Author

ArneBinder commented Sep 24, 2025

regarding: "investigate why tests fail with missing nltk model error (only on CI, not local!)"

@RainbowRivey did some investigation:

In test run it installs 'torchmetrics (1.4.0.post0)' which is an old version using 'punkt', but we manually install latest NLTK which now supports only 'punkt_tab'
This should be fixed by setting minimal torchmetrics version higher or using older nltk (in torchmetrics deps it is nltk >=3.6, <=3.8.1)
I wonder if torchmetrics could install the proper nltk version itself without manually searching its dependencies...

@codecov
Copy link

codecov bot commented Sep 24, 2025

Codecov Report

❌ Patch coverage is 88.92947% with 394 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.68%. Comparing base (a6bb91d) to head (cf64392).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...taskmodules/re_text_classification_with_indices.py 88.23% 47 Missing and 31 partials ⚠️
...h_ie/taskmodules/pointer_network_for_end2end_re.py 88.00% 22 Missing and 26 partials ⚠️
...dels/base_models/bart_with_decoder_position_ids.py 83.11% 13 Missing and 13 partials ⚠️
src/pytorch_ie/taskmodules/text_to_text.py 86.17% 11 Missing and 15 partials ⚠️
...ules/pointer_network/annotation_encoder_decoder.py 89.47% 12 Missing and 12 partials ⚠️
...es/metrics/wrapped_metric_with_prepare_function.py 76.31% 9 Missing and 9 partials ⚠️
...h_ie/models/base_models/bart_as_pointer_network.py 89.17% 11 Missing and 6 partials ⚠️
...odels/common/model_with_metrics_from_taskmodule.py 75.00% 9 Missing and 8 partials ⚠️
...h_ie/models/sequence_classification_with_pooler.py 89.44% 9 Missing and 8 partials ⚠️
src/pytorch_ie/taskmodules/common/mixins.py 86.50% 10 Missing and 7 partials ⚠️
... and 16 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #498      +/-   ##
==========================================
+ Coverage   76.98%   82.68%   +5.70%     
==========================================
  Files          31       64      +33     
  Lines        1803     5361    +3558     
  Branches      350     1129     +779     
==========================================
+ Hits         1388     4433    +3045     
- Misses        337      675     +338     
- Partials       78      253     +175     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ArneBinder ArneBinder marked this pull request as ready for review September 24, 2025 13:21
@ArneBinder ArneBinder merged commit 39a25be into main Sep 24, 2025
4 checks passed
@ArneBinder ArneBinder deleted the add-models-and-taskmodules-from-pie-modules branch September 24, 2025 13:29
@ArneBinder ArneBinder linked an issue Sep 24, 2025 that may be closed by this pull request
ArneBinder added a commit that referenced this pull request Sep 24, 2025
make a copy to ensure original documents are not modified in non-inplace
mode.

background: This is a regression introduced in #498, where we changed
the scope of the `documents` fixture to `module`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Breaking Changes epic

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add taskmodules and models from pie-modules

3 participants