-
Notifications
You must be signed in to change notification settings - Fork 0
Description
What should be the final architecture of the PIE family?
pie-core
dependencies: huggingface-hub, pytorch-lightning (just for pytorch-lightning.core.mixins.HyperparametersMixin)
content (see #7 for most recent overview):
- document and annotation
- metric and statistic
- module (de-)serialization, i.e., hf_hub_mixin
- (generic) auto
- taskmodule (including AutoTaskmodule)
- model (including AutoModel)
- annotation_pipeline (including AutoAnnotationPipeline)
- common utilities (e.g. hydra and dictionary helpers)
- common classes required by pie-modules as well as pie-datasets
WithDocumentTypeMixin: this is a bit unfortunate because it requirespie_datasets.datasetEDIT: allowWithDocumentTypeMixinfordocument_typeforDatasetDict.to_document_typepie-datasets#176 should allow to removeconvert_dataset(Enter|Exit)Dataset(Dict)Mixin
required actions:
- increase test coverage, see implement missing tests #7
- create a release: https://github.com/ArneBinder/pie-core/releases/tag/v0.2.0
pie-datasets
dependencies: pie_core, datasets
dataset_builder dependencies: pie-modules (for documents, annotations, and utils.sequence_tagging)
dev dependencies: same as dataset_builder dependencies
content:
- dataset, iterable dataset, dataset dict
- common base dataset builders (e.g., Brat)
- with required annotations and documents
- helper methods and classes for dataset builders (e.g., Pipeline, Caster, Converter from
pie_dataset.document.processing.generic)- or should this rather live in
pie-modules?
- or should this rather live in
- dataset builders (but this does not live in the actual
pie_datasetspackage)
required actions:
- remove explicit usage of
pytorch-ie(core only) pie-datasets#190 - remove explicit usage of
pytorch-ie(datasets only) pie-datasets#192 - (BREAKING) use
pie-coreinstead ofpie-modules(core only, not for datasets) pie-datasets#178 - (OPTIONAL) (BREAKING) update brat attribute layers pie-datasets#145
- create breaking release
- update dataset scripts at HF hub with versions before the breaking release (update dataset scripts at HF hub with versions
<0.11pie-datasets#206) - update dataset scripts at HF hub with versions after the breaking release (update dataset scripts at HF hub with versions
=0.11pie-datasets#207) - use
pie-documentsinstead ofpie-modulespie-datasets#208 - update dataset scripts at HF hub with versions from #211 pie-datasets#210
pie-modules pie-documents
dependencies: pie-core
lazy dependencies (imported on usage): nltk, flair (both for sentence splitters)
content: everything not PyTorch-specific
- document processing (e.g., tokenization, regex partitioner, etc.)
- document types and annotations required for that
- base document and annotations types
- (document) metrics
- (document) statistics
required actions:
still in pie-modules:
- integrate (document) metrics and statistics from
pytorch_ie, see - create non-breaking release of pie-modules (requires Add
argument_and_relation_type_whitelistparameter toRETextClassificationWithIndicesTaskModulepie-modules#170) with all current features as "backup" - create non-breaking release including
pytorch-ieusingpie-core, i.e., usepie-corein a non-breaking way pytorch-ie#456: pie-modules v0.15.9 - deprecate pie-modules
in pie-documents:
- use
pie_coreinstead ofpytorch_ie, see remove pytorch-ie pie-documents#1- remove models and taskmodules
- integrate annotations and documents from
pytorch_ie
- create release, see pie-documents v0.1.0
PyTorch-IE
dependencies: torch, torchmetrics, pie-core, pie-modules, maybe pie-datasets?
content: torch related model (and taskmodule) implementations
- (pytorch) model
- (pytorch) pipeline
- models
- taskmodules
- datamodule?
required actions:
- use
pie_core, see usepie-corein a non-breaking way pytorch-ie#456 - create (hopefully) non-breaking release: 0.31.9
- derive
PyTorchIEPipelinefrompie_core.AnnotationPipelinepytorch-ie#475 - create another (hopefully) non-breaking release, see v0.31.11
- create a breaking release that uses the new pie-core v0.3.0, see v0.32.0
- (BREAKING) use
pie-documentspytorch-ie#476- requires pie-documents release that
- does not depend on pytorch-ie
- has annotations and documents
- remove annotations and documents and import them from
pie_modules - remove metrics and statistics and import them from
pie_modules
- requires pie-documents release that
- add
tokenize_documentfrompie-modulespytorch-ie#462 - add taskmodules and models from pie-modules pytorch-ie#459
- (BREAKING) make
AutoAnnotationPipelinework- annotate all model implementations with
@Model.register()(instead of@PyTorchIEModel.register()) - remove
auto_model_class = AutoPyTorchIEModelfromPyTorchIEPipeline - remove
AutoPyTorchIEModel
- annotate all model implementations with
-
(OPTIONAL) increase test coverage -
(OPTIONAL) increase mypy coverage - remove backwards compatibilities and create breaking release, see https://github.com/ArneBinder/pytorch-ie/releases/tag/v0.33.0
Template repo
Future
requirements (blockers) for all PIE, but non-PyTorch projects:
-
pie-modulesrelease implicitly includingpie-core(usepie-coreimplicitly pie-modules#176), i.e., including- a
pytorch-ierelease including usepie-corein a non-breaking way pytorch-ie#456 which requires
- a
- use
pie-coreinstead ofpie-modules(core only, not for datasets) pie-datasets#178 (requires above, i.e., "pie-modulesrelease implicitly includingpie-core")
SKLearn-IE
similar to PyTorch-IE, but with scikit-learn models and taskmodules instead of torch
dependencies: scikit-learn, pie_core, pie_modules, maybe pie_datasets
content: scikit-learn related model (and taskmodule) implementations, i.e., Python modules
- model with
SKLearnIEModel - pipeline with
SKLearnIEPipeline - models with
SKLearnIEModelimplementations - taskmodules with respective
TaskModuleimplementations
LLM-IE (or LangchainIE?)
similar to PyTorch-IE, but with LLM models and taskmodules instead of torch (i.e., based on langchain)
dependencies: langchain, pie_core, pie_modules
content: LLM (or just langchain?) related model (and taskmodule) implementations, i.e., Python modules
- model with
LangchainIEModel - pipeline with
LangchainIEPipeline - models with respective
LangchainIEModelimplementations - taskmodules with respective
TaskModuleimplementations