Skip to content

the role of pie-core in the PIE family #17

@ArneBinder

Description

@ArneBinder

What should be the final architecture of the PIE family?

pie-core

dependencies: huggingface-hub, pytorch-lightning (just for pytorch-lightning.core.mixins.HyperparametersMixin)

content (see #7 for most recent overview):

  • document and annotation
  • metric and statistic
  • module (de-)serialization, i.e., hf_hub_mixin
  • (generic) auto
  • taskmodule (including AutoTaskmodule)
  • model (including AutoModel)
  • annotation_pipeline (including AutoAnnotationPipeline)
  • common utilities (e.g. hydra and dictionary helpers)
  • common classes required by pie-modules as well as pie-datasets

required actions:

pie-datasets

dependencies: pie_core, datasets
dataset_builder dependencies: pie-modules (for documents, annotations, and utils.sequence_tagging)
dev dependencies: same as dataset_builder dependencies

content:

  • dataset, iterable dataset, dataset dict
  • common base dataset builders (e.g., Brat)
    • with required annotations and documents
  • helper methods and classes for dataset builders (e.g., Pipeline, Caster, Converter from pie_dataset.document.processing.generic)
    • or should this rather live in pie-modules?
  • dataset builders (but this does not live in the actual pie_datasets package)

required actions:

pie-modules pie-documents

dependencies: pie-core
lazy dependencies (imported on usage): nltk, flair (both for sentence splitters)

content: everything not PyTorch-specific

  • document processing (e.g., tokenization, regex partitioner, etc.)
  • document types and annotations required for that
  • base document and annotations types
  • (document) metrics
  • (document) statistics

required actions:

still in pie-modules:

in pie-documents:

PyTorch-IE

dependencies: torch, torchmetrics, pie-core, pie-modules, maybe pie-datasets?

content: torch related model (and taskmodule) implementations

  • (pytorch) model
  • (pytorch) pipeline
  • models
  • taskmodules
  • datamodule?

required actions:

Template repo

Future

requirements (blockers) for all PIE, but non-PyTorch projects:

SKLearn-IE

similar to PyTorch-IE, but with scikit-learn models and taskmodules instead of torch
dependencies: scikit-learn, pie_core, pie_modules, maybe pie_datasets
content: scikit-learn related model (and taskmodule) implementations, i.e., Python modules

  • model with SKLearnIEModel
  • pipeline with SKLearnIEPipeline
  • models with SKLearnIEModel implementations
  • taskmodules with respective TaskModule implementations

LLM-IE (or LangchainIE?)

similar to PyTorch-IE, but with LLM models and taskmodules instead of torch (i.e., based on langchain)
dependencies: langchain, pie_core, pie_modules
content: LLM (or just langchain?) related model (and taskmodule) implementations, i.e., Python modules

  • model with LangchainIEModel
  • pipeline with LangchainIEPipeline
  • models with respective LangchainIEModel implementations
  • taskmodules with respective TaskModule implementations

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions