Skip to content

Commit abdb914

Browse files
authored
Feature/Update default language models to UD 2.5 (#12)
* Refactor code (mostly static type hints) * Update default models to latest versions (UD 2.5) * Update LICENSE, setup.py, and README.md * Add more tests
1 parent 934e9ad commit abdb914

File tree

12 files changed

+537
-456
lines changed

12 files changed

+537
-456
lines changed

.github/workflows/pythonpackage.yml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,11 @@ jobs:
3232
flake8 **/*.py --count --show-source --statistics
3333
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
3434
flake8 **/*.py --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
35+
- name: Install package
36+
run: |
37+
# required because of entry points
38+
pip install .
3539
- name: Test with pytest
3640
run: |
3741
pip install pytest
38-
python -m pytest -vvv test
42+
python -m pytest -vvv tests

LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
MIT License
22

3-
Copyright (c) 2019 Text Analysis and Knowledge Engineering Lab (TakeLab)
3+
Copyright (c) 2019-2020 Text Analysis and Knowledge Engineering Lab (TakeLab)
44

55
Permission is hereby granted, free of charge, to any person obtaining a copy
66
of this software and associated documentation files (the "Software"), to deal

README.md

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22

33
This package wraps the fast and efficient [UDPipe](http://ufal.mff.cuni.cz/udpipe) language-agnostic NLP pipeline
44
(via its [Python bindings](https://github.com/ufal/udpipe/tree/master/bindings/python)), so you can use
5-
[UDPipe pre-trained models](https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-2998) as a [spaCy](https://spacy.io/) pipeline for 50+ languages out-of-the-box.
6-
Inspired by [spacy-stanfordnlp](https://github.com/explosion/spacy-stanfordnlp), this package offers slightly less accurate
7-
models that are in turn much faster (see benchmarks for [UDPipe](https://ufal.mff.cuni.cz/udpipe/models#universal_dependencies_24_models_performance) and [StanfordNLP](https://stanfordnlp.github.io/stanfordnlp/performance.html)).
5+
[UDPipe pre-trained models](https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-3131) as a [spaCy](https://spacy.io/) pipeline for 50+ languages out-of-the-box.
6+
Inspired by [spacy-stanza](https://github.com/explosion/spacy-stanza), this package offers slightly less accurate
7+
models that are in turn much faster (see benchmarks for [UDPipe](https://ufal.mff.cuni.cz/udpipe/models#universal_dependencies_25_models_performance) and [Stanza](https://stanfordnlp.github.io/stanza/performance.html)).
88

99
## Installation
1010

@@ -14,10 +14,10 @@ Use the package manager [pip](https://pip.pypa.io/en/stable/) to install spacy-u
1414
pip install spacy-udpipe
1515
```
1616

17-
After installation, use `spacy_udpipe.download(<language ISO code>)` to download the pre-trained model for the desired language.
17+
After installation, use `spacy_udpipe.download()` to download the pre-trained model for the desired language.
1818

1919
## Usage
20-
The loaded UDPipeLanguage class returns a spaCy [`Language` object](https://spacy.io/api/language), i.e., the nlp object you can use to process text and create a [`Doc` object](https://spacy.io/api/doc).
20+
The loaded UDPipeLanguage class returns a spaCy [`Language` object](https://spacy.io/api/language), i.e., the object you can use to process text and create a [`Doc` object](https://spacy.io/api/doc).
2121

2222
```python
2323
import spacy_udpipe
@@ -56,18 +56,19 @@ Created by [Antonio Šajatović](http://github.com/asajatovic) during an interns
5656
## Contributing
5757
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
5858

59-
Please make sure to update tests as appropriate. Tests are run automatically for each pull request on the master branch. To start the tests locally, just run [`pytest`](https://docs.pytest.org/en/latest/contents.html) in the root source directory.
59+
Please make sure to update the tests as appropriate. Tests are run automatically for each pull request on the master branch.
60+
To start the tests locally, first, install the package with `pip install -e .`, then run [`pytest`](https://docs.pytest.org/en/latest/contents.html) in the root source directory.
6061

6162
## License
62-
[MIT](https://choosealicense.com/licenses/mit/) © Text Analysis and Knowledge Engineering Lab (TakeLab)
63+
[MIT](https://choosealicense.com/licenses/mit/) © Text Analysis and Knowledge Engineering Lab (TakeLab)
6364

6465
## Project status
6566
Maintained by [Text Analysis and Knowledge Engineering Lab (TakeLab)](http://takelab.fer.hr/).
6667

6768
## Notes
6869
* All available pre-trained models are licensed under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/).
6970

70-
* Full list of pre-trained models for supported languages is available in [`languages.json`](https://github.com/TakeLab/spacy-udpipe/blob/master/spacy_udpipe/languages.json).
71+
* A full list of pre-trained models for supported languages is available in [`languages.json`](https://github.com/TakeLab/spacy-udpipe/blob/master/spacy_udpipe/languages.json).
7172

7273
* This package exposes a `spacy_languages` entry point in its [`setup.py`](https://github.com/TakeLab/spacy-udpipe/blob/master/setup.py) so full suport for serialization is enabled:
7374
```python
@@ -84,10 +85,10 @@ Maintained by [Text Analysis and Knowledge Engineering Lab (TakeLab)](http://tak
8485
* Known possible issues:
8586
* Tag map
8687

87-
`Token.tag_` is a [CoNLL](https://universaldependencies.org/format.html) XPOS tag (language-specific part-of-speech tag), defined for each language separately by the corresponding [Universal Dependencies](https://universaldependencies.org/) treebank. Mappings between between XPOS and Universal Dependencies POS tags should be defined in a `TAG_MAP` dictionary (located in language-specific `tag_map.py` files), along with optional morphological features. See [spaCy tag map](https://spacy.io/usage/adding-languages#tag-map) for more details.
88+
`Token.tag_` is a [CoNLL](https://universaldependencies.org/format.html) XPOS tag (language-specific part-of-speech tag), defined for each language separately by the corresponding [Universal Dependencies](https://universaldependencies.org/) treebank. Mappings between XPOS and Universal Dependencies POS tags should be defined in a `TAG_MAP` dictionary (located in language-specific `tag_map.py` files), along with optional morphological features. See [spaCy tag map](https://spacy.io/usage/adding-languages#tag-map) for more details.
8889
* Syntax iterators
8990

9091
In order to extract `Doc.noun_chunks`, a proper syntax iterator implementation for the language of interest is required. For more details, please see [spaCy syntax iterators](https://spacy.io/usage/adding-languages#syntax-iterators).
9192
* Other language-specific issues
9293

93-
A quick way to check language-specific defaults in [spaCy](https://spacy.io) is to visit [spaCy language support](https://spacy.io/usage/models#languages). Also, please see [spaCy language data](https://spacy.io/usage/adding-languages#language-data) for details regarding other language-specific data.
94+
A quick way to check language-specific defaults in [spaCy](https://spacy.io) is to visit [spaCy language support](https://spacy.io/usage/models#languages). Also, please see [spaCy language data](https://spacy.io/usage/adding-languages#language-data) for details regarding other language-specific data.

setup.py

Lines changed: 31 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -3,41 +3,44 @@
33

44
import setuptools
55

6-
with open("README.md", "r") as fh:
7-
long_description = fh.read()
8-
96
URL = "https://github.com/TakeLab/spacy-udpipe"
107

11-
# get a dict of available languages from languages.json
12-
root = os.path.abspath(os.path.dirname(__file__))
13-
langs_path = os.path.join(root, "spacy_udpipe", "languages.json")
14-
with open(langs_path, "r") as f:
15-
LANGUAGES = json.load(f)
8+
with open("README.md", "r") as f:
9+
readme = f.read()
10+
11+
# Get available languages and models from spacy_udpipe/languages.json
12+
languages_path = os.path.join(
13+
os.path.abspath(os.path.dirname(__file__)),
14+
"spacy_udpipe",
15+
"languages.json"
16+
)
17+
with open(languages_path, "r") as f:
18+
languages = json.load(f)
1619

17-
ENTRY_LANGS = set(f"udpipe_{s.split('-')[0]} = spacy_udpipe:UDPipeLanguage"
18-
for s in LANGUAGES.keys())
20+
ENTRY_POINTS = {"spacy_languages":
21+
set(f"udpipe_{s.split('-')[0]} = "
22+
"spacy_udpipe:UDPipeLanguage"
23+
for s in languages.keys()
24+
)
25+
}
1926

2027
setuptools.setup(
21-
name="spacy-udpipe",
22-
version="0.1.0",
28+
name="spacy_udpipe",
29+
version="0.2.0",
2330
description="Use fast UDPipe models directly in spaCy",
24-
long_description=long_description,
31+
long_description=readme,
2532
long_description_content_type="text/markdown",
2633
url=URL,
2734
author="TakeLab",
2835
author_email="[email protected]",
29-
license='MIT',
30-
keywords='udpipe spacy nlp',
36+
license="MIT",
37+
keywords="nlp udpipe spacy python",
3138
packages=setuptools.find_packages(),
3239
install_requires=["spacy>=2.1.0", "ufal.udpipe>=1.2.0"],
3340
python_requires=">=3.6",
34-
entry_points={
35-
"spacy_languages": ENTRY_LANGS
36-
},
41+
entry_points=ENTRY_POINTS,
3742
tests_require=["pytest>=5.0.0"],
38-
package_data={
39-
'spacy_udpipe': ['./languages.json'],
40-
},
43+
package_data={"spacy_udpipe": ["./languages.json"], },
4144
classifiers=[
4245
"Development Status :: 4 - Beta",
4346
"Intended Audience :: Developers",
@@ -46,10 +49,11 @@
4649
"Operating System :: OS Independent",
4750
],
4851
project_urls={
49-
'SpaCy': 'https://spacy.io/',
50-
'TakeLab': 'http://takelab.fer.hr/',
51-
'UDPipe': 'http://ufal.mff.cuni.cz/udpipe',
52-
'Source': URL,
53-
'Tracker': URL + '/issues',
54-
}
52+
"SpaCy": "https://spacy.io/",
53+
"TakeLab": "http://takelab.fer.hr/",
54+
"UDPipe": "http://ufal.mff.cuni.cz/udpipe",
55+
"Source": URL,
56+
"Tracker": URL + "/issues",
57+
},
58+
zip_safe=False
5559
)

spacy_udpipe/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
from .language import UDPipeLanguage, UDPipeModel, load, load_from_path
2-
from .util import download
2+
from .utils import download
33

44
__all__ = ["UDPipeLanguage", "UDPipeModel",
55
"load", "load_from_path", "download"]

0 commit comments

Comments
 (0)