GitHub / aphp / edspdf
EDS-PDF is a generic, pure-Python framework for text extraction from PDF documents. It provides the machinery to use rule- or machine-learning-based approaches to classify text blocs between body and meta-data.
JSON API: https://data.code.gouv.fr/api/v1/hosts/GitHub/repositories/aphp%2Fedspdf
Stars: 47
Forks: 7
Open issues: 0
License: bsd-3-clause
Language: Python
Size: 8.93 MB
Dependencies parsed at:
39
Created at: almost 3 years ago
Updated at: 20 days ago
Pushed at: 3 months ago
Last synced at: 6 days ago
Commit Stats
Commits: 293
Authors: 10
Mean commits per author: 29.3
Development Distribution Score: 0.621
More commit stats: https://commits.ecosystem.code.gouv.fr/hosts/GitHub/repositories/aphp/edspdf
Topics: extraction, machine-learning, pdf
- actions/checkout v2 composite
- actions/setup-python v2 composite
- actions/checkout v2 composite
- actions/download-artifact v3 composite
- actions/setup-python v2 composite
- actions/upload-artifact v3 composite
- pypa/gh-action-pypi-publish release/v1 composite
- actions/cache v2 composite
- actions/checkout v3 composite
- actions/checkout v2 composite
- actions/setup-python v3 composite
- actions/setup-python v2 composite
- codecov/codecov-action v2 composite
- pre-commit/action v3.0.0 composite
- accelerate >=0.12.0,<1.0.0
- anyascii >=0.3.2
- attrs >=23.1
- build >=0.10.0
- catalogue >=2.0
- confit >=0.5.3,<1.0.0
- dill *
- foldedtensor >=0.3.3
- fsspec <2023.1.0 ; python_version<'3.8'
- fsspec python_version>='3.8'
- loguru *
- networkx >=2.6
- pdfminer.six >=20220319,<20231228 ; python_version<'3.8'
- pdfminer.six python_version>='3.8'
- pyarrow *
- pydantic >=1.2,<2.0.0
- pypdfium2 >=4.0
- regex *
- rich-logger >=0.3
- safetensors >=0.3
- scikit-learn >=1.0.2,<2.0.0
- toml *
- torch >1.0.0
- tqdm >=4.64