GitHub / dataesr / bso-parser-html
Extract structured metadata (affiliations, authors name and orcid, keywords ...) from raw html pages
JSON API: https://data.code.gouv.fr/api/v1/hosts/GitHub/repositories/dataesr%2Fbso-parser-html
Stars: 1
Forks: 0
Open issues: 0
License: mit
Language: Python
Size: 249 KB
Dependencies parsed at:
28
Created at: almost 4 years ago
Updated at: over 1 year ago
Pushed at: about 1 year ago
Last synced at: 6 days ago
Dockerfile
docker
- ubuntu 18.04 build
requirements.txt
pypi
- Flask ==2.1.0
- Flask-Bootstrap ==3.3.7.1
- Flask-Testing ==0.7.1
- SPARQLWrapper ==1.8.5
- Unidecode ==1.3.2
- Werkzeug ==2.2.2
- beautifulsoup4 ==4.9.3
- boto3 ==1.26.41
- bs4 ==0.0.1
- elasticsearch ==7.8.0
- gunicorn ==20.0.4
- jsonschema ==3.2.0
- lxml ==4.6.3
- multiprocess ==0.70.12.2
- pandas ==1.2.5
- pycountry ==20.7.3
- pymongo ==3.8.0
- pysftp ==0.2.9
- python-dateutil *
- python-keystoneclient ==4.0.0
- python-swiftclient ==3.9.0
- redis ==4.3.4
- regex ==2017.4.5
- requests ==2.28.1
- retry *
- rq ==1.9.0
- tokenizers ==0.10.1