mat-o-lab / csvtocsvw Goto Github PK
View Code? Open in Web Editor NEWGenerates JSON-LD for various types CSVs
Home Page: https://csvtocsvw.matolab.org
License: Apache License 2.0
Generates JSON-LD for various types CSVs
Home Page: https://csvtocsvw.matolab.org
License: Apache License 2.0
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
Describe the bug
I'm starting csvtocsvw via podman and the container exists immediately with the following error message:
[2023-01-18 14:26:29 +0000] [1] [INFO] Starting gunicorn 20.1.0
[2023-01-18 14:26:29 +0000] [1] [INFO] Listening at: http://0.0.0.0:8080 (1)
[2023-01-18 14:26:29 +0000] [1] [INFO] Using worker: sync
[2023-01-18 14:26:29 +0000] [3] [INFO] Booting worker with pid: 3
[2023-01-18 14:26:29 +0000] [4] [INFO] Booting worker with pid: 4
[2023-01-18 14:26:29 +0000] [5] [INFO] Booting worker with pid: 5
[2023-01-18 14:26:30 +0000] [3] [ERROR] Exception in worker process
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
worker.init_process()
File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 134, in init_process
self.load_wsgi()
File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
self.wsgi = self.app.wsgi()
File "/usr/local/lib/python3.8/site-packages/gunicorn/app/base.py", line 67, in wsgi
self.callable = self.load()
File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
return self.load_wsgiapp()
File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
return util.import_app(self.app_uri)
File "/usr/local/lib/python3.8/site-packages/gunicorn/util.py", line 359, in import_app
mod = importlib.import_module(module)
File "/usr/local/lib/python3.8/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 843, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/src/wsgi.py", line 1, in <module>
from app import app
File "/src/app.py", line 18, in <module>
from annotator import CSV_Annotator
File "/src/annotator.py", line 47, in <module>
units_graph.parse(QUDT_UNIT_URL, format='turtle')
File "/usr/local/lib/python3.8/site-packages/rdflib/graph.py", line 1306, in parse
source = create_input_source(
File "/usr/local/lib/python3.8/site-packages/rdflib/parser.py", line 404, in create_input_source
) = _create_input_source_from_location(
File "/usr/local/lib/python3.8/site-packages/rdflib/parser.py", line 458, in _create_input_source_from_location
input_source = URLInputSource(absolute_location, format)
File "/usr/local/lib/python3.8/site-packages/rdflib/parser.py", line 282, in __init__
response: HTTPResponse = _urlopen(req)
File "/usr/local/lib/python3.8/site-packages/rdflib/parser.py", line 270, in _urlopen
return urlopen(req)
File "/usr/local/lib/python3.8/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/usr/local/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/usr/local/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/usr/local/lib/python3.8/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/usr/local/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/usr/local/lib/python3.8/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
[2023-01-18 14:26:30 +0000] [3] [INFO] Worker exiting (pid: 3)
[2023-01-18 14:26:30 +0000] [4] [ERROR] Exception in worker process
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
worker.init_process()
File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 134, in init_process
self.load_wsgi()
File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
self.wsgi = self.app.wsgi()
File "/usr/local/lib/python3.8/site-packages/gunicorn/app/base.py", line 67, in wsgi
self.callable = self.load()
File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
return self.load_wsgiapp()
File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
return util.import_app(self.app_uri)
File "/usr/local/lib/python3.8/site-packages/gunicorn/util.py", line 359, in import_app
mod = importlib.import_module(module)
File "/usr/local/lib/python3.8/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 843, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/src/wsgi.py", line 1, in <module>
from app import app
File "/src/app.py", line 18, in <module>
from annotator import CSV_Annotator
File "/src/annotator.py", line 47, in <module>
units_graph.parse(QUDT_UNIT_URL, format='turtle')
File "/usr/local/lib/python3.8/site-packages/rdflib/graph.py", line 1306, in parse
source = create_input_source(
File "/usr/local/lib/python3.8/site-packages/rdflib/parser.py", line 404, in create_input_source
) = _create_input_source_from_location(
File "/usr/local/lib/python3.8/site-packages/rdflib/parser.py", line 458, in _create_input_source_from_location
input_source = URLInputSource(absolute_location, format)
File "/usr/local/lib/python3.8/site-packages/rdflib/parser.py", line 282, in __init__
response: HTTPResponse = _urlopen(req)
File "/usr/local/lib/python3.8/site-packages/rdflib/parser.py", line 270, in _urlopen
return urlopen(req)
File "/usr/local/lib/python3.8/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/usr/local/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/usr/local/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/usr/local/lib/python3.8/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/usr/local/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/usr/local/lib/python3.8/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
[2023-01-18 14:26:30 +0000] [4] [INFO] Worker exiting (pid: 4)
[2023-01-18 14:26:30 +0000] [5] [ERROR] Exception in worker process
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
worker.init_process()
File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 134, in init_process
self.load_wsgi()
File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
self.wsgi = self.app.wsgi()
File "/usr/local/lib/python3.8/site-packages/gunicorn/app/base.py", line 67, in wsgi
self.callable = self.load()
File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
return self.load_wsgiapp()
File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
return util.import_app(self.app_uri)
File "/usr/local/lib/python3.8/site-packages/gunicorn/util.py", line 359, in import_app
mod = importlib.import_module(module)
File "/usr/local/lib/python3.8/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 843, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/src/wsgi.py", line 1, in <module>
from app import app
File "/src/app.py", line 18, in <module>
from annotator import CSV_Annotator
File "/src/annotator.py", line 47, in <module>
units_graph.parse(QUDT_UNIT_URL, format='turtle')
File "/usr/local/lib/python3.8/site-packages/rdflib/graph.py", line 1306, in parse
source = create_input_source(
File "/usr/local/lib/python3.8/site-packages/rdflib/parser.py", line 404, in create_input_source
) = _create_input_source_from_location(
File "/usr/local/lib/python3.8/site-packages/rdflib/parser.py", line 458, in _create_input_source_from_location
input_source = URLInputSource(absolute_location, format)
File "/usr/local/lib/python3.8/site-packages/rdflib/parser.py", line 282, in __init__
response: HTTPResponse = _urlopen(req)
File "/usr/local/lib/python3.8/site-packages/rdflib/parser.py", line 270, in _urlopen
return urlopen(req)
File "/usr/local/lib/python3.8/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/usr/local/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/usr/local/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/usr/local/lib/python3.8/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/usr/local/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/usr/local/lib/python3.8/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
[2023-01-18 14:26:30 +0000] [5] [INFO] Worker exiting (pid: 5)
[2023-01-18 14:26:30 +0000] [1] [WARNING] Worker with pid 4 was terminated due to signal 15
[2023-01-18 14:26:30 +0000] [1] [WARNING] Worker with pid 5 was terminated due to signal 15
[2023-01-18 14:26:30 +0000] [1] [INFO] Shutting down: Master
[2023-01-18 14:26:30 +0000] [1] [INFO] Reason: Worker failed to boot.
To Reproduce
podman run -it ghcr.io/mat-o-lab/csvtocsvw:latest gunicorn -b 0.0.0.0:8080 wsgi:app --workers=3
podman run -it ghcr.io/mat-o-lab/csvtocsvw:latest
also crashes
Expected behavior
container starts without issues and keeps running
Desktop (please complete the following information):
In order to be able to use the code, it would be nice to add some documentation, in particular an example would be great.
trying to download ontology from https://purl.matolab.org/mseo/mid causes timeout errors.
Full error message:
File "C:\Users\richard\anaconda3\envs\csv2csvw\lib\site-packages\owlready2\namespace.py", line 903, in load
except: raise OwlReadyOntologyParsingError("Cannot download '%s'!" % f)
owlready2.base.OwlReadyOntologyParsingError: Cannot download 'https://purl.matolab.org/mseo/mid'!
no code was altered, code was working fine last week, did some filedestinations change?
Need to rework the get_header_length function
Line 120 in 78b9480
The part at
Line 145 in 78b9480
Would be better to use an alternative way to get the header_length. --> pandas functions?
find pandas (or any other suitable) function for determining column count in a csv row. (--> maybe just count the occurences of the separator in each row?)
find the beginning of the data table (including header)
uploaded a jupyter notebook in branch get_header_length_updated as prototype.
Open questions: integrate xls to csv straight into the annotator or make another microservice like csv_to_csvw or map_to_method?
swagger.json must be corrected, the necessary input is not documented right, maybe see example
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.