GithubHelp home page GithubHelp logo

mat-o-lab / csvtocsvw Goto Github PK

View Code? Open in Web Editor NEW
4.0 4.0 0.0 22.27 MB

Generates JSON-LD for various types CSVs

Home Page: https://csvtocsvw.matolab.org

License: Apache License 2.0

Jupyter Notebook 99.64% Dockerfile 0.01% Python 0.32% HTML 0.04%

csvtocsvw's People

Contributors

benjaminkroe avatar rfechner avatar thhanke avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

csvtocsvw's Issues

Automatic API documentation

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

container crashes at startup

Describe the bug
I'm starting csvtocsvw via podman and the container exists immediately with the following error message:

[2023-01-18 14:26:29 +0000] [1] [INFO] Starting gunicorn 20.1.0
[2023-01-18 14:26:29 +0000] [1] [INFO] Listening at: http://0.0.0.0:8080 (1)
[2023-01-18 14:26:29 +0000] [1] [INFO] Using worker: sync
[2023-01-18 14:26:29 +0000] [3] [INFO] Booting worker with pid: 3
[2023-01-18 14:26:29 +0000] [4] [INFO] Booting worker with pid: 4
[2023-01-18 14:26:29 +0000] [5] [INFO] Booting worker with pid: 5
[2023-01-18 14:26:30 +0000] [3] [ERROR] Exception in worker process
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
    worker.init_process()
  File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 134, in init_process
    self.load_wsgi()
  File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
    self.wsgi = self.app.wsgi()
  File "/usr/local/lib/python3.8/site-packages/gunicorn/app/base.py", line 67, in wsgi
    self.callable = self.load()
  File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
    return self.load_wsgiapp()
  File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
    return util.import_app(self.app_uri)
  File "/usr/local/lib/python3.8/site-packages/gunicorn/util.py", line 359, in import_app
    mod = importlib.import_module(module)
  File "/usr/local/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 843, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/src/wsgi.py", line 1, in <module>
    from app import app
  File "/src/app.py", line 18, in <module>
    from annotator import CSV_Annotator
  File "/src/annotator.py", line 47, in <module>
    units_graph.parse(QUDT_UNIT_URL, format='turtle')
  File "/usr/local/lib/python3.8/site-packages/rdflib/graph.py", line 1306, in parse
    source = create_input_source(
  File "/usr/local/lib/python3.8/site-packages/rdflib/parser.py", line 404, in create_input_source
    ) = _create_input_source_from_location(
  File "/usr/local/lib/python3.8/site-packages/rdflib/parser.py", line 458, in _create_input_source_from_location
    input_source = URLInputSource(absolute_location, format)
  File "/usr/local/lib/python3.8/site-packages/rdflib/parser.py", line 282, in __init__
    response: HTTPResponse = _urlopen(req)
  File "/usr/local/lib/python3.8/site-packages/rdflib/parser.py", line 270, in _urlopen
    return urlopen(req)
  File "/usr/local/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/local/lib/python3.8/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/local/lib/python3.8/urllib/request.py", line 640, in http_response
    response = self.parent.error(
  File "/usr/local/lib/python3.8/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/usr/local/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/usr/local/lib/python3.8/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
[2023-01-18 14:26:30 +0000] [3] [INFO] Worker exiting (pid: 3)
[2023-01-18 14:26:30 +0000] [4] [ERROR] Exception in worker process
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
    worker.init_process()
  File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 134, in init_process
    self.load_wsgi()
  File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
    self.wsgi = self.app.wsgi()
  File "/usr/local/lib/python3.8/site-packages/gunicorn/app/base.py", line 67, in wsgi
    self.callable = self.load()
  File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
    return self.load_wsgiapp()
  File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
    return util.import_app(self.app_uri)
  File "/usr/local/lib/python3.8/site-packages/gunicorn/util.py", line 359, in import_app
    mod = importlib.import_module(module)
  File "/usr/local/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 843, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/src/wsgi.py", line 1, in <module>
    from app import app
  File "/src/app.py", line 18, in <module>
    from annotator import CSV_Annotator
  File "/src/annotator.py", line 47, in <module>
    units_graph.parse(QUDT_UNIT_URL, format='turtle')
  File "/usr/local/lib/python3.8/site-packages/rdflib/graph.py", line 1306, in parse
    source = create_input_source(
  File "/usr/local/lib/python3.8/site-packages/rdflib/parser.py", line 404, in create_input_source
    ) = _create_input_source_from_location(
  File "/usr/local/lib/python3.8/site-packages/rdflib/parser.py", line 458, in _create_input_source_from_location
    input_source = URLInputSource(absolute_location, format)
  File "/usr/local/lib/python3.8/site-packages/rdflib/parser.py", line 282, in __init__
    response: HTTPResponse = _urlopen(req)
  File "/usr/local/lib/python3.8/site-packages/rdflib/parser.py", line 270, in _urlopen
    return urlopen(req)
  File "/usr/local/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/local/lib/python3.8/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/local/lib/python3.8/urllib/request.py", line 640, in http_response
    response = self.parent.error(
  File "/usr/local/lib/python3.8/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/usr/local/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/usr/local/lib/python3.8/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
[2023-01-18 14:26:30 +0000] [4] [INFO] Worker exiting (pid: 4)
[2023-01-18 14:26:30 +0000] [5] [ERROR] Exception in worker process
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
    worker.init_process()
  File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 134, in init_process
    self.load_wsgi()
  File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
    self.wsgi = self.app.wsgi()
  File "/usr/local/lib/python3.8/site-packages/gunicorn/app/base.py", line 67, in wsgi
    self.callable = self.load()
  File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
    return self.load_wsgiapp()
  File "/usr/local/lib/python3.8/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
    return util.import_app(self.app_uri)
  File "/usr/local/lib/python3.8/site-packages/gunicorn/util.py", line 359, in import_app
    mod = importlib.import_module(module)
  File "/usr/local/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 843, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/src/wsgi.py", line 1, in <module>
    from app import app
  File "/src/app.py", line 18, in <module>
    from annotator import CSV_Annotator
  File "/src/annotator.py", line 47, in <module>
    units_graph.parse(QUDT_UNIT_URL, format='turtle')
  File "/usr/local/lib/python3.8/site-packages/rdflib/graph.py", line 1306, in parse
    source = create_input_source(
  File "/usr/local/lib/python3.8/site-packages/rdflib/parser.py", line 404, in create_input_source
    ) = _create_input_source_from_location(
  File "/usr/local/lib/python3.8/site-packages/rdflib/parser.py", line 458, in _create_input_source_from_location
    input_source = URLInputSource(absolute_location, format)
  File "/usr/local/lib/python3.8/site-packages/rdflib/parser.py", line 282, in __init__
    response: HTTPResponse = _urlopen(req)
  File "/usr/local/lib/python3.8/site-packages/rdflib/parser.py", line 270, in _urlopen
    return urlopen(req)
  File "/usr/local/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/local/lib/python3.8/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/local/lib/python3.8/urllib/request.py", line 640, in http_response
    response = self.parent.error(
  File "/usr/local/lib/python3.8/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/usr/local/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/usr/local/lib/python3.8/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
[2023-01-18 14:26:30 +0000] [5] [INFO] Worker exiting (pid: 5)
[2023-01-18 14:26:30 +0000] [1] [WARNING] Worker with pid 4 was terminated due to signal 15
[2023-01-18 14:26:30 +0000] [1] [WARNING] Worker with pid 5 was terminated due to signal 15
[2023-01-18 14:26:30 +0000] [1] [INFO] Shutting down: Master
[2023-01-18 14:26:30 +0000] [1] [INFO] Reason: Worker failed to boot.

To Reproduce
podman run -it ghcr.io/mat-o-lab/csvtocsvw:latest gunicorn -b 0.0.0.0:8080 wsgi:app --workers=3
podman run -it ghcr.io/mat-o-lab/csvtocsvw:latest also crashes

Expected behavior
container starts without issues and keeps running

Screenshots

Desktop (please complete the following information):

  • OS: Fedora 37

Documentation

In order to be able to use the code, it would be nice to add some documentation, in particular an example would be great.

currently having problems with retrieving matolab onotogies

trying to download ontology from https://purl.matolab.org/mseo/mid causes timeout errors.

Full error message:

File "C:\Users\richard\anaconda3\envs\csv2csvw\lib\site-packages\owlready2\namespace.py", line 903, in load
except: raise OwlReadyOntologyParsingError("Cannot download '%s'!" % f)
owlready2.base.OwlReadyOntologyParsingError: Cannot download 'https://purl.matolab.org/mseo/mid'!

no code was altered, code was working fine last week, did some filedestinations change?

Reworking get_header_length function

Need to rework the get_header_length function

def get_header_length(self, file_data, separator_string, encoding):

The part at

with redirect_stderr(f):
redirects the error from the pandas parsing of the csv file to analyze it

Would be better to use an alternative way to get the header_length. --> pandas functions?

  1. find pandas (or any other suitable) function for determining column count in a csv row. (--> maybe just count the occurences of the separator in each row?)

  2. find the beginning of the data table (including header)

xls to csv prototype

uploaded a jupyter notebook in branch get_header_length_updated as prototype.

Open questions: integrate xls to csv straight into the annotator or make another microservice like csv_to_csvw or map_to_method?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.