GithubHelp home page GithubHelp logo

gluish's Issues

Python3 TSV unicode seems broken

I get an error when trying to use iter_tsv() from python3:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/luigi/worker.py", line 194, in run
    new_deps = self._run_get_new_deps()
  File "/usr/local/lib/python3.5/dist-packages/luigi/worker.py", line 131, in _run_get_new_deps
    task_gen = self.task.run()
  File "/home/bnewbold/DEBUG/luigi_utf8/small.py", line 21, in run
    cols=('col1', 'col2', 'col3')):
  File "/usr/local/lib/python3.5/dist-packages/gluish-0.2.8-py3.5.egg/gluish/format.py", line 86, in iter_tsv
    yield Record._make(str(line).rstrip('\n').split('\t'))
  File "<string>", line 21, in _make
TypeError: Expected 3 arguments, got 1

Running the gluish tests (particularly format_test.FormatTest) fails with the same error under python3 (nosetests3). Is Python3 (in my particular case, python3.5) indented to be supported? I can see there was a recent commit touching these lines, maybe i'm doing something wrong.

Minimal test case, and a patch that fixes things for python3 (but presumably breaks 2.7): https://gist.github.com/bnewbold/8919a20b1f01532b0da4f1c5594c6a05

0.2 ideas / cleanup

gluish 0.1.X contains both luigi-related and non-luigi-related code. For 0.2 only luigi-related code should stay in the package:

Keep:

  • task
  • format (TSV)
  • intervals
  • database helpers (seems generic enough)

Cleanup:

  • parameters (keep ClosestDateParameter)
  • utils (keep shellout)

Remove:

  • benchmark (timing decorators are simple)
  • colors
  • common
  • configuration
  • esindex (it's in luigi.contrib)
  • oai
  • path

list parameters are not slugified

Gluish is a great addition to luigi! I have noticed that list parameters are not properly sluggified. For example a task that takes a list like ['cd', 'statesd', 'dma', 'county'] resulted in the following filename: geo-["'cd'", "'statesd'", "'dma'", "'county'"]-state-nd.h5.

I think this might be rectified by adding a delist function:

def delist(x):
    if type(x) is list:
        return '-'.join(sorted(x))
    else:
        return x

parts = ('{k}-{v}'.format(k=k, v=delist(v))
                     for k, v in task_params.iteritems())

or something like this:

parts = []
 for k, v in task_params.iteritems():    
    if type(v) == list:
        v = '-'.join(sorted(v))
    parts.append('{k}-{v}'.format(k=k, v=v))

Encapsulate _init__ imports in try/catch (individually)

On Windows, it is an incredible pain to install sqlitebck. Because of that, I opted for installing gluish without dependencies, and just use the parts that I needed.

However, this cherry-picking approach does not work because you import all the sub-modules in the init_.py - this makes import fail no matter what parts I actually want to use, because this means all the dependencies have to be present.

I have two alternative solutions:

Solution 1: Don't import in init.py like you do.
Solution 2: Make the imports conditional, enclose them in try/catch like here: http://stackoverflow.com/a/3496790/1319284. This means convenient imports if the dependencies are there, no convenient imports otherwise.

shellout broken

I am on Windows, and just went through a long session of debugging a shellout call, which failed because of "File not found" problems. The problem, I think, is that the command is given to subprocess.call as one big blob, which I think gets interpreted as the command file name. This problem appears on Windows, at the very least with cygwin.

shellout broken when using json-strings and .format() style because of internal double-format

e.g.
cmd="esbulk -server {server} -purge -mapping '{"mappings":{"{type}":{"properties":{"location":{"type":"geo_point"}}}}}' -index {index} -type {type} -w {workers} -id id -verbose {file}.ldj".format(**self.config)
output=shellout(cmd)

leads to:

Traceback (most recent call last):
File "/home/metadata/.local/lib/python3.5/site-packages/luigi/worker.py", line 194, in run
new_deps = self._run_get_new_deps()
File "/home/metadata/.local/lib/python3.5/site-packages/luigi/worker.py", line 131, in _run_get_new_deps
task_gen = self.task.run()
File "/home/metadata/git/efre-lod-elasticsearch-tools/luigi/update_gn.py", line 91, in run
cmd="esbulk -server {server} -purge -mapping '{"mappings":{"{type}":{"properties":{"location":{"type":"geo_point"}}}}}' -index {index} -type {type} -w {workers} -id id -verbose {file}.ldj".format(**self.config)
KeyError: '"mappings"'

escaping the braces doesn't work either

Migration from Goodtables to Frictionless Repository

Hi,

Goodtables.io is going to be deprecated in 2022, we, therefore, recommend migrating to the new Frictionless Repository (https://repository.frictionlessdata.io/) continuous data validation system provided by Frictionless Data. The core difference between the two projects is that Frictionless Repository doesn't rely on any hosted infrastructure except for Github Actions which makes this project more sustainable. Also, it uses a newer Frictionless Framework under the hood that brought many improvements over the old goodtables-py library in terms of validation quality and performance.

If you have any doubts or questions, please come and ask in our Discord chat or in the GitHub Discussion.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.