miku / gluish Goto Github PK
View Code? Open in Web Editor NEWUtils around luigi.
License: GNU General Public License v3.0
Utils around luigi.
License: GNU General Public License v3.0
I get an error when trying to use iter_tsv()
from python3:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/luigi/worker.py", line 194, in run
new_deps = self._run_get_new_deps()
File "/usr/local/lib/python3.5/dist-packages/luigi/worker.py", line 131, in _run_get_new_deps
task_gen = self.task.run()
File "/home/bnewbold/DEBUG/luigi_utf8/small.py", line 21, in run
cols=('col1', 'col2', 'col3')):
File "/usr/local/lib/python3.5/dist-packages/gluish-0.2.8-py3.5.egg/gluish/format.py", line 86, in iter_tsv
yield Record._make(str(line).rstrip('\n').split('\t'))
File "<string>", line 21, in _make
TypeError: Expected 3 arguments, got 1
Running the gluish tests (particularly format_test.FormatTest
) fails with the same error under python3 (nosetests3
). Is Python3 (in my particular case, python3.5) indented to be supported? I can see there was a recent commit touching these lines, maybe i'm doing something wrong.
Minimal test case, and a patch that fixes things for python3 (but presumably breaks 2.7): https://gist.github.com/bnewbold/8919a20b1f01532b0da4f1c5594c6a05
As the title says. The new function according to luigi source code is luigi.tools.parse_task.id_to_name_and_params
gluish 0.1.X contains both luigi-related and non-luigi-related code. For 0.2 only luigi-related code should stay in the package:
Keep:
Cleanup:
ClosestDateParameter
)shellout
)Remove:
Gluish is a great addition to luigi! I have noticed that list parameters are not properly sluggified. For example a task that takes a list like ['cd', 'statesd', 'dma', 'county']
resulted in the following filename: geo-["'cd'", "'statesd'", "'dma'", "'county'"]-state-nd.h5
.
I think this might be rectified by adding a delist function:
def delist(x):
if type(x) is list:
return '-'.join(sorted(x))
else:
return x
parts = ('{k}-{v}'.format(k=k, v=delist(v))
for k, v in task_params.iteritems())
or something like this:
parts = []
for k, v in task_params.iteritems():
if type(v) == list:
v = '-'.join(sorted(v))
parts.append('{k}-{v}'.format(k=k, v=v))
On Windows, it is an incredible pain to install sqlitebck. Because of that, I opted for installing gluish without dependencies, and just use the parts that I needed.
However, this cherry-picking approach does not work because you import all the sub-modules in the init_.py - this makes import fail no matter what parts I actually want to use, because this means all the dependencies have to be present.
I have two alternative solutions:
Solution 1: Don't import in init.py like you do.
Solution 2: Make the imports conditional, enclose them in try/catch like here: http://stackoverflow.com/a/3496790/1319284. This means convenient imports if the dependencies are there, no convenient imports otherwise.
I am on Windows, and just went through a long session of debugging a shellout call, which failed because of "File not found" problems. The problem, I think, is that the command is given to subprocess.call as one big blob, which I think gets interpreted as the command file name. This problem appears on Windows, at the very least with cygwin.
the method "which" occurs in common.py and utils.py, I guess one place might be enough, or? ;)
e.g.
cmd="esbulk
-server {server} -purge -mapping '{"mappings":{"{type}":{"properties":{"location":{"type":"geo_point"}}}}}' -index {index} -type {type} -w {workers} -id id -verbose {file}.ldj".format(**self.config)
output=shellout(cmd)
leads to:
Traceback (most recent call last):
File "/home/metadata/.local/lib/python3.5/site-packages/luigi/worker.py", line 194, in run
new_deps = self._run_get_new_deps()
File "/home/metadata/.local/lib/python3.5/site-packages/luigi/worker.py", line 131, in _run_get_new_deps
task_gen = self.task.run()
File "/home/metadata/git/efre-lod-elasticsearch-tools/luigi/update_gn.py", line 91, in run
cmd="esbulk -server {server} -purge -mapping '{"mappings":{"{type}":{"properties":{"location":{"type":"geo_point"}}}}}' -index {index} -type {type} -w {workers} -id id -verbose {file}.ldj".format(**self.config)
KeyError: '"mappings"'
escaping the braces doesn't work either
Hi,
Goodtables.io is going to be deprecated in 2022, we, therefore, recommend migrating to the new Frictionless Repository (https://repository.frictionlessdata.io/) continuous data validation system provided by Frictionless Data. The core difference between the two projects is that Frictionless Repository doesn't rely on any hosted infrastructure except for Github Actions which makes this project more sustainable. Also, it uses a newer Frictionless Framework under the hood that brought many improvements over the old goodtables-py library in terms of validation quality and performance.
If you have any doubts or questions, please come and ask in our Discord chat or in the GitHub Discussion.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.