I'm Mariano ๐. I'm a software engineer, passionate about the field.
๐ I'm the author of the book Clean code in Python. I also write about software in my blog.
๐ฌ You can reach out to me at LinkedIn.
A text compression tool & library
Home Page: https://compr.readthedocs.io/en/latest/?badge=latest
License: MIT License
I'm Mariano ๐. I'm a software engineer, passionate about the field.
๐ I'm the author of the book Clean code in Python. I also write about software in my blog.
๐ฌ You can reach out to me at LinkedIn.
https://github.com/google/python-fire
Prototype to see if it's a suitable replacement for the command line interface.
Depends upon #25
Generate documentation for the low-level internals of the project.
All code related to the command line interface should be moved from the __init__
to ` new, module, called (for example) cli.py
The lib is currently encoding byte characters of '1' or '0' for the binary bit representation, respectively, and not actual bits in an array.
Is not strictly required to port everything to C at this point, just doing the optimisation in Python will suffice.
Some alternatives might be:
<< 1
(dumping in chunks of 64 bits, for instance), etc.Compare memory utilisation before and after the change.
Ability to compress multiple files, packaging the compression into a single one.
This changes the cli interface, for now the user has to specify the name of the output file first (default one will probably not work anymore), and then the list of the files to search (similar to tar, etc.), like:
pycompress --ouptut <ofilename> [files...]
The compression can be done sequentially, no need to parallelism; Any sort of optimisation will be done later on.
For each compressor/<X>.py
there should be a corresponding test file tests/unit/test_<X>.py
depends on [blocked by]: #29
Change the underlying implementation for mmap, and compare performance results.
pytest
style of tests (functions with assets, etc.)nose
dependencyPython 3.5
and Python 3.6
)make checklist
: check for style issues (pylint
), syntax, & run testssha256sum
for instance).Helper object that will yield the contents of the file reading by a given buffer size.
Conditions:
pseudocode:
with IterableFile('/tmp/foo/bar') as streamed_file:
for chunk in streamed_file.stream(buffer_size=1024):
print(chunk)
Create a setup.py
that allows the project to be installed as a package for development and installation.
Remove FIXME at 72f879f
at least 90%
setup codecov
This optional parameter, when selected, should gather information along the process of the main command being performed, and display the results just before the program finishes.
For example, it can collect the time elapsed, the sizes of both files (prior and after the program was called), and the compression/extraction ratio (in %), etc.
This information is rendered on stdout
Code linting for all code
Tests should ignore the dataset on the run
Check that coverage level did not decrease. Fail if it did.
Automate coverage level report per branch, and PR. Link directly in the project main page and documentation.
Create a checklist target in Makefile, and separate tests from checklist.
Check for security issues and updates automatically. Maybe https://github.com/integrations/src-clr can help
ATM if no default is provided for the file being worked on (extraction/compression), it uses <original-file>.comp
as a default one. If an absolute path is provided, it will still use that absolute path with the .comp
suffix.
A user might have read permissions for the file being worked on (that's all it should take for compression), but not write permissions (for the output file).
The proposal is to change the default for:
`pwd`/`basename <original-file>`.comp
Leaving the resulting file in the current directory, where write permissions are assumed.
Use pathlib: https://docs.python.org/3.5/library/pathlib.html
Add a new target in Makefile
that checks type hinting. If the mypy validation has some issues, the target should fail.
This new target will be part of the checklist
, so make checklist
should run mypy among other things.
Include as one of the items of the checklist. Build must fail
make lint
should be part of the checklist, and should run linting checks automatically (pycodestyle
, pylint, etc.).
If some issues are found on any of the files, it should fail with exit code 1.
Enable the user to indicate an output directory for the file/s that are going to be written.
Parameter must be called --output-dir
or -O
.
If this parameter is provided, all files will be written inside this directory with the default naming convention.
Update documentation with examples of this use.
Generate the documentation for the project, describing the main functions their parameters, etc.
High-level project information
API documentation: generated from docstrings + adding custom information about each function on the project, modules, how to use, etc.
Python annotations
[Low-level file documentation:]
Make documentation available online (RTD)
Update Readme
Document:
cli:
Programatic API
In case the target file already exists (regardless if it was user-specified or detault one), warn the user about it, and ask for confirmation before continuing with the processing.
This has to be done, before any actual processing of the file takes place.
If -f | --force
is indicated, assume the output file will be overwritten and do not prompt.
Travis CI for the project.
Run tests against the following Python versions:
Update travis CI
Automatically run performance checks on the platform, that should be used to measure differences on changes, regression, etc. It is recommended to run as part of the CI along with the unit tests. It should be possible to compare performance across different branches and revisions.
Instrument the code, to support performance testability.
Have a separate target in Makefile
.
The benchmark has to include the following relevant metrics (to be reviewed):
N
files (traceability to determine how does it scale as more files are added).A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.