wireservice / agate Goto Github PK
View Code? Open in Web Editor NEWA Python data analysis library that is optimized for humans instead of machines.
Home Page: https://agate.readthedocs.io
License: MIT License
A Python data analysis library that is optimized for humans instead of machines.
Home Page: https://agate.readthedocs.io
License: MIT License
Return a list of quintiles for the given column.
This should be standard... maybe support both?
Should output an dict of tables
Should take a function name that returns a grouping key
Should probably just drop this now that Table.compute
is in?
See #7
Return an list of percentiles for a given column.
Combine Table.where
with levenshtein algorithm. Maybe this one? https://pypi.python.org/pypi/python-Levenshtein/0.11.2
Should have a way to create a new table from a subset of columns of an existing table. What is the API for this? What is it called? subset? part? subtable?
Find the first row that satisfies the criteria
Like Table.join
(#11) but for rows.
Count instances of a value
See #7
Table.rank(self, func)
func
returns a value, these values are compared to generate a rank IntColumn
.
Optional sort_by_rank
param?
Return a list of quartiles for the given column.
FAIL: test_median (tests.test_columns.TestFloatColumn)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/onyxfish/src/journalism/tests/test_columns.py", line 191, in test_median
self.assertEqual(self.table.columns['two'].median(), 2.805)
AssertionError: 2.8049999999999997 != 2.805
======================================================================
FAIL: test_sum (tests.test_columns.TestFloatColumn)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/onyxfish/src/journalism/tests/test_columns.py", line 177, in test_sum
self.assertEqual(self.table.columns['two'].sum(), 10.71)
AssertionError: 10.709999999999999 != 10.71
Since tables are immutable this can be cached:
i = self._table._column_names.index(self._k)
Table.compute
Returns true if any value in the column passes a truth test
Stubbed as freq
, but should be spelled. Returns a dict, I guess? Or a Table?
_data
, _data_without_nulls
, _data_sorted
Reminder: tables are immutable so we can cache this indefinately
Compute percent change between two columns into another
Iterate over the table and compute a new column based existing columns
What to call this?
How should this work? Operator overloading? Function call? What sort of validation do we need?
csvkit does this by return None when a key is outside the row range, which seems sound
http://latimes-calculate.readthedocs.org/en/latest/
Most of these probably don't make sense, but a few, like Benford's would be a good fit
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.