Comments (5)
After discussion, we settled on the following:
There are three field spec variants and they are named thusly:
field_spec
can be a string, an array/tuple or None. The string can be a single column ora.deep.column
. The array can be a list of columns including['using.deep', 'column.syntax']
.field_spec
can imply retrieving one or more columns. If it isNone
then it will default to the defaultvalue
column.field_spec_list
is an array/tuple of strings. It is used when the arg requires the request of multiple columns. Each element can be a simple column name or can['use.deep', 'column.syntax']
.field_path
is a lower level deal that can only address a single column. It can be a string that is either a single simple column ora.deep.column
OR an array/tuple that only contains segments to a single column - i.e.:['a', 'deep', 'column']
. These values will eventually hit the sanitizer and be turned into the array form.
These names should exclusively be used in class methods in class methods for clarity both as incoming args and internal args. To wit:
@staticmethod
def is_valid_value(event, field_path=None):
val = event.value(field_path)
return not bool(val is None or val == '' or is_nan(val))
As for the aforementioned "single sanitizer method" - this is to be deployed inside Event.get()
to handle simple on-off cases for when it makes sense to call in = e.get('in')
but the developer should also use the same method "upstream" so splitting.deep.string.paths
do not keep happening inside a loop. The final test to see if .get()
got an array is inexpensive.
@pjm17971 please review.
from pond.
There is one final issue while we’re picking at this scab that I wanted to hammer out before we close this issue and make everything holy. It’s the issue of what the default for field_spec
should be. Of course it should be value
…mostly.
I propose the following:
Method prototypes handling a field_spec
argument should set the default value to None
in python land/whatever the proper corollary is in JS land, and that the uber sanitize method be responsible for setting the default if it receives None as the value.
Reasoning:
- That value really never needs to be set until the moment before it finally hits
Event.get()
and needs to become['value']
. So there really isn't any reason to pepper the entire rest of the code base with a million(..., field_spec=['value'])
method prototypes. Set it to one language-specific null value and get on with your life. - If we ever want to change the default which we will never want to do until we do, it's easy RE: point 1 because the sanitizer is doing it.
- There are some methods (
Event.map()
, et al) that take afield_spec
and if that isNone
/etc it defaults to "mapping all the columns."
And that makes the two cases basically consistent, not providing a field_spec
at all telegraphs "do your default thing" to all of those methods with the least amount of code to make it happen.
@pjm17971 also for your perusal.
from pond.
In the Collapser
processor, it is internally using _field_spec
as an internal attribute from a passed in Option. For consistency, this should be changed to _field_spec_list
because that's what Event.collapse()
takes as an argument.
from pond.
I've taken a pass through the python code and renamed everything using a consistent naming scheme and I've also come up with cut-and-paste arg docstrings so it's clear what is doing what. Both of these things help when you have a situation where TimeSeries.collapse()
calls Pipeline.collapse()
which invokes the Collapser
processor, which in turn calls Event.collapse()
.
Whew.
To be on the same page I have used the following scheme:
field_spec
can be a string, an array/tuple or None. The string can be a single column ora.deep.column
. The array can be a list of columns including['using.deep', 'column.syntax']
.field_spec
can imply retrieving one or more columns. If it isNone
then it will default to the defaultvalue
column.field_spec_list
is an array/tuple of strings. It is used when the arg requires the request of multiple columns. Each element can be a simple column name or can['use.deep', 'column.syntax']
.field_path_array
is a lower level deal that can only address a single column. It is used to access a column and if the array has multiple values, they are segments ofa.deep.column.path
.
These names are used on "both sides" of a method. Example: Event.is_valid_value()
is a very light abstraction around Event.get()
which takes a field path array. So Event.is_valid_value()
looks like this:
@staticmethod
def is_valid_value(event, field_path_array=None):
val = event.value(field_path_array)
return not bool(val is None or val == '' or is_nan(val))
And things calling it, when possible should look like this example from Collection.clean()
:
def clean(self, field_path_array=None):
flt_events = list()
for i in self.events():
if Event.is_valid_value(i, field_path_array):
flt_events.append(i)
return Collection(flt_events)
This not only makes things consistent but also makes it easier to track down the points where we should pre-split the field_path_array into an actual array so that's not happening inside a loop.
Speaking of optimizing those splits, we could consider removing any code that can handle a string from Event.get()
- this would force the developer to split.this
into ['split', 'this']
farther upstream. Both code bases will still do that split in get()
which could silently lead to writing non-optimal code.
The python code is currently doing the split with the sanitizer method in get()
but that is by design while I was re-orging things. My next step is to use the renamed methods to swim back upstream to see where we should do the splits.
from pond.
For reference in the future, the python changes are mixed in here:
esnet/pypond@1048844
The javascript changes are here:
c933d09
Closing now.
from pond.
Related Issues (20)
- Timeseries with names containing dots are not compatible with fieldSpecs
- fixedWindowRollup of less than 1 second HOT 3
- Strong-type data points, avoid magic strings HOT 3
- How to use a Rolling/Sliding Window HOT 1
- How to append new events? HOT 4
- Suggested addition to rollup callendar intervals
- Remove moment.js to improve bundle sizes HOT 2
- TimeEvent implementation does not exist in index.d.ts HOT 5
- Displaying a specific column from TS
- TimeSeries.columns() throws iterator exception instead of returning empty array
- A time-series database? HOT 1
- Time-Index in Milliseconds? HOT 1
- duplicate *events def in timeseries.js v0.9.0
- Filter TimeSeries in pond 0.9
- Sampling of TimeSeries
- one of my favourite projects on github
- Creating a TimeSeries of IndexedEvent with indexes before epoch throws TypeError
- Use Generator for events generator function return type
- Property 'range' does not exist on type 'TimeSeries' in v0.9.0
- Index.getIndexString parameter definition missed in v0.9.0
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pond.