Comments (4)
The benefits of an approach are that:
a) it is recognisable to anyone who knows the data
b) one can sort a CSV file on the column (using e.g. a linux sort) and then iterate over it to do e.g. a time series map without actually loading into a database - it is already in time sorted order
c) it is possible that pre-sorting data will make it load quicker into a data store (e.g. database) as there is less fragmentation in the data to arrange
d) there is no change of collisions due to duplicate hashing
This structure would colocate data about an individual in the file. If your query patterns were primarily time oriented (e.g. across all individuals) then it might make sense to reverse the key order (time first, then individual).
The downside is this is likely to require more bytes than a hash and changing resolution (e.g. from seconds to nanoseconds) would result in a changed length of key (different number of bytes) and that means a variable width type is needed.
from data-publication.
We might probably use 891:20150301112532
(with colon), to reflect the format of the occurrenceIDs of our other datasets. Any downsides?
from data-publication.
Looks good to me.
It was just a suggestion...
from data-publication.
Done in ff49bcd
from data-publication.
Related Issues (20)
- Derive positions for observations HOT 4
- Incorrect DOI for vis-estuarine-monitoring-events HOT 3
- Base minimumDistanceAboveSurfaceInMeters on altitude_agl HOT 1
- Add heading to bird tracking data HOT 1
- Remove ownerInstitutionCode HOT 1
- Review dung beetles DwC mapping HOT 6
- DCAT parsing issue on carriage returns HOT 2
- Verkeerd gebied in de dataset HOT 3
- Suggestions invasive muntjak occurrences HOT 5
- `coordinateUncertaintyInMeters` doesn't match `verbatimCoordinateSystem`
- Correct shortname for nbn-fish-damage-pump-stations HOT 1
- Complete metadata guideline on wiki
- Complete data guidelines for occurrences on wiki
- Review of watervogels sampling-event dataset HOT 12
- Count data 2018 is missing from summer geese gbif dataset HOT 9
- Different occ count for VMM macroinvertebrates dataset HOT 1
- Include individual count in bullfrog dataset on gbif HOT 1
- Whip vis-non-native-fish-occurrences HOT 2
- Review florabank2 HOT 6
- where are the Ameiurus melas on gbif ?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from data-publication.