GithubHelp home page GithubHelp logo

dgit's People

Contributors

dependabot[bot] avatar gitter-badger avatar pingali avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

dgit's Issues

More extensions

Database backends:

  1. Postgres, hive (native and qubole), hbase

Transformations:

  1. Analysis of dataset contents for outliers etc.

support for dat

Some possible integration points:

  1. Make dgit repos available through dat for sharing and discovery
  2. Extend dat to support validation, materialization etc.

http://dat-data.com

rst from metadata

Instead of having a separate metadata server, how about converting it into an RST file that can be searched/viewed in github itself. This will reduce one service depenedency.

Generator script

Support a generator script - accessing the dataset's content
requires running the generator script (bash?)

rename dgit

The name is conflicting with multiple projects
(a) debian project
(b) github's own distributed data git

Some alternative names:
(a) dmgit (data management git)
(b) mgit (metadata git)
(c) git-data-manage
... Anything else?

Tutorials for end users

Has to show the following (ideally in a single tutorial):

  1. Base usecase
  2. Validation
  3. Transformation

Extract dataset contents

dgit extract

Right now we only have a generic shell command to use.

dgit sh cp -r <filename or pattern ~/target

This extract should
(a) leave a log action record
(b) incorporate git commit information
~/target/username/reponame//filename

Renaming cloned repos

When repos are cloned from s3, the username is being replaced.
It should not be.

[Wrong] DrDrake/regression.git -> pingali/regression.git

Snippets and links

  1. Add support for including snippets, readme using a separate command like

dgit add --snippet

  1. Add support for adding resource links

dgit add --link s3://....

This resource is, for some reason, outside the git (e.g., too big)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.