GithubHelp home page GithubHelp logo

Comments (5)

nilesh-c avatar nilesh-c commented on May 30, 2024

Hi, I'm a 3rd year student of computer science, pursuing my Bachelor's degree at RCC Institute of Information Technology, India. I am interested in learning about this too, having the same doubts as you do. Some clarification of what types of data we are looking at would be very helpful, especially if someone can provide example datasets in csv/xml format to play with and get an idea.

Cheers,
Nilesh

from dev.

mick avatar mick commented on May 30, 2024

@nikitsaraf @nilesh-c

Let me help clarify.

  1. The focus of project as of now is matching records from 2 datasets based on address found in both datasets. Many of the datasets we have been working on with cities do not clearly match up with data from other sources, or data from other departments in the city. Focusing on the use case of datasets match on address is just a place to start, this tool could also prompt the user to select which columns they would like to match.
  2. For this project lets assume the datasources are all csv files. We could support other sources as well, but csv are common to government
  3. Building this as a tool that includes dedupe as a dependency I think makes the most sense. Dedupe is a powerful tool, so making it easier to use would be great.

from dev.

nikitsaraf avatar nikitsaraf commented on May 30, 2024

Hi Mick!

Thank you so much for your prompt reply and helping me clarify my doubts.

As, I said before, I have dedupe installed and running on my system. I tried a couple of examples on their sample data and it is fairly easy to use without any complications.

Can you provide me some more details on the use-case of this tool ? Who will utilize this tool (To decide whether to build a Web-Based tool or a python tool itself with a simpler User Interface) ? So, that I can start thinking over the User-Interface and the level of abstraction to be given to this tool.

Also, If you can provide me with some your sample data, I can test it on dedupe, and check whether it can serve our use-cases.

from dev.

mick avatar mick commented on May 30, 2024

@nikitsaraf,

The use case that we have been talking about is user to run this all in their browser, so they dont even need to install a tool. It should be flexible enough to all for the user to select what columns to match on, provide training, and work through manually matching if needed (this might make more sense as a separate tool)

Dedupe has some sample data you can get started with. But if you want something more advanced I'd suggest grabbing two datasets off https://data.sfgov.org/ (or another city's open data portal) that include address that should match, like all businesses vs restaurant inspection scores

from dev.

nikitsaraf avatar nikitsaraf commented on May 30, 2024

@dthompson I have submitted my proposal. Please review and let me know for any clarifications.

from dev.

Related Issues (16)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.