Comments (5)
Hi, I'm a 3rd year student of computer science, pursuing my Bachelor's degree at RCC Institute of Information Technology, India. I am interested in learning about this too, having the same doubts as you do. Some clarification of what types of data we are looking at would be very helpful, especially if someone can provide example datasets in csv/xml format to play with and get an idea.
Cheers,
Nilesh
from dev.
Let me help clarify.
- The focus of project as of now is matching records from 2 datasets based on address found in both datasets. Many of the datasets we have been working on with cities do not clearly match up with data from other sources, or data from other departments in the city. Focusing on the use case of datasets match on address is just a place to start, this tool could also prompt the user to select which columns they would like to match.
- For this project lets assume the datasources are all csv files. We could support other sources as well, but csv are common to government
- Building this as a tool that includes dedupe as a dependency I think makes the most sense. Dedupe is a powerful tool, so making it easier to use would be great.
from dev.
Hi Mick!
Thank you so much for your prompt reply and helping me clarify my doubts.
As, I said before, I have dedupe installed and running on my system. I tried a couple of examples on their sample data and it is fairly easy to use without any complications.
Can you provide me some more details on the use-case of this tool ? Who will utilize this tool (To decide whether to build a Web-Based tool or a python tool itself with a simpler User Interface) ? So, that I can start thinking over the User-Interface and the level of abstraction to be given to this tool.
Also, If you can provide me with some your sample data, I can test it on dedupe, and check whether it can serve our use-cases.
from dev.
The use case that we have been talking about is user to run this all in their browser, so they dont even need to install a tool. It should be flexible enough to all for the user to select what columns to match on, provide training, and work through manually matching if needed (this might make more sense as a separate tool)
Dedupe has some sample data you can get started with. But if you want something more advanced I'd suggest grabbing two datasets off https://data.sfgov.org/ (or another city's open data portal) that include address that should match, like all businesses vs restaurant inspection scores
from dev.
@dthompson I have submitted my proposal. Please review and let me know for any clarifications.
from dev.
Related Issues (16)
- Google Summer of Code 2013 HOT 2
- GSoc 2013 Proposal HOT 5
- Civic APIs HOT 3
- Open311, Developing (& Migrating) to a better GeoReport API Application Server implemented using elasticsearch and flask / django. HOT 6
- [GSoC 2013] Interested in extending dedupe and working on Automated Data Matching HOT 27
- [GSOC 2013] Adopt-a HOT 2
- Open311 dashboard enchancement proposal HOT 4
- GSoC Proposal: Adopt-a HOT 4
- GSoC Proposal: Civic APIs
- [GSOC 2013] Open311 Visualization Tools and 311 API Tools HOT 1
- Make Data Matching Easy! - GSoC
- Food_inspection_api
- [GSOC 2013] Java wrapper of the GeoReportv2 API
- [GSoC 13] OpenCounter and Business Data API
- [GSoC 2013] Civic APIs
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dev.