GithubHelp home page GithubHelp logo

lmores / mismo Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nickcrews/mismo

0.0 0.0 0.0 9.95 MB

The SQL/Ibis powered sklearn of record linkage

Home Page: https://nickcrews.github.io/mismo/

License: GNU Lesser General Public License v3.0

Python 99.52% Just 0.48%

mismo's Introduction

Mismo

PyPI - Version PyPI - Python Version

The SQL/Ibis powered sklearn of record linkage.

Still in alpha stage. Breaking changes will happen frequently and with no warning. Once things are more stabilized I will come up with a stability policy. Any suggestions as to how you want the API to look like would be greatly appreciated.


Installation

I have claimed mismo on PyPI, but I won't update it often until this is more stable. Until then, install from source:

python -m pip install "mismo[viz] @ git+https://github.com/NickCrews/mismo@<SOME-SHA-OR-BRANCH>"

Goals

Mismo tries to be the sklearn of record linkage, backed by the scalability and power of SQL and Ibis. It is made of many small data structures and functions, each with a well-defined and standard API that allows them to be composed together and extended easily. None of the other record linkage packages I have seen, such as Splink, Dedupe, or Record Linkage Toolkit, had all of these properties, so I decided to make my own.

See Goals and Alternatives for a more detailed discussion of the goals of Mismo and how it compares to other record linkage packages.

Features

  • Supports larger-than-memory datasets, executed on powerful SQL engines. Use DuckDB for prototyping and for jobs up to maybe ~10M records, or Spark or other distributed backends for larger tasks, without needing to change your code!
  • Use the clean, strong-typed, pythonic, and Dataframe API of Ibis.
  • Small, modular functions and data structures that are easy to plug together and extend.
  • Layered API: Use top-level APIs if your task is common enough that it is supported out of the box.

Examples

See the example notebook.

Documentation

See the documentation.

Contributing

See the contributing guide.

License

mismo is distributed under the terms of the LGPL-3.0-or-later license.

mismo's People

Contributors

dependabot[bot] avatar jstammers avatar lmores avatar nickcrews avatar olivierbinette avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.