GithubHelp home page GithubHelp logo

lenamax2355 / python-dataframe-diff-overlap Goto Github PK

View Code? Open in Web Editor NEW

This project forked from frank-yifei-wang/python-dataframe-diff-overlap

0.0 0.0 0.0 15 KB

Compare two Pandas DataFrame using Python to find the difference/overlap (what's in DataFrame A but not B and vice versa, and what A has is also in B and vice versa)

Python 34.87% Jupyter Notebook 65.13%

python-dataframe-diff-overlap's Introduction

Python DataFrame Diff and Overlap Functions

Ever found yourself wondering what rows are in Pandas DataFrame A only, but not in B (or vice versa), and what rows are in A, as well as in B (or vice versa)?

While there exists pandas.DataFrame.diff() (more like an element-by-element subtraction), pandas.Index.intersection (only on the index) and pandas.DataFrame.merge() (more like SQL join to merge two DataFrames), none of them does exactly what we need here.

Therefore I wrote these two functions df_diff() and df_overlap() to do exactly that -- see the visual demo below.

Great things about them:

  • Can compare on any column (not only on the index - but default to index if omit arguments on_A and/or on_B)
  • Can compare on different column combination (like column_this in DataFrame A and column_that in DataFrame B. You can also insert concatenated id columns of your own and then compare on them)
  • Can handle duplicated/missing ids correctly
  • They are mainly based on set operations -- super fast!
  • Has error check/handling (will safely return empty DataFrame if found no result or got invalid input)

To use the function(s), simply copy them from file df_diff_overlap.py. Or try notebook df_diff_overlap.ipynb if you prefer the step-by-step Jupyter way!

image

python-dataframe-diff-overlap's People

Contributors

frank-yifei-wang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.