GithubHelp home page GithubHelp logo

pandas_wide_to_long's Introduction

PANDAS_WIDE_TO_LONG

A Python script to convert a wide csv to long csv

REQURMENTS

Pandas library

BRIEF EXPLNATION

I've decided to publish this very simple Python script because for a beginner, like me, there is a lack of examples, in plain language to transform a wide csv file into a long format csv file. The truth is: technical documentation is written by technical people and the entry level is not as easy as tehcnical people think.

EXAMPLE

For this script I used a quite complex wide format csv that has 145 columns and 278 rows. It's COVID-19 data from Portugal, compiled by all of us at @VOSTPT

The structure of the wide format csv is as follows

CONCELHO LAT LONG DATE 1 DATE 2 DATE 3 to DATE 144 DATE 145
COUNTY 1 X.XX X.XX value value values value
COUNTIES 2 - 277 X.XX Y.YY value value values value
COUNTY 278 X.XX X.XX value value values value

In order to work with most data analysis and graphical packages that exist, we need to use something called a "Long Format". Basically we need to turn the above into this

CONCELHO LAT LONG DATA VALUE
COUNTY 1 X.XX Y.YY DATE 1 VALUE (DATE 1 )
COUNTY 2 X.XX Y.YY DATE 1 VALUE (DATE 1 )
COUNTY 278 X.XX Y.YY DATE 1 VALUE (DATE 1 )
COUNTY 1 X.XX Y.YY DATE 145 VALUE (DATE 1 )
COUNTY 2 X.XX Y.YY DATE 145 VALUE (DATE 1 )
COUNTY 278 X.XX Y.YY DATE 145 VALUE (DATE 1 )

THE SCRIPT

This script allows you to take any wide format data file into a long format datafile

We start by importing the pandas library

Then we assign a variable "df" to the result of reading the datafile

As you can see by the example we have 3 columns (CONCELHO, LAT, LONG) that we want to keep, and the dates start from the 4th column on

We assign a variable called "dates" to all the columns from the 4th column on

Then we create a variable called "longform" and use pd.melt to do the following

  • Based on the our "df" dataframe we tell Pandas to keep CONCELHO, LAT, LONG
  • We tell Pandas to use our variable "dates" to create two new columns, one based "DATA", and one named "INC"

From this moment on you can start working with your long format dataframe as I did here or you can save the result to a csv file.

CONCLUSIONS

After understanding how pd.melt or pandas.melt work it's quite easy to transform a wide format datafile into a long format. However the documentation that exists, for those that are beginning, it's not ver clear. I hope that this script and explanation can be helpful.

pandas_wide_to_long's People

Contributors

jorgemiguelgomes avatar

Watchers

 avatar

Forkers

guimarais

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.