GithubHelp home page GithubHelp logo

Comments (6)

mikapfl avatar mikapfl commented on August 27, 2024 1

Is this the right way in general? Maybe i'm also missing something conceptually.

That's one way to do it. If you already have the data in wide format, it is probably easiest to go via the interchange format because we have reading routines there. If you have data in some other format and have to roll your own reading function, I would also consider building the native xarray format directly, because it is more expressive and you can use xarray's toolbox immediately.

from primap2.

rgieseke avatar rgieseke commented on August 27, 2024 1

Thanks for the quick reply, Mika! (Maybe you could enable "Discussions" on this repo, for questions like this.)

For the record the following works

df = pd.read_csv("rcmip-emissions-annual-means-v5-1-0.csv")
df = df.drop(["Activity_Id", "Mip_Era"], axis=1)
df["Entity"] = df.Unit.apply(lambda x: x.split(" ")[1].split("/")[0])

coords_cols = {
    "unit": "Unit",
    "area": "Region",
    "model": "Model",
    "scenario": "Scenario",
    "entity": "Entity",
    "category": "Variable"
}
coords_defaults = {
    "source": "RCMIP",
}
coords_terminologies = {
    "area": "RCMIP",
    "category": "RCMIP",
    "scenario": "RCMIP"
}
meta_data = {
    "references": "doi:10.5194/gmd-13-5175-2020",
    "rights": "CC BY 4.0 International",
}
data_if = pm2.pm2io.convert_wide_dataframe_if(
    df,
    coords_cols=coords_cols,
    coords_defaults=coords_defaults,
    coords_terminologies=coords_terminologies,
    meta_data=meta_data,
    filter_keep={"f1": {
        "Model": "CEDS/UVA/GCP/PRIMAP",
        "Region ": "World"
    }}
)

rcmip = pm2.pm2io.from_interchange_format(data_if)

As you noted

rcmip["CH4"].loc[{"category (RCMIP)":"Emissions|CO2"}]

and the like will be nan - but i'm not sure what you mean by processing the category?

from primap2.

rgieseke avatar rgieseke commented on August 27, 2024 1

I've never used it, why is it better than issues?

It feels a bit more light-weight than an actual 'issue' (one can also move issues to discussions) and i guess a bit like on Stackoverflow one can select an answer as "the right one".

from primap2.

mikapfl avatar mikapfl commented on August 27, 2024

Hi Robert,

Unfortunately, read_wide_csv_file_if doesn't support re-using a source column multiple times in coords_cols because behind the scenes it just does a renaming. Probably something it should support, so maybe worth opening a bug report - but I can't commit to when I will have time to fix it.

Meanwhile, you can pre-process the dataframe yourself and use convert_wide_dataframe_if directly. This does less checking of arguments etc., so error messages might be weirder, but it works with surprisingly little changes to your code since you are reading the dataframe anyway:

df = pd.read_csv("rcmip-emissions-annual-means-v5-1-0.csv")
df["Entity"] = df.Unit.apply(lambda x: x.split(" ")[1].split("/")[0])

coords_cols = {
    "unit": "Unit",
    "area": "Region",
    "model": "Model",
    "scenario": "Scenario",
    "entity": "Entity",
    "category": "Variable"
}
coords_defaults = {
    "source": "RCMIP",
}
coords_terminologies = {
    "area": "RCMIP",
    "category": "RCMIP",
}
meta_data = {
    "url": "https://doi.org/10.5194/gmd-13-5175-2020",
    "rights": "CC BY 4.0 International",
}
data_if = pm2.pm2io.convert_wide_dataframe_if(
    df,
    coords_cols=coords_cols,
    coords_defaults=coords_defaults,
    coords_terminologies=coords_terminologies,
    meta_data=meta_data,
    filter_keep={"f1": {
        "Model": "CEDS/UVA/GCP/PRIMAP",
    }}
)
data_if

Of course, I would do some processing on category before using the data, if you convert it as is into the primap2 native xarray format, you might run out of memory storing all those NaNs for entity CO2 and Category Emissions|CH4.

Cheers,

Mika

from primap2.

mikapfl avatar mikapfl commented on August 27, 2024

(Maybe you could enable "Discussions" on this repo, for questions like this.)

I've never used it, why is it better than issues?

and the like will be nan - but i'm not sure what you mean by processing the category?

If you want to use this data with primap2, then it would be good to properly separate the dimensions entity and category, so Emissions|CO2 and Emissions|CH4 should both be translated to Emissions. That gives you efficient memory use and easy filtering in both dimensions without string processing (e.g. you can efficiently answer the question "give me all Emissions of Germany in 2015" without doing any category.str.split('|') calls). Of course, you can't use the RCMIP categorization then, and have to use something custom instead.

from primap2.

rgieseke avatar rgieseke commented on August 27, 2024

Got it, that's where we started with the RCMIP climate categories discussion, thanks!

from primap2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.