GithubHelp home page GithubHelp logo

xgi-org / xgi-data Goto Github PK

View Code? Open in Web Editor NEW
4.0 5.0 1.0 71.33 MB

Standardized higher-order datasets with corresponding datasheets

Home Page: https://zenodo.org/communities/xgi

License: Other

Python 86.37% Jupyter Notebook 13.63%
datasheet hypergraph json

xgi-data's People

Contributors

maximelucas avatar nwlandry avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

romups

xgi-data's Issues

Create a README for each dataset

Each dataset should have an accompanying README explaining (among other things) the statistics of the hypergraphs, dataset attributes, limitations, details on how the dataset was collected, and references.

Move xgi-data to Zenodo

During our last meeting it came up that GitHub's file size restrictions are harming xgi-data. After discussing alternatives, I volunteered to determine whether we can host our datasets in Zenodo. We not only want to host but also a programmatic way for users of XGI to download the data sets.

The answer is yes, I think we can do this. This code should work:

import requests
import gzip
import pandas as pd
from io import StringIO

# This URL points to one of the files of this dataset: https://zenodo.org/record/2539424
# It is a compressed plain text file containing a directed edge list
url = 'https://zenodo.org/api/files/c2d18dc0-c4ca-4454-aa3e-fd6e035b4f87/dewiki.wikilink_graph.2003-03-01.csv.gz'

# Download the file and make sure everything went ok
r = requests.get(url)
assert r.ok

# decompress (it is a .gz file) and decode (the return type is bytes, not str) and 
data = gzip.decompress(r.content).decode('utf-8')

# Now data is a string with the entire contents of the downloaded file. Since pd.read_csv
# expects a file and not a string, use StringIO to give the data a file-like interface.
# Also, watch the delimiter!
df = pd.read_csv(StringIO(data), delimiter='\t')

df.head()
# page_id_from page_title_from     page_id_to page_title_to
# 0            10   Aussagenlogik  7334       Äquivalenz
# 1            10   Aussagenlogik  767        Boolesche Algebra
# 2            10   Aussagenlogik  1217       Disjunktion
# 3            10   Aussagenlogik  2446       Implikation
# 4            10   Aussagenlogik  2895       Konjunktion

So we can easily encapsulate these lines in a function that would then replace xgi.download_xgi_data.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.