The xgi-data's discuss from xgi-org

Scripts to check that JSON is in standard format

A python script that accepts a JSON file as input and checks it to make sure that it's in a format that agrees with the specification.

Standard JSON format

Develop a JSON file format to store hypergraphs in an explainable way.

During our last meeting it came up that GitHub's file size restrictions are harming xgi-data. After discussing alternatives, I volunteered to determine whether we can host our datasets in Zenodo. We not only want to host but also a programmatic way for users of XGI to download the data sets.

The answer is yes, I think we can do this. This code should work:

import requests
import gzip
import pandas as pd
from io import StringIO

# This URL points to one of the files of this dataset: https://zenodo.org/record/2539424
# It is a compressed plain text file containing a directed edge list
url = 'https://zenodo.org/api/files/c2d18dc0-c4ca-4454-aa3e-fd6e035b4f87/dewiki.wikilink_graph.2003-03-01.csv.gz'

# Download the file and make sure everything went ok
r = requests.get(url)
assert r.ok

# decompress (it is a .gz file) and decode (the return type is bytes, not str) and 
data = gzip.decompress(r.content).decode('utf-8')

# Now data is a string with the entire contents of the downloaded file. Since pd.read_csv
# expects a file and not a string, use StringIO to give the data a file-like interface.
# Also, watch the delimiter!
df = pd.read_csv(StringIO(data), delimiter='\t')

df.head()
# page_id_from page_title_from     page_id_to page_title_to
# 0            10   Aussagenlogik  7334       Äquivalenz
# 1            10   Aussagenlogik  767        Boolesche Algebra
# 2            10   Aussagenlogik  1217       Disjunktion
# 3            10   Aussagenlogik  2446       Implikation
# 4            10   Aussagenlogik  2895       Konjunktion

So we can easily encapsulate these lines in a function that would then replace xgi.download_xgi_data.

Convert data repository to JSON

Convert the datasets at https://www.cs.cornell.edu/~arb/data/ to this format.

Create a README for each dataset

Each dataset should have an accompanying README explaining (among other things) the statistics of the hypergraphs, dataset attributes, limitations, details on how the dataset was collected, and references.

xgi-org / xgi-data Goto Github PK

xgi-data's Issues

NDC-substances not loading

Scripts to check that JSON is in standard format

Standard JSON format

Move xgi-data to Zenodo

Convert data repository to JSON

Create a README for each dataset

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs