The analysis can be replicated as follows:
-
Obtain environmental microbe data graph with file name:
masterG.edgelist.tsv
-
Assuming in main repository directory, run:
make
This will generate a set of files and the file:
masterG_edges.tsv
will be formatted to work with the Embiggen package.
The file:
masterG_edges_nodes_intindex.txt
will be formatted to work with the SNAP HPC implementation, including node2vec.
In addition, the file:
masterG.edgelist_col12_nodes_meta.txt
contains node types formatted as metadata for the embedding projector (see below).
-
Run SNAP node2vec with the following parameters and slurm script.
-
The resulting embeddings can be visualized:
- Using the UMAP notebook.
- Using the embedding projector.