alibaba / euler Goto Github PK

A distributed graph deep learning framework.

License: Apache License 2.0

CMake 1.72% Python 22.20% C++ 74.92% Shell 0.24% Lex 0.15% Yacc 0.72% C 0.06%

graph graph-learning network-embedding deep-learning graph-convolutional-networks graph-neural-networks graphsage random-walk node2vec graph-embedding

euler's Introduction

概述

Euler-2.0 新增功能

基础教程

进阶教程

详细接口

联系我们

如果有任何问题，请直接提交issues，也欢迎通过Euler开源技术支持邮件组（[email protected]）联系我们。

License

Euler使用Apache-2.0许可

致谢

Euler由阿里妈妈工程平台团队与搜索广告算法团队共同探讨与开发，也获得了阿里妈妈多个团队的大力支持。同时也特别感谢蚂蚁金服的机器学习团队，项目早期的一些技术交流给予我们的帮助。

euler's People

Contributors

Stargazers

Watchers

Forkers

chengenbao young-scott diorsking ali-jiandu tutty427 mbrukman jacklee20151 liangguochang lengrui1988 howww zhoudaqing zjx4041739 aisnnu futuremac liukangling mrcaohh shanhaijun jokoz bobo-peng zhuzhibin1988 supersakura shaoguangcheng yinjie1230 deephao husthuke skymysky wocclyl xjump hengqujushi burness catyans ouyangwen-it jdlc105 gaoyz0625 jiangxunyue emilywangattri xuchensjtu hitflame gaoli1990 lucosax dingxiye kevin891219 formath yishuihanhan zhanglae youngjt wangchen663 blank-1 dmjvictory yuehanlyu wycharry b-xiang hhh920406 ixiejun tangkuo zeigar zhouyonglong batermj maxiao001 enjoyor-ai theseusyang zyody wowcplusplus sunshinehome wwjiang007 fangchen1993 zyz282994112 kongdzh gharthur jepsonw linsocurry gaoxiaoninghit sain fangqingan hhy5277 302775293 zhiwuya liao1995 littlegreay yangzhihui0627 hi-yan solarwinds1991 tianyikenan dongzhaoyu ezhangle awesomemachinelearning moonwater xeransis onceagainitsrxvn archsaber bineea ceciet shaunstanislauslau autowonderman waipptt hyzcn mysqlsc kevinyzy yxh1990 jabc1

euler's Issues

OSError: libjvm.so: cannot open shared object file: No such file or directory

执行python ppi_data.py报错，检查下java环境变量，export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export LD_LIBRARY_PATH=$JAVA_HOME/lib/server:$LD_LIBRARY_PATH，这里$JAVA_HOME/lib文件夹下并没有server，是还缺少什么吗

English description

Hi,

Could you please add an English description? It would be really nice.
Thanks a lot for your work!

Will this project will support the Pytorch?

Excellent work!

May I kindly ask if this project will support the Pytorch?

Thank you so much.

Node2vec模型的几个疑问？

首先感谢开源这么好的框架！
在run_loop.py第200行，我注意到Node2Vec模型后面几个参数（如walk_len）都是不支持自定义的。
另外，在模型编写一节，

将DeepWalk套入到上面的范式中，其实现可以分为以下三步：

从顶点进行随机游走生成源点正例对，并采样出负例；

将源点、正例和负例embedding到向量；

从源点、正例和负例的embedding向量计算出交叉熵loss和mrr。

因此，对于Node2vec我有下面这几个疑问，感谢您的解答。
Q1. 后续walk_len等这些参数会支持自定义吗？
Q2. 是不是walk_len和num_negs分别表示对于某一个源点的正例和负例的数量？
walk_len不适合设置过大的值吗（如40）?
walk_len是不是和word2vec中的window_size参数的含义是相似的？
Q3. num_negs, walk_len, left_win_size, right_win_size的建议取值范围是怎样的？

关于分布式训练导致测试过程F1下降问题

欧拉你好，我们这边在测试官方提供的PPI数据过程中，单机训练默认参数同样step下，f1 = 0.6左右，但是在分布式环境部署后同样版本的欧拉系统 f1训练结束后降低到0.36左右。数据切分方式是直接切分，比如固定block数切分。请问f1 score下降的主要原因是什么？

PS: 分布式训练采用 1ps + 3worker

train模式下指定的 --id_file不生效

你好，我想利用部分数据进行模型的训练，使用如下命令后

python -m tf_euler --data_dir test_data --id_file test_data/train_id.csv \
--max_id 999 \
--feature_idx 0 --feature_dim 3 -—label_idx 1 --label_dim 1 \
--num_classes 2  \
--model graphsage_supervised --mode save_embedding \
--batch_size 10 --num_epochs 100

发现，embdeding中的id并没有只使用指定的部分id。当然，查看文档中的描述是说 --id_file是用于测试集id文件，evaluate必须；，请问在训练模式下，有什么办法指定训练、校验、和测试集的参数吗？

关于节点标签的问题

使用半监督GraphSage模型，在生成数据block时，只有部分节点有标签，其他没有标签的节点如何设置，以给的PPI的数据为例，

当节点的标签未知时，float_feature的第0项如何设置

tf高级接口的所有op操作都是在cpu上执行的吗？

首先感谢能够开源这么好的框架，我现有个疑惑想请教一下，我记得tf的操作会首先放到gpu上，而我看咱们的tf接口只是实现了cpp。所以有些疑惑如果用tf调咱们的框架是否需要其它配置？

why not support python3

Good job!

but I have a question,why not support python3 ? python2 is outdated~

deepwalk

在用ppi数据测试deepwalk时，遇到如下错误：
Traceback (most recent call last):
File "scripts/DeepWalk.py", line 59, in
_, loss, metric_name, metric = model(source)
File "/usr/local/lib/python2.7/dist-packages/tf_euler/python/layers.py", line 62, in call
outputs = self.call(inputs)
File "scripts/DeepWalk.py", line 25, in call
loss, mrr = self.decoder(embedding, embedding_pos, embedding_negs)
File "scripts/DeepWalk.py", line 50, in decoder
mrr = tf_euler.metrics.mrr_score(logits, neg_logits)
File "/usr/local/lib/python2.7/dist-packages/tf_euler/python/metrics.py", line 40, in mrr_score
all_logits = tf.concat([negative_logits, logits], axis=2)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_ops.py", line 1124, in concat
return gen_array_ops.concat_v2(values=values, axis=axis, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 1033, in concat_v2
"ConcatV2", values=values, axis=axis, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1792, in init
control_input_ops)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1631, in _create_c_op
raise ValueError(str(e))
ValueError: Dimension 0 in both shapes must be equal, but are 8 and 1. Shapes are [8] and [1]. for 'deepwalk_1/mrr/concat' (op: 'ConcatV2') with input shapes: [128,6,1,8], [128,6,1,1], [] and with computed input tensors: input[2] = <2>.

分布式训练中模型保存到了哪里？

Euler支持传统的GCN实现么，还是只能用ScalableGCN?

如题，谢谢~

运行"快速开始"的例子时出错

我的python环境是2.7.5，操作系统是redhat7.2，执行
pip install euler-gl
安装，在运行
https://github.com/alibaba/euler/wiki/%E5%BF%AB%E9%80%9F%E5%BC%80%E5%A7%8B
这个例子时，跑到
python ppi_data.py
这一步，报这个错误：

Traceback (most recent call last):
File "ppi_data.py", line 32, in
from euler.tools import json2dat
File "/usr/lib/python2.7/site-packages/euler/init.py", line 20, in
from euler.python.service import *
File "/usr/lib/python2.7/site-packages/euler/python/service.py", line 27, in
_LIB = ctypes.CDLL(_LIB_PATH)
File "/usr/lib64/python2.7/ctypes/init.py", line 360, in init
self._handle = _dlopen(self._name, mode)
OSError: libhdfs.so.0.0.0: cannot open shared object file: No such file or directory

单机安装还需要配置hdfs么？

运行demo时报如下错误：tensorflow.python.framework.errors_impl.NotFoundError: /usr/local/lib/python2.7/dist-packages/tf_euler/python/euler_ops/libtf_euler.so: undefined symbol: _ZN10tensorflow12OpDefBuilder4AttrESs

其中tensorflow是源码安装的1.12,euler-gl是pip安装的，运行快速开始的demo时报了如下错误：
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 163, in _run_module_as_main
mod_name, _Error)
File "/usr/lib/python2.7/runpy.py", line 111, in _get_module_details
import(mod_name) # Do not catch exceptions initializing package
File "/usr/local/lib/python2.7/dist-packages/tf_euler/init.py", line 21, in
from tf_euler.python import encoders
File "/usr/local/lib/python2.7/dist-packages/tf_euler/python/encoders.py", line 24, in
from tf_euler.python import euler_ops
File "/usr/local/lib/python2.7/dist-packages/tf_euler/python/euler_ops/init.py", line 20, in
from tf_euler.python.euler_ops.base import *
File "/usr/local/lib/python2.7/dist-packages/tf_euler/python/euler_ops/base.py", line 31, in
_LIB_OP = tf.load_op_library(_LIB_PATH)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/load_library.py", line 60, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /usr/local/lib/python2.7/dist-packages/tf_euler/python/euler_ops/libtf_euler.so: undefined symbol: _ZN10tensorflow12OpDefBuilder4AttrESs

请教两个问题

1、采样是在训练时进行的吗，对比提前采样这样的好处是什么
2、在性能测试中，5台机器一共480个物理核，然而一个tf worker需要16个核，600个worker是如何分配资源的

谢谢！

why git clone Failed...THX!

git clone --recursive https://github.com/alibaba/euler.git
...
# Failed to recurse into submodule path 'third_party/grpc'

see https://github.com/IvanaXu/MyLearn/blob/master/Euler/001.install_error.md
SO I GOT IT BY PYPI.

euler会修改tensorlow安装吗

我的tensorflow是1.12的，docker版，装euler前是能用的，装完euler就不能用了。
安装方式是pip install euler-gl
之后再import tensorflow就会返回：
from tensorflow.python.keras._impl.keras.backend import abs
ImportError: cannot import name abs
请问何解?

关于LasGNN的参考文献

LasGNN提到的把node-wise转化成layer-wise的想法，在nips-2018的一篇文章Adaptive Sampling
Towards Fast Graph Representation Learning已经提过。是否考虑引用？另外一个相似的工作是fastGCN。

关于计算资源和复杂度的咨询

刚切入到图神经网络这块，之前也一直在调研图神经网络，效果不错，目前工作也涉及到这些，以下有几点比较关切，还望能够得到靠谱的建议。
1.关于计算资源的，如果是上亿规模的节点，以及与之相对应的量级的边，用多大的机器配置比较合适？
2.关于数据倒入的问题，怎么样能够快速生成适合euler格式的json数据？
3.spark里面的GraphX也是专门处理图的，图计算引擎是基于pregrel模式，存储是基于点分割，据说也是目前业界的主流。那么euler跟graphX之间的关系是怎么样的呢？相互补充，还是包含？
4.目前来看euler主要是图计算，图模型，图存储这块看到的不多，请问这是打算用其它工具吗？还是考虑后续开发和改进呢？
Thanks。

Doubts on Hadoop Support

I read the quick start introduction and it says that you are using a Hadoop 2.9.2 which is a pretty high version. Is there any requirements for Hadoop(HBase Hive…) version? Also is it support Apache Hadoop only? I currently have a cluster running CDH5.7(Hadoop 2.6), need I update my Hadoop or change it to an original Apache Hadoop?

Looking forward to reply, thank you very much!

LINE / DeepWalk之类的非监督算法如何启动？

我今天尝试了用euler在本地跑LINE算法生成embedding，图参照ppi_data.py的方法生成了，执行指令是：
python -m tf_euler --data_dir graph --model line --mode save_embedding --max_id 109915

2019-02-26 18:23:56.241471: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key line_1/target_embedding/embeddings not found in checkpoint

请问是哪里出了问题？能否提供一个Line / Deepwalk算法的示例工程？

编译安装时遇到问题

主机环境：Ubuntu16.04
遇到问题：执行git clone --recursive https://github.com/alibaba/euler.git时，报错fatal: unable to access 'https://chromium.googlesource.com/chromium/llvm-project/llvm/lib/Fuzzer/': Failed to connect to chromium.googlesource.com port 443: Connection timed out
请问这个问题需要怎么解决，要配置代理服务器吗

关于数据 block

看到数据格式是block 如下所示，
{
"node_id": "顶点编号，int",
"node_type": "顶点类型，int",
"node_weight": "顶点权重，float",
"neighbor": {"边类型": {"邻居id": "权重", "...": "..."}, "...": "..."},
"uint64_feature": {"属性编号": ["int", "..."], "...": "..."},
"float_feature": {"属性编号": ["float", "..."], "...": "..."},
"binary_feature": {"属性编号": "string", "...": "..."},
"edge":[{
"src_id": "起点id, int",
"dst_id": "终点id, int",
"edge_type": "边类型, int",
"weight": "边权重, float",
"uint64_feature": {"属性编号": ["int", "..."], "...": ["int", "..."]},
"float_feature": {"属性编号": ["float", "..."], "...": ["float", "..."]},
"binary_feature": {"属性编号": "string", "...": "..."}
}, "..."]
}

请问一下是否支持自定义block，也就是不使用block中的所有键值对，如下：
{
"node_id": "顶点编号，int",
"node_type": "顶点类型，int",
"neighbor": {[]},
"edge":[{
"src_id": "起点id, int",
"dst_id": "终点id, int",

}, "..."]
}

graphsage 无监督 micro f1 结果

我直接使用 graphsage github 上的代码，跑一个 epoch，其他参数默认，得到 ppi 无监督的 microp f1是 0.7左右，数据集就是project官网上的，这个和论文以及你们测的都不符，你们有观察都这个问题吗？
还有就是你们有测试过，reddit 数据集的无监督 micro f1吗？

hdfs://xxx/ppi/ppi_train.id data error!

同一份数据本地验证是正常的，放到HDFS上跑分布式的报错，日志如下：

E0223 15:59:28.125142   831 graph_builder.cc:74] hdfs://xxx/ppi/ppi_train.id data error!
E0223 15:59:28.664247   830 graph_builder.cc:74] hdfs://xxx/ppi/ppi_test.id data error!
19/02/23 15:59:28 WARN hdfs.DFSClient: zero
E0223 15:59:28.777889   832 graph_builder.cc:74] hdfs://xxx/ppi/ppi_val.id data error!
E0223 15:59:28.994524   825 graph_builder.cc:74] hdfs://xxx/ppi/ppi-id_map.json data error!
E0223 15:59:29.763633   829 graph_builder.cc:74] hdfs://xxx/ppi/ppi_meta.json data error!
E0223 15:59:30.792249   823 graph_builder.cc:74] hdfs://xxx/ppi/ppi-class_map.json data error!
E0223 15:59:30.850896   824 compact_node.cc:296] edge group size list error
E0223 15:59:31.422977   824 graph_builder.cc:74] hdfs://xxx/ppi/ppi-feats.npy data error!
E0223 15:59:31.619906   826 graph_builder.cc:74] hdfs://xxx/ppi/ppi-walks.txt data error!
E0223 15:59:32.472931   822 graph_builder.cc:74] hdfs://xxx/ppi/ppi-G.json data error!
I0223 15:59:33.548483   827 graph_builder.cc:80] Load Done: hdfs://xxx/ppi/ppi_data.dat
E0223 16:00:10.703824   828 graph_builder.cc:74] hdfs://xxx/ppi/ppi_data.json data error!

Difference between compact and fast graph engine

Hi, Thanks for your work!

I'm curious about what's the difference between compact and fast graph engine. Could you elaborate more on this? Thanks.

lasgnn和lshne还不能使用吗?

看了下run_loop.py,lasgnn并没有加到里面,而lshne的超参都是固定的值,请问是什么原因呢?

Deepwalk/Node2vec 随机游走图算法的实现

非常感谢贵司提供这个开源项目！

我对于代码实现有两点疑问，还请不吝赐教：

欧拉自带的几个算法中，deepwalk和node2vec的随机游走的图算法的实现是不是直接调用的TensorFlow的随机游走，而没有用到欧拉自己的c++的库？
有没有一个完整的图算法的实现来展示欧拉c++的库的使用的整个流程呢？wiki里面的代码只有简单的遍历之类的，但是感觉欧拉的c++的执行流程和模型都和传统的图计算系统（例如powergraph）不太一样，不是那种定义好算子后就一遍执行到底的，而是需要用户不断调用欧拉的接口来进行交互，不知道我的理解是否正确？

预祝新年快乐！

编译出错

集群环境：Ubuntu16.04，Hadoop2.6，euler
遇到问题：
执行make -j 32 时
CMakeFiles/Makefile2:104: recipe for target 'CMakeFiles/linenoise.dir/all' failed

分布式训练

在用euler做分布式训练，启动bash中--euler_zk_path要设置什么参数？我分别尝试不设置--euler_zk_path和设置--euler_zk_path本地路径（/data/baymax/euler_zk），控制台和log也没有输出，是不是设置的问题？如果不是设置问题，怎么查看分布式训练过程（考虑既没有控制输出，本地log也没输出）？

分布式训练时报错

训练指令为：

python -m tf_euler --data_dir hdfs://zzz:8020/xxx/ppi --max_id 56944 --feature_idx 1 --feature_dim 50 --label_idx 0 --label_dim 121 --model graphsage_supervised --mode train

节点日志如下：

I0218 17:07:00.295877   356 remote_graph.cc:91] Initialize RemoteGraph, connect to server monitor: [127.0.0.1:2181, /tf_euler]
E0218 17:07:01.358556   674 zk_server_monitor.cc:150] ZK error when checking root node: connection loss.
could not find method isEncrypted from class org/apache/hadoop/fs/FileStatus with signature ()Z
hdfsListDirectory(/xxx/ppi): getFileInfoFromStat(0 out of 11) error:
java.lang.NoSuchMethodError: isEncrypted
WARNING: Logging before InitGoogleLogging() is written to STDERR
E0218 17:07:03.510931   670 graph_engine.cc:99] no file valid in dir: /xxx/ppi
I0218 17:07:03.510964   670 graph_service.cc:179] service init finish
E0218 17:07:03.510977   670 graph_service.cc:157] service error

例子中ppi数据+faiss检索部分执行报错

使用https://github.com/alibaba/euler/wiki/%E5%BF%AB%E9%80%9F%E5%BC%80%E5%A7%8B 中提供的embedding输出与在Faiss中进行检索部分的代码，与文档中的ppi示例数据执行报错，执行流程如下：

[lsy@localhost data]$ python -m tf_euler \
>   --data_dir ppi \
>   --max_id 56944 --feature_idx 1 --feature_dim 50 --label_idx 0 --label_dim 121 \
>   --model graphsage_supervised --mode save_embedding
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0227 04:06:31.182329 22828 graph_builder.cc:59] Load Done: ppi/ppi_data.dat
I0227 04:10:39.233194 22828 graph_builder.cc:109] Done: build all sampler
I0227 04:10:39.238440 22828 graph_builder.cc:112] Graph build finish
WARNING:tensorflow:From /usr/lib/python2.7/site-packages/tf_euler/python/layers.py:77: __init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior.
INFO:tensorflow:Graph was finalized.
2019-02-27 04:10:43.968531: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
2019-02-27 04:10:45.815944: W tensorflow/core/framework/allocator.cc:122] Allocation of 10240000 exceeds 10% of system memory.
2019-02-27 04:10:45.932727: W tensorflow/core/framework/allocator.cc:122] Allocation of 1024000 exceeds 10% of system memory.
2019-02-27 04:10:45.950182: W tensorflow/core/framework/allocator.cc:122] Allocation of 2621440 exceeds 10% of system memory.
2019-02-27 04:10:45.952823: W tensorflow/core/framework/allocator.cc:122] Allocation of 1024000 exceeds 10% of system memory.
2019-02-27 04:10:45.957001: W tensorflow/core/framework/allocator.cc:122] Allocation of 2621440 exceeds 10% of system memory.
[lsy@localhost data]$ cat faiss_test.py 
import faiss
import numpy as np

embedding = np.load('ckpt/embedding.npy')
index = faiss.IndexFlatIP(128)
index.add(embedding)
print(index.search(embedding[:5], 4))

[lsy@localhost data]$ python faiss_test.py    
Traceback (most recent call last):
  File "faiss_test.py", line 6, in <module>
    index.add(embedding)
  File "/usr/lib/python2.7/site-packages/faiss-0.1-py2.7.egg/faiss/__init__.py", line 102, in replacement_add
    assert d == self.d
AssertionError

关于数据分片的问题

请问数据分片在hdfs储存中如何体现？在已有hdfs的情况下，分片的意义是什么？

以及tf_euler.initialize_shared_graph是否可以认为是同时启动了一个图引擎（service）和一个worker？

谢谢！

无监督算法如LINE / DeepWalk等产生的embedding如何保存到HDFS？

看了源码，run_save_embedding接口貌似只支持写入到本地model_dir，并不支持HDFS路径？

CentOS 7环境下编译错误

换用阿里云的CentOS 7环境，按照build_wheel.sh一路跑下来，执行到cmake ..时出错，ErrorLog如下：

/usr/include/c++/4.8.2/bits/c++0x_warning.h:32:2: 错误：#error This file requires compiler and library support for the ISO C++ 2011 standard. This support is currently experimental, and must be enabled with the -std=c++11 or -std=gnu++11 compiler options.
 #error This file requires compiler and library support for the \

看样子是需要c11的特性，然而我发现CMakeLists.txt已经指定了，请问是哪里出问题了吗？

# Set C++11 as standard for the whole project
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_POSITION_INDEPENDENT_CODE ON)
add_definitions(-D_GLIBCXX_USE_CXX11_ABI=0 -O3 -fno-omit-frame-pointer)

通过pip安装和编译安装有什么区别？

服务器上通过pip安装一切正常，但是通过编译安装总会在安装zookeeper的那行报错，请问是什么原因？

Unknown CMake command "protobuf_generate_grpc_cpp"

CMake Error at client/testing/CMakeLists.txt:1 (protobuf_generate_grpc_cpp):
  Unknown CMake command "protobuf_generate_grpc_cpp".


CMake Warning (dev) in CMakeLists.txt:
  No cmake_minimum_required command is present.  A line of code such as

    cmake_minimum_required(VERSION 3.11)

  should be added at the top of the file.  The version specified may be lower
  if you wish to support older CMake versions for this project.  For more
  information run "cmake --help-policy CMP0000".

cmake的时候报错了，请问是什么原因？

json数据生成

非常感谢开源这么好的框架，在使用过程中发现如果需要将传统的图数据格式转换为Euler可以接受的JSON数据比较麻烦，请问有增加更多数据接口的计划吗，或者转换工具的开发计划。

关于LasGNN的sampling策略的请教

你好！我看了下LasGNN的介绍（https://github.com/alibaba/euler/wiki/LasGNN ），
对于layer-wise的sample， LasGNN是根据网络中每一层学得的权重W进行加权采样的，但是，sampling这个操作不应该在预处理的时候做的嘛，如何根据训练过程的W进行sample？
不是特别理解，能否详细介绍下？
此外，我在代码中没太找到对应的操作代码，能否指明一下是在哪一部分？谢谢了！

euler如何支持hdfs

看到吗现在会判断hdfs_prefix='hdfs://'
若何使用viewfs:// ?

source = dataset.make_one_shot_iterator().get_next() save_embedding报错

找不到设备？我是分布式训练完之后，save_embedding过程中,出现这个错误，为啥可用设备名字变成下面这个列表
available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0 ].

异常栈如下：
InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Cannot assign a device for operation IteratorToStringHandle: Operation was explicitly assigned to /job:worker/task:0 but available devices are [ /job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:XLA_CPU:0 ]. Make sure the device specification refers to a valid device.
[[node IteratorToStringHandle (defined at /usr/lib/python2.7/site-packages/tf_euler/python/run_loop.py:163) = IteratorToStringHandle_device="/job:worker/task:0"]]

输入数据格式中，neighbor字段的内容貌似都可以从edge字段生成，是否冗余了？

输入数据格式中，neighbor字段的内容貌似可以从edge字段生成，是否冗余了？

请问什么时候可以支持MAC安装

在mac上安装时，报了这个错误：
OSError: dlopen($HOME/anaconda2/lib/python2.7/site-packages/euler_gl-0.1.0-py2.7.egg/euler/python/libeuler_service.so, 6): image not found

Any performance numbers?

Great work!
It would be great if some performance numbers(for Node2Vec and LINE) are provided.

输入数据的格式是否合理？

目前的输入JSON格式是要求每个点需要合并它的全部邻居节点，如果一个节点有非常大量的邻居（例如百万），那么这条记录的生成以及存储都有很大的问题，请问作为开发者你们是怎么看待的？在你们所落地的业务之中，是否也存在这种有非常大量邻居的节点呢？

”由于每个TensorFlow worker中都包含了其他worker可能会访问到的图数据“ 什么出现？

https://github.com/alibaba/euler/wiki/%E9%9B%86%E7%BE%A4%E4%BD%BF%E7%94%A8 提到过

最后，由于每个TensorFlow worker中都包含了其他worker可能会访问到的图数据，worker之间需要同步地退出。可以使用以下的Hook来达到这一目的：

什么情况下会发生？

Problem with model graphsage

运行如下命令时报错：
python2 -m tf_euler --data_dir cnet --max_id 40012 --model graphsage --model_dir graphsage --batch_size 256 --learning_rate 0.01 --num_epochs 10 --dim 256 --mode train --log_steps 100 --aggregator mean

WARNING: Logging before InitGoogleLogging() is written to STDERR
I0123 02:46:48.594172 3304 graph_builder.cc:59] Load Done: cnet/CausalNet.dat
I0123 02:46:50.339694 3304 graph_builder.cc:109] Done: build all sampler
I0123 02:46:50.339740 3304 graph_builder.cc:112] Graph build finish
WARNING:tensorflow:From /home/zyli/.local/lib/python2.7/site-packages/tf_euler/python/layers.py:77: init (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalentbehavior.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from graphsage/model.ckpt-0
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into graphsage/model.ckpt.
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/home/zyli/.local/lib/python2.7/site-packages/tf_euler/main.py", line 28, in
tf.app.run(run_loop.main)
File "/home/zyli/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/home/zyli/.local/lib/python2.7/site-packages/tf_euler/python/run_loop.py", line340, in main
run_local(flags_obj, run_network_embedding)
File "/home/zyli/.local/lib/python2.7/site-packages/tf_euler/python/run_loop.py", line303, in run_local
run(flags_obj, master='', is_chief=True)
File "/home/zyli/.local/lib/python2.7/site-packages/tf_euler/python/run_loop.py", line292, in run_network_embedding
run_train(model, flags_obj, master, is_chief)
File "/home/zyli/.local/lib/python2.7/site-packages/tf_euler/python/run_loop.py", line139, in run_train
sess.run(train_op)
File "/home/zyli/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 671, in run
run_metadata=run_metadata)
File "/home/zyli/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1156, in run
run_metadata=run_metadata)
File "/home/zyli/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1255, in run
raise six.reraise(*original_exc_info)
File "/home/zyli/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1240, in run
return self._sess.run(*args, **kwargs)
File "/home/zyli/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1312, in run
run_metadata=run_metadata)
File "/home/zyli/.local/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 1076, in run
return self._sess.run(*args, **kwargs)
File "/home/zyli/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/home/zyli/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/home/zyli/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/home/zyli/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Reshape cannot infer the missing input size for an empty tensor unless all specified input sizes are non-zero
[[node graphsage_1/sageencoder_1/Reshape_7 (defined at /home/zyli/.local/lib/python2.7/site-packages/tf_euler/python/encoders.py:125) = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](graphsage_1/sageencoder_1/GetDenseFeature_1, graphsage_1/sageencoder_2/Reshape_7/shape)]]

Caused by op u'graphsage_1/sageencoder_1/Reshape_7', defined at:
File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/home/zyli/.local/lib/python2.7/site-packages/tf_euler/main.py", line 28, in
tf.app.run(run_loop.main)
File "/home/zyli/.local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/home/zyli/.local/lib/python2.7/site-packages/tf_euler/python/run_loop.py", line340, in main
run_local(flags_obj, run_network_embedding)
File "/home/zyli/.local/lib/python2.7/site-packages/tf_euler/python/run_loop.py", line303, in run_local
run(flags_obj, master='', is_chief=True)
File "/home/zyli/.local/lib/python2.7/site-packages/tf_euler/python/run_loop.py", line292, in run_network_embedding
run_train(model, flags_obj, master, is_chief)
File "/home/zyli/.local/lib/python2.7/site-packages/tf_euler/python/run_loop.py", line107, in run_train
_, loss, metric_name, metric = model(source)
File "/home/zyli/.local/lib/python2.7/site-packages/tf_euler/python/layers.py", line 62, in call
outputs = self.call(inputs)
File "/home/zyli/.local/lib/python2.7/site-packages/tf_euler/python/models/base.py", line 99, in call
embedding = self.target_encoder(src)
File "/home/zyli/.local/lib/python2.7/site-packages/tf_euler/python/models/graphsage.py", line 58, in target_encoder
return self._target_encoder(inputs)
File "/home/zyli/.local/lib/python2.7/site-packages/tf_euler/python/layers.py", line 62, in call
outputs = self.call(inputs)
File "/home/zyli/.local/lib/python2.7/site-packages/tf_euler/python/encoders.py", line125, in call
h = aggregator((hidden[hop], tf.reshape(hidden[hop + 1], neigh_shape)))
File "/home/zyli/.local/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 6482, in reshape
"Reshape", tensor=tensor, shape=shape, name=name)
File "/home/zyli/.local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/zyli/.local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/zyli/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/home/zyli/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Reshape cannot infer the missing input size for an empty tensor unless all specified input sizes are non-zero
[[node graphsage_1/sageencoder_1/Reshape_7 (defined at /home/zyli/.local/lib/python2.7/site-packages/tf_euler/python/encoders.py:125) = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](graphsage_1/sageencoder_1/GetDenseFeature_1, graphsage_1/sageencoder_2/Reshape_7/shape)]]

应该是encoders.py:125报错，cannot infer the missing input size for an empty tensor unless all specified input sizes are non-zero. 我对tensorflow不熟悉，请问该怎么修复？

关于测试集中正负样本的个数

首先感谢开源这么好的仓库！
因为我的数据中还没有节点的特征，所以就将 float_feature的第0维设成了[0]和[1]这样的一维二分类特征，将float_feature的第1维设置成了[1].也就是

buf["float_feature"][0] = [class_map[node]]
buf["float_feature"][1] = [1]

边只有类别信息没有特征信息。

生成的data.json文件为：

{"node_weight": 1, "uint64_feature": {}, "float_feature": {"0": [1], "1": [1]}, "edge": [{"src_id": 0, "weight": 1, "uint64_feature": {}, "float_feature": {}, "dst_id": 1210723, "edge_type": 0, "binary_feature": {}}, {"src_id": 0, "weight": 1, "uint64_feature": {}, "float_feature": {}, "dst_id": 979074, "edge_type": 0, "binary_feature": {}}

但是在打印结果(tp,fp,fn,tp)的时候发现这四个值的总和并不等于test_id中的个数，而且tp的总数远多于test_id中应该为正样本的个数。想请问：1. 总数不匹配的问题是由于采样引起的吗？2. 已排除class_map文件出错的可能性，但是tp值比label为1的数据多出来很多 3. 对于二分类的问题是否因该将标签赋值为[0],[1]，还是有其他标注方法呢？

内存消耗

请问对于10亿节点，数百亿边的大图来说。用GCN算法需要多少内存

请问是否能提供分布式数据生成的用例？

我现在希望从Edge转化到Block，因为我本身也是从分布式的Edge文件转的（而不是JSON文件），我尝试着用Spark Scala写了个demo，但是输出的文件非常小（只有M级别），请问是哪里出了问题么？同样的伪代码在Python本地是work的。

    val blockRDD = nEdges.map(r => (r.srcId, Set((r.dstId.toLong, r.attr.toFloat)))).reduceByKey(_ ++ _, 1000)
      .map { case (nodeId, neighbors: Set[(Long, Float)]) => {
        val block = new Block
        block.setNode_id(nodeId)
        block.setNode_weight(1.0f)
        block.setNodeType(0)
        val neighbors_ = neighbors.toArray
        val neighbor = new java.util.HashMap[Integer, java.util.HashMap[java.lang.Long, java.lang.Float]]()
        for (i <- 0 to meta.getEdge_type_num - 1) {
          neighbor.put(i, new java.util.HashMap[java.lang.Long, java.lang.Float])
        }
        val edgeList = new java.util.ArrayList[EdgeItem]()
        val parser = new BlockParser(meta)
        for (i <- 0 to neighbors_.length - 1) {
          val dstId = neighbors_(i)._1
          val weight = neighbors_(i)._2
          neighbor.get(0).put(dstId, weight)
          val edge = new EdgeItem
          edge.setSrc_id(nodeId)
          edge.setDst_id(dstId)
          edge.setWeight(weight)
          edgeList.add(edge)
        }
        block.setNeighbor(neighbor)
        block.setEdge(edgeList)
        parser.BlockJsonToBytes(block)
      }
      }