shenweichen / graphembedding Goto Github PK

View Code? Open in Web Editor NEW

3.6K 63.0 989.0 681 KB

Implementation and experiments of graph embedding algorithms.

License: MIT License

Python 100.00%

deepwalk node2vec line sdne struc2vec graph graphembedding

graphembedding's People

Contributors

Stargazers

Watchers

Forkers

almoslmi gavinxu520 chaoyue729 elliottliu sanrenyimu yzchengrui limingmingli321 sean0719 lzjtt2017 tiffen gali472 yannan1212 fivepoints abelard223 guoshichengfirst hunterhawk jolt2017 armavrdsp cquptzr guanlongtianzi qss2012 xiangyuwei freator trochilida littleblueshell kevinguom kangxz tianjiansmile jinwangjoshua keep-in-silence zhhhzhang eric6356 fftd chancelife sain qianrenjian cdj0311 hekaistorm zhouyy92 heyuanhao zhengzhixian jianwenl renlang97 shaoliangliang1996 goldenxingxing ty2009137128 hijuly jacksonjack001 ryfan-rs 1oscar liudan1111 huanglinshan brown-q wurentidai kongdzh yoohu carrychang sunnydx wjqy1510 shunyuanxue fudp liudan8 dc-ying lotapp prozhuchen alucardmini csw-toby yinkexin seyoulala markusrosen patriciaxiao luoxiaojian acezen hlchen123 zdx youhebuke lwhzju jiaminggu pengmli angel1288 guyulongcs wolfhu soulinmau rudyfan zxxxxxxxin ltkevin dreamerdw goodtom11 shieldontheroad tengyuan01ye heucoder dangchienhsgs nemesisamos wqw123 mathlf2015 kt8506 jjh3024 kaijuanyuan gaojunruo baibaisong

graphembedding's Issues

SDNE

WARNING:tensorflow:From /home/ant/.conda/envs/lxh_tf/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling Base
ResourceVariable.init (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Traceback (most recent call last):
File "sdne_wiki.py", line 54, in
model = SDNE(G, hidden_size=[256, 128],)
File "/home/ant/researchInstitute/luoxianhao/ge/shenweichen/graphEmbedding/ge/models/sdne.py", line 93, in init
self.reset_model()
File "/home/ant/researchInstitute/luoxianhao/ge/shenweichen/graphEmbedding/ge/models/sdne.py", line 101, in reset_model
self.model.compile(opt, [l_2nd(self.beta), l_1st(self.alpha)])
File "/home/ant/.conda/envs/lxh_tf/lib/python3.7/site-packages/tensorflow_core/python/training/tracking/base.py", line 457, in _method_wrapper
result = method(self, *args, **kwargs)
File "/home/ant/.conda/envs/lxh_tf/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 373, in compile
self._compile_weights_loss_and_weighted_metrics()
File "/home/ant/.conda/envs/lxh_tf/lib/python3.7/site-packages/tensorflow_core/python/training/tracking/base.py", line 457, in _method_wrapper
result = method(self, *args, **kwargs)
File "/home/ant/.conda/envs/lxh_tf/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 1652, in compile_weights_loss
and_weighted_metrics self.total_loss = self._prepare_total_loss(masks)
File "/home/ant/.conda/envs/lxh_tf/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 1712, in _prepare_total_loss
per_sample_losses = loss_fn.call(y_true, y_pred)
File "/home/ant/.conda/envs/lxh_tf/lib/python3.7/site-packages/tensorflow_core/python/keras/losses.py", line 216, in call
return self.fn(y_true, y_pred, **self.fn_kwargs)
File "/home/ant/researchInstitute/luoxianhao/ge/shenweichen/graphEmbedding/ge/models/sdne.py", line 37, in loss_2nd
b[y_true != 0] = beta
TypeError: 'Tensor' object does not support item assignment

编译环境：tf1.15

如何修改下面这行代码
b_[y_true != 0] = beta

看了你的多标签分类，感觉收获很多。请问有链路预测部分的代码么？

节点多的话跑不通，吃内存

48万个节点，32G内存跑不同，在_create_A_L上报内存不够。
_create_A_L种构造矩阵的方式是否可优化，稀疏矩阵这样存储太浪费。

About the wiki data

这几个例子是都没有设置weight嘛

关于top_k_list

请教：
classify.py 中定义top_k_list = [len(l) for l in Y]
top_k_list的元素就是测试集Y中每个对应元素的长度？
top_k_list是什么作用呢？

执行deepwalk_wiki.py文件时的报错init() got an unexpected keyword argument 'size'

请问是怎么解决的呢？

如何使用GPU加速

你好，在运行node2vec时候节点多就跑的很慢，请问怎么才能使用GPU加速？？感谢

如何可以使用keras的gpu版本加速跑

我看现在是使用cpu跑的，速度非常的慢，如何使用GPU加速呢

LINE 生成概率分布的时候总概率不是1

GraphEmbedding/ge/models/line.py

Lines 134 to 135 in c186681

 norm_prob = [self.graph[edge[0]][edge[1]].get('weight', 1.0) * 

 numEdges / total_sum for edge in self.graph.edges()]

这个代码为什么要乘以numEdge 这样相加起来总概率就不是1了

关于deepwalk的随机采样问题

    def deepwalk_walk(self, walk_length, start_node):

        walk = [start_node]

        while len(walk) < walk_length:
            cur = walk[-1]
            cur_nbrs = list(self.G.neighbors(cur))
            if len(cur_nbrs) > 0:
                walk.append(random.choice(cur_nbrs))
            else:
                break
        return walk

个人觉得上面的函数是否不太妥当，完全没用到，各个结点间的转移概率。各个节点间的转移概率其实是可以统计得到的，是否用上会更好？

关于fastdtw的问题

Traceback (most recent call last):

File "/Users/yangfengyu/Desktop/GraphEmbedding-master/examples/line_wiki.py", line 4, in
from ge.classify import read_node_label, Classifier
File "/Users/yangfengyu/Desktop/GraphEmbedding-master/ge/init.py", line 1, in
from .models import *
File "/Users/yangfengyu/Desktop/GraphEmbedding-master/ge/models/init.py", line 5, in
from .struc2vec import Struc2Vec
File "/Users/yangfengyu/Desktop/GraphEmbedding-master/ge/models/struc2vec.py", line 28, in
from fastdtw import fastdtw
ModuleNotFoundError: No module named 'fastdtw'

我在本地进行了实验，但是一直报错如上。
请问fastdtw模块是项目内的还是第三方的库，我在项目里全局搜索没有找到。
期待回复！

SDNE Examples unable to run

I've tried tensorflow versions 1.15 and >2 and get this error. Were there breaking changes to this repo? If you or anyone else don't get these errors could you share you environment configuration?

(py3SDNE3) mac0632:examples patrick.mullen$ python sdne_wiki.py
WARNING:tensorflow:From /Users/patrick.mullen/aws/py3SDNE3/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Traceback (most recent call last):
  File "sdne_wiki.py", line 49, in <module>
    model = SDNE(G, hidden_size=[256, 128],)
  File "/Users/patrick.mullen/aws/py3SDNE3/lib/python3.7/site-packages/ge-0.0.0-py3.7.egg/ge/models/sdne.py", line 93, in __init__
  File "/Users/patrick.mullen/aws/py3SDNE3/lib/python3.7/site-packages/ge-0.0.0-py3.7.egg/ge/models/sdne.py", line 101, in reset_model
  File "/Users/patrick.mullen/aws/py3SDNE3/lib/python3.7/site-packages/tensorflow_core/python/training/tracking/base.py", line 457, in _method_wrapper
    result = method(self, *args, **kwargs)
  File "/Users/patrick.mullen/aws/py3SDNE3/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 373, in compile
    self._compile_weights_loss_and_weighted_metrics()
  File "/Users/patrick.mullen/aws/py3SDNE3/lib/python3.7/site-packages/tensorflow_core/python/training/tracking/base.py", line 457, in _method_wrapper
    result = method(self, *args, **kwargs)
  File "/Users/patrick.mullen/aws/py3SDNE3/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 1652, in _compile_weights_loss_and_weighted_metrics
    self.total_loss = self._prepare_total_loss(masks)
  File "/Users/patrick.mullen/aws/py3SDNE3/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 1712, in _prepare_total_loss
    per_sample_losses = loss_fn.call(y_true, y_pred)
  File "/Users/patrick.mullen/aws/py3SDNE3/lib/python3.7/site-packages/tensorflow_core/python/keras/losses.py", line 216, in call
    return self.fn(y_true, y_pred, **self._fn_kwargs)
  File "/Users/patrick.mullen/aws/py3SDNE3/lib/python3.7/site-packages/ge-0.0.0-py3.7.egg/ge/models/sdne.py", line 36, in loss_2nd
  File "<__array_function__ internals>", line 6, in ones_like
  File "/Users/patrick.mullen/aws/py3SDNE3/lib/python3.7/site-packages/numpy-1.18.2-py3.7-macosx-10.14-x86_64.egg/numpy/core/numeric.py", line 278, in ones_like
    res = empty_like(a, dtype=dtype, order=order, subok=subok, shape=shape)
  File "<__array_function__ internals>", line 6, in empty_like
  File "/Users/patrick.mullen/aws/py3SDNE3/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 736, in __array__
    " array.".format(self.name))
NotImplementedError: Cannot convert a symbolic Tensor (2nd_target:0) to a numpy array.

nodevec

node2vec keyerror(' ',' ')???

SyntaxError: invalid syntax in File "setup.py" line 16

(venv) ➜  GraphEmbedding git:(master) python setup.py install
  File "setup.py", line 16
    `tensorflow`
    ^
SyntaxError: invalid syntax

LINE采样问题

line.py中111行到137行，建立结点同名表的时候，norm_prob的总和是1，在create_alias_table函数里将norm_prob转换为均值为1。请问为什么在创建边同名表的时候，norm_prob的均值是1？

    def _gen_sampling_table(self):

        # create sampling table for vertex
        power = 0.75
        numNodes = self.node_size
        node_degree = np.zeros(numNodes)  # out degree
        node2idx = self.node2idx

        for edge in self.graph.edges():
            node_degree[node2idx[edge[0]]
                        ] += self.graph[edge[0]][edge[1]].get('weight', 1.0)

        total_sum = sum([math.pow(node_degree[i], power)
                         for i in range(numNodes)])
        norm_prob = [float(math.pow(node_degree[j], power)) /
                     total_sum for j in range(numNodes)]

        self.node_accept, self.node_alias = create_alias_table(norm_prob)

        # create sampling table for edge
        numEdges = self.graph.number_of_edges()
        total_sum = sum([self.graph[edge[0]][edge[1]].get('weight', 1.0)
                         for edge in self.graph.edges()])
        norm_prob = [self.graph[edge[0]][edge[1]].get('weight', 1.0) *
                     numEdges / total_sum for edge in self.graph.edges()]

        self.edge_accept, self.edge_alias = create_alias_table(norm_prob)

是不是不支持window系统？安装报错了呢

不支持gbk

使用SDNE的时候报内存不足是为什么

就是使用SDNE进行跑的时候，发现内存不足，我的内存有128G，这是问什么，我也已经将网络调小了，

请问还有什么办法，以及这个所以需要的内存是怎么算的，谢谢

Adding Algorithms into https://github.com/benedekrozemberczki/karateclub

There are a lot of algorithms in KarateClub that might be good to take a look at. There are all common algorithms for Community Graph Embedding. Inversely, the algorithms in this repo are made such that they can be included in the other repo.

report the results on all datasets

Results of node2vec, deewalk, line, sdne and struc2vec on all datasets. Hope this will help anyone who is interested in this project.

wiki

Alg	micro	macro	samples	weighted	acc	NMI
node2vec	0.7447	0.6771	0.7193	0.7450	0.6279	0.3536
deepwalk	0.7307	0.6579	0.7058	0.7296	0.6091	0.3416
line	0.5059	0.2461	0.4536	0.4523	0.3160	0.0798
sdne	0.6916	0.5119	0.6528	0.6718	0.5530	0.1801
struc2vec	0.4512	0.1249	0.3933	0.3383	0.2308	0.0516

brazil

Alg	micro	macro	samples	weighted	acc	NMI
node2vec	0.1481	0.1579	0.1481	0.1648	0.1481	0.0442
deepwalk	0.1852	0.1694	0.1852	0.2004	0.1852	0.0471
line	0.4444	0.4167	0.4444	0.4753	0.4444	0.2822
sdne	0.5926	0.5814	0.5926	0.5928	0.5926	0.4041
struc2vec	0.7778	0.7739	0.7778	0.7762	0.7778	0.3906

europe

Alg	micro	macro	samples	weighted	acc	NMI
node2vec	0.4125	0.4156	0.4125	0.4209	0.4125	0.0155
deepwalk	0.4375	0.4358	0.4375	0.4347	0.4375	0.0180
line	0.5000	0.4983	0.5000	0.5016	0.5000	0.1186
sdne	0.5000	0.4818	0.5000	0.4916	0.5000	0.1714
struc2vec	0.5375	0.5247	0.5375	0.5294	0.5375	0.0783

usa

Alg	micro	macro	samples	weighted	acc	NMI
node2vec	0.5420	0.5278	0.5420	0.5351	0.5420	0.0822
deepwalk	0.5504	0.5394	0.5504	0.5472	0.5504	0.0910
line	0.4160	0.4032	0.4160	0.4175	0.4160	0.1660
sdne	0.6092	0.5819	0.6092	0.5971	0.6092	0.2028
struc2vec	0.5210	0.5040	0.5210	0.5211	0.5210	0.0702

请问能否提供数据来源，想了解数据的背景和含义？

joblib backend问题

Preprocess transition probs...
[Parallel(n_jobs=30)]: Using backend MultiprocessingBackend with 30 concurrent workers.
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 567, in call
return self.func(*args, **kwargs)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 225, in call
for func, args, kwargs in self.items]
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 225, in
for func, args, kwargs in self.items]
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/ge/walker.py", line 88, in _simulate_walks
walk_length=walk_length, start_node=v))
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/ge/walker.py", line 56, in node2vec_walk
next_node = cur_nbrs[alias_sample(alias_edges[edge][0],
KeyError: (9836, 7324)
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "blog_node.py", line 69, in
model = Node2Vec(G, 10, 80, workers=30,p=0.25,q=2 )
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/ge/models/node2vec.py", line 39, in init
num_walks=num_walks, walk_length=walk_length, workers=workers, verbose=1)
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/ge/walker.py", line 72, in simulate_walks
partition_num(num_walks, workers))
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 934, in call
self.retrieve()
File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/joblib/parallel.py", line 833, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/home/ubuntu/anaconda3/lib/python3.7/multiprocessing/pool.py", line 657, in get

我是下载之后把joblib的backend改成multiprocessing，但其实不管是默认的还是其他的，都会报错说keyerror是random一个数据点。我用的是http://socialcomputing.asu.edu/datasets/BlogCatalog3 这个数据集，不管是stru2vec还是node2vec都会遇到这样问题。请教

classify

大佬您好，请问那个TopKRanker类是干什么的

cannot find random_walks.pkl

in deepwalk.py

`
def train(self, embed_size=128, window_size=5, workers=3, iter=5, **kwargs):

    sentences = pd.read_pickle('random_walks.pkl')
    kwargs["sentences"] = sentences
    kwargs["min_count"] = kwargs.get("min_count", 0)
    kwargs["size"] = embed_size
    kwargs["sg"] = 1  # skip gram
    kwargs["hs"] = 1  # deepwalk use Hierarchical Softmax
    kwargs["workers"] = workers
    kwargs["window"] = window_size
    kwargs["iter"] = iter

`
cannot find random_walks.pkl

could you provide it ?
Thanks!

跑了一下karate网络，向量跟节点对应不上

好像是读取连边数据时nodetype是str，但是改成int的话就报错了，怎么让嵌入向量顺序与节点对上？

ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory

请问不用cuda9.0行吗用cuda10的话需要修改哪里的代码

ZeroDivisionError

在将node2vec代码应用在新的数据集上，存在edge weights为0的情况, 出现ZeroDivisionError. 调整walker.py的line 185,186代码, 添加try except statement, 用append item to empty list的形式替代原来的list comprehension，仍然报错，不知道是什么原因。谢谢

unsupported operand type(s) for +: 'int' and 'str'

I got this issue using struc2vec and node2vec methods
`---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in
1 model_struc2vec = ge.Struc2Vec(G, 10, 80, workers=4, verbose=40, ) #init model
----> 2 model_struc2vec.train(window_size = 5, iter = 3)# train model
3 embeddings_struc2vec = model_struc3vec.get_embeddings()# get embedding vectors

/anaconda3/envs/python36/lib/python3.6/site-packages/ge-0.0.0-py3.6.egg/ge/models/struc2vec.py in train(self, embed_size, window_size, workers, iter)
114 print("Learning representation...")
115 model = Word2Vec(sentences, size=embed_size, window=window_size, min_count=0, hs=1, sg=1, workers=workers,
--> 116 iter=iter)
117 print("Learning representation done!")
118 self.w2v_model = model

/anaconda3/envs/python36/lib/python3.6/site-packages/gensim-3.6.0-py3.6-macosx-10.7-x86_64.egg/gensim/models/word2vec.py in init(self, sentences, corpus_file, size, alpha, window, min_count, max_vocab_size, sample, seed, workers, min_alpha, sg, hs, negative, ns_exponent, cbow_mean, hashfxn, iter, null_word, trim_rule, sorted_vocab, batch_words, compute_loss, callbacks, max_final_vocab)
765 callbacks=callbacks, batch_words=batch_words, trim_rule=trim_rule, sg=sg, alpha=alpha, window=window,
766 seed=seed, hs=hs, negative=negative, cbow_mean=cbow_mean, min_alpha=min_alpha, compute_loss=compute_loss,
--> 767 fast_version=FAST_VERSION)
768
769 def _do_train_epoch(self, corpus_file, thread_id, offset, cython_vocab, thread_private_mem, cur_epoch,

/anaconda3/envs/python36/lib/python3.6/site-packages/gensim-3.6.0-py3.6-macosx-10.7-x86_64.egg/gensim/models/base_any2vec.py in init(self, sentences, corpus_file, workers, vector_size, epochs, callbacks, batch_words, trim_rule, sg, alpha, window, seed, hs, negative, ns_exponent, cbow_mean, min_alpha, compute_loss, fast_version, **kwargs)
757 raise TypeError("You can't pass a generator as the sentences argument. Try an iterator.")
758
--> 759 self.build_vocab(sentences=sentences, corpus_file=corpus_file, trim_rule=trim_rule)
760 self.train(
761 sentences=sentences, corpus_file=corpus_file, total_examples=self.corpus_count,

/anaconda3/envs/python36/lib/python3.6/site-packages/gensim-3.6.0-py3.6-macosx-10.7-x86_64.egg/gensim/models/base_any2vec.py in build_vocab(self, sentences, corpus_file, update, progress_per, keep_raw_vocab, trim_rule, **kwargs)
941 trim_rule=trim_rule, **kwargs)
942 report_values['memory'] = self.estimate_memory(vocab_size=report_values['num_retained_words'])
--> 943 self.trainables.prepare_weights(self.hs, self.negative, self.wv, update=update, vocabulary=self.vocabulary)
944
945 def build_vocab_from_freq(self, word_freq, keep_raw_vocab=False, corpus_count=None, trim_rule=None, update=False):

/anaconda3/envs/python36/lib/python3.6/site-packages/gensim-3.6.0-py3.6-macosx-10.7-x86_64.egg/gensim/models/word2vec.py in prepare_weights(self, hs, negative, wv, update, vocabulary)
1820 # set initial input/projection and hidden weights
1821 if not update:
-> 1822 self.reset_weights(hs, negative, wv)
1823 else:
1824 self.update_weights(hs, negative, wv)

/anaconda3/envs/python36/lib/python3.6/site-packages/gensim-3.6.0-py3.6-macosx-10.7-x86_64.egg/gensim/models/word2vec.py in reset_weights(self, hs, negative, wv)
1837 for i in xrange(len(wv.vocab)):
1838 # construct deterministic seed from word AND seed argument
-> 1839 wv.vectors[i] = self.seeded_vector(wv.index2word[i] + str(self.seed), wv.vector_size)
1840 if hs:
1841 self.syn1 = zeros((len(wv.vocab), self.layer1_size), dtype=REAL)

TypeError: unsupported operand type(s) for +: 'int' and 'str'
`

struc2vec中_get_order_degreelist_node 这一步非常慢

我有大概80多万关系十几万节点，在循环执行_get_order_degreelist_node这个方法的时候，计算非常慢，该怎么提升速度。

Observing Fast Implementation of Node Embeddings

There are a few popular nide embedding repos that might be good to talk about. https://github.com/VHRanger/nodevectors

Confusing code

Hi, I am trying to re-implement SDNE code but I got stuck when reading code. Could you explain for me these below lines:
-

GraphEmbedding/ge/models/sdne.py

Line 162 in 7de7a09

edge_weight = graph[v1][v2].get('weight', 1)

: what is meaning of get('weight',1). Why does it has 1 value here?

GraphEmbedding/ge/models/sdne.py

Line 169 in 7de7a09

A_ = sp.csr_matrix((A_data + A_data, (A_row_index + A_col_index, A_col_index + A_row_index)),

: what is A_ and its purpose?
GraphEmbedding/ge/models/sdne.py

Line 172 in 7de7a09

D = sp.diags(A_.sum(axis=1).flatten().tolist()[0])

: how does this line create a diagonal matrix? I suppose this is a formula.
GraphEmbedding/ge/models/sdne.py

Line 125 in 7de7a09

L_mat_train = self.L[index][:, index].todense()

: this is code to get L[i:j, :] for a mini batch to train. But what is meaning of L[index][:, index].todense() with index = [i:j]
GraphEmbedding/ge/models/sdne.py

Line 46 in 7de7a09

def loss_1st(y_true, y_pred):

: where place do you pass parameterss for Loss function? How to know L = y_true and Y = y_pred?

Thank your helping!

代码逻辑的疑问

if not self.use_rejection_sampling:
alias_edges = {}
for edge in G.edges():
alias_edges[edge] = self.get_alias_edge(edge[0], edge[1])
self.alias_edges = alias_edges
这是walk.py里面根据前一个节点t和当前节点v动态计算概率的代码，但为什么要放在if not self.use_rejection_sampling:这个条件下，不用负采样也应该有这个计算，然后更新alias_edges吧

似乎alias采样的使用是有问题的

在node2vec中，你们在对邻居采样的时候，对权重(概率)做了归一化
`

    alias_nodes = {}

    for node in G.nodes():

        unnormalized_probs = [G[node][nbr].get('weight', 1.0)

                              for nbr in G.neighbors(node)]

        norm_const = sum(unnormalized_probs)

        normalized_probs = [

            float(u_prob)/norm_const for u_prob in unnormalized_probs]

        alias_nodes[node] = create_alias_table(normalized_probs)`

那么所有的概率都将小于1，使用alias采样时所有的概率都会分到small里，那么就和不使用是没区别的。这样的话似乎就是等概率采样了？

ReFeX (recursive features)

Reference: https://github.com/dkaslovsky/GraphRole

关于Struc2vec构建相似度图代码的一些问题

您好，我在使用struc2vec构建结构相似度的代码时发现有一些问题。

具体来说，当opt2_reduce_sim_calc开启的时候，get_vertices函数拿到的是对于与每个节点自己相似的邻居，这里的这个相似性是单向的。也就是假如a与b相似，则a的邻居中有b，若b与a也相似，则b的邻居中也有a（类似于有向图），然而在后面_get_layer_rep方法中，是把这种相似度当作无向情况来处理的，也就是只考虑了opt2_reduce_sim_calc为False的情况。
此时，当opt2为True时，由于a和b的相似邻居中都有对方，而构建边的时候，会为每个点保存“入”和“出”的两条边，这样就会导致重复的边。换句话说，我认为_get_layer_rep在opt2_reduce_sim_calc选项为True的时候，行为是有错误的。

期待您的回复
最后感谢您开源这部分代码，极大的方便了我的工作，节约了时间，谢谢

关于tensorflow的to_float问题

您好，我在尝试运行您的sdne_wiki.py代码时，提示了如下错误信息：
AttributeError: module 'tensorflow' has no attribute 'to_float'
我的python版本为3.6.7，tf.version = "2.0.0-alpha0"，我猜测可能是我的版本太新问题，请问我可以如何跑起来您的代码呢？谢谢！

about tensorflow

您好，为什么您的项目要求TensorFlow

deepwalk效果很差（deepwalk effect is poor）

我使用这个作者提供的网络嵌入代码做实验，花了将近一个月的时间，都没有出效果，都快崩溃了。检查了无数次自己的算法是否有问题，最后才发现是这个作者提供的deepwalk代码有问题，大家如果是要用deepwalk，请使用https://github.com/phanein/deepwalk

I experimented with the embedding code provided by this author, and it took me nearly a month, and it didn't work, and I almost crashed. When I checked my algorithm for a number of times, I finally found that there was a problem with the deepwalk code provided by this author. If you want to use deepwalk, please use https://github.com/phanein/deepwalk.

Documentation: Node Embedding in relations to Role Discovery

Some do not understand how Node Embedding may be useful to the practical application e.g. Role Discovery in Social Networks. For that, I hope that there is documentation about the issue. Some infographics:

我用BlogCatalog数据集复现论文中的实验，差距太大。

我参照deepwalk论文的参数进行evaluate_embeddings，
--number-walks 80 --representation-size 128 --walk-length 40 --window-size 10
最后得到的结果与原论文相差甚远，请问是什么问题呢？

ValueError: Input contains NaN

Hello!

Thank you for providing this wonderful tool for study. I changed the second option into all for LINE (line 48 of GraphEmbedding/examples/line_wiki.py), and encountered the following error:

....
97/97 - 1s - loss: nan - first_order_loss: nan - second_order_loss: 0.0503
Epoch 48/50
97/97 - 1s - loss: nan - first_order_loss: nan - second_order_loss: 0.0480
Epoch 49/50
97/97 - 1s - loss: nan - first_order_loss: nan - second_order_loss: 0.0485
Epoch 50/50
97/97 - 1s - loss: nan - first_order_loss: nan - second_order_loss: 0.0472
Training classifier using 80.00% nodes...
Traceback (most recent call last):
  File "line_wiki.py", line 52, in <module>
    evaluate_embeddings(embeddings)
  File "line_wiki.py", line 19, in evaluate_embeddings
    clf.split_train_evaluate(X, Y, tr_frac)
  File "/export/d1/shuaiw/GraphEmbedding/env/lib/python3.6/site-packages/ge-0.0.0-py3.6.egg/ge/classify.py", line 66, in split_train_evaluate
  File "/export/d1/shuaiw/GraphEmbedding/env/lib/python3.6/site-packages/ge-0.0.0-py3.6.egg/ge/classify.py", line 34, in train
  File "/export/d1/shuaiw/GraphEmbedding/env/lib/python3.6/site-packages/scikit_learn-0.22.2.post1-py3.6-linux-x86_64.egg/sklearn/multiclass.py", line 239, in fit
    for i, column in enumerate(columns))
  File "/export/d1/shuaiw/GraphEmbedding/env/lib/python3.6/site-packages/joblib-0.13.0-py3.6.egg/joblib/parallel.py", line 917, in __call__
    if self.dispatch_one_batch(iterator):
  File "/export/d1/shuaiw/GraphEmbedding/env/lib/python3.6/site-packages/joblib-0.13.0-py3.6.egg/joblib/parallel.py", line 759, in dispatch_one_batch
    self._dispatch(tasks)
  File "/export/d1/shuaiw/GraphEmbedding/env/lib/python3.6/site-packages/joblib-0.13.0-py3.6.egg/joblib/parallel.py", line 716, in _dispatch
    job = self._backend.apply_async(batch, callback=cb)
  File "/export/d1/shuaiw/GraphEmbedding/env/lib/python3.6/site-packages/joblib-0.13.0-py3.6.egg/joblib/_parallel_backends.py", line 182, in apply_async
    result = ImmediateResult(func)
  File "/export/d1/shuaiw/GraphEmbedding/env/lib/python3.6/site-packages/joblib-0.13.0-py3.6.egg/joblib/_parallel_backends.py", line 549, in __init__
    self.results = batch()
  File "/export/d1/shuaiw/GraphEmbedding/env/lib/python3.6/site-packages/joblib-0.13.0-py3.6.egg/joblib/parallel.py", line 225, in __call__
    for func, args, kwargs in self.items]
  File "/export/d1/shuaiw/GraphEmbedding/env/lib/python3.6/site-packages/joblib-0.13.0-py3.6.egg/joblib/parallel.py", line 225, in <listcomp>
    for func, args, kwargs in self.items]
  File "/export/d1/shuaiw/GraphEmbedding/env/lib/python3.6/site-packages/scikit_learn-0.22.2.post1-py3.6-linux-x86_64.egg/sklearn/multiclass.py", line 79, in _fit_binary
    estimator.fit(X, y)
  File "/export/d1/shuaiw/GraphEmbedding/env/lib/python3.6/site-packages/scikit_learn-0.22.2.post1-py3.6-linux-x86_64.egg/sklearn/linear_model/_logistic.py", line 1527, in fit
    accept_large_sparse=solver != 'liblinear')
  File "/export/d1/shuaiw/GraphEmbedding/env/lib/python3.6/site-packages/scikit_learn-0.22.2.post1-py3.6-linux-x86_64.egg/sklearn/utils/validation.py", line 755, in check_X_y
    estimator=estimator)
  File "/export/d1/shuaiw/GraphEmbedding/env/lib/python3.6/site-packages/scikit_learn-0.22.2.post1-py3.6-linux-x86_64.egg/sklearn/utils/validation.py", line 578, in check_array
    allow_nan=force_all_finite == 'allow-nan')
  File "/export/d1/shuaiw/GraphEmbedding/env/lib/python3.6/site-packages/scikit_learn-0.22.2.post1-py3.6-linux-x86_64.egg/sklearn/utils/validation.py", line 60, in _assert_all_finite
    msg_dtype if msg_dtype is not None else X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

Could anyone take a look and see if that can be fixed? Thank you very much!

LINE的negative samples

如果代码没有理解错的话，应该这部分是负采样。
https://github.com/shenweichen/GraphEmbedding/blob/master/ge/models/line.py#L170-L173
但这里的负采样没有剔除相邻节点和自身节点

ModuleNotFoundError: No module named 'tensorflow'

Problem with installation

Hello, I have a problem at python3.7

error: python-dateutil 2.8.1 is installed but python-dateutil<2.8.1,>=2.1 is required by {'botocore'}

Full log

eurvanov@eurvanov-HP-ProBook-430-G5:~/python/adeo/market-radar/synonyms-service/research/GraphEmbedding$ python setup.py install
running install
running bdist_egg
running egg_info
writing ge.egg-info/PKG-INFO
writing dependency_links to ge.egg-info/dependency_links.txt
writing requirements to ge.egg-info/requires.txt
writing top-level names to ge.egg-info/top_level.txt
reading manifest file 'ge.egg-info/SOURCES.txt'
writing manifest file 'ge.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/ge
copying build/lib/ge/utils.py -> build/bdist.linux-x86_64/egg/ge
copying build/lib/ge/classify.py -> build/bdist.linux-x86_64/egg/ge
copying build/lib/ge/alias.py -> build/bdist.linux-x86_64/egg/ge
copying build/lib/ge/init.py -> build/bdist.linux-x86_64/egg/ge
creating build/bdist.linux-x86_64/egg/ge/models
copying build/lib/ge/models/node2vec.py -> build/bdist.linux-x86_64/egg/ge/models
copying build/lib/ge/models/deepwalk.py -> build/bdist.linux-x86_64/egg/ge/models
copying build/lib/ge/models/struc2vec.py -> build/bdist.linux-x86_64/egg/ge/models
copying build/lib/ge/models/init.py -> build/bdist.linux-x86_64/egg/ge/models
copying build/lib/ge/models/line.py -> build/bdist.linux-x86_64/egg/ge/models
copying build/lib/ge/models/sdne.py -> build/bdist.linux-x86_64/egg/ge/models
copying build/lib/ge/walker.py -> build/bdist.linux-x86_64/egg/ge
byte-compiling build/bdist.linux-x86_64/egg/ge/utils.py to utils.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/ge/classify.py to classify.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/ge/alias.py to alias.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/ge/init.py to init.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/ge/models/node2vec.py to node2vec.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/ge/models/deepwalk.py to deepwalk.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/ge/models/struc2vec.py to struc2vec.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/ge/models/init.py to init.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/ge/models/line.py to line.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/ge/models/sdne.py to sdne.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/ge/walker.py to walker.cpython-37.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying ge.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying ge.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying ge.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying ge.egg-info/requires.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying ge.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
zip_safe flag not set; analyzing archive contents...
creating 'dist/ge-0.0.0-py3.7.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing ge-0.0.0-py3.7.egg
Removing /home/eurvanov/anaconda3/envs/mr-research/lib/python3.7/site-packages/ge-0.0.0-py3.7.egg
Copying ge-0.0.0-py3.7.egg to /home/eurvanov/anaconda3/envs/mr-research/lib/python3.7/site-packages
ge 0.0.0 is already the active version in easy-install.pth

Installed /home/eurvanov/anaconda3/envs/mr-research/lib/python3.7/site-packages/ge-0.0.0-py3.7.egg
Processing dependencies for ge==0.0.0
Searching for python-dateutil>=2.1
Reading https://pypi.org/simple/python-dateutil/
Downloading https://files.pythonhosted.org/packages/d4/70/d60450c3dd48ef87586924207ae8907090de0b306af2bce5d134d78615cb/python_dateutil-2.8.1-py2.py3-none-any.whl#sha256=75bb3f31ea686f1197762692a9ee6a7550b59fc6ca3a1f4b5d7e32fb98e2da2a
Best match: python-dateutil 2.8.1
Processing python_dateutil-2.8.1-py2.py3-none-any.whl
Installing python_dateutil-2.8.1-py2.py3-none-any.whl to /home/eurvanov/anaconda3/envs/mr-research/lib/python3.7/site-packages
writing requirements to /home/eurvanov/anaconda3/envs/mr-research/lib/python3.7/site-packages/python_dateutil-2.8.1-py3.7.egg/EGG-INFO/requires.txt
Adding python-dateutil 2.8.1 to easy-install.pth file

Installed /home/eurvanov/anaconda3/envs/mr-research/lib/python3.7/site-packages/python_dateutil-2.8.1-py3.7.egg
Searching for botocore<1.14.0,>=1.13.26
Reading https://pypi.org/simple/botocore/
Downloading https://files.pythonhosted.org/packages/8a/93/ea2ec042794dfda186348df02c6057223a8bbc21c055124fbe3e16925441/botocore-1.13.26-py2.py3-none-any.whl#sha256=9fefb42c6d4fa0079a52b49e5491fa0738cca63649f68be180b3ed6c253d2622
Best match: botocore 1.13.26
Processing botocore-1.13.26-py2.py3-none-any.whl
Installing botocore-1.13.26-py2.py3-none-any.whl to /home/eurvanov/anaconda3/envs/mr-research/lib/python3.7/site-packages
writing requirements to /home/eurvanov/anaconda3/envs/mr-research/lib/python3.7/site-packages/botocore-1.13.26-py3.7.egg/EGG-INFO/requires.txt
Adding botocore 1.13.26 to easy-install.pth file

Installed /home/eurvanov/anaconda3/envs/mr-research/lib/python3.7/site-packages/botocore-1.13.26-py3.7.egg
error: python-dateutil 2.8.1 is installed but python-dateutil<2.8.1,>=2.1 is required by {'botocore'}

init() got an unexpected keyword argument 'size'

I could fix it. The problem is because of the 'gensim' versioning

ImportError

ImportError: No module named 'tensorflow.python.keras'

有权图的数据集/例子有吗

表示学习transx与deepwalk有什么区别呢，都embedding

No module named 'joblib'

请问这是什么原因？
Traceback (most recent call last):
File "deepwalk_wiki.py", line 4, in
from ge.classify import read_node_label, Classifier
File "//anaconda/lib/python3.5/site-packages/ge-0.0.0-py3.5.egg/ge/init.py", line 1, in
File "//anaconda/lib/python3.5/site-packages/ge-0.0.0-py3.5.egg/ge/models/init.py", line 1, in
File "//anaconda/lib/python3.5/site-packages/ge-0.0.0-py3.5.egg/ge/models/deepwalk.py", line 20, in
File "//anaconda/lib/python3.5/site-packages/ge-0.0.0-py3.5.egg/ge/walker.py", line 7, in
ImportError: No module named 'joblib'

About the data set

这几个例子是都没有weight么？

SDNE

_create_A_L函数

这步A_data+A_data长度为node_size的2倍, 而shape=(node_size, node_size) 本地测试的时候报错了。

    A_ = sp.csr_matrix((A_data + A_data, (A_row_index + A_col_index, A_col_index + A_row_index)),
                       shape=(node_size, node_size))

    D = sp.diags(A_.sum(axis=1).flatten().tolist()[0])
    L = D - A_
    return A, L`

	norm_prob = [self.graph[edge[0]][edge[1]].get('weight', 1.0) *
	numEdges / total_sum for edge in self.graph.edges()]

shenweichen / graphembedding Goto Github PK

graphembedding's People

Contributors

Stargazers

Watchers

Forkers

graphembedding's Issues

Traceback (most recent call last):

这步A_data+A_data长度为node_size的2倍, 而shape=(node_size, node_size) 本地测试的时候报错了。

Recommend Projects

Recommend Topics

Recommend Org

Jobs