jiangxinyang227 / nlp-project Goto Github PK

View Code? Open in Web Editor NEW

467.0 467.0 223.0 83.56 MB

including text classifier, language model, pre_trained model, multi_label classifier, text generator, dialogue. etc

Python 99.78% Shell 0.21% Jupyter Notebook 0.01%

nlp-project's People

Contributors

Stargazers

Watchers

Forkers

blueskychina lchloveai by-sum xinjianlv btbujiangjun ricardo-zhanglei qianrenjian kingsleylzj himoutoumaru belle9217 boyuxu sosoho jiijkkk goingcoder jeffery0516 changaolin 1997shirley xiaoweibianchen 471417367 debuluoyi leeon2vec communicateconnectcreate lzhan011 carlos9310 bbzhang0824 victortowne hz-xmz cokinlee 285116319 milorin rj172lbw lizhaofu csliuchang hongshengxin susangzj xuedaniang yxsunmadmax kbwzy 547130095 haif-liu chavesliu jianzhez ares5221 iamxpy kaytony handsomebrothers halicia gzfffff ben2017a rooljia katehuang920909 jinhuasu magicgo danxiangjie hishook chenwanyuan printfll cg110778 emperorwga fengye004 nagin-kim limingweishuo wwt1991wwz dreaming-world aliciashen0118 hanleizhang seeker1943 rogue-qc buptguo geethubchang xw-jia zhubingb zhangshuai881020 emilywangattri swordlin zhourong122 cao-ming chentutu yaoyaozhi lizhaolz shieldontheroad ultimatedaotu xunen63 wzx111057 guo-huojian cuizhiguo swan815 liyandan caowenli zhaoxiaoliang-clh dubingzhu ttjjlw liyuan97 xiaodeng-1 laitianan morraytang2010 rejae bzqweiyi zmingshi cynthia921

nlp-project's Issues

词向量问题

你好！情感分析中，请问一下你用什么语料训练的词向量？
sentences = LineSentence("corpus.txt")

无监督单文本关键词提取

你好，非常喜欢你在文本分类中所说的框架，整体非常清晰。
请教下楼主，无监督单文本关键词提取有没有研究过，
我所知道的只有一个textrank，貌似效果不太好，
如果有什么好的blog或者开源项目，求推荐，谢谢！

File "D:\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\training\saver.py", line 825, in init
self.build()
File "D:\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\training\saver.py", line 837, in build
self._build(self._filename, build_save=True, build_restore=True)
File "D:\Anaconda3\envs\tensorflow-gpu\lib\site-packages\tensorflow\python\training\saver.py", line 862, in _build
raise ValueError("No variables to save")
ValueError: No variables to save

想问一下这是什么原因呢？

lstm_siamese 数据格式问题

非常抱歉提这个issue，您能把数据搞个一两条附上去吗？内容随意，是新手，想学习下您这个项目，但数据这块一直弄不成，非常感谢，您也可以发我邮箱，[email protected] 非常感谢

TypeError: only integer scalar arrays can be converted to a scalar index

File "train.py", line 125, in train
self.config["batch_size"]):
File "/home/rejae/PycharmProjects/NLP-Project-master/text_classifier/data_helpers/eval_data.py", line 155, in next_batch
y = y[perm]
TypeError: only integer scalar arrays can be converted to a scalar index
需要将eval_data中:
def next_batch(self, x, y, batch_size):

    perm = np.arange(len(x))
    #print(type(y)  is list)  list need transfer to array.
    #out_images = np.array(X_train)[indices.astype(int)]
    np.random.shuffle(perm)
    x = x[perm]
    #y = y[perm]  
    y = np.array(y)[perm] #后修改

text_classifier\models\bilstmatten.py 中代码矩阵运算出错

利用求得的alpha的值对H进行加权求和，用矩阵运算直接操作

    r = tf.matmul(tf.transpose(H, [0, 2, 1]), tf.reshape(self.alpha, [-1, self.config["sequence_length"], 1]))

报错：
InvalidArgumentError (see above for traceback): In[0].dim(0) and In[1].dim(0) must be the same: [128,512,50] vs [256,50,1]
[[Node: Attention/MatMul_1 = BatchMatMul[T=DT_FLOAT, adj_x=false, adj_y=false, _device="/job:localhost/replica:0/task:0/cpu:0"](Attention/transpose, Attention/Reshape_3)]]

cnn_b not found in checkpoint

NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key conv-maxpool-3/cnn_b not found in checkpoint
[[node save/RestoreV2 (defined at /root/anaconda3/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py:2418) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_INT64, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

关于bert的长度问题

作者你好。我用bert的时候,我看到他的官方代码最大的长度是512 用于大文本的时候例如超过512 效果是不是马上就会变差的了?

text_classfiter报错

初始化Predictor对象，调用predict方法，运行test.py文件后，报错：
InvalidArgumentError (see above for traceback): Inputs to operation transformer/transformer-1/multi_head_atten/Select of type Select must have the same size and shape. Input 0: [8,100,100] != input 1: [1024,100,100]

请问怎么解决呢？

不好意思。是我搞错了一些事。

不好意思。是我搞错了。。无法删除这个标题

dialogue_generator (bilstm)报错

AttributeError: 'Predictor' object has no attribute 'sess'
求大佬解答~

Bert_CNN中do_predict时候报错 File "bert_cnn.py", line 911, in main probabilities = prediction["probabilities"] IndexError: invalid index to scalar variable.

                    outputs, current_state = tf.nn.bidirectional_dynamic_rnn(lstm_fw_cell, lstm_bw_cell,embedded_words, dtype=tf.float32, scope="bi-lstm" + str(idx))
                   embedded_words = tf.concat(outputs, 2)

    # 将最后一层Bi-LSTM输出的结果分割成前向和后向的输出
    outputs = tf.split(embedded_words, 2, -1)

为什么先对最后一层进行concat后，又split拆开使用，都是针对axis=2拼接和拆开，直接使用bidirectional_dynamic_rnn, 返回的outputs元组不行么？

jiangxinyang227 / nlp-project Goto Github PK

nlp-project's People

Contributors

Stargazers

Watchers

Forkers

nlp-project's Issues

利用求得的alpha的值对H进行加权求和，用矩阵运算直接操作

Recommend Projects

Recommend Topics

Recommend Org

Jobs