GithubHelp home page GithubHelp logo

deepl4patent's Introduction

Deep Learning based Pipeline with Multichannel Inputs for Patent Classification (DeepL4Patent)

In this work, we introduce a deep learning pipeline called DeepL4Patent for automatic patent classification with multichannel inputs. A neural network model is trained with multichannel inputs namely embeddings of different segments of patent texts, and sparse linear input of different metadata.

archi

The deep learning architecture has two components: deep, and wide. It feed-forward neural networks with embedding of each segment, and uses them as deep layers for deep neural network model, and the patent metadata on the other hand is used as a wide part for the model. Specifically, the architecture is described as follows: for the wide components of the model, we used one-hot representation for patent metadata features (such as inventors, citations, and assignees), these one-hot vectors are fed into separate sub-networks, and at the end they are represented as deep networks. The right side of Figure above shows the architecture of wide layers since the multi-sparse inputs of patent metadata are feed into separate subnetworks. For the deep components of the model, we create deep layers for the most important patent text segments. These are sequential input to a Long Short-Term Memory (LSTM) network that takes the embedding as inputs. The left side of the Figure shows the architecture of deep layers since we used a pre-trained word embedding model (section IV) to encode each segment texts into vectors, and then we feed them into LSTM layers. To avoid network overfitting and help network stability, we added additional layers for each input channel, dropout layer is used to drop some inputs in order to prevent neural networks from overfitting, and Batch normalization layer is used to normalize the input layer by adjusting and scaling the activations. The exponential linear unit (ELU) is used as activation function. Finally, we concatenated nine components into a fully connected layer with dropout, batch normalization, and activation function.
Currently the repository contains three examples:

  1. Multi-class Patent Classification with multichannel inputs: The Main IPC code is used as a label for the patent document

  2. Multi-label Patent Classification with multichannel inputs: The full IPC codes are used as a label for the patent document

  3. Multi-class Patent Classification with a single channel input

References
Mustafa Sofean. Deep Learning based Pipeline with Multichannel Inputs for Patent Classification. 1st Workshop on Patent Text Mining and Semantic Technologies. PatentSemTech 2019

deepl4patent's People

Contributors

sofean-mso avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Forkers

yatide

deepl4patent's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.