Hi, thanks for sharing these projects, super neat work! I just w

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

Differences between KeyBERT and BERTopic about keybert HOT 3 CLOSED

maartengr commented on August 25, 2024 4

Differences between KeyBERT and BERTopic

from keybert.

Comments (3)

MaartenGr commented on August 25, 2024 20

Although the approach may look similar, their implementation is actually quite different. In practice, you will not be able to recreate KeyBERT with BERTopic and vice versa. To make this clear, I'll go through the models individually and then compare them.

BERTopic

The procedure of BERTopic is demonstrated below:

Here, you can see that there are three distinct steps:

Embedding documents
Clustering documents
Creating a topic representation.

The main output of BERTopic is a set of words per topic. Thus, multiple documents have the same topic representation.

KeyBERT

KeyBERT can roughly be divided into the following steps:

Embedding documents
Creating candidate keywords
Calculating best keywords through either MMR, Max Sum Similarity, or Cosine Similarity

The main output of KeyBERT is a set of words per document. Thus, each document is expected to have different keywords.

BERTopic vs. KeyBERT

The main similarities between the two methods are that they embed documents and leverage MMR (although both models may opt not to). To me, that is essentially where the similarities end. The main difference is everything that happens between embedding documents and, in some cases, leveraging MMR. For example, BERTopic aims to cluster documents and create a broad representation of multiple documents whereas KeyBERT does not. Moreover, when it comes down to algorithmic implementation, the UMAP/HDBSCAN/c-TF-IDF route is quite different from generating candidate keywords and comparing them to the individual documents.

When to use BERTopic vs. KeyBERT

As you might have already noticed from the descriptions above, both the purpose and output of the methods differ. BERTopic, and in that sense most topic modeling techniques, are meant to explore the data to create an understanding of the perhaps millions of documents that you have collected. KeyBERT, in contrast, is not able to do this as it creates a completely different set of words per document. An example of using KeyBERT, and in that sense most keyword extraction algorithms, is automatically creating relevant keywords for content (blogs, articles, etc.) that businesses post on their website.

P.S. I kinda went overboard with this explanation but seeing as there were several people that liked your question it seemed to be important to several others. If I wasn't clear of if you have any follow-up questions, don't hesitate to ask!

from keybert.

shoegazerstella commented on August 25, 2024

Hello @MaartenGr and thanks a lot for the clean clarification!

from keybert.

voxmenthe commented on August 25, 2024

Great explanation really appreciated being able to find this thanks!

from keybert.

Recommend Projects

Differences between KeyBERT and BERTopic about keybert HOT 3 CLOSED

Comments (3)

BERTopic

KeyBERT

BERTopic vs. KeyBERT

When to use BERTopic vs. KeyBERT

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs