GithubHelp home page GithubHelp logo

johndef64 / bertopic_graph Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 85.93 MB

This repo offers a workflow dedicated to utilizing BERTopic for Semantic Graph-based information retrieval in nutrigenomics. It includes Jupyter notebooks on topic modeling and semantic graph creation, aimed at enhance genetic literature exploration. Ideal for genomic researchers, it simplifies the analysis of nutrition-related genetic information.

Jupyter Notebook 99.93% Python 0.07%
natural-language-processing topicmodeling gene-network genomics knowledge-graph

bertopic_graph's Introduction

BERTopic for Semantic Graph Information Retrieval

This repository contains the Jupyter notebook bertopic_graph.ipynb, which presents a semantic graph-based information retrieval system using BERTopic to analyze hidden topics within genetic literature related to MeSH terms specific to nutrigenomics.

Open In Colab

Repository Structure

  • data: Harvest the data produced during the notebook execution.
  • utlis: Contains accessory python code.
  • bertopic_graph.ipynb: Jupyter notebook detailing the GRPM BERTopic analysis process with a focus on semantic graph-based information retrieval in the nutrigenomics domain.
  • bertopic_tutorial.ipynb: Jupyter notebook created for educational purposes.

Requirements

All required libraries and their specific versions used in this project are outlined within the grpm_bertopic.ipynb notebook. Ensure to install these dependencies before executing the notebook.

Usage

To conduct the GRPM BERTopic analysis for semantic graph-based information retrieval in nutrigenomics, follow the steps provided in the grpm_bertopic.ipynb notebook. Each step includes detailed documentation and corresponding code snippets. By following these steps, you will explore complex relationships between genetic variations and MeSH terms specific to nutrigenomics.

The general workflow is illustrated below: Workflow

About GRPM BERTopic Analysis

This analysis employs the BERTopic pipeline to:

  1. Utilize a curated dataset enriched with genetic features related to nutrigenomics.
  2. Preprocess the dataset based on MeSH terms specifically chosen for the nutrigenomics domain.
  3. Implement the BERTopic methodology to perform topic modeling on the dataset, enabling a structured exploration of genetic influences, interactions, and implications in nutrigenomics.
  4. Construct a semantic graph representing intertopic relationships and dependencies, fostering enhanced information retrieval and contextual understanding within the nutrigenomics domain.

About the Semantic Graph

The semantic graph serves as the foundation of the information retrieval system, encapsulating the intricate relationships and dependencies between topics. By querying the semantic graph, researchers can navigate through the interconnected topics and retrieve relevant information with contextual understanding. We employed the hierarchical clustering tree as the root of the semantic graph through the aggregation of semantically associated topics into coherent clusters.

Workflow

The complete graph consists of three layers:

  1. The first layer is the backbone of the graph, it consists in the hierarchical tree connecting each detected topic and the most semantically significant terms for each topic (based on c-TF-IDF score). It can be observed that some terms are common across multiple topics.
  2. The second layer of the graph contains nodes related to the papers with PubMed URLs
  3. The third layer comprises the complex network of genes associated with each cluster of papers.

The complete graph is available in 'data/semantic_graph.graphml' and can be loaded into tools like Neo4J or Cytoscape for its examination.

If you encounter any challenges or have inquiries, feel free to raise an issue in this repository.

bertopic_graph's People

Contributors

johndef64 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.