GithubHelp home page GithubHelp logo

technical-solutions-lakehouse / jsl-medical-text-summarization Goto Github PK

View Code? Open in Web Editor NEW

This project forked from databricks-industry-solutions/jsl-medical-text-summarization

0.0 0.0 0.0 68 KB

License: Other

Python 100.00%

jsl-medical-text-summarization's Introduction

Exploring Medical Text Summarization with John Snow Labs

➤ Introduction

Text Summarization, a crucial task in Natural Language Processing (NLP), condenses lengthy text documents into shorter, compact versions while retaining the most critical information and meaning. The primary goal is to create a summary that accurately echoes the content of the original text in a more succinct form.

In the context of healthcare, where the prompt and accurate comprehension of extensive medical literature is paramount, medical text summarization plays a significant role.

➤ Purpose

John Snow Labs offers a suite of specialized text summarization models tailored for the healthcare industry. This joint Solution Accelerator serves as a definitive guide for utilizing these models via Spark NLP for Healthcare libraries, offering users an opportunity to harness the power of NLP for summarization tasks.

🪄 MODELS:

Model Name Model Description
summarizer_clinical_jsl This summarization model can quickly summarize clinical notes, encounters, critical care notes, discharge notes, reports, etc. It is based on a large language model that is finetuned with healthcare-specific additional data curated by medical doctors at John Snow Labs. It can generate summaries up to 512 tokens given an input text (max 1024 tokens).
summarizer_clinical_jsl_augmented This summarization model can quickly summarize clinical notes, encounters, critical care notes, discharge notes, reports, etc. It is based on a large language model that is finetuned with healthcare-specific additional data curated by medical doctors at John Snow Labs. This model is further optimized by augmenting the training methodology, and dataset. It can generate summaries up to 512 tokens given an input text (max 1024 tokens).
summarizer_biomedical_pubmed This summarization model can quickly summarize biomedical research or short papers. It is based on a large language model that is finetuned with healthcare-specific additional data curated by medical doctors at John Snow Labs. It can generate summaries up to 512 tokens given an input text (max 1024 tokens).
summarizer_generic_jsl A model optimized with custom data and training methodology from John Snow Labs, generating summaries from clinical notes.
summarizer_clinical_questions This summarization model efficiently summarizes medical questions from a range of clinical sources, providing concise summaries and generating relevant questions. It is based on a large language model that is finetuned with healthcare-specific additional data curated by medical doctors at John Snow Labs. It can generate summaries up to 512 tokens given an input text (max 1024 tokens).
summarizer_radiology This summarization model enables users to rapidly access a succinct synopsis of a report’s key findings without compromising on essential details. It is based on a large language model that is finetuned with healthcare-specific additional data curated by medical doctors at John Snow Labs. It can generate summaries up to 512 tokens given an input text (max 1024 tokens).
summarizer_clinical_guidelines_large The Medical Summarizer Model efficiently categorizes clinical guidelines into four sections for easy understanding. With a 768-token context length, it provides detailed yet concise summaries, leveraging data curated by doctors from John Snow Labs.
summarizer_clinical_laymen This model has been carefully fine-tuned with a custom dataset curated by John Snow Labs, expressly designed to minimize the use of clinical terminology in the generated summaries. The summarizer_clinical_laymen model is capable of producing summaries of up to 512 tokens from an input text of a maximum of 1024 tokens.

➤ License

Copyright / License info of the notebook. Copyright [2021] the Notebook Authors. The source in this notebook is provided subject to the Apache 2.0 License. All included or referenced third party libraries are subject to the licenses set forth below.

Library Name Library License Library License URL Library Source URL
Pandas BSD 3-Clause License https://github.com/pandas-dev/pandas/blob/master/LICENSE https://github.com/pandas-dev/pandas
Numpy BSD 3-Clause License https://github.com/numpy/numpy/blob/main/LICENSE.txt https://github.com/numpy/numpy
NLTK Apache License 2.0 https://github.com/nltk/nltk/blob/develop/LICENSE.txt https://github.com/nltk/nltk/tree/develop
textwrap MIT License https://github.com/mgeisler/textwrap/blob/master/LICENSE https://github.com/mgeisler/textwrap/tree/master
Apache Spark Apache License 2.0 https://github.com/apache/spark/blob/master/LICENSE https://github.com/apache/spark/tree/master/python/pyspark
Spark NLP Apache License 2.0 https://github.com/JohnSnowLabs/spark-nlp/blob/master/LICENSE https://github.com/JohnSnowLabs/spark-nlp
Spark NLP for Healthcare Proprietary license - John Snow Labs Inc. NA NA
Author
Databricks Inc.
John Snow Labs Inc.

➤ Disclaimers

Databricks Inc. (“Databricks”) does not dispense medical, diagnosis, or treatment advice. This Solution Accelerator (“tool”) is for informational purposes only and may not be used as a substitute for professional medical advice, treatment, or diagnosis. This tool may not be used within Databricks to process Protected Health Information (“PHI”) as defined in the Health Insurance Portability and Accountability Act of 1996, unless you have executed with Databricks a contract that allows for processing PHI, an accompanying Business Associate Agreement (BAA), and are running this notebook within a HIPAA Account. Please note that if you run this notebook within Azure Databricks, your contract with Microsoft applies.

➤ Instruction

To run this accelerator, set up JSL Partner Connect AWS, Azure and navigate to My Subscriptions tab. Make sure you have a valid subscription for the workspace you clone this repo into, then install on cluster as shown in the screenshot below, with the default options. You will receive an email from JSL when the installation completes.


Once the JSL installation completes successfully, clone this repo into a Databricks workspace. Attach the RUNME notebook to any cluster and execute the notebook via Run-All. A multi-step-job describing the accelerator pipeline will be created, and the link will be provided. Execute the multi-step-job to see how the pipeline runs.

jsl-medical-text-summarization's People

Contributors

dbbnicole avatar muhammetsnts avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.