|
--- |
|
language: |
|
- en |
|
license: apache-2.0 |
|
pipeline_tag: text-generation |
|
tags: |
|
- healthcare |
|
- diabetes |
|
|
|
model-index: |
|
- name: HAH 2024 v0.11 |
|
results: |
|
- task: |
|
name: Text Generation |
|
type: text-generation |
|
dataset: |
|
name: Custom Dataset (3000 review articles on diabetes) |
|
type: diabetes |
|
metrics: |
|
- name: Placeholder Metric for Development |
|
type: Placeholder Type |
|
value: 0 |
|
|
|
model-description: |
|
short-description: "HAH 2024 v0.1 is a state-of-the-art language model fine-tuned specifically for generating text based on diabetes-related content. Leveraging a dataset constructed from 3000 open-source review articles, this model provides informative and contextually relevant answers to various queries about diabetes care, research, and therapies." |
|
|
|
intended-use: |
|
primary-use: "HAH 2024 v0.1 is intended to for research purposes only." |
|
secondary-potential-uses: |
|
- "a Prototype for researchers to assess (not to formally use in real life cases) generating educational content for patients and the general public about diabetes care and management." |
|
- "Check the use of adapters to assist researchers in summarizing large volumes of diabetes-related literature." |
|
|
|
limitations: |
|
- "While HAH 2024 v0.1 excels at generating contextually appropriate responses, it may occasionally produce outputs that require further verification." |
|
- "The training dataset, being limited to published articles, might not capture all contemporary research or emerging trends in diabetes care." |
|
|
|
training-data: |
|
description: "The training data for HAH 2024 v0.1 consists of 3000 open-source review articles about diabetes, carefully curated to cover a wide range of topics within the field. The dataset was enriched with questions generated through prompting OpenAI GPT-4 to ensure diversity in content and perspectives." |
|
|
|
training-procedure: |
|
description: "HAH 2024 v0.1 was fine-tuned on an A100 GPU using Google Colab. The fine-tuning process was carefully monitored to maintain the model's relevance to diabetes-related content while minimizing biases that might arise from the dataset's specific nature." |
|
|
|
--- |
|
|
|
# Model Card for HAH 2024 v0.1 |
|
|
|
This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1). |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
HAH 2024 v0.11 aim is to ASSESS how an advanced language model fine-tuned for generating insights from diabetes-related healthcare data will perform. HAH 2024 v0.1 is intended to for research purposes only. |
|
|
|
- **Developed by:** Dr M As'ad |
|
- **Funded by:** Self funded |
|
- **Model type:** Transformer-based language model |
|
- **Language(s) (NLP):** English |
|
- **License:** Apache-2.0 |
|
- **Finetuned from model [optional]:** Mistral 7b Instruct v0.2 |
|
|
|
## Uses |
|
|
|
### Direct Use |
|
|
|
HAH 2024 v0.11 is designed to assess the performance for direct use in chat interface on diabetes domain. |
|
|
|
### Downstream Use [optional] |
|
|
|
The model can also be fine-tuned for specialized tasks sch a subtypes or subgroups in diabetes field. |
|
|
|
### Out-of-Scope Use |
|
|
|
This model is not recommended for non-English text or contexts outside of healthcare, |
|
IT is research project not for any deployments to be used in real chat interface. |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
The model may inherently carry biases from the training data related to diabetes literature, potentially reflecting the geographic and demographic focus of the sources. |
|
|
|
### Recommendations |
|
|
|
Users should verify the model-generated information with current medical guidelines and consider a manual review for sensitive applications. |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model: |
|
|
|
```python |
|
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer |
|
|
|
# Assuming the model and tokenizer are loaded with 'username/HAH_2024_v0.1' |
|
model = AutoModelForCausalLM.from_pretrained("drmasad/HAH_2024_v0.11") |
|
tokenizer = AutoTokenizer.from_pretrained("drmasad/HAH_2024_v0.11") |
|
|
|
# Setting up the instruction and the user prompt |
|
instructions = "you are an expert endocrinologist. Answer the query in accurate informative language any patient will understand." |
|
user_prompt = "what is diabetic retinopathy?" |
|
|
|
# Using the pipeline for text-generation |
|
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200) |
|
|
|
# Formatting the input with special tokens [INST] and [/INST] for instructions |
|
result = pipe(f"<s>[INST] {instructions} [/INST] {user_prompt}</s>") |
|
|
|
# Extracting generated text and post-processing |
|
generated_text = result[0]['generated_text'] |
|
|
|
# Split the generated text to get the text after the last occurrence of </s> |
|
answer = generated_text.split("</s>")[-1].strip() |
|
|
|
# Print the answer |
|
print(answer) |
|
|
|
|