HAH-2024-v0.11 / README.md
drmasad's picture
Update README.md
43f5406 verified
|
raw
history blame
No virus
4.97 kB
---
language:
- en
license: apache-2.0
pipeline_tag: text-generation
tags:
- healthcare
- diabetes
model-index:
- name: HAH 2024 v0.11
results:
- task:
name: Text Generation
type: text-generation
dataset:
name: Custom Dataset (3000 review articles on diabetes)
type: diabetes
metrics:
- name: Placeholder Metric for Development
type: Placeholder Type
value: 0 # Temporary placeholder value
model-description:
short-description: "HAH 2024 v0.1 is a state-of-the-art language model fine-tuned specifically for generating text based on diabetes-related content. Leveraging a dataset constructed from 3000 open-source review articles, this model provides informative and contextually relevant answers to various queries about diabetes care, research, and therapies."
intended-use:
primary-use: "HAH 2024 v0.1 is intended to for research purposes only."
secondary-potential-uses:
- "a Prototype for researchers to assess (not to formally use in real life cases) generating educational content for patients and the general public about diabetes care and management."
- "Check the use of adapters to assist researchers in summarizing large volumes of diabetes-related literature."
limitations:
- "While HAH 2024 v0.1 excels at generating contextually appropriate responses, it may occasionally produce outputs that require further verification."
- "The training dataset, being limited to published articles, might not capture all contemporary research or emerging trends in diabetes care."
training-data:
description: "The training data for HAH 2024 v0.1 consists of 3000 open-source review articles about diabetes, carefully curated to cover a wide range of topics within the field. The dataset was enriched with questions generated through prompting OpenAI GPT-4 to ensure diversity in content and perspectives."
training-procedure:
description: "HAH 2024 v0.1 was fine-tuned on an A100 GPU using Google Colab. The fine-tuning process was carefully monitored to maintain the model's relevance to diabetes-related content while minimizing biases that might arise from the dataset's specific nature."
---
# Model Card for HAH 2024 v0.1
This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).
## Model Details
### Model Description
HAH 2024 v0.11 aim is to ASSESS how an advanced language model fine-tuned for generating insights from diabetes-related healthcare data will perform. HAH 2024 v0.1 is intended to for research purposes only.
- **Developed by:** Dr M As'ad
- **Funded by:** Self funded
- **Model type:** Transformer-based language model
- **Language(s) (NLP):** English
- **License:** Apache-2.0
- **Finetuned from model [optional]:** Mistral 7b Instruct v0.2
## Uses
### Direct Use
HAH 2024 v0.11 is designed to assess the performance for direct use in chat interface on diabetes domain.
### Downstream Use [optional]
The model can also be fine-tuned for specialized tasks sch a subtypes or subgroups in diabetes field.
### Out-of-Scope Use
This model is not recommended for non-English text or contexts outside of healthcare,
IT is research project not for any deployments to be used in real chat interface.
## Bias, Risks, and Limitations
The model may inherently carry biases from the training data related to diabetes literature, potentially reflecting the geographic and demographic focus of the sources.
### Recommendations
Users should verify the model-generated information with current medical guidelines and consider a manual review for sensitive applications.
## How to Get Started with the Model
Use the code below to get started with the model:
```python
from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer
# Assuming the model and tokenizer are loaded with 'username/HAH_2024_v0.1'
model = AutoModelForCausalLM.from_pretrained("drmasad/HAH_2024_v0.11")
tokenizer = AutoTokenizer.from_pretrained("drmasad/HAH_2024_v0.11")
# Setting up the instruction and the user prompt
instructions = "you are an expert endocrinologist. Answer the query in accurate informative language any patient will understand."
user_prompt = "what is diabetic retinopathy?"
# Using the pipeline for text-generation
pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)
# Formatting the input with special tokens [INST] and [/INST] for instructions
result = pipe(f"<s>[INST] {instructions} [/INST] {user_prompt}</s>")
# Extracting generated text and post-processing
generated_text = result[0]['generated_text']
# Split the generated text to get the text after the last occurrence of </s>
answer = generated_text.split("</s>")[-1].strip()
# Print the answer
print(answer)