HAH-2024-v0.11 / README.md

drmasad

Update README.md

43f5406 verified 4 months ago

preview code

raw

history blame

No virus

4.97 kB

	---
	language:
	- en
	license: apache-2.0
	pipeline_tag: text-generation
	tags:
	- healthcare
	- diabetes

	model-index:
	- name: HAH 2024 v0.11
	results:
	- task:
	name: Text Generation
	type: text-generation
	dataset:
	name: Custom Dataset (3000 review articles on diabetes)
	type: diabetes
	metrics:
	- name: Placeholder Metric for Development
	type: Placeholder Type
	value: 0 # Temporary placeholder value

	model-description:
	short-description: "HAH 2024 v0.1 is a state-of-the-art language model fine-tuned specifically for generating text based on diabetes-related content. Leveraging a dataset constructed from 3000 open-source review articles, this model provides informative and contextually relevant answers to various queries about diabetes care, research, and therapies."

	intended-use:
	primary-use: "HAH 2024 v0.1 is intended to for research purposes only."
	secondary-potential-uses:
	- "a Prototype for researchers to assess (not to formally use in real life cases) generating educational content for patients and the general public about diabetes care and management."
	- "Check the use of adapters to assist researchers in summarizing large volumes of diabetes-related literature."

	limitations:
	- "While HAH 2024 v0.1 excels at generating contextually appropriate responses, it may occasionally produce outputs that require further verification."
	- "The training dataset, being limited to published articles, might not capture all contemporary research or emerging trends in diabetes care."

	training-data:
	description: "The training data for HAH 2024 v0.1 consists of 3000 open-source review articles about diabetes, carefully curated to cover a wide range of topics within the field. The dataset was enriched with questions generated through prompting OpenAI GPT-4 to ensure diversity in content and perspectives."

	training-procedure:
	description: "HAH 2024 v0.1 was fine-tuned on an A100 GPU using Google Colab. The fine-tuning process was carefully monitored to maintain the model's relevance to diabetes-related content while minimizing biases that might arise from the dataset's specific nature."

	---

	# Model Card for HAH 2024 v0.1

	This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).

	## Model Details

	### Model Description

	HAH 2024 v0.11 aim is to ASSESS how an advanced language model fine-tuned for generating insights from diabetes-related healthcare data will perform. HAH 2024 v0.1 is intended to for research purposes only.

	- Developed by: Dr M As'ad
	- Funded by: Self funded
	- Model type: Transformer-based language model
	- Language(s) (NLP): English
	- License: Apache-2.0
	- Finetuned from model [optional]: Mistral 7b Instruct v0.2

	## Uses

	### Direct Use

	HAH 2024 v0.11 is designed to assess the performance for direct use in chat interface on diabetes domain.

	### Downstream Use [optional]

	The model can also be fine-tuned for specialized tasks sch a subtypes or subgroups in diabetes field.

	### Out-of-Scope Use

	This model is not recommended for non-English text or contexts outside of healthcare,
	IT is research project not for any deployments to be used in real chat interface.

	## Bias, Risks, and Limitations

	The model may inherently carry biases from the training data related to diabetes literature, potentially reflecting the geographic and demographic focus of the sources.

	### Recommendations

	Users should verify the model-generated information with current medical guidelines and consider a manual review for sensitive applications.

	## How to Get Started with the Model

	Use the code below to get started with the model:

	```python
	from transformers import pipeline, AutoModelForCausalLM, AutoTokenizer

	# Assuming the model and tokenizer are loaded with 'username/HAH_2024_v0.1'
	model = AutoModelForCausalLM.from_pretrained("drmasad/HAH_2024_v0.11")
	tokenizer = AutoTokenizer.from_pretrained("drmasad/HAH_2024_v0.11")

	# Setting up the instruction and the user prompt
	instructions = "you are an expert endocrinologist. Answer the query in accurate informative language any patient will understand."
	user_prompt = "what is diabetic retinopathy?"

	# Using the pipeline for text-generation
	pipe = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=200)

	# Formatting the input with special tokens [INST] and [/INST] for instructions
	result = pipe(f"<s>[INST] {instructions} [/INST] {user_prompt}</s>")

	# Extracting generated text and post-processing
	generated_text = result[0]['generated_text']

	# Split the generated text to get the text after the last occurrence of </s>
	answer = generated_text.split("</s>")[-1].strip()

	# Print the answer
	print(answer)