---
license: apache-2.0
datasets:
- cxllin/medinstruct
language:
- en
metrics:
- accuracy
library_name: transformers
pipeline_tag: question-answering
tags:
- medical
---
#  cxllin/Llama2-7b-med-v1

## Model Details

### Description

The **cxllin/Llama2-7b-med-v1** model, derived from the Llama 7b model, is posited to specialize in Natural Language Processing tasks within the medical domain.

#### Development Details
- **Developer**: Collin Heenan 
- **Model Architecture**: Transformer 
- **Base Model**: [Llama-2-7b](https://huggingface.co/NousResearch/Nous-Hermes-llama-2-7b)
- **Primary Language**: English
- **License**: apache 2.0

### Model Source Links
- **Repository**: Not Specified
- **Paper**: [Jin, Di, et al. "What Disease does this Patient Have?..."](https://github.com/jind11/MedQA)

### Direct Applications
The model is presumed to be applicable for various NLP tasks within the medical domain, such as:
- Medical text generation or summarization.
- Question answering related to medical topics.

### Downstream Applications
Potential downstream applications might encompass:
- Healthcare chatbot development.
- Information extraction from medical documentation.

### Out-of-Scope Utilizations
- Rendering definitive medical diagnoses or advice.
- Employing in critical healthcare systems without stringent validation.
- Applying in any high-stakes or legal contexts without thorough expert validation.

## Bias, Risks, and Limitations

- **Biases**: The model may perpetuate biases extant in the training data, influencing neutrality.
- **Risks**: There exists the peril of disseminating inaccurate or misleading medical information.
- **Limitations**: Expertise in highly specialized or novel medical topics may be deficient.

### Recommendations for Use
Utilizers are urged to:
- Confirm outputs via expert medical review, especially in professional contexts.
- Employ the model judiciously, adhering to pertinent legal and ethical guidelines.
- Maintain transparency with end-users regarding the model’s capabilities and limitations.

## Getting Started with the Model

Details regarding model deployment and interaction remain to be provided.

### Training Dataset
- **Dataset Source**:[cxllin/medinstruct](https://huggingface.co/datasets/cxllin/medinstruct)
- **Size**: 10.2k rows
- **Scope**: Medical exam-related question-answering data.

#### Preprocessing Steps
Details regarding data cleaning, tokenization, and special term handling during training are not specified.

---

@article{jin2020disease,
  title={What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams},
  author={Jin, Di and Pan, Eileen and Oufattole, Nassim and Weng, Wei-Hung and Fang, Hanyi and Szolovits, Peter},
  journal={arXiv preprint arXiv:2009.13081},
  year={2020}
}