Llama2-Fin-Summarizer

Model Description

This is a fine-tuned version of the LLaMA2 7B model, quantized to 4-bit precision, specifically trained for financial text summarization. The model was fine-tuned on a custom dataset of 200+ large financial documents, allowing it to generate concise and accurate summaries of financial reports, articles, and other related documents.

Model Details:

Base Model: LLaMA2 7B
Fine-tuning Dataset: Custom dataset with 200+ large financial documents
Quantization: 4-bit (low memory usage)
Task: Financial text summarization
Trainable Parameters: The model was trained using parameter-efficient fine-tuning techniques, with only a subset of parameters being trainable during the fine-tuning process.

How to Use the Model

Installation

To use this model, you need to install the required Python libraries:

pip install accelerate peft bitsandbytes git+https://github.com/huggingface/transformers py7zr

Input/Output Format

Input: The model accepts text input only.
Output: The model generates summarized text output only.

Import with Hugging Face Transformers and PEFT

from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer
import torch

peft_model_dir = "Karthikeyan-M3011/llama2-fin-summarizer"

trained_model = AutoPeftModelForCausalLM.from_pretrained(
    peft_model_dir,
    low_cpu_mem_usage=True,
    torch_dtype=torch.float16,
    load_in_4bit=True,
)
tokenizer = AutoTokenizer.from_pretrained(peft_model_dir)

Inference with Llama2-Financial-Summarizer

query = 'Your text to summarize'
dialogue = query[:10000]  # max tokens allowed

prompt = f"""
Summarize the following conversation.

### Input:
{dialogue}

### Summary:
"""

input_ids = tokenizer(prompt, return_tensors='pt', truncation=True).input_ids.cuda()
outputs = trained_model.generate(input_ids=input_ids, max_new_tokens=200)
output = tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0][len(prompt):]

dash_line = '-' * 100
print(dash_line)
print(f'INPUT PROMPT:\n{prompt}')
print(dash_line)
print(f'TRAINED MODEL GENERATED TEXT:\n{output}')

Limitations

Dataset Bias: The model was fine-tuned on a relatively small dataset (200+ financial documents).
Quantization Effects: The 4-bit quantization reduces memory usage but may introduce slight inaccuracies compared to models using higher precision.
Context Limitations: The model can only process up to 10,000 tokens in the input, which may limit its ability to summarize very long documents in a single pass.

Training Parameters

The model was fine-tuned using the following training parameters:

from transformers import TrainingArguments


training_arguments = TrainingArguments(
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    optim="paged_adamw_32bit",
    logging_steps=1,
    learning_rate=1e-4,
    fp16=True,
    max_grad_norm=0.3,
    num_train_epochs=4,
    evaluation_strategy="steps",
    eval_steps=0.2,
    warmup_ratio=0.05,
    save_strategy="epoch",
    group_by_length=True,
    output_dir=OUTPUT_DIR,
    report_to="tensorboard",
    save_safetensors=True,
    lr_scheduler_type="cosine",
    seed=42,
)
model.config.use_cache = False

Training Execution

from trl import SFTTrainer

trainer = SFTTrainer(
    model=model,
    train_dataset=train_data,
    eval_dataset=validation_data,
    peft_config=lora_config,
    dataset_text_field="text",
    max_seq_length=1024,
    tokenizer=tokenizer,
    args=training_arguments,
)

trainer.train()

Authors

Karthikeyan M LinkedIn
Arun Kumar R LinkedIn
Barath Raj P LinkedIn
Logabaalan R S LinkedIn

Citation

If you use this model in your research or applications, please cite it as follows:

@misc{llama2-fin-summarizer,
  publisher = {Karthikeyan M},
  title = {Fine-tuned LLaMA2 7B Model for Financial Summarization},
  year = {2024},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Karthikeyan-M3011/llama2-fin-summarizer}},
}