vhab10 commited on
Commit
a062dbe
1 Parent(s): 318b0c1

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -0
README.md ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - llama
5
+ - quantization
6
+ - text-generation
7
+ - cpu
8
+ - gpu
9
+ - efficient-inference
10
+ license: apache-2.0
11
+ base_model:
12
+ - meta-llama/Llama-3.1-8B
13
+ ---
14
+
15
+ # Llama 3.1 8B Q4_K_M GGUF Model
16
+
17
+ ## Overview
18
+ This is the quantized version of the Llama 3.1 8B model in Q4_K_M format, optimized for efficient inference on both CPU and GPU. The model was quantized using the llama.cpp library, allowing users to run it in resource-constrained environments . This quantization reduces the model's memory footprint while maintaining strong language generation capabilities.
19
+
20
+ The model was originally trained by Meta AI and has been adapted to the GGUF format for compatibility with llama.cpp.
21
+
22
+ ## Model Details
23
+ - **Base Model**: meta-llama/Llama-3.1-8B
24
+ - **Quantization Type**: Q4_K_M (4-bit quantization with memory optimization)
25
+ - **Model Size**: 8B parameters
26
+ - **Format**: GGUF (used for efficient loading in llama.cpp)
27
+ - **Intended Use**: Text generation, inference on CPUs/GPUs with reduced memory constraints
28
+
29
+ ## Intended Use
30
+ The model is intended for text generation tasks and is optimized for efficient inference on both CPUs and GPUs, making it suitable for use in resource-constrained environments.
31
+
32
+ ## License
33
+ This model is licensed under the Apache 2.0 License.
34
+
35
+ ---