NeuralNovel commited on
Commit
7b840c0
1 Parent(s): 4a45f61

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -113
README.md CHANGED
@@ -1,134 +1,54 @@
1
  ---
2
- datasets: Intel/orca_dpo_pairs
3
  license: apache-2.0
4
- base_model: NeuralNovel/Gecko-7B-v0.1
5
- model-index:
6
- - name: out
7
- results: []
8
  ---
9
 
10
- [<img src="https://raw.githubusercontent.com/OpenAccess-AI-Collective/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/OpenAccess-AI-Collective/axolotl)
11
- <details><summary>See axolotl config</summary>
12
-
13
- axolotl version: `0.3.0`
14
- ```yaml
15
- base_model: NeuralNovel/Gecko-7B-v0.1
16
- model_type: MistralForCausalLM
17
- tokenizer_type: LlamaTokenizer
18
- is_mistral_derived_model: true
19
-
20
- load_in_8bit: false
21
- load_in_4bit: false
22
- strict: false
23
-
24
- datasets:
25
- - path: Intel/orca_dpo_pairs
26
- type:
27
- system_prompt: ""
28
- field_system: system
29
- field_instruction: question
30
- field_output: rejected
31
- field_output: chosen
32
- format: "[INST] {instruction} [/INST]"
33
- no_input_format: "[INST] {instruction} [/INST]"
34
-
35
- dataset_prepared_path:
36
- val_set_size: 0.05
37
- output_dir: ./out
38
-
39
- sequence_len: 8192
40
- sample_packing: true
41
- pad_to_sequence_len: true
42
- eval_sample_packing: false
43
-
44
- wandb_project:
45
- wandb_entity:
46
- wandb_watch:
47
- wandb_name:
48
- wandb_log_model:
49
-
50
- gradient_accumulation_steps: 4
51
- micro_batch_size: 2
52
- num_epochs: 1
53
- optimizer: adamw_bnb_8bit
54
- lr_scheduler: cosine
55
- learning_rate: 0.000005
56
-
57
- train_on_inputs: false
58
- group_by_length: false
59
- bf16: true
60
- fp16: false
61
- tf32: false
62
-
63
- gradient_checkpointing: true
64
- early_stopping_patience:
65
- resume_from_checkpoint:
66
- local_rank:
67
- logging_steps: 1
68
- xformers_attention:
69
- flash_attention: true
70
-
71
- warmup_steps: 10
72
- evals_per_epoch: 4
73
- eval_table_size:
74
- eval_table_max_new_tokens: 128
75
- saves_per_epoch: 1
76
- debug:
77
- deepspeed:
78
- weight_decay: 0.0
79
- fsdp:
80
- fsdp_config:
81
- special_tokens:
82
- bos_token: "<s>"
83
- eos_token: "</s>"
84
- unk_token: "<unk>"
85
 
86
- ```
87
 
88
- </details><br>
 
 
 
 
 
89
 
 
90
 
91
- ## Model description
92
 
93
- More information needed
94
 
95
- ## Intended uses & limitations
96
 
97
- More information needed
98
 
99
- ## Training and evaluation data
100
 
101
- Trained using an A100 for 2 hours.
102
 
103
- ## Training procedure
104
 
105
- ### Training hyperparameters
106
 
107
- The following hyperparameters were used during training:
108
- - learning_rate: 5e-06
109
- - train_batch_size: 2
110
- - eval_batch_size: 2
111
- - seed: 42
112
- - gradient_accumulation_steps: 4
113
- - total_train_batch_size: 8
114
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
115
- - lr_scheduler_type: cosine
116
- - lr_scheduler_warmup_steps: 10
117
- - num_epochs: 1
118
 
119
- ### Training results
 
120
 
121
- | Training Loss | Epoch | Step | Validation Loss |
122
- |:-------------:|:-----:|:----:|:---------------:|
123
- | 1.1108 | 0.01 | 1 | 1.2742 |
124
- | 1.0158 | 0.26 | 19 | 0.8302 |
125
- | 0.8999 | 0.51 | 38 | 0.8009 |
126
- | 0.851 | 0.77 | 57 | 0.7924 |
127
 
128
 
129
- ### Framework versions
130
 
131
- - Transformers 4.37.0.dev0
132
- - Pytorch 2.0.1+cu117
133
- - Datasets 2.16.1
134
- - Tokenizers 0.15.0
 
 
 
 
 
 
 
 
1
  ---
 
2
  license: apache-2.0
3
+ base_model: mistralai/Mistral-7B-Instruct-v0.2
4
+ library_name: transformers
5
+ inference: false
 
6
  ---
7
 
8
+ ![Gecko](https://i.ibb.co/z5hMcBw/OIG-42.jpg)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
 
 
10
 
11
+ # Gecko-7B-v0.1
12
+ Designed to generate instructive and narrative text, with a specific focus on mathematics & numeracy.
13
+
14
+ Full-parameter fine-tune (FFT) of Mistral-7B-Instruct-v0.2, with apache-2.0 license.
15
+
16
+ You may download and use this model for research, training and commercial purposes.
17
 
18
+ This model is suitable for commercial deployment.
19
 
20
+ ### Data-set
21
 
22
+ The model was finetuned using the Neural-Mini-Math dataset (Currently Private)
23
 
 
24
 
25
+ ### Summary
26
 
27
+ Fine-tuned with the intention of following all prompt directions, making it more suitable for roleplay and problem solving.
28
 
29
+ #### Out-of-Scope Use
30
 
31
+ The model may not perform well in scenarios unrelated to instructive and narrative text generation. Misuse or applications outside its designed scope may result in suboptimal outcomes.
32
 
33
+ ### Bias, Risks, and Limitations
34
 
35
+ This model may not work as intended. As such all users are encouraged to use this model with caution and respect.
 
 
 
 
 
 
 
 
 
 
36
 
37
+ This model is for testing and research purposes only, it has reduced levels of alignment and as a result may produce NSFW or harmful content.
38
+ The user is responsible for their output and must use this model responsibly.
39
 
40
+ ### Hardware and Training
 
 
 
 
 
41
 
42
 
 
43
 
44
+ ```
45
+
46
+ n_epochs = 3,
47
+ n_checkpoints = 3,
48
+ batch_size = 12,
49
+ learning_rate = 1e-5,
50
+
51
+
52
+ ```
53
+
54
+ *Sincere appreciation to Techmind for their generous sponsorship.*