gokceuludogan commited on
Commit
c057932
1 Parent(s): 0d484bb

Upload trainer_state.json

Browse files
Files changed (1) hide show
  1. trainer_state.json +2475 -0
trainer_state.json ADDED
@@ -0,0 +1,2475 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": null,
3
+ "best_model_checkpoint": null,
4
+ "epoch": 10.0,
5
+ "global_step": 198580,
6
+ "is_hyper_param_search": false,
7
+ "is_local_process_zero": true,
8
+ "is_world_process_zero": true,
9
+ "log_history": [
10
+ {
11
+ "epoch": 0.03,
12
+ "learning_rate": 4.987410615369121e-05,
13
+ "loss": 1.5195,
14
+ "step": 500
15
+ },
16
+ {
17
+ "epoch": 0.05,
18
+ "learning_rate": 4.974821230738242e-05,
19
+ "loss": 0.9589,
20
+ "step": 1000
21
+ },
22
+ {
23
+ "epoch": 0.08,
24
+ "learning_rate": 4.962231846107363e-05,
25
+ "loss": 0.849,
26
+ "step": 1500
27
+ },
28
+ {
29
+ "epoch": 0.1,
30
+ "learning_rate": 4.949642461476483e-05,
31
+ "loss": 0.7958,
32
+ "step": 2000
33
+ },
34
+ {
35
+ "epoch": 0.13,
36
+ "learning_rate": 4.937053076845604e-05,
37
+ "loss": 0.7606,
38
+ "step": 2500
39
+ },
40
+ {
41
+ "epoch": 0.15,
42
+ "learning_rate": 4.9244636922147244e-05,
43
+ "loss": 0.7385,
44
+ "step": 3000
45
+ },
46
+ {
47
+ "epoch": 0.18,
48
+ "learning_rate": 4.911874307583846e-05,
49
+ "loss": 0.7186,
50
+ "step": 3500
51
+ },
52
+ {
53
+ "epoch": 0.2,
54
+ "learning_rate": 4.899284922952966e-05,
55
+ "loss": 0.6996,
56
+ "step": 4000
57
+ },
58
+ {
59
+ "epoch": 0.23,
60
+ "learning_rate": 4.886695538322087e-05,
61
+ "loss": 0.6902,
62
+ "step": 4500
63
+ },
64
+ {
65
+ "epoch": 0.25,
66
+ "learning_rate": 4.874106153691208e-05,
67
+ "loss": 0.6809,
68
+ "step": 5000
69
+ },
70
+ {
71
+ "epoch": 0.28,
72
+ "learning_rate": 4.861516769060329e-05,
73
+ "loss": 0.6723,
74
+ "step": 5500
75
+ },
76
+ {
77
+ "epoch": 0.3,
78
+ "learning_rate": 4.848927384429449e-05,
79
+ "loss": 0.6689,
80
+ "step": 6000
81
+ },
82
+ {
83
+ "epoch": 0.33,
84
+ "learning_rate": 4.8363379997985705e-05,
85
+ "loss": 0.6621,
86
+ "step": 6500
87
+ },
88
+ {
89
+ "epoch": 0.35,
90
+ "learning_rate": 4.823748615167691e-05,
91
+ "loss": 0.6569,
92
+ "step": 7000
93
+ },
94
+ {
95
+ "epoch": 0.38,
96
+ "learning_rate": 4.811159230536812e-05,
97
+ "loss": 0.6519,
98
+ "step": 7500
99
+ },
100
+ {
101
+ "epoch": 0.4,
102
+ "learning_rate": 4.798569845905932e-05,
103
+ "loss": 0.6445,
104
+ "step": 8000
105
+ },
106
+ {
107
+ "epoch": 0.43,
108
+ "learning_rate": 4.785980461275053e-05,
109
+ "loss": 0.643,
110
+ "step": 8500
111
+ },
112
+ {
113
+ "epoch": 0.45,
114
+ "learning_rate": 4.7733910766441734e-05,
115
+ "loss": 0.6377,
116
+ "step": 9000
117
+ },
118
+ {
119
+ "epoch": 0.48,
120
+ "learning_rate": 4.760801692013295e-05,
121
+ "loss": 0.6386,
122
+ "step": 9500
123
+ },
124
+ {
125
+ "epoch": 0.5,
126
+ "learning_rate": 4.748212307382415e-05,
127
+ "loss": 0.6264,
128
+ "step": 10000
129
+ },
130
+ {
131
+ "epoch": 0.53,
132
+ "learning_rate": 4.7356229227515365e-05,
133
+ "loss": 0.6231,
134
+ "step": 10500
135
+ },
136
+ {
137
+ "epoch": 0.55,
138
+ "learning_rate": 4.723033538120657e-05,
139
+ "loss": 0.6254,
140
+ "step": 11000
141
+ },
142
+ {
143
+ "epoch": 0.58,
144
+ "learning_rate": 4.710444153489778e-05,
145
+ "loss": 0.6205,
146
+ "step": 11500
147
+ },
148
+ {
149
+ "epoch": 0.6,
150
+ "learning_rate": 4.697854768858899e-05,
151
+ "loss": 0.6222,
152
+ "step": 12000
153
+ },
154
+ {
155
+ "epoch": 0.63,
156
+ "learning_rate": 4.685265384228019e-05,
157
+ "loss": 0.6156,
158
+ "step": 12500
159
+ },
160
+ {
161
+ "epoch": 0.65,
162
+ "learning_rate": 4.6726759995971395e-05,
163
+ "loss": 0.612,
164
+ "step": 13000
165
+ },
166
+ {
167
+ "epoch": 0.68,
168
+ "learning_rate": 4.660086614966261e-05,
169
+ "loss": 0.6182,
170
+ "step": 13500
171
+ },
172
+ {
173
+ "epoch": 0.71,
174
+ "learning_rate": 4.647497230335381e-05,
175
+ "loss": 0.6127,
176
+ "step": 14000
177
+ },
178
+ {
179
+ "epoch": 0.73,
180
+ "learning_rate": 4.634907845704502e-05,
181
+ "loss": 0.6087,
182
+ "step": 14500
183
+ },
184
+ {
185
+ "epoch": 0.76,
186
+ "learning_rate": 4.622318461073623e-05,
187
+ "loss": 0.6062,
188
+ "step": 15000
189
+ },
190
+ {
191
+ "epoch": 0.78,
192
+ "learning_rate": 4.609729076442744e-05,
193
+ "loss": 0.604,
194
+ "step": 15500
195
+ },
196
+ {
197
+ "epoch": 0.81,
198
+ "learning_rate": 4.597139691811864e-05,
199
+ "loss": 0.6029,
200
+ "step": 16000
201
+ },
202
+ {
203
+ "epoch": 0.83,
204
+ "learning_rate": 4.5845503071809856e-05,
205
+ "loss": 0.6024,
206
+ "step": 16500
207
+ },
208
+ {
209
+ "epoch": 0.86,
210
+ "learning_rate": 4.571960922550106e-05,
211
+ "loss": 0.599,
212
+ "step": 17000
213
+ },
214
+ {
215
+ "epoch": 0.88,
216
+ "learning_rate": 4.559371537919227e-05,
217
+ "loss": 0.5967,
218
+ "step": 17500
219
+ },
220
+ {
221
+ "epoch": 0.91,
222
+ "learning_rate": 4.546782153288347e-05,
223
+ "loss": 0.5961,
224
+ "step": 18000
225
+ },
226
+ {
227
+ "epoch": 0.93,
228
+ "learning_rate": 4.534192768657468e-05,
229
+ "loss": 0.5954,
230
+ "step": 18500
231
+ },
232
+ {
233
+ "epoch": 0.96,
234
+ "learning_rate": 4.521603384026589e-05,
235
+ "loss": 0.5927,
236
+ "step": 19000
237
+ },
238
+ {
239
+ "epoch": 0.98,
240
+ "learning_rate": 4.50901399939571e-05,
241
+ "loss": 0.5859,
242
+ "step": 19500
243
+ },
244
+ {
245
+ "epoch": 1.0,
246
+ "eval_loss": 0.6686062812805176,
247
+ "eval_runtime": 51.5648,
248
+ "eval_samples_per_second": 342.346,
249
+ "step": 19858
250
+ },
251
+ {
252
+ "epoch": 1.01,
253
+ "learning_rate": 4.49642461476483e-05,
254
+ "loss": 0.5871,
255
+ "step": 20000
256
+ },
257
+ {
258
+ "epoch": 1.03,
259
+ "learning_rate": 4.4838352301339516e-05,
260
+ "loss": 0.579,
261
+ "step": 20500
262
+ },
263
+ {
264
+ "epoch": 1.06,
265
+ "learning_rate": 4.471245845503072e-05,
266
+ "loss": 0.5806,
267
+ "step": 21000
268
+ },
269
+ {
270
+ "epoch": 1.08,
271
+ "learning_rate": 4.458656460872193e-05,
272
+ "loss": 0.5788,
273
+ "step": 21500
274
+ },
275
+ {
276
+ "epoch": 1.11,
277
+ "learning_rate": 4.446067076241314e-05,
278
+ "loss": 0.5765,
279
+ "step": 22000
280
+ },
281
+ {
282
+ "epoch": 1.13,
283
+ "learning_rate": 4.433477691610434e-05,
284
+ "loss": 0.5756,
285
+ "step": 22500
286
+ },
287
+ {
288
+ "epoch": 1.16,
289
+ "learning_rate": 4.4208883069795545e-05,
290
+ "loss": 0.5756,
291
+ "step": 23000
292
+ },
293
+ {
294
+ "epoch": 1.18,
295
+ "learning_rate": 4.408298922348676e-05,
296
+ "loss": 0.5758,
297
+ "step": 23500
298
+ },
299
+ {
300
+ "epoch": 1.21,
301
+ "learning_rate": 4.3957095377177963e-05,
302
+ "loss": 0.5712,
303
+ "step": 24000
304
+ },
305
+ {
306
+ "epoch": 1.23,
307
+ "learning_rate": 4.3831201530869176e-05,
308
+ "loss": 0.5708,
309
+ "step": 24500
310
+ },
311
+ {
312
+ "epoch": 1.26,
313
+ "learning_rate": 4.370530768456038e-05,
314
+ "loss": 0.5684,
315
+ "step": 25000
316
+ },
317
+ {
318
+ "epoch": 1.28,
319
+ "learning_rate": 4.357941383825159e-05,
320
+ "loss": 0.5738,
321
+ "step": 25500
322
+ },
323
+ {
324
+ "epoch": 1.31,
325
+ "learning_rate": 4.34535199919428e-05,
326
+ "loss": 0.5697,
327
+ "step": 26000
328
+ },
329
+ {
330
+ "epoch": 1.33,
331
+ "learning_rate": 4.3327626145634006e-05,
332
+ "loss": 0.5659,
333
+ "step": 26500
334
+ },
335
+ {
336
+ "epoch": 1.36,
337
+ "learning_rate": 4.320173229932521e-05,
338
+ "loss": 0.569,
339
+ "step": 27000
340
+ },
341
+ {
342
+ "epoch": 1.38,
343
+ "learning_rate": 4.307583845301642e-05,
344
+ "loss": 0.5688,
345
+ "step": 27500
346
+ },
347
+ {
348
+ "epoch": 1.41,
349
+ "learning_rate": 4.2949944606707624e-05,
350
+ "loss": 0.5695,
351
+ "step": 28000
352
+ },
353
+ {
354
+ "epoch": 1.44,
355
+ "learning_rate": 4.282405076039883e-05,
356
+ "loss": 0.5659,
357
+ "step": 28500
358
+ },
359
+ {
360
+ "epoch": 1.46,
361
+ "learning_rate": 4.269815691409004e-05,
362
+ "loss": 0.5641,
363
+ "step": 29000
364
+ },
365
+ {
366
+ "epoch": 1.49,
367
+ "learning_rate": 4.257226306778125e-05,
368
+ "loss": 0.5628,
369
+ "step": 29500
370
+ },
371
+ {
372
+ "epoch": 1.51,
373
+ "learning_rate": 4.2446369221472454e-05,
374
+ "loss": 0.5611,
375
+ "step": 30000
376
+ },
377
+ {
378
+ "epoch": 1.54,
379
+ "learning_rate": 4.2320475375163666e-05,
380
+ "loss": 0.5604,
381
+ "step": 30500
382
+ },
383
+ {
384
+ "epoch": 1.56,
385
+ "learning_rate": 4.219458152885487e-05,
386
+ "loss": 0.5647,
387
+ "step": 31000
388
+ },
389
+ {
390
+ "epoch": 1.59,
391
+ "learning_rate": 4.2068687682546085e-05,
392
+ "loss": 0.5629,
393
+ "step": 31500
394
+ },
395
+ {
396
+ "epoch": 1.61,
397
+ "learning_rate": 4.194279383623729e-05,
398
+ "loss": 0.5619,
399
+ "step": 32000
400
+ },
401
+ {
402
+ "epoch": 1.64,
403
+ "learning_rate": 4.181689998992849e-05,
404
+ "loss": 0.5574,
405
+ "step": 32500
406
+ },
407
+ {
408
+ "epoch": 1.66,
409
+ "learning_rate": 4.16910061436197e-05,
410
+ "loss": 0.5568,
411
+ "step": 33000
412
+ },
413
+ {
414
+ "epoch": 1.69,
415
+ "learning_rate": 4.156511229731091e-05,
416
+ "loss": 0.5585,
417
+ "step": 33500
418
+ },
419
+ {
420
+ "epoch": 1.71,
421
+ "learning_rate": 4.1439218451002114e-05,
422
+ "loss": 0.556,
423
+ "step": 34000
424
+ },
425
+ {
426
+ "epoch": 1.74,
427
+ "learning_rate": 4.1313324604693327e-05,
428
+ "loss": 0.5582,
429
+ "step": 34500
430
+ },
431
+ {
432
+ "epoch": 1.76,
433
+ "learning_rate": 4.118743075838453e-05,
434
+ "loss": 0.5541,
435
+ "step": 35000
436
+ },
437
+ {
438
+ "epoch": 1.79,
439
+ "learning_rate": 4.106153691207574e-05,
440
+ "loss": 0.5524,
441
+ "step": 35500
442
+ },
443
+ {
444
+ "epoch": 1.81,
445
+ "learning_rate": 4.093564306576695e-05,
446
+ "loss": 0.5565,
447
+ "step": 36000
448
+ },
449
+ {
450
+ "epoch": 1.84,
451
+ "learning_rate": 4.0809749219458157e-05,
452
+ "loss": 0.5544,
453
+ "step": 36500
454
+ },
455
+ {
456
+ "epoch": 1.86,
457
+ "learning_rate": 4.068385537314936e-05,
458
+ "loss": 0.5534,
459
+ "step": 37000
460
+ },
461
+ {
462
+ "epoch": 1.89,
463
+ "learning_rate": 4.055796152684057e-05,
464
+ "loss": 0.5545,
465
+ "step": 37500
466
+ },
467
+ {
468
+ "epoch": 1.91,
469
+ "learning_rate": 4.0432067680531774e-05,
470
+ "loss": 0.5521,
471
+ "step": 38000
472
+ },
473
+ {
474
+ "epoch": 1.94,
475
+ "learning_rate": 4.030617383422299e-05,
476
+ "loss": 0.5467,
477
+ "step": 38500
478
+ },
479
+ {
480
+ "epoch": 1.96,
481
+ "learning_rate": 4.018027998791419e-05,
482
+ "loss": 0.5512,
483
+ "step": 39000
484
+ },
485
+ {
486
+ "epoch": 1.99,
487
+ "learning_rate": 4.00543861416054e-05,
488
+ "loss": 0.5488,
489
+ "step": 39500
490
+ },
491
+ {
492
+ "epoch": 2.0,
493
+ "eval_loss": 0.6351193785667419,
494
+ "eval_runtime": 51.4996,
495
+ "eval_samples_per_second": 342.779,
496
+ "step": 39716
497
+ },
498
+ {
499
+ "epoch": 2.01,
500
+ "learning_rate": 3.992849229529661e-05,
501
+ "loss": 0.5438,
502
+ "step": 40000
503
+ },
504
+ {
505
+ "epoch": 2.04,
506
+ "learning_rate": 3.980259844898782e-05,
507
+ "loss": 0.5404,
508
+ "step": 40500
509
+ },
510
+ {
511
+ "epoch": 2.06,
512
+ "learning_rate": 3.967670460267902e-05,
513
+ "loss": 0.5412,
514
+ "step": 41000
515
+ },
516
+ {
517
+ "epoch": 2.09,
518
+ "learning_rate": 3.9550810756370235e-05,
519
+ "loss": 0.5414,
520
+ "step": 41500
521
+ },
522
+ {
523
+ "epoch": 2.12,
524
+ "learning_rate": 3.942491691006144e-05,
525
+ "loss": 0.5452,
526
+ "step": 42000
527
+ },
528
+ {
529
+ "epoch": 2.14,
530
+ "learning_rate": 3.929902306375265e-05,
531
+ "loss": 0.5373,
532
+ "step": 42500
533
+ },
534
+ {
535
+ "epoch": 2.17,
536
+ "learning_rate": 3.917312921744385e-05,
537
+ "loss": 0.5373,
538
+ "step": 43000
539
+ },
540
+ {
541
+ "epoch": 2.19,
542
+ "learning_rate": 3.904723537113506e-05,
543
+ "loss": 0.539,
544
+ "step": 43500
545
+ },
546
+ {
547
+ "epoch": 2.22,
548
+ "learning_rate": 3.8921341524826264e-05,
549
+ "loss": 0.5363,
550
+ "step": 44000
551
+ },
552
+ {
553
+ "epoch": 2.24,
554
+ "learning_rate": 3.879544767851748e-05,
555
+ "loss": 0.5376,
556
+ "step": 44500
557
+ },
558
+ {
559
+ "epoch": 2.27,
560
+ "learning_rate": 3.866955383220868e-05,
561
+ "loss": 0.5421,
562
+ "step": 45000
563
+ },
564
+ {
565
+ "epoch": 2.29,
566
+ "learning_rate": 3.8543659985899895e-05,
567
+ "loss": 0.5399,
568
+ "step": 45500
569
+ },
570
+ {
571
+ "epoch": 2.32,
572
+ "learning_rate": 3.84177661395911e-05,
573
+ "loss": 0.5343,
574
+ "step": 46000
575
+ },
576
+ {
577
+ "epoch": 2.34,
578
+ "learning_rate": 3.829187229328231e-05,
579
+ "loss": 0.5365,
580
+ "step": 46500
581
+ },
582
+ {
583
+ "epoch": 2.37,
584
+ "learning_rate": 3.816597844697351e-05,
585
+ "loss": 0.5397,
586
+ "step": 47000
587
+ },
588
+ {
589
+ "epoch": 2.39,
590
+ "learning_rate": 3.804008460066472e-05,
591
+ "loss": 0.5342,
592
+ "step": 47500
593
+ },
594
+ {
595
+ "epoch": 2.42,
596
+ "learning_rate": 3.7914190754355925e-05,
597
+ "loss": 0.5327,
598
+ "step": 48000
599
+ },
600
+ {
601
+ "epoch": 2.44,
602
+ "learning_rate": 3.778829690804714e-05,
603
+ "loss": 0.54,
604
+ "step": 48500
605
+ },
606
+ {
607
+ "epoch": 2.47,
608
+ "learning_rate": 3.766240306173834e-05,
609
+ "loss": 0.5362,
610
+ "step": 49000
611
+ },
612
+ {
613
+ "epoch": 2.49,
614
+ "learning_rate": 3.753650921542955e-05,
615
+ "loss": 0.5351,
616
+ "step": 49500
617
+ },
618
+ {
619
+ "epoch": 2.52,
620
+ "learning_rate": 3.741061536912076e-05,
621
+ "loss": 0.5351,
622
+ "step": 50000
623
+ },
624
+ {
625
+ "epoch": 2.54,
626
+ "learning_rate": 3.728472152281197e-05,
627
+ "loss": 0.5284,
628
+ "step": 50500
629
+ },
630
+ {
631
+ "epoch": 2.57,
632
+ "learning_rate": 3.715882767650317e-05,
633
+ "loss": 0.5345,
634
+ "step": 51000
635
+ },
636
+ {
637
+ "epoch": 2.59,
638
+ "learning_rate": 3.7032933830194386e-05,
639
+ "loss": 0.5352,
640
+ "step": 51500
641
+ },
642
+ {
643
+ "epoch": 2.62,
644
+ "learning_rate": 3.690703998388559e-05,
645
+ "loss": 0.5328,
646
+ "step": 52000
647
+ },
648
+ {
649
+ "epoch": 2.64,
650
+ "learning_rate": 3.67811461375768e-05,
651
+ "loss": 0.5362,
652
+ "step": 52500
653
+ },
654
+ {
655
+ "epoch": 2.67,
656
+ "learning_rate": 3.6655252291268e-05,
657
+ "loss": 0.5304,
658
+ "step": 53000
659
+ },
660
+ {
661
+ "epoch": 2.69,
662
+ "learning_rate": 3.652935844495921e-05,
663
+ "loss": 0.5322,
664
+ "step": 53500
665
+ },
666
+ {
667
+ "epoch": 2.72,
668
+ "learning_rate": 3.640346459865042e-05,
669
+ "loss": 0.5295,
670
+ "step": 54000
671
+ },
672
+ {
673
+ "epoch": 2.74,
674
+ "learning_rate": 3.627757075234163e-05,
675
+ "loss": 0.5284,
676
+ "step": 54500
677
+ },
678
+ {
679
+ "epoch": 2.77,
680
+ "learning_rate": 3.615167690603283e-05,
681
+ "loss": 0.5318,
682
+ "step": 55000
683
+ },
684
+ {
685
+ "epoch": 2.79,
686
+ "learning_rate": 3.6025783059724046e-05,
687
+ "loss": 0.5312,
688
+ "step": 55500
689
+ },
690
+ {
691
+ "epoch": 2.82,
692
+ "learning_rate": 3.589988921341525e-05,
693
+ "loss": 0.5282,
694
+ "step": 56000
695
+ },
696
+ {
697
+ "epoch": 2.85,
698
+ "learning_rate": 3.577399536710646e-05,
699
+ "loss": 0.5292,
700
+ "step": 56500
701
+ },
702
+ {
703
+ "epoch": 2.87,
704
+ "learning_rate": 3.564810152079766e-05,
705
+ "loss": 0.5289,
706
+ "step": 57000
707
+ },
708
+ {
709
+ "epoch": 2.9,
710
+ "learning_rate": 3.552220767448887e-05,
711
+ "loss": 0.5293,
712
+ "step": 57500
713
+ },
714
+ {
715
+ "epoch": 2.92,
716
+ "learning_rate": 3.5396313828180075e-05,
717
+ "loss": 0.5241,
718
+ "step": 58000
719
+ },
720
+ {
721
+ "epoch": 2.95,
722
+ "learning_rate": 3.527041998187129e-05,
723
+ "loss": 0.5275,
724
+ "step": 58500
725
+ },
726
+ {
727
+ "epoch": 2.97,
728
+ "learning_rate": 3.5144526135562493e-05,
729
+ "loss": 0.5335,
730
+ "step": 59000
731
+ },
732
+ {
733
+ "epoch": 3.0,
734
+ "learning_rate": 3.5018632289253706e-05,
735
+ "loss": 0.5244,
736
+ "step": 59500
737
+ },
738
+ {
739
+ "epoch": 3.0,
740
+ "eval_loss": 0.6179068088531494,
741
+ "eval_runtime": 51.415,
742
+ "eval_samples_per_second": 343.344,
743
+ "step": 59574
744
+ },
745
+ {
746
+ "epoch": 3.02,
747
+ "learning_rate": 3.489273844294491e-05,
748
+ "loss": 0.5182,
749
+ "step": 60000
750
+ },
751
+ {
752
+ "epoch": 3.05,
753
+ "learning_rate": 3.476684459663612e-05,
754
+ "loss": 0.5147,
755
+ "step": 60500
756
+ },
757
+ {
758
+ "epoch": 3.07,
759
+ "learning_rate": 3.464095075032733e-05,
760
+ "loss": 0.5187,
761
+ "step": 61000
762
+ },
763
+ {
764
+ "epoch": 3.1,
765
+ "learning_rate": 3.4515056904018536e-05,
766
+ "loss": 0.5201,
767
+ "step": 61500
768
+ },
769
+ {
770
+ "epoch": 3.12,
771
+ "learning_rate": 3.438916305770974e-05,
772
+ "loss": 0.5199,
773
+ "step": 62000
774
+ },
775
+ {
776
+ "epoch": 3.15,
777
+ "learning_rate": 3.426326921140095e-05,
778
+ "loss": 0.5239,
779
+ "step": 62500
780
+ },
781
+ {
782
+ "epoch": 3.17,
783
+ "learning_rate": 3.4137375365092154e-05,
784
+ "loss": 0.5176,
785
+ "step": 63000
786
+ },
787
+ {
788
+ "epoch": 3.2,
789
+ "learning_rate": 3.401148151878336e-05,
790
+ "loss": 0.5199,
791
+ "step": 63500
792
+ },
793
+ {
794
+ "epoch": 3.22,
795
+ "learning_rate": 3.388558767247457e-05,
796
+ "loss": 0.5216,
797
+ "step": 64000
798
+ },
799
+ {
800
+ "epoch": 3.25,
801
+ "learning_rate": 3.375969382616578e-05,
802
+ "loss": 0.5152,
803
+ "step": 64500
804
+ },
805
+ {
806
+ "epoch": 3.27,
807
+ "learning_rate": 3.3633799979856984e-05,
808
+ "loss": 0.5176,
809
+ "step": 65000
810
+ },
811
+ {
812
+ "epoch": 3.3,
813
+ "learning_rate": 3.3507906133548196e-05,
814
+ "loss": 0.5155,
815
+ "step": 65500
816
+ },
817
+ {
818
+ "epoch": 3.32,
819
+ "learning_rate": 3.33820122872394e-05,
820
+ "loss": 0.5151,
821
+ "step": 66000
822
+ },
823
+ {
824
+ "epoch": 3.35,
825
+ "learning_rate": 3.3256118440930615e-05,
826
+ "loss": 0.5176,
827
+ "step": 66500
828
+ },
829
+ {
830
+ "epoch": 3.37,
831
+ "learning_rate": 3.3130224594621814e-05,
832
+ "loss": 0.5144,
833
+ "step": 67000
834
+ },
835
+ {
836
+ "epoch": 3.4,
837
+ "learning_rate": 3.300433074831302e-05,
838
+ "loss": 0.5172,
839
+ "step": 67500
840
+ },
841
+ {
842
+ "epoch": 3.42,
843
+ "learning_rate": 3.287843690200423e-05,
844
+ "loss": 0.5173,
845
+ "step": 68000
846
+ },
847
+ {
848
+ "epoch": 3.45,
849
+ "learning_rate": 3.275254305569544e-05,
850
+ "loss": 0.5172,
851
+ "step": 68500
852
+ },
853
+ {
854
+ "epoch": 3.47,
855
+ "learning_rate": 3.2626649209386644e-05,
856
+ "loss": 0.5124,
857
+ "step": 69000
858
+ },
859
+ {
860
+ "epoch": 3.5,
861
+ "learning_rate": 3.2500755363077856e-05,
862
+ "loss": 0.518,
863
+ "step": 69500
864
+ },
865
+ {
866
+ "epoch": 3.53,
867
+ "learning_rate": 3.237486151676906e-05,
868
+ "loss": 0.5164,
869
+ "step": 70000
870
+ },
871
+ {
872
+ "epoch": 3.55,
873
+ "learning_rate": 3.224896767046027e-05,
874
+ "loss": 0.5175,
875
+ "step": 70500
876
+ },
877
+ {
878
+ "epoch": 3.58,
879
+ "learning_rate": 3.212307382415148e-05,
880
+ "loss": 0.5156,
881
+ "step": 71000
882
+ },
883
+ {
884
+ "epoch": 3.6,
885
+ "learning_rate": 3.1997179977842687e-05,
886
+ "loss": 0.5164,
887
+ "step": 71500
888
+ },
889
+ {
890
+ "epoch": 3.63,
891
+ "learning_rate": 3.187128613153389e-05,
892
+ "loss": 0.5165,
893
+ "step": 72000
894
+ },
895
+ {
896
+ "epoch": 3.65,
897
+ "learning_rate": 3.17453922852251e-05,
898
+ "loss": 0.5171,
899
+ "step": 72500
900
+ },
901
+ {
902
+ "epoch": 3.68,
903
+ "learning_rate": 3.1619498438916304e-05,
904
+ "loss": 0.5133,
905
+ "step": 73000
906
+ },
907
+ {
908
+ "epoch": 3.7,
909
+ "learning_rate": 3.149360459260752e-05,
910
+ "loss": 0.5157,
911
+ "step": 73500
912
+ },
913
+ {
914
+ "epoch": 3.73,
915
+ "learning_rate": 3.136771074629872e-05,
916
+ "loss": 0.5116,
917
+ "step": 74000
918
+ },
919
+ {
920
+ "epoch": 3.75,
921
+ "learning_rate": 3.124181689998993e-05,
922
+ "loss": 0.5137,
923
+ "step": 74500
924
+ },
925
+ {
926
+ "epoch": 3.78,
927
+ "learning_rate": 3.111592305368114e-05,
928
+ "loss": 0.5154,
929
+ "step": 75000
930
+ },
931
+ {
932
+ "epoch": 3.8,
933
+ "learning_rate": 3.099002920737235e-05,
934
+ "loss": 0.5147,
935
+ "step": 75500
936
+ },
937
+ {
938
+ "epoch": 3.83,
939
+ "learning_rate": 3.086413536106355e-05,
940
+ "loss": 0.5158,
941
+ "step": 76000
942
+ },
943
+ {
944
+ "epoch": 3.85,
945
+ "learning_rate": 3.0738241514754765e-05,
946
+ "loss": 0.5139,
947
+ "step": 76500
948
+ },
949
+ {
950
+ "epoch": 3.88,
951
+ "learning_rate": 3.0612347668445964e-05,
952
+ "loss": 0.5162,
953
+ "step": 77000
954
+ },
955
+ {
956
+ "epoch": 3.9,
957
+ "learning_rate": 3.0486453822137173e-05,
958
+ "loss": 0.5133,
959
+ "step": 77500
960
+ },
961
+ {
962
+ "epoch": 3.93,
963
+ "learning_rate": 3.0360559975828383e-05,
964
+ "loss": 0.5126,
965
+ "step": 78000
966
+ },
967
+ {
968
+ "epoch": 3.95,
969
+ "learning_rate": 3.023466612951959e-05,
970
+ "loss": 0.5128,
971
+ "step": 78500
972
+ },
973
+ {
974
+ "epoch": 3.98,
975
+ "learning_rate": 3.0108772283210794e-05,
976
+ "loss": 0.5135,
977
+ "step": 79000
978
+ },
979
+ {
980
+ "epoch": 4.0,
981
+ "eval_loss": 0.6064777374267578,
982
+ "eval_runtime": 51.6259,
983
+ "eval_samples_per_second": 341.941,
984
+ "step": 79432
985
+ },
986
+ {
987
+ "epoch": 4.0,
988
+ "learning_rate": 2.9982878436902007e-05,
989
+ "loss": 0.507,
990
+ "step": 79500
991
+ },
992
+ {
993
+ "epoch": 4.03,
994
+ "learning_rate": 2.9856984590593213e-05,
995
+ "loss": 0.4969,
996
+ "step": 80000
997
+ },
998
+ {
999
+ "epoch": 4.05,
1000
+ "learning_rate": 2.9731090744284422e-05,
1001
+ "loss": 0.5011,
1002
+ "step": 80500
1003
+ },
1004
+ {
1005
+ "epoch": 4.08,
1006
+ "learning_rate": 2.9605196897975628e-05,
1007
+ "loss": 0.5023,
1008
+ "step": 81000
1009
+ },
1010
+ {
1011
+ "epoch": 4.1,
1012
+ "learning_rate": 2.9479303051666834e-05,
1013
+ "loss": 0.5025,
1014
+ "step": 81500
1015
+ },
1016
+ {
1017
+ "epoch": 4.13,
1018
+ "learning_rate": 2.9353409205358046e-05,
1019
+ "loss": 0.5032,
1020
+ "step": 82000
1021
+ },
1022
+ {
1023
+ "epoch": 4.15,
1024
+ "learning_rate": 2.9227515359049252e-05,
1025
+ "loss": 0.5036,
1026
+ "step": 82500
1027
+ },
1028
+ {
1029
+ "epoch": 4.18,
1030
+ "learning_rate": 2.9101621512740458e-05,
1031
+ "loss": 0.5004,
1032
+ "step": 83000
1033
+ },
1034
+ {
1035
+ "epoch": 4.2,
1036
+ "learning_rate": 2.8975727666431667e-05,
1037
+ "loss": 0.5041,
1038
+ "step": 83500
1039
+ },
1040
+ {
1041
+ "epoch": 4.23,
1042
+ "learning_rate": 2.8849833820122873e-05,
1043
+ "loss": 0.505,
1044
+ "step": 84000
1045
+ },
1046
+ {
1047
+ "epoch": 4.26,
1048
+ "learning_rate": 2.872393997381408e-05,
1049
+ "loss": 0.5072,
1050
+ "step": 84500
1051
+ },
1052
+ {
1053
+ "epoch": 4.28,
1054
+ "learning_rate": 2.859804612750529e-05,
1055
+ "loss": 0.5012,
1056
+ "step": 85000
1057
+ },
1058
+ {
1059
+ "epoch": 4.31,
1060
+ "learning_rate": 2.8472152281196497e-05,
1061
+ "loss": 0.505,
1062
+ "step": 85500
1063
+ },
1064
+ {
1065
+ "epoch": 4.33,
1066
+ "learning_rate": 2.8346258434887703e-05,
1067
+ "loss": 0.5039,
1068
+ "step": 86000
1069
+ },
1070
+ {
1071
+ "epoch": 4.36,
1072
+ "learning_rate": 2.8220364588578912e-05,
1073
+ "loss": 0.5006,
1074
+ "step": 86500
1075
+ },
1076
+ {
1077
+ "epoch": 4.38,
1078
+ "learning_rate": 2.8094470742270118e-05,
1079
+ "loss": 0.5061,
1080
+ "step": 87000
1081
+ },
1082
+ {
1083
+ "epoch": 4.41,
1084
+ "learning_rate": 2.796857689596133e-05,
1085
+ "loss": 0.5018,
1086
+ "step": 87500
1087
+ },
1088
+ {
1089
+ "epoch": 4.43,
1090
+ "learning_rate": 2.7842683049652536e-05,
1091
+ "loss": 0.5009,
1092
+ "step": 88000
1093
+ },
1094
+ {
1095
+ "epoch": 4.46,
1096
+ "learning_rate": 2.771678920334374e-05,
1097
+ "loss": 0.503,
1098
+ "step": 88500
1099
+ },
1100
+ {
1101
+ "epoch": 4.48,
1102
+ "learning_rate": 2.759089535703495e-05,
1103
+ "loss": 0.5018,
1104
+ "step": 89000
1105
+ },
1106
+ {
1107
+ "epoch": 4.51,
1108
+ "learning_rate": 2.7465001510726157e-05,
1109
+ "loss": 0.502,
1110
+ "step": 89500
1111
+ },
1112
+ {
1113
+ "epoch": 4.53,
1114
+ "learning_rate": 2.7339107664417363e-05,
1115
+ "loss": 0.5039,
1116
+ "step": 90000
1117
+ },
1118
+ {
1119
+ "epoch": 4.56,
1120
+ "learning_rate": 2.7213213818108572e-05,
1121
+ "loss": 0.5044,
1122
+ "step": 90500
1123
+ },
1124
+ {
1125
+ "epoch": 4.58,
1126
+ "learning_rate": 2.7087319971799778e-05,
1127
+ "loss": 0.4998,
1128
+ "step": 91000
1129
+ },
1130
+ {
1131
+ "epoch": 4.61,
1132
+ "learning_rate": 2.6961426125490984e-05,
1133
+ "loss": 0.5046,
1134
+ "step": 91500
1135
+ },
1136
+ {
1137
+ "epoch": 4.63,
1138
+ "learning_rate": 2.6835532279182197e-05,
1139
+ "loss": 0.5039,
1140
+ "step": 92000
1141
+ },
1142
+ {
1143
+ "epoch": 4.66,
1144
+ "learning_rate": 2.6709638432873402e-05,
1145
+ "loss": 0.5024,
1146
+ "step": 92500
1147
+ },
1148
+ {
1149
+ "epoch": 4.68,
1150
+ "learning_rate": 2.6583744586564608e-05,
1151
+ "loss": 0.503,
1152
+ "step": 93000
1153
+ },
1154
+ {
1155
+ "epoch": 4.71,
1156
+ "learning_rate": 2.6457850740255818e-05,
1157
+ "loss": 0.5007,
1158
+ "step": 93500
1159
+ },
1160
+ {
1161
+ "epoch": 4.73,
1162
+ "learning_rate": 2.6331956893947023e-05,
1163
+ "loss": 0.5027,
1164
+ "step": 94000
1165
+ },
1166
+ {
1167
+ "epoch": 4.76,
1168
+ "learning_rate": 2.6206063047638236e-05,
1169
+ "loss": 0.5081,
1170
+ "step": 94500
1171
+ },
1172
+ {
1173
+ "epoch": 4.78,
1174
+ "learning_rate": 2.6080169201329442e-05,
1175
+ "loss": 0.5002,
1176
+ "step": 95000
1177
+ },
1178
+ {
1179
+ "epoch": 4.81,
1180
+ "learning_rate": 2.5954275355020648e-05,
1181
+ "loss": 0.5045,
1182
+ "step": 95500
1183
+ },
1184
+ {
1185
+ "epoch": 4.83,
1186
+ "learning_rate": 2.5828381508711857e-05,
1187
+ "loss": 0.4986,
1188
+ "step": 96000
1189
+ },
1190
+ {
1191
+ "epoch": 4.86,
1192
+ "learning_rate": 2.5702487662403063e-05,
1193
+ "loss": 0.5026,
1194
+ "step": 96500
1195
+ },
1196
+ {
1197
+ "epoch": 4.88,
1198
+ "learning_rate": 2.557659381609427e-05,
1199
+ "loss": 0.502,
1200
+ "step": 97000
1201
+ },
1202
+ {
1203
+ "epoch": 4.91,
1204
+ "learning_rate": 2.545069996978548e-05,
1205
+ "loss": 0.5016,
1206
+ "step": 97500
1207
+ },
1208
+ {
1209
+ "epoch": 4.94,
1210
+ "learning_rate": 2.5324806123476687e-05,
1211
+ "loss": 0.5056,
1212
+ "step": 98000
1213
+ },
1214
+ {
1215
+ "epoch": 4.96,
1216
+ "learning_rate": 2.519891227716789e-05,
1217
+ "loss": 0.4983,
1218
+ "step": 98500
1219
+ },
1220
+ {
1221
+ "epoch": 4.99,
1222
+ "learning_rate": 2.5073018430859102e-05,
1223
+ "loss": 0.5002,
1224
+ "step": 99000
1225
+ },
1226
+ {
1227
+ "epoch": 5.0,
1228
+ "eval_loss": 0.6010987758636475,
1229
+ "eval_runtime": 51.6361,
1230
+ "eval_samples_per_second": 341.873,
1231
+ "step": 99290
1232
+ },
1233
+ {
1234
+ "epoch": 5.01,
1235
+ "learning_rate": 2.4947124584550308e-05,
1236
+ "loss": 0.4916,
1237
+ "step": 99500
1238
+ },
1239
+ {
1240
+ "epoch": 5.04,
1241
+ "learning_rate": 2.4821230738241517e-05,
1242
+ "loss": 0.4912,
1243
+ "step": 100000
1244
+ },
1245
+ {
1246
+ "epoch": 5.06,
1247
+ "learning_rate": 2.4695336891932723e-05,
1248
+ "loss": 0.4941,
1249
+ "step": 100500
1250
+ },
1251
+ {
1252
+ "epoch": 5.09,
1253
+ "learning_rate": 2.456944304562393e-05,
1254
+ "loss": 0.4876,
1255
+ "step": 101000
1256
+ },
1257
+ {
1258
+ "epoch": 5.11,
1259
+ "learning_rate": 2.4443549199315138e-05,
1260
+ "loss": 0.4942,
1261
+ "step": 101500
1262
+ },
1263
+ {
1264
+ "epoch": 5.14,
1265
+ "learning_rate": 2.4317655353006347e-05,
1266
+ "loss": 0.4937,
1267
+ "step": 102000
1268
+ },
1269
+ {
1270
+ "epoch": 5.16,
1271
+ "learning_rate": 2.4191761506697556e-05,
1272
+ "loss": 0.4902,
1273
+ "step": 102500
1274
+ },
1275
+ {
1276
+ "epoch": 5.19,
1277
+ "learning_rate": 2.4065867660388762e-05,
1278
+ "loss": 0.4909,
1279
+ "step": 103000
1280
+ },
1281
+ {
1282
+ "epoch": 5.21,
1283
+ "learning_rate": 2.3939973814079968e-05,
1284
+ "loss": 0.4933,
1285
+ "step": 103500
1286
+ },
1287
+ {
1288
+ "epoch": 5.24,
1289
+ "learning_rate": 2.3814079967771177e-05,
1290
+ "loss": 0.4935,
1291
+ "step": 104000
1292
+ },
1293
+ {
1294
+ "epoch": 5.26,
1295
+ "learning_rate": 2.3688186121462383e-05,
1296
+ "loss": 0.4951,
1297
+ "step": 104500
1298
+ },
1299
+ {
1300
+ "epoch": 5.29,
1301
+ "learning_rate": 2.3562292275153592e-05,
1302
+ "loss": 0.4913,
1303
+ "step": 105000
1304
+ },
1305
+ {
1306
+ "epoch": 5.31,
1307
+ "learning_rate": 2.3436398428844798e-05,
1308
+ "loss": 0.4973,
1309
+ "step": 105500
1310
+ },
1311
+ {
1312
+ "epoch": 5.34,
1313
+ "learning_rate": 2.3310504582536007e-05,
1314
+ "loss": 0.49,
1315
+ "step": 106000
1316
+ },
1317
+ {
1318
+ "epoch": 5.36,
1319
+ "learning_rate": 2.3184610736227213e-05,
1320
+ "loss": 0.4897,
1321
+ "step": 106500
1322
+ },
1323
+ {
1324
+ "epoch": 5.39,
1325
+ "learning_rate": 2.3058716889918422e-05,
1326
+ "loss": 0.4929,
1327
+ "step": 107000
1328
+ },
1329
+ {
1330
+ "epoch": 5.41,
1331
+ "learning_rate": 2.293282304360963e-05,
1332
+ "loss": 0.4876,
1333
+ "step": 107500
1334
+ },
1335
+ {
1336
+ "epoch": 5.44,
1337
+ "learning_rate": 2.2806929197300837e-05,
1338
+ "loss": 0.4951,
1339
+ "step": 108000
1340
+ },
1341
+ {
1342
+ "epoch": 5.46,
1343
+ "learning_rate": 2.2681035350992043e-05,
1344
+ "loss": 0.4913,
1345
+ "step": 108500
1346
+ },
1347
+ {
1348
+ "epoch": 5.49,
1349
+ "learning_rate": 2.2555141504683252e-05,
1350
+ "loss": 0.4952,
1351
+ "step": 109000
1352
+ },
1353
+ {
1354
+ "epoch": 5.51,
1355
+ "learning_rate": 2.242924765837446e-05,
1356
+ "loss": 0.4894,
1357
+ "step": 109500
1358
+ },
1359
+ {
1360
+ "epoch": 5.54,
1361
+ "learning_rate": 2.2303353812065667e-05,
1362
+ "loss": 0.4936,
1363
+ "step": 110000
1364
+ },
1365
+ {
1366
+ "epoch": 5.56,
1367
+ "learning_rate": 2.2177459965756873e-05,
1368
+ "loss": 0.488,
1369
+ "step": 110500
1370
+ },
1371
+ {
1372
+ "epoch": 5.59,
1373
+ "learning_rate": 2.2051566119448082e-05,
1374
+ "loss": 0.4908,
1375
+ "step": 111000
1376
+ },
1377
+ {
1378
+ "epoch": 5.61,
1379
+ "learning_rate": 2.192567227313929e-05,
1380
+ "loss": 0.4935,
1381
+ "step": 111500
1382
+ },
1383
+ {
1384
+ "epoch": 5.64,
1385
+ "learning_rate": 2.1799778426830498e-05,
1386
+ "loss": 0.4883,
1387
+ "step": 112000
1388
+ },
1389
+ {
1390
+ "epoch": 5.67,
1391
+ "learning_rate": 2.1673884580521707e-05,
1392
+ "loss": 0.4912,
1393
+ "step": 112500
1394
+ },
1395
+ {
1396
+ "epoch": 5.69,
1397
+ "learning_rate": 2.1547990734212913e-05,
1398
+ "loss": 0.4903,
1399
+ "step": 113000
1400
+ },
1401
+ {
1402
+ "epoch": 5.72,
1403
+ "learning_rate": 2.142209688790412e-05,
1404
+ "loss": 0.4914,
1405
+ "step": 113500
1406
+ },
1407
+ {
1408
+ "epoch": 5.74,
1409
+ "learning_rate": 2.1296203041595328e-05,
1410
+ "loss": 0.489,
1411
+ "step": 114000
1412
+ },
1413
+ {
1414
+ "epoch": 5.77,
1415
+ "learning_rate": 2.1170309195286537e-05,
1416
+ "loss": 0.4929,
1417
+ "step": 114500
1418
+ },
1419
+ {
1420
+ "epoch": 5.79,
1421
+ "learning_rate": 2.1044415348977743e-05,
1422
+ "loss": 0.4892,
1423
+ "step": 115000
1424
+ },
1425
+ {
1426
+ "epoch": 5.82,
1427
+ "learning_rate": 2.091852150266895e-05,
1428
+ "loss": 0.4854,
1429
+ "step": 115500
1430
+ },
1431
+ {
1432
+ "epoch": 5.84,
1433
+ "learning_rate": 2.0792627656360158e-05,
1434
+ "loss": 0.4938,
1435
+ "step": 116000
1436
+ },
1437
+ {
1438
+ "epoch": 5.87,
1439
+ "learning_rate": 2.0666733810051367e-05,
1440
+ "loss": 0.4916,
1441
+ "step": 116500
1442
+ },
1443
+ {
1444
+ "epoch": 5.89,
1445
+ "learning_rate": 2.0540839963742573e-05,
1446
+ "loss": 0.4945,
1447
+ "step": 117000
1448
+ },
1449
+ {
1450
+ "epoch": 5.92,
1451
+ "learning_rate": 2.0414946117433782e-05,
1452
+ "loss": 0.4901,
1453
+ "step": 117500
1454
+ },
1455
+ {
1456
+ "epoch": 5.94,
1457
+ "learning_rate": 2.0289052271124988e-05,
1458
+ "loss": 0.4922,
1459
+ "step": 118000
1460
+ },
1461
+ {
1462
+ "epoch": 5.97,
1463
+ "learning_rate": 2.0163158424816194e-05,
1464
+ "loss": 0.4903,
1465
+ "step": 118500
1466
+ },
1467
+ {
1468
+ "epoch": 5.99,
1469
+ "learning_rate": 2.0037264578507403e-05,
1470
+ "loss": 0.4911,
1471
+ "step": 119000
1472
+ },
1473
+ {
1474
+ "epoch": 6.0,
1475
+ "eval_loss": 0.5948862433433533,
1476
+ "eval_runtime": 51.885,
1477
+ "eval_samples_per_second": 340.233,
1478
+ "step": 119148
1479
+ },
1480
+ {
1481
+ "epoch": 6.02,
1482
+ "learning_rate": 1.9911370732198612e-05,
1483
+ "loss": 0.4846,
1484
+ "step": 119500
1485
+ },
1486
+ {
1487
+ "epoch": 6.04,
1488
+ "learning_rate": 1.978547688588982e-05,
1489
+ "loss": 0.4849,
1490
+ "step": 120000
1491
+ },
1492
+ {
1493
+ "epoch": 6.07,
1494
+ "learning_rate": 1.9659583039581027e-05,
1495
+ "loss": 0.4815,
1496
+ "step": 120500
1497
+ },
1498
+ {
1499
+ "epoch": 6.09,
1500
+ "learning_rate": 1.9533689193272233e-05,
1501
+ "loss": 0.4835,
1502
+ "step": 121000
1503
+ },
1504
+ {
1505
+ "epoch": 6.12,
1506
+ "learning_rate": 1.9407795346963442e-05,
1507
+ "loss": 0.4802,
1508
+ "step": 121500
1509
+ },
1510
+ {
1511
+ "epoch": 6.14,
1512
+ "learning_rate": 1.9281901500654648e-05,
1513
+ "loss": 0.482,
1514
+ "step": 122000
1515
+ },
1516
+ {
1517
+ "epoch": 6.17,
1518
+ "learning_rate": 1.9156007654345857e-05,
1519
+ "loss": 0.4865,
1520
+ "step": 122500
1521
+ },
1522
+ {
1523
+ "epoch": 6.19,
1524
+ "learning_rate": 1.9030113808037063e-05,
1525
+ "loss": 0.4846,
1526
+ "step": 123000
1527
+ },
1528
+ {
1529
+ "epoch": 6.22,
1530
+ "learning_rate": 1.8904219961728272e-05,
1531
+ "loss": 0.4826,
1532
+ "step": 123500
1533
+ },
1534
+ {
1535
+ "epoch": 6.24,
1536
+ "learning_rate": 1.8778326115419478e-05,
1537
+ "loss": 0.4782,
1538
+ "step": 124000
1539
+ },
1540
+ {
1541
+ "epoch": 6.27,
1542
+ "learning_rate": 1.8652432269110687e-05,
1543
+ "loss": 0.4839,
1544
+ "step": 124500
1545
+ },
1546
+ {
1547
+ "epoch": 6.29,
1548
+ "learning_rate": 1.8526538422801896e-05,
1549
+ "loss": 0.485,
1550
+ "step": 125000
1551
+ },
1552
+ {
1553
+ "epoch": 6.32,
1554
+ "learning_rate": 1.8400644576493102e-05,
1555
+ "loss": 0.4803,
1556
+ "step": 125500
1557
+ },
1558
+ {
1559
+ "epoch": 6.35,
1560
+ "learning_rate": 1.8274750730184308e-05,
1561
+ "loss": 0.4792,
1562
+ "step": 126000
1563
+ },
1564
+ {
1565
+ "epoch": 6.37,
1566
+ "learning_rate": 1.8148856883875517e-05,
1567
+ "loss": 0.4859,
1568
+ "step": 126500
1569
+ },
1570
+ {
1571
+ "epoch": 6.4,
1572
+ "learning_rate": 1.8022963037566727e-05,
1573
+ "loss": 0.4847,
1574
+ "step": 127000
1575
+ },
1576
+ {
1577
+ "epoch": 6.42,
1578
+ "learning_rate": 1.7897069191257932e-05,
1579
+ "loss": 0.4832,
1580
+ "step": 127500
1581
+ },
1582
+ {
1583
+ "epoch": 6.45,
1584
+ "learning_rate": 1.7771175344949138e-05,
1585
+ "loss": 0.4865,
1586
+ "step": 128000
1587
+ },
1588
+ {
1589
+ "epoch": 6.47,
1590
+ "learning_rate": 1.7645281498640347e-05,
1591
+ "loss": 0.4836,
1592
+ "step": 128500
1593
+ },
1594
+ {
1595
+ "epoch": 6.5,
1596
+ "learning_rate": 1.7519387652331553e-05,
1597
+ "loss": 0.4855,
1598
+ "step": 129000
1599
+ },
1600
+ {
1601
+ "epoch": 6.52,
1602
+ "learning_rate": 1.7393493806022763e-05,
1603
+ "loss": 0.4806,
1604
+ "step": 129500
1605
+ },
1606
+ {
1607
+ "epoch": 6.55,
1608
+ "learning_rate": 1.7267599959713972e-05,
1609
+ "loss": 0.4801,
1610
+ "step": 130000
1611
+ },
1612
+ {
1613
+ "epoch": 6.57,
1614
+ "learning_rate": 1.7141706113405178e-05,
1615
+ "loss": 0.4807,
1616
+ "step": 130500
1617
+ },
1618
+ {
1619
+ "epoch": 6.6,
1620
+ "learning_rate": 1.7015812267096383e-05,
1621
+ "loss": 0.4832,
1622
+ "step": 131000
1623
+ },
1624
+ {
1625
+ "epoch": 6.62,
1626
+ "learning_rate": 1.6889918420787593e-05,
1627
+ "loss": 0.4803,
1628
+ "step": 131500
1629
+ },
1630
+ {
1631
+ "epoch": 6.65,
1632
+ "learning_rate": 1.6764024574478802e-05,
1633
+ "loss": 0.4821,
1634
+ "step": 132000
1635
+ },
1636
+ {
1637
+ "epoch": 6.67,
1638
+ "learning_rate": 1.6638130728170008e-05,
1639
+ "loss": 0.4792,
1640
+ "step": 132500
1641
+ },
1642
+ {
1643
+ "epoch": 6.7,
1644
+ "learning_rate": 1.6512236881861213e-05,
1645
+ "loss": 0.4827,
1646
+ "step": 133000
1647
+ },
1648
+ {
1649
+ "epoch": 6.72,
1650
+ "learning_rate": 1.6386343035552423e-05,
1651
+ "loss": 0.4821,
1652
+ "step": 133500
1653
+ },
1654
+ {
1655
+ "epoch": 6.75,
1656
+ "learning_rate": 1.6260449189243632e-05,
1657
+ "loss": 0.4754,
1658
+ "step": 134000
1659
+ },
1660
+ {
1661
+ "epoch": 6.77,
1662
+ "learning_rate": 1.6134555342934838e-05,
1663
+ "loss": 0.4822,
1664
+ "step": 134500
1665
+ },
1666
+ {
1667
+ "epoch": 6.8,
1668
+ "learning_rate": 1.6008661496626047e-05,
1669
+ "loss": 0.4813,
1670
+ "step": 135000
1671
+ },
1672
+ {
1673
+ "epoch": 6.82,
1674
+ "learning_rate": 1.5882767650317253e-05,
1675
+ "loss": 0.4802,
1676
+ "step": 135500
1677
+ },
1678
+ {
1679
+ "epoch": 6.85,
1680
+ "learning_rate": 1.575687380400846e-05,
1681
+ "loss": 0.4835,
1682
+ "step": 136000
1683
+ },
1684
+ {
1685
+ "epoch": 6.87,
1686
+ "learning_rate": 1.5630979957699668e-05,
1687
+ "loss": 0.4842,
1688
+ "step": 136500
1689
+ },
1690
+ {
1691
+ "epoch": 6.9,
1692
+ "learning_rate": 1.5505086111390877e-05,
1693
+ "loss": 0.4867,
1694
+ "step": 137000
1695
+ },
1696
+ {
1697
+ "epoch": 6.92,
1698
+ "learning_rate": 1.5379192265082086e-05,
1699
+ "loss": 0.4832,
1700
+ "step": 137500
1701
+ },
1702
+ {
1703
+ "epoch": 6.95,
1704
+ "learning_rate": 1.525329841877329e-05,
1705
+ "loss": 0.4835,
1706
+ "step": 138000
1707
+ },
1708
+ {
1709
+ "epoch": 6.97,
1710
+ "learning_rate": 1.5127404572464498e-05,
1711
+ "loss": 0.4813,
1712
+ "step": 138500
1713
+ },
1714
+ {
1715
+ "epoch": 7.0,
1716
+ "learning_rate": 1.5001510726155707e-05,
1717
+ "loss": 0.4799,
1718
+ "step": 139000
1719
+ },
1720
+ {
1721
+ "epoch": 7.0,
1722
+ "eval_loss": 0.5898594856262207,
1723
+ "eval_runtime": 51.3831,
1724
+ "eval_samples_per_second": 343.557,
1725
+ "step": 139006
1726
+ },
1727
+ {
1728
+ "epoch": 7.02,
1729
+ "learning_rate": 1.4875616879846913e-05,
1730
+ "loss": 0.4741,
1731
+ "step": 139500
1732
+ },
1733
+ {
1734
+ "epoch": 7.05,
1735
+ "learning_rate": 1.474972303353812e-05,
1736
+ "loss": 0.4743,
1737
+ "step": 140000
1738
+ },
1739
+ {
1740
+ "epoch": 7.08,
1741
+ "learning_rate": 1.462382918722933e-05,
1742
+ "loss": 0.477,
1743
+ "step": 140500
1744
+ },
1745
+ {
1746
+ "epoch": 7.1,
1747
+ "learning_rate": 1.4497935340920537e-05,
1748
+ "loss": 0.476,
1749
+ "step": 141000
1750
+ },
1751
+ {
1752
+ "epoch": 7.13,
1753
+ "learning_rate": 1.4372041494611743e-05,
1754
+ "loss": 0.474,
1755
+ "step": 141500
1756
+ },
1757
+ {
1758
+ "epoch": 7.15,
1759
+ "learning_rate": 1.4246147648302952e-05,
1760
+ "loss": 0.4761,
1761
+ "step": 142000
1762
+ },
1763
+ {
1764
+ "epoch": 7.18,
1765
+ "learning_rate": 1.412025380199416e-05,
1766
+ "loss": 0.4757,
1767
+ "step": 142500
1768
+ },
1769
+ {
1770
+ "epoch": 7.2,
1771
+ "learning_rate": 1.3994359955685366e-05,
1772
+ "loss": 0.4725,
1773
+ "step": 143000
1774
+ },
1775
+ {
1776
+ "epoch": 7.23,
1777
+ "learning_rate": 1.3868466109376573e-05,
1778
+ "loss": 0.4743,
1779
+ "step": 143500
1780
+ },
1781
+ {
1782
+ "epoch": 7.25,
1783
+ "learning_rate": 1.3742572263067782e-05,
1784
+ "loss": 0.4789,
1785
+ "step": 144000
1786
+ },
1787
+ {
1788
+ "epoch": 7.28,
1789
+ "learning_rate": 1.361667841675899e-05,
1790
+ "loss": 0.4683,
1791
+ "step": 144500
1792
+ },
1793
+ {
1794
+ "epoch": 7.3,
1795
+ "learning_rate": 1.3490784570450196e-05,
1796
+ "loss": 0.4746,
1797
+ "step": 145000
1798
+ },
1799
+ {
1800
+ "epoch": 7.33,
1801
+ "learning_rate": 1.3364890724141405e-05,
1802
+ "loss": 0.4698,
1803
+ "step": 145500
1804
+ },
1805
+ {
1806
+ "epoch": 7.35,
1807
+ "learning_rate": 1.3238996877832612e-05,
1808
+ "loss": 0.4726,
1809
+ "step": 146000
1810
+ },
1811
+ {
1812
+ "epoch": 7.38,
1813
+ "learning_rate": 1.3113103031523818e-05,
1814
+ "loss": 0.4759,
1815
+ "step": 146500
1816
+ },
1817
+ {
1818
+ "epoch": 7.4,
1819
+ "learning_rate": 1.2987209185215027e-05,
1820
+ "loss": 0.4727,
1821
+ "step": 147000
1822
+ },
1823
+ {
1824
+ "epoch": 7.43,
1825
+ "learning_rate": 1.2861315338906235e-05,
1826
+ "loss": 0.4737,
1827
+ "step": 147500
1828
+ },
1829
+ {
1830
+ "epoch": 7.45,
1831
+ "learning_rate": 1.2735421492597444e-05,
1832
+ "loss": 0.4732,
1833
+ "step": 148000
1834
+ },
1835
+ {
1836
+ "epoch": 7.48,
1837
+ "learning_rate": 1.2609527646288648e-05,
1838
+ "loss": 0.476,
1839
+ "step": 148500
1840
+ },
1841
+ {
1842
+ "epoch": 7.5,
1843
+ "learning_rate": 1.2483633799979858e-05,
1844
+ "loss": 0.4751,
1845
+ "step": 149000
1846
+ },
1847
+ {
1848
+ "epoch": 7.53,
1849
+ "learning_rate": 1.2357739953671065e-05,
1850
+ "loss": 0.4769,
1851
+ "step": 149500
1852
+ },
1853
+ {
1854
+ "epoch": 7.55,
1855
+ "learning_rate": 1.2231846107362273e-05,
1856
+ "loss": 0.4751,
1857
+ "step": 150000
1858
+ },
1859
+ {
1860
+ "epoch": 7.58,
1861
+ "learning_rate": 1.210595226105348e-05,
1862
+ "loss": 0.4732,
1863
+ "step": 150500
1864
+ },
1865
+ {
1866
+ "epoch": 7.6,
1867
+ "learning_rate": 1.1980058414744688e-05,
1868
+ "loss": 0.4745,
1869
+ "step": 151000
1870
+ },
1871
+ {
1872
+ "epoch": 7.63,
1873
+ "learning_rate": 1.1854164568435895e-05,
1874
+ "loss": 0.471,
1875
+ "step": 151500
1876
+ },
1877
+ {
1878
+ "epoch": 7.65,
1879
+ "learning_rate": 1.1728270722127103e-05,
1880
+ "loss": 0.4744,
1881
+ "step": 152000
1882
+ },
1883
+ {
1884
+ "epoch": 7.68,
1885
+ "learning_rate": 1.160237687581831e-05,
1886
+ "loss": 0.4799,
1887
+ "step": 152500
1888
+ },
1889
+ {
1890
+ "epoch": 7.7,
1891
+ "learning_rate": 1.147648302950952e-05,
1892
+ "loss": 0.474,
1893
+ "step": 153000
1894
+ },
1895
+ {
1896
+ "epoch": 7.73,
1897
+ "learning_rate": 1.1350589183200725e-05,
1898
+ "loss": 0.4768,
1899
+ "step": 153500
1900
+ },
1901
+ {
1902
+ "epoch": 7.76,
1903
+ "learning_rate": 1.1224695336891933e-05,
1904
+ "loss": 0.4751,
1905
+ "step": 154000
1906
+ },
1907
+ {
1908
+ "epoch": 7.78,
1909
+ "learning_rate": 1.109880149058314e-05,
1910
+ "loss": 0.477,
1911
+ "step": 154500
1912
+ },
1913
+ {
1914
+ "epoch": 7.81,
1915
+ "learning_rate": 1.0972907644274348e-05,
1916
+ "loss": 0.4759,
1917
+ "step": 155000
1918
+ },
1919
+ {
1920
+ "epoch": 7.83,
1921
+ "learning_rate": 1.0847013797965557e-05,
1922
+ "loss": 0.472,
1923
+ "step": 155500
1924
+ },
1925
+ {
1926
+ "epoch": 7.86,
1927
+ "learning_rate": 1.0721119951656763e-05,
1928
+ "loss": 0.4741,
1929
+ "step": 156000
1930
+ },
1931
+ {
1932
+ "epoch": 7.88,
1933
+ "learning_rate": 1.0595226105347972e-05,
1934
+ "loss": 0.4776,
1935
+ "step": 156500
1936
+ },
1937
+ {
1938
+ "epoch": 7.91,
1939
+ "learning_rate": 1.0469332259039178e-05,
1940
+ "loss": 0.4735,
1941
+ "step": 157000
1942
+ },
1943
+ {
1944
+ "epoch": 7.93,
1945
+ "learning_rate": 1.0343438412730385e-05,
1946
+ "loss": 0.4773,
1947
+ "step": 157500
1948
+ },
1949
+ {
1950
+ "epoch": 7.96,
1951
+ "learning_rate": 1.0217544566421595e-05,
1952
+ "loss": 0.474,
1953
+ "step": 158000
1954
+ },
1955
+ {
1956
+ "epoch": 7.98,
1957
+ "learning_rate": 1.00916507201128e-05,
1958
+ "loss": 0.4749,
1959
+ "step": 158500
1960
+ },
1961
+ {
1962
+ "epoch": 8.0,
1963
+ "eval_loss": 0.5894228219985962,
1964
+ "eval_runtime": 51.3606,
1965
+ "eval_samples_per_second": 343.707,
1966
+ "step": 158864
1967
+ },
1968
+ {
1969
+ "epoch": 8.01,
1970
+ "learning_rate": 9.96575687380401e-06,
1971
+ "loss": 0.4705,
1972
+ "step": 159000
1973
+ },
1974
+ {
1975
+ "epoch": 8.03,
1976
+ "learning_rate": 9.839863027495217e-06,
1977
+ "loss": 0.4665,
1978
+ "step": 159500
1979
+ },
1980
+ {
1981
+ "epoch": 8.06,
1982
+ "learning_rate": 9.713969181186425e-06,
1983
+ "loss": 0.4634,
1984
+ "step": 160000
1985
+ },
1986
+ {
1987
+ "epoch": 8.08,
1988
+ "learning_rate": 9.588075334877632e-06,
1989
+ "loss": 0.4673,
1990
+ "step": 160500
1991
+ },
1992
+ {
1993
+ "epoch": 8.11,
1994
+ "learning_rate": 9.462181488568838e-06,
1995
+ "loss": 0.4659,
1996
+ "step": 161000
1997
+ },
1998
+ {
1999
+ "epoch": 8.13,
2000
+ "learning_rate": 9.336287642260047e-06,
2001
+ "loss": 0.469,
2002
+ "step": 161500
2003
+ },
2004
+ {
2005
+ "epoch": 8.16,
2006
+ "learning_rate": 9.210393795951255e-06,
2007
+ "loss": 0.4635,
2008
+ "step": 162000
2009
+ },
2010
+ {
2011
+ "epoch": 8.18,
2012
+ "learning_rate": 9.084499949642462e-06,
2013
+ "loss": 0.4695,
2014
+ "step": 162500
2015
+ },
2016
+ {
2017
+ "epoch": 8.21,
2018
+ "learning_rate": 8.95860610333367e-06,
2019
+ "loss": 0.4697,
2020
+ "step": 163000
2021
+ },
2022
+ {
2023
+ "epoch": 8.23,
2024
+ "learning_rate": 8.832712257024877e-06,
2025
+ "loss": 0.4686,
2026
+ "step": 163500
2027
+ },
2028
+ {
2029
+ "epoch": 8.26,
2030
+ "learning_rate": 8.706818410716085e-06,
2031
+ "loss": 0.4683,
2032
+ "step": 164000
2033
+ },
2034
+ {
2035
+ "epoch": 8.28,
2036
+ "learning_rate": 8.580924564407292e-06,
2037
+ "loss": 0.4716,
2038
+ "step": 164500
2039
+ },
2040
+ {
2041
+ "epoch": 8.31,
2042
+ "learning_rate": 8.4550307180985e-06,
2043
+ "loss": 0.4632,
2044
+ "step": 165000
2045
+ },
2046
+ {
2047
+ "epoch": 8.33,
2048
+ "learning_rate": 8.329136871789707e-06,
2049
+ "loss": 0.4687,
2050
+ "step": 165500
2051
+ },
2052
+ {
2053
+ "epoch": 8.36,
2054
+ "learning_rate": 8.203243025480915e-06,
2055
+ "loss": 0.4683,
2056
+ "step": 166000
2057
+ },
2058
+ {
2059
+ "epoch": 8.38,
2060
+ "learning_rate": 8.077349179172123e-06,
2061
+ "loss": 0.4652,
2062
+ "step": 166500
2063
+ },
2064
+ {
2065
+ "epoch": 8.41,
2066
+ "learning_rate": 7.95145533286333e-06,
2067
+ "loss": 0.4688,
2068
+ "step": 167000
2069
+ },
2070
+ {
2071
+ "epoch": 8.43,
2072
+ "learning_rate": 7.825561486554538e-06,
2073
+ "loss": 0.4674,
2074
+ "step": 167500
2075
+ },
2076
+ {
2077
+ "epoch": 8.46,
2078
+ "learning_rate": 7.699667640245745e-06,
2079
+ "loss": 0.4707,
2080
+ "step": 168000
2081
+ },
2082
+ {
2083
+ "epoch": 8.49,
2084
+ "learning_rate": 7.573773793936953e-06,
2085
+ "loss": 0.4718,
2086
+ "step": 168500
2087
+ },
2088
+ {
2089
+ "epoch": 8.51,
2090
+ "learning_rate": 7.44787994762816e-06,
2091
+ "loss": 0.4693,
2092
+ "step": 169000
2093
+ },
2094
+ {
2095
+ "epoch": 8.54,
2096
+ "learning_rate": 7.3219861013193685e-06,
2097
+ "loss": 0.4655,
2098
+ "step": 169500
2099
+ },
2100
+ {
2101
+ "epoch": 8.56,
2102
+ "learning_rate": 7.196092255010575e-06,
2103
+ "loss": 0.4683,
2104
+ "step": 170000
2105
+ },
2106
+ {
2107
+ "epoch": 8.59,
2108
+ "learning_rate": 7.0701984087017836e-06,
2109
+ "loss": 0.471,
2110
+ "step": 170500
2111
+ },
2112
+ {
2113
+ "epoch": 8.61,
2114
+ "learning_rate": 6.94430456239299e-06,
2115
+ "loss": 0.4665,
2116
+ "step": 171000
2117
+ },
2118
+ {
2119
+ "epoch": 8.64,
2120
+ "learning_rate": 6.818410716084198e-06,
2121
+ "loss": 0.4693,
2122
+ "step": 171500
2123
+ },
2124
+ {
2125
+ "epoch": 8.66,
2126
+ "learning_rate": 6.692516869775406e-06,
2127
+ "loss": 0.4708,
2128
+ "step": 172000
2129
+ },
2130
+ {
2131
+ "epoch": 8.69,
2132
+ "learning_rate": 6.566623023466613e-06,
2133
+ "loss": 0.468,
2134
+ "step": 172500
2135
+ },
2136
+ {
2137
+ "epoch": 8.71,
2138
+ "learning_rate": 6.440729177157821e-06,
2139
+ "loss": 0.4656,
2140
+ "step": 173000
2141
+ },
2142
+ {
2143
+ "epoch": 8.74,
2144
+ "learning_rate": 6.314835330849028e-06,
2145
+ "loss": 0.4693,
2146
+ "step": 173500
2147
+ },
2148
+ {
2149
+ "epoch": 8.76,
2150
+ "learning_rate": 6.188941484540236e-06,
2151
+ "loss": 0.4716,
2152
+ "step": 174000
2153
+ },
2154
+ {
2155
+ "epoch": 8.79,
2156
+ "learning_rate": 6.063047638231444e-06,
2157
+ "loss": 0.4671,
2158
+ "step": 174500
2159
+ },
2160
+ {
2161
+ "epoch": 8.81,
2162
+ "learning_rate": 5.937153791922651e-06,
2163
+ "loss": 0.4663,
2164
+ "step": 175000
2165
+ },
2166
+ {
2167
+ "epoch": 8.84,
2168
+ "learning_rate": 5.811259945613859e-06,
2169
+ "loss": 0.4729,
2170
+ "step": 175500
2171
+ },
2172
+ {
2173
+ "epoch": 8.86,
2174
+ "learning_rate": 5.685366099305066e-06,
2175
+ "loss": 0.4655,
2176
+ "step": 176000
2177
+ },
2178
+ {
2179
+ "epoch": 8.89,
2180
+ "learning_rate": 5.559472252996274e-06,
2181
+ "loss": 0.4647,
2182
+ "step": 176500
2183
+ },
2184
+ {
2185
+ "epoch": 8.91,
2186
+ "learning_rate": 5.433578406687481e-06,
2187
+ "loss": 0.4676,
2188
+ "step": 177000
2189
+ },
2190
+ {
2191
+ "epoch": 8.94,
2192
+ "learning_rate": 5.307684560378689e-06,
2193
+ "loss": 0.4658,
2194
+ "step": 177500
2195
+ },
2196
+ {
2197
+ "epoch": 8.96,
2198
+ "learning_rate": 5.181790714069896e-06,
2199
+ "loss": 0.4681,
2200
+ "step": 178000
2201
+ },
2202
+ {
2203
+ "epoch": 8.99,
2204
+ "learning_rate": 5.055896867761104e-06,
2205
+ "loss": 0.4675,
2206
+ "step": 178500
2207
+ },
2208
+ {
2209
+ "epoch": 9.0,
2210
+ "eval_loss": 0.5858681201934814,
2211
+ "eval_runtime": 51.7912,
2212
+ "eval_samples_per_second": 340.849,
2213
+ "step": 178722
2214
+ },
2215
+ {
2216
+ "epoch": 9.01,
2217
+ "learning_rate": 4.9300030214523114e-06,
2218
+ "loss": 0.4637,
2219
+ "step": 179000
2220
+ },
2221
+ {
2222
+ "epoch": 9.04,
2223
+ "learning_rate": 4.80410917514352e-06,
2224
+ "loss": 0.4622,
2225
+ "step": 179500
2226
+ },
2227
+ {
2228
+ "epoch": 9.06,
2229
+ "learning_rate": 4.6782153288347265e-06,
2230
+ "loss": 0.4619,
2231
+ "step": 180000
2232
+ },
2233
+ {
2234
+ "epoch": 9.09,
2235
+ "learning_rate": 4.552321482525934e-06,
2236
+ "loss": 0.4614,
2237
+ "step": 180500
2238
+ },
2239
+ {
2240
+ "epoch": 9.11,
2241
+ "learning_rate": 4.4264276362171415e-06,
2242
+ "loss": 0.4597,
2243
+ "step": 181000
2244
+ },
2245
+ {
2246
+ "epoch": 9.14,
2247
+ "learning_rate": 4.30053378990835e-06,
2248
+ "loss": 0.4625,
2249
+ "step": 181500
2250
+ },
2251
+ {
2252
+ "epoch": 9.17,
2253
+ "learning_rate": 4.174639943599557e-06,
2254
+ "loss": 0.464,
2255
+ "step": 182000
2256
+ },
2257
+ {
2258
+ "epoch": 9.19,
2259
+ "learning_rate": 4.048746097290765e-06,
2260
+ "loss": 0.4601,
2261
+ "step": 182500
2262
+ },
2263
+ {
2264
+ "epoch": 9.22,
2265
+ "learning_rate": 3.9228522509819725e-06,
2266
+ "loss": 0.4649,
2267
+ "step": 183000
2268
+ },
2269
+ {
2270
+ "epoch": 9.24,
2271
+ "learning_rate": 3.7969584046731796e-06,
2272
+ "loss": 0.4634,
2273
+ "step": 183500
2274
+ },
2275
+ {
2276
+ "epoch": 9.27,
2277
+ "learning_rate": 3.671064558364387e-06,
2278
+ "loss": 0.466,
2279
+ "step": 184000
2280
+ },
2281
+ {
2282
+ "epoch": 9.29,
2283
+ "learning_rate": 3.5451707120555946e-06,
2284
+ "loss": 0.4663,
2285
+ "step": 184500
2286
+ },
2287
+ {
2288
+ "epoch": 9.32,
2289
+ "learning_rate": 3.4192768657468025e-06,
2290
+ "loss": 0.465,
2291
+ "step": 185000
2292
+ },
2293
+ {
2294
+ "epoch": 9.34,
2295
+ "learning_rate": 3.29338301943801e-06,
2296
+ "loss": 0.4639,
2297
+ "step": 185500
2298
+ },
2299
+ {
2300
+ "epoch": 9.37,
2301
+ "learning_rate": 3.1674891731292176e-06,
2302
+ "loss": 0.4631,
2303
+ "step": 186000
2304
+ },
2305
+ {
2306
+ "epoch": 9.39,
2307
+ "learning_rate": 3.041595326820425e-06,
2308
+ "loss": 0.4633,
2309
+ "step": 186500
2310
+ },
2311
+ {
2312
+ "epoch": 9.42,
2313
+ "learning_rate": 2.9157014805116326e-06,
2314
+ "loss": 0.4609,
2315
+ "step": 187000
2316
+ },
2317
+ {
2318
+ "epoch": 9.44,
2319
+ "learning_rate": 2.7898076342028406e-06,
2320
+ "loss": 0.461,
2321
+ "step": 187500
2322
+ },
2323
+ {
2324
+ "epoch": 9.47,
2325
+ "learning_rate": 2.6639137878940477e-06,
2326
+ "loss": 0.4593,
2327
+ "step": 188000
2328
+ },
2329
+ {
2330
+ "epoch": 9.49,
2331
+ "learning_rate": 2.538019941585255e-06,
2332
+ "loss": 0.4623,
2333
+ "step": 188500
2334
+ },
2335
+ {
2336
+ "epoch": 9.52,
2337
+ "learning_rate": 2.412126095276463e-06,
2338
+ "loss": 0.4625,
2339
+ "step": 189000
2340
+ },
2341
+ {
2342
+ "epoch": 9.54,
2343
+ "learning_rate": 2.2862322489676707e-06,
2344
+ "loss": 0.4607,
2345
+ "step": 189500
2346
+ },
2347
+ {
2348
+ "epoch": 9.57,
2349
+ "learning_rate": 2.160338402658878e-06,
2350
+ "loss": 0.4638,
2351
+ "step": 190000
2352
+ },
2353
+ {
2354
+ "epoch": 9.59,
2355
+ "learning_rate": 2.0344445563500857e-06,
2356
+ "loss": 0.4613,
2357
+ "step": 190500
2358
+ },
2359
+ {
2360
+ "epoch": 9.62,
2361
+ "learning_rate": 1.9085507100412932e-06,
2362
+ "loss": 0.4652,
2363
+ "step": 191000
2364
+ },
2365
+ {
2366
+ "epoch": 9.64,
2367
+ "learning_rate": 1.7826568637325008e-06,
2368
+ "loss": 0.4652,
2369
+ "step": 191500
2370
+ },
2371
+ {
2372
+ "epoch": 9.67,
2373
+ "learning_rate": 1.6567630174237083e-06,
2374
+ "loss": 0.4627,
2375
+ "step": 192000
2376
+ },
2377
+ {
2378
+ "epoch": 9.69,
2379
+ "learning_rate": 1.530869171114916e-06,
2380
+ "loss": 0.4628,
2381
+ "step": 192500
2382
+ },
2383
+ {
2384
+ "epoch": 9.72,
2385
+ "learning_rate": 1.4049753248061235e-06,
2386
+ "loss": 0.462,
2387
+ "step": 193000
2388
+ },
2389
+ {
2390
+ "epoch": 9.74,
2391
+ "learning_rate": 1.2790814784973313e-06,
2392
+ "loss": 0.4646,
2393
+ "step": 193500
2394
+ },
2395
+ {
2396
+ "epoch": 9.77,
2397
+ "learning_rate": 1.1531876321885388e-06,
2398
+ "loss": 0.4626,
2399
+ "step": 194000
2400
+ },
2401
+ {
2402
+ "epoch": 9.79,
2403
+ "learning_rate": 1.027293785879746e-06,
2404
+ "loss": 0.4618,
2405
+ "step": 194500
2406
+ },
2407
+ {
2408
+ "epoch": 9.82,
2409
+ "learning_rate": 9.013999395709538e-07,
2410
+ "loss": 0.4622,
2411
+ "step": 195000
2412
+ },
2413
+ {
2414
+ "epoch": 9.84,
2415
+ "learning_rate": 7.755060932621615e-07,
2416
+ "loss": 0.4616,
2417
+ "step": 195500
2418
+ },
2419
+ {
2420
+ "epoch": 9.87,
2421
+ "learning_rate": 6.496122469533689e-07,
2422
+ "loss": 0.4631,
2423
+ "step": 196000
2424
+ },
2425
+ {
2426
+ "epoch": 9.9,
2427
+ "learning_rate": 5.237184006445765e-07,
2428
+ "loss": 0.4625,
2429
+ "step": 196500
2430
+ },
2431
+ {
2432
+ "epoch": 9.92,
2433
+ "learning_rate": 3.978245543357841e-07,
2434
+ "loss": 0.4621,
2435
+ "step": 197000
2436
+ },
2437
+ {
2438
+ "epoch": 9.95,
2439
+ "learning_rate": 2.7193070802699166e-07,
2440
+ "loss": 0.461,
2441
+ "step": 197500
2442
+ },
2443
+ {
2444
+ "epoch": 9.97,
2445
+ "learning_rate": 1.4603686171819923e-07,
2446
+ "loss": 0.4617,
2447
+ "step": 198000
2448
+ },
2449
+ {
2450
+ "epoch": 10.0,
2451
+ "learning_rate": 2.014301540940679e-08,
2452
+ "loss": 0.4652,
2453
+ "step": 198500
2454
+ },
2455
+ {
2456
+ "epoch": 10.0,
2457
+ "eval_loss": 0.5854414701461792,
2458
+ "eval_runtime": 51.7785,
2459
+ "eval_samples_per_second": 340.933,
2460
+ "step": 198580
2461
+ },
2462
+ {
2463
+ "epoch": 10.0,
2464
+ "step": 198580,
2465
+ "total_flos": 1.2104979074144256e+17,
2466
+ "train_runtime": 21748.254,
2467
+ "train_samples_per_second": 9.131
2468
+ }
2469
+ ],
2470
+ "max_steps": 198580,
2471
+ "num_train_epochs": 10,
2472
+ "total_flos": 1.2104979074144256e+17,
2473
+ "trial_name": null,
2474
+ "trial_params": null
2475
+ }