07/10/2024 09:29:44 - INFO - llamafactory.hparams.parser - Process rank: 1, device: cuda:1, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16

07/10/2024 09:29:44 - INFO - llamafactory.data.template - Add pad token: </s>

[INFO|parser.py:325] 2024-07-10 09:29:44,482 >> Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16

07/10/2024 09:29:44 - INFO - llamafactory.hparams.parser - Process rank: 4, device: cuda:4, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16

07/10/2024 09:29:44 - INFO - llamafactory.hparams.parser - Process rank: 6, device: cuda:6, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16

07/10/2024 09:29:44 - INFO - llamafactory.hparams.parser - Process rank: 5, device: cuda:5, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16

07/10/2024 09:29:44 - INFO - llamafactory.hparams.parser - Process rank: 2, device: cuda:2, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16

07/10/2024 09:29:44 - INFO - llamafactory.hparams.parser - Process rank: 3, device: cuda:3, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16

07/10/2024 09:29:44 - INFO - llamafactory.hparams.parser - Process rank: 7, device: cuda:7, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16

07/10/2024 09:29:44 - INFO - llamafactory.data.template - Add pad token: </s>

07/10/2024 09:29:44 - INFO - llamafactory.data.template - Add pad token: </s>

07/10/2024 09:29:44 - INFO - llamafactory.data.template - Add pad token: </s>

07/10/2024 09:29:44 - INFO - llamafactory.data.template - Add pad token: </s>

[INFO|tokenization_utils_base.py:2161] 2024-07-10 09:29:44,956 >> loading file tokenizer.model from cache at /root/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-hf/snapshots/01c7f73d771dfac7d292323805ebc428287df4f9/tokenizer.model

[INFO|tokenization_utils_base.py:2161] 2024-07-10 09:29:44,957 >> loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-hf/snapshots/01c7f73d771dfac7d292323805ebc428287df4f9/tokenizer.json

[INFO|tokenization_utils_base.py:2161] 2024-07-10 09:29:44,957 >> loading file added_tokens.json from cache at None

[INFO|tokenization_utils_base.py:2161] 2024-07-10 09:29:44,957 >> loading file special_tokens_map.json from cache at /root/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-hf/snapshots/01c7f73d771dfac7d292323805ebc428287df4f9/special_tokens_map.json

[INFO|tokenization_utils_base.py:2161] 2024-07-10 09:29:44,957 >> loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-hf/snapshots/01c7f73d771dfac7d292323805ebc428287df4f9/tokenizer_config.json

[INFO|template.py:372] 2024-07-10 09:29:45,060 >> Add pad token: </s>

[INFO|loader.py:50] 2024-07-10 09:29:45,060 >> Loading dataset train_output.json...

07/10/2024 09:29:45 - INFO - llamafactory.data.template - Add pad token: </s>

07/10/2024 09:29:45 - INFO - llamafactory.data.template - Add pad token: </s>

07/10/2024 09:29:46 - INFO - llamafactory.data.loader - Loading dataset train_output.json...

07/10/2024 09:29:46 - INFO - llamafactory.data.loader - Loading dataset train_output.json...

07/10/2024 09:29:46 - INFO - llamafactory.data.loader - Loading dataset train_output.json...

07/10/2024 09:29:46 - INFO - llamafactory.data.loader - Loading dataset train_output.json...

07/10/2024 09:29:46 - INFO - llamafactory.data.loader - Loading dataset train_output.json...

07/10/2024 09:29:46 - INFO - llamafactory.data.loader - Loading dataset train_output.json...

07/10/2024 09:29:46 - INFO - llamafactory.data.loader - Loading dataset train_output.json...

[INFO|configuration_utils.py:733] 2024-07-10 09:29:47,068 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-hf/snapshots/01c7f73d771dfac7d292323805ebc428287df4f9/config.json

[INFO|configuration_utils.py:800] 2024-07-10 09:29:47,069 >> Model config LlamaConfig {
  "_name_or_path": "meta-llama/Llama-2-7b-hf",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 11008,
  "max_position_embeddings": 4096,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 32,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 10000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "float16",
  "transformers_version": "4.42.3",
  "use_cache": true,
  "vocab_size": 32000
}


[INFO|modeling_utils.py:3556] 2024-07-10 09:29:47,090 >> loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-hf/snapshots/01c7f73d771dfac7d292323805ebc428287df4f9/model.safetensors.index.json

[INFO|modeling_utils.py:1531] 2024-07-10 09:29:47,091 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16.

[INFO|configuration_utils.py:1000] 2024-07-10 09:29:47,092 >> Generate config GenerationConfig {
  "bos_token_id": 1,
  "eos_token_id": 2
}


07/10/2024 09:30:04 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.

07/10/2024 09:30:04 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference.

07/10/2024 09:30:04 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.

07/10/2024 09:30:04 - INFO - llamafactory.model.adapter - Fine-tuning method: Full

07/10/2024 09:30:04 - INFO - llamafactory.model.loader - trainable params: 6,738,415,616 || all params: 6,738,415,616 || trainable%: 100.0000

07/10/2024 09:30:04 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.

07/10/2024 09:30:04 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference.

07/10/2024 09:30:04 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.

07/10/2024 09:30:04 - INFO - llamafactory.model.adapter - Fine-tuning method: Full

07/10/2024 09:30:04 - INFO - llamafactory.model.loader - trainable params: 6,738,415,616 || all params: 6,738,415,616 || trainable%: 100.0000

[INFO|modeling_utils.py:4364] 2024-07-10 09:30:04,564 >> All model checkpoint weights were used when initializing LlamaForCausalLM.


[INFO|modeling_utils.py:4372] 2024-07-10 09:30:04,564 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at meta-llama/Llama-2-7b-hf.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.

07/10/2024 09:30:04 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.

07/10/2024 09:30:04 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference.

07/10/2024 09:30:04 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.

07/10/2024 09:30:04 - INFO - llamafactory.model.adapter - Fine-tuning method: Full

07/10/2024 09:30:04 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.

07/10/2024 09:30:04 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference.

07/10/2024 09:30:04 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.

07/10/2024 09:30:04 - INFO - llamafactory.model.adapter - Fine-tuning method: Full

07/10/2024 09:30:04 - INFO - llamafactory.model.loader - trainable params: 6,738,415,616 || all params: 6,738,415,616 || trainable%: 100.0000

07/10/2024 09:30:04 - INFO - llamafactory.model.loader - trainable params: 6,738,415,616 || all params: 6,738,415,616 || trainable%: 100.0000

[INFO|configuration_utils.py:955] 2024-07-10 09:30:04,747 >> loading configuration file generation_config.json from cache at /root/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-hf/snapshots/01c7f73d771dfac7d292323805ebc428287df4f9/generation_config.json

[INFO|configuration_utils.py:1000] 2024-07-10 09:30:04,748 >> Generate config GenerationConfig {
  "bos_token_id": 1,
  "do_sample": true,
  "eos_token_id": 2,
  "max_length": 4096,
  "pad_token_id": 0,
  "temperature": 0.6,
  "top_p": 0.9
}


[INFO|checkpointing.py:103] 2024-07-10 09:30:04,755 >> Gradient checkpointing enabled.

[INFO|attention.py:80] 2024-07-10 09:30:04,755 >> Using torch SDPA for faster training and inference.

[INFO|adapter.py:302] 2024-07-10 09:30:04,755 >> Upcasting trainable params to float32.

[INFO|adapter.py:48] 2024-07-10 09:30:04,755 >> Fine-tuning method: Full

[INFO|loader.py:196] 2024-07-10 09:30:04,814 >> trainable params: 6,738,415,616 || all params: 6,738,415,616 || trainable%: 100.0000

07/10/2024 09:30:04 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.

07/10/2024 09:30:04 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference.

07/10/2024 09:30:04 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.

07/10/2024 09:30:04 - INFO - llamafactory.model.adapter - Fine-tuning method: Full

07/10/2024 09:30:04 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.

07/10/2024 09:30:04 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference.

07/10/2024 09:30:04 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.

07/10/2024 09:30:04 - INFO - llamafactory.model.adapter - Fine-tuning method: Full

07/10/2024 09:30:05 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.

07/10/2024 09:30:05 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference.

07/10/2024 09:30:05 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.

07/10/2024 09:30:05 - INFO - llamafactory.model.adapter - Fine-tuning method: Full

07/10/2024 09:30:05 - INFO - llamafactory.model.loader - trainable params: 6,738,415,616 || all params: 6,738,415,616 || trainable%: 100.0000

07/10/2024 09:30:05 - INFO - llamafactory.model.loader - trainable params: 6,738,415,616 || all params: 6,738,415,616 || trainable%: 100.0000

07/10/2024 09:30:05 - INFO - llamafactory.model.loader - trainable params: 6,738,415,616 || all params: 6,738,415,616 || trainable%: 100.0000

[INFO|trainer.py:642] 2024-07-10 09:30:04,819 >> Using auto half precision backend

[INFO|trainer.py:2128] 2024-07-10 09:30:24,499 >> ***** Running training *****

[INFO|trainer.py:2129] 2024-07-10 09:30:24,499 >>   Num examples = 19,880

[INFO|trainer.py:2130] 2024-07-10 09:30:24,499 >>   Num Epochs = 5

[INFO|trainer.py:2131] 2024-07-10 09:30:24,499 >>   Instantaneous batch size per device = 2

[INFO|trainer.py:2134] 2024-07-10 09:30:24,499 >>   Total train batch size (w. parallel, distributed & accumulation) = 128

[INFO|trainer.py:2135] 2024-07-10 09:30:24,500 >>   Gradient Accumulation steps = 8

[INFO|trainer.py:2136] 2024-07-10 09:30:24,500 >>   Total optimization steps = 775

[INFO|trainer.py:2137] 2024-07-10 09:30:24,501 >>   Number of trainable parameters = 6,738,415,616

[INFO|callbacks.py:310] 2024-07-10 09:30:38,200 >> {'loss': 8.4196, 'learning_rate': 8.3333e-09, 'epoch': 0.01, 'throughput': 499.92}

[INFO|callbacks.py:310] 2024-07-10 09:30:49,316 >> {'loss': 8.4400, 'learning_rate': 1.6667e-08, 'epoch': 0.01, 'throughput': 564.21}

[INFO|callbacks.py:310] 2024-07-10 09:31:00,361 >> {'loss': 8.3839, 'learning_rate': 2.5000e-08, 'epoch': 0.02, 'throughput': 589.87}

[INFO|callbacks.py:310] 2024-07-10 09:31:11,423 >> {'loss': 8.4024, 'learning_rate': 3.3333e-08, 'epoch': 0.03, 'throughput': 601.53}

[INFO|callbacks.py:310] 2024-07-10 09:31:22,496 >> {'loss': 8.4594, 'learning_rate': 4.1667e-08, 'epoch': 0.03, 'throughput': 609.72}

[INFO|callbacks.py:310] 2024-07-10 09:31:33,578 >> {'loss': 8.4107, 'learning_rate': 5.0000e-08, 'epoch': 0.04, 'throughput': 610.81}

[INFO|callbacks.py:310] 2024-07-10 09:31:44,687 >> {'loss': 8.4551, 'learning_rate': 5.8333e-08, 'epoch': 0.05, 'throughput': 612.19}

[INFO|callbacks.py:310] 2024-07-10 09:31:55,787 >> {'loss': 8.4415, 'learning_rate': 6.6667e-08, 'epoch': 0.05, 'throughput': 611.88}

[INFO|callbacks.py:310] 2024-07-10 09:32:06,882 >> {'loss': 8.4965, 'learning_rate': 7.5000e-08, 'epoch': 0.06, 'throughput': 616.53}

[INFO|callbacks.py:310] 2024-07-10 09:32:17,990 >> {'loss': 8.4251, 'learning_rate': 8.3333e-08, 'epoch': 0.06, 'throughput': 616.52}

[INFO|callbacks.py:310] 2024-07-10 09:32:29,076 >> {'loss': 8.4291, 'learning_rate': 9.1667e-08, 'epoch': 0.07, 'throughput': 619.46}

[INFO|callbacks.py:310] 2024-07-10 09:32:40,134 >> {'loss': 8.4463, 'learning_rate': 1.0000e-07, 'epoch': 0.08, 'throughput': 621.33}

[INFO|callbacks.py:310] 2024-07-10 09:32:51,191 >> {'loss': 8.5116, 'learning_rate': 1.0833e-07, 'epoch': 0.08, 'throughput': 621.94}

[INFO|callbacks.py:310] 2024-07-10 09:33:02,261 >> {'loss': 8.4749, 'learning_rate': 1.1667e-07, 'epoch': 0.09, 'throughput': 621.00}

[INFO|callbacks.py:310] 2024-07-10 09:33:13,309 >> {'loss': 8.3311, 'learning_rate': 1.2500e-07, 'epoch': 0.10, 'throughput': 621.21}

[INFO|callbacks.py:310] 2024-07-10 09:33:24,408 >> {'loss': 8.3729, 'learning_rate': 1.3333e-07, 'epoch': 0.10, 'throughput': 619.70}

[INFO|callbacks.py:310] 2024-07-10 09:33:35,514 >> {'loss': 8.4261, 'learning_rate': 1.4167e-07, 'epoch': 0.11, 'throughput': 619.77}

[INFO|callbacks.py:310] 2024-07-10 09:33:46,626 >> {'loss': 8.3051, 'learning_rate': 1.5000e-07, 'epoch': 0.12, 'throughput': 620.22}

[INFO|callbacks.py:310] 2024-07-10 09:33:57,716 >> {'loss': 8.2461, 'learning_rate': 1.5833e-07, 'epoch': 0.12, 'throughput': 620.60}

[INFO|callbacks.py:310] 2024-07-10 09:34:08,814 >> {'loss': 8.2894, 'learning_rate': 1.6667e-07, 'epoch': 0.13, 'throughput': 621.35}

[INFO|callbacks.py:310] 2024-07-10 09:34:19,853 >> {'loss': 8.2484, 'learning_rate': 1.7500e-07, 'epoch': 0.14, 'throughput': 622.66}

[INFO|callbacks.py:310] 2024-07-10 09:34:30,927 >> {'loss': 8.3034, 'learning_rate': 1.8333e-07, 'epoch': 0.14, 'throughput': 622.79}

[INFO|callbacks.py:310] 2024-07-10 09:34:42,007 >> {'loss': 8.0540, 'learning_rate': 1.9167e-07, 'epoch': 0.15, 'throughput': 620.79}

[INFO|callbacks.py:310] 2024-07-10 09:34:53,068 >> {'loss': 7.9583, 'learning_rate': 2.0000e-07, 'epoch': 0.15, 'throughput': 620.06}

[INFO|callbacks.py:310] 2024-07-10 09:35:04,183 >> {'loss': 7.9626, 'learning_rate': 2.0833e-07, 'epoch': 0.16, 'throughput': 618.76}

[INFO|callbacks.py:310] 2024-07-10 09:35:15,275 >> {'loss': 7.8761, 'learning_rate': 2.1667e-07, 'epoch': 0.17, 'throughput': 617.72}

[INFO|callbacks.py:310] 2024-07-10 09:35:26,369 >> {'loss': 7.8896, 'learning_rate': 2.2500e-07, 'epoch': 0.17, 'throughput': 619.19}

[INFO|callbacks.py:310] 2024-07-10 09:35:37,467 >> {'loss': 7.8352, 'learning_rate': 2.3333e-07, 'epoch': 0.18, 'throughput': 619.67}

[INFO|callbacks.py:310] 2024-07-10 09:35:48,529 >> {'loss': 7.6910, 'learning_rate': 2.4167e-07, 'epoch': 0.19, 'throughput': 619.31}

[INFO|callbacks.py:310] 2024-07-10 09:35:59,589 >> {'loss': 7.7851, 'learning_rate': 2.5000e-07, 'epoch': 0.19, 'throughput': 619.35}

[INFO|callbacks.py:310] 2024-07-10 09:36:10,651 >> {'loss': 7.7249, 'learning_rate': 2.5833e-07, 'epoch': 0.20, 'throughput': 620.08}

[INFO|callbacks.py:310] 2024-07-10 09:36:21,731 >> {'loss': 6.8838, 'learning_rate': 2.6667e-07, 'epoch': 0.21, 'throughput': 620.73}

[INFO|callbacks.py:310] 2024-07-10 09:36:32,784 >> {'loss': 6.7173, 'learning_rate': 2.7500e-07, 'epoch': 0.21, 'throughput': 620.79}

[INFO|callbacks.py:310] 2024-07-10 09:36:43,885 >> {'loss': 6.6793, 'learning_rate': 2.8333e-07, 'epoch': 0.22, 'throughput': 620.63}

[INFO|callbacks.py:310] 2024-07-10 09:36:54,976 >> {'loss': 6.7250, 'learning_rate': 2.9167e-07, 'epoch': 0.23, 'throughput': 621.73}

[INFO|callbacks.py:310] 2024-07-10 09:37:06,088 >> {'loss': 6.6905, 'learning_rate': 3.0000e-07, 'epoch': 0.23, 'throughput': 621.42}

[INFO|callbacks.py:310] 2024-07-10 09:37:17,200 >> {'loss': 6.6179, 'learning_rate': 3.0833e-07, 'epoch': 0.24, 'throughput': 620.70}

[INFO|callbacks.py:310] 2024-07-10 09:37:28,267 >> {'loss': 6.5010, 'learning_rate': 3.1667e-07, 'epoch': 0.24, 'throughput': 620.42}

[INFO|callbacks.py:310] 2024-07-10 09:37:39,337 >> {'loss': 6.4588, 'learning_rate': 3.2500e-07, 'epoch': 0.25, 'throughput': 620.00}

[INFO|callbacks.py:310] 2024-07-10 09:37:50,404 >> {'loss': 6.3614, 'learning_rate': 3.3333e-07, 'epoch': 0.26, 'throughput': 620.23}

[INFO|callbacks.py:310] 2024-07-10 09:38:01,465 >> {'loss': 6.2775, 'learning_rate': 3.4167e-07, 'epoch': 0.26, 'throughput': 620.83}

[INFO|callbacks.py:310] 2024-07-10 09:38:12,548 >> {'loss': 5.9868, 'learning_rate': 3.5000e-07, 'epoch': 0.27, 'throughput': 621.38}

[INFO|callbacks.py:310] 2024-07-10 09:38:23,635 >> {'loss': 5.2286, 'learning_rate': 3.5833e-07, 'epoch': 0.28, 'throughput': 622.16}

[INFO|callbacks.py:310] 2024-07-10 09:38:34,725 >> {'loss': 4.5076, 'learning_rate': 3.6667e-07, 'epoch': 0.28, 'throughput': 622.41}

[INFO|callbacks.py:310] 2024-07-10 09:38:45,821 >> {'loss': 4.1167, 'learning_rate': 3.7500e-07, 'epoch': 0.29, 'throughput': 622.36}

[INFO|callbacks.py:310] 2024-07-10 09:38:56,918 >> {'loss': 3.6585, 'learning_rate': 3.8333e-07, 'epoch': 0.30, 'throughput': 622.56}

[INFO|callbacks.py:310] 2024-07-10 09:39:07,957 >> {'loss': 3.3613, 'learning_rate': 3.9167e-07, 'epoch': 0.30, 'throughput': 623.15}

[INFO|callbacks.py:310] 2024-07-10 09:39:19,011 >> {'loss': 3.1068, 'learning_rate': 4.0000e-07, 'epoch': 0.31, 'throughput': 624.24}

[INFO|callbacks.py:310] 2024-07-10 09:39:30,083 >> {'loss': 2.9368, 'learning_rate': 4.0833e-07, 'epoch': 0.32, 'throughput': 624.86}

[INFO|callbacks.py:310] 2024-07-10 09:39:41,141 >> {'loss': 2.3466, 'learning_rate': 4.1667e-07, 'epoch': 0.32, 'throughput': 624.66}

[INFO|callbacks.py:310] 2024-07-10 09:39:52,217 >> {'loss': 2.0645, 'learning_rate': 4.2500e-07, 'epoch': 0.33, 'throughput': 625.72}

[INFO|callbacks.py:310] 2024-07-10 09:40:03,316 >> {'loss': 1.7729, 'learning_rate': 4.3333e-07, 'epoch': 0.33, 'throughput': 625.36}

[INFO|callbacks.py:310] 2024-07-10 09:40:14,413 >> {'loss': 1.6199, 'learning_rate': 4.4167e-07, 'epoch': 0.34, 'throughput': 625.75}

[INFO|callbacks.py:310] 2024-07-10 09:40:25,526 >> {'loss': 1.1593, 'learning_rate': 4.5000e-07, 'epoch': 0.35, 'throughput': 625.44}

[INFO|callbacks.py:310] 2024-07-10 09:40:36,622 >> {'loss': 0.7199, 'learning_rate': 4.5833e-07, 'epoch': 0.35, 'throughput': 625.45}

[INFO|callbacks.py:310] 2024-07-10 09:40:47,675 >> {'loss': 0.4394, 'learning_rate': 4.6667e-07, 'epoch': 0.36, 'throughput': 625.32}

[INFO|callbacks.py:310] 2024-07-10 09:40:58,741 >> {'loss': 0.3806, 'learning_rate': 4.7500e-07, 'epoch': 0.37, 'throughput': 624.67}

[INFO|callbacks.py:310] 2024-07-10 09:41:09,807 >> {'loss': 0.3185, 'learning_rate': 4.8333e-07, 'epoch': 0.37, 'throughput': 624.67}

[INFO|callbacks.py:310] 2024-07-10 09:41:20,863 >> {'loss': 0.3056, 'learning_rate': 4.9167e-07, 'epoch': 0.38, 'throughput': 624.92}

[INFO|callbacks.py:310] 2024-07-10 09:41:31,931 >> {'loss': 0.2981, 'learning_rate': 5.0000e-07, 'epoch': 0.39, 'throughput': 624.68}

[INFO|callbacks.py:310] 2024-07-10 09:41:43,017 >> {'loss': 0.2473, 'learning_rate': 5.0833e-07, 'epoch': 0.39, 'throughput': 625.08}

[INFO|callbacks.py:310] 2024-07-10 09:41:54,122 >> {'loss': 0.2924, 'learning_rate': 5.1667e-07, 'epoch': 0.40, 'throughput': 624.78}

[INFO|callbacks.py:310] 2024-07-10 09:42:05,233 >> {'loss': 0.2656, 'learning_rate': 5.2500e-07, 'epoch': 0.41, 'throughput': 624.70}

[INFO|callbacks.py:310] 2024-07-10 09:42:16,322 >> {'loss': 0.2335, 'learning_rate': 5.3333e-07, 'epoch': 0.41, 'throughput': 624.63}

[INFO|callbacks.py:310] 2024-07-10 09:42:27,376 >> {'loss': 0.2647, 'learning_rate': 5.4167e-07, 'epoch': 0.42, 'throughput': 624.86}

[INFO|callbacks.py:310] 2024-07-10 09:42:38,440 >> {'loss': 0.3003, 'learning_rate': 5.5000e-07, 'epoch': 0.42, 'throughput': 625.10}

[INFO|callbacks.py:310] 2024-07-10 09:42:49,510 >> {'loss': 0.2656, 'learning_rate': 5.5833e-07, 'epoch': 0.43, 'throughput': 625.39}

[INFO|callbacks.py:310] 2024-07-10 09:43:00,566 >> {'loss': 0.2212, 'learning_rate': 5.6667e-07, 'epoch': 0.44, 'throughput': 625.81}

[INFO|callbacks.py:310] 2024-07-10 09:43:11,641 >> {'loss': 0.2532, 'learning_rate': 5.7500e-07, 'epoch': 0.44, 'throughput': 626.41}

[INFO|callbacks.py:310] 2024-07-10 09:43:22,719 >> {'loss': 0.2799, 'learning_rate': 5.8333e-07, 'epoch': 0.45, 'throughput': 626.50}

[INFO|callbacks.py:310] 2024-07-10 09:43:33,824 >> {'loss': 0.2876, 'learning_rate': 5.9167e-07, 'epoch': 0.46, 'throughput': 626.18}

[INFO|callbacks.py:310] 2024-07-10 09:43:44,926 >> {'loss': 0.2191, 'learning_rate': 6.0000e-07, 'epoch': 0.46, 'throughput': 625.63}

[INFO|callbacks.py:310] 2024-07-10 09:43:56,013 >> {'loss': 0.2344, 'learning_rate': 6.0833e-07, 'epoch': 0.47, 'throughput': 624.93}

[INFO|callbacks.py:310] 2024-07-10 09:44:07,066 >> {'loss': 0.2547, 'learning_rate': 6.1667e-07, 'epoch': 0.48, 'throughput': 625.13}

[INFO|callbacks.py:310] 2024-07-10 09:44:18,140 >> {'loss': 0.2349, 'learning_rate': 6.2500e-07, 'epoch': 0.48, 'throughput': 625.12}

[INFO|callbacks.py:310] 2024-07-10 09:44:29,223 >> {'loss': 0.2045, 'learning_rate': 6.3333e-07, 'epoch': 0.49, 'throughput': 624.85}

[INFO|callbacks.py:310] 2024-07-10 09:44:40,278 >> {'loss': 0.2262, 'learning_rate': 6.4167e-07, 'epoch': 0.50, 'throughput': 624.97}

[INFO|callbacks.py:310] 2024-07-10 09:44:51,386 >> {'loss': 0.2393, 'learning_rate': 6.5000e-07, 'epoch': 0.50, 'throughput': 624.88}

[INFO|callbacks.py:310] 2024-07-10 09:45:02,474 >> {'loss': 0.2206, 'learning_rate': 6.5833e-07, 'epoch': 0.51, 'throughput': 624.86}

[INFO|callbacks.py:310] 2024-07-10 09:45:13,589 >> {'loss': 0.2029, 'learning_rate': 6.6667e-07, 'epoch': 0.51, 'throughput': 624.75}

[INFO|callbacks.py:310] 2024-07-10 09:45:24,690 >> {'loss': 0.2125, 'learning_rate': 6.7500e-07, 'epoch': 0.52, 'throughput': 625.17}

[INFO|callbacks.py:310] 2024-07-10 09:45:35,760 >> {'loss': 0.2023, 'learning_rate': 6.8333e-07, 'epoch': 0.53, 'throughput': 625.00}

[INFO|callbacks.py:310] 2024-07-10 09:45:46,811 >> {'loss': 0.2262, 'learning_rate': 6.9167e-07, 'epoch': 0.53, 'throughput': 624.62}

[INFO|callbacks.py:310] 2024-07-10 09:45:57,866 >> {'loss': 0.2003, 'learning_rate': 7.0000e-07, 'epoch': 0.54, 'throughput': 625.13}

[INFO|callbacks.py:310] 2024-07-10 09:46:08,949 >> {'loss': 0.2351, 'learning_rate': 7.0833e-07, 'epoch': 0.55, 'throughput': 625.03}

[INFO|callbacks.py:310] 2024-07-10 09:46:20,006 >> {'loss': 0.1839, 'learning_rate': 7.1667e-07, 'epoch': 0.55, 'throughput': 624.96}

[INFO|callbacks.py:310] 2024-07-10 09:46:31,099 >> {'loss': 0.2315, 'learning_rate': 7.2500e-07, 'epoch': 0.56, 'throughput': 625.09}

[INFO|callbacks.py:310] 2024-07-10 09:46:42,194 >> {'loss': 0.2732, 'learning_rate': 7.3333e-07, 'epoch': 0.57, 'throughput': 624.84}

[INFO|callbacks.py:310] 2024-07-10 09:46:53,298 >> {'loss': 0.2405, 'learning_rate': 7.4167e-07, 'epoch': 0.57, 'throughput': 625.12}

[INFO|callbacks.py:310] 2024-07-10 09:47:04,402 >> {'loss': 0.2005, 'learning_rate': 7.5000e-07, 'epoch': 0.58, 'throughput': 625.10}

[INFO|callbacks.py:310] 2024-07-10 09:47:15,465 >> {'loss': 0.2005, 'learning_rate': 7.5833e-07, 'epoch': 0.59, 'throughput': 624.99}

[INFO|callbacks.py:310] 2024-07-10 09:47:26,522 >> {'loss': 0.2115, 'learning_rate': 7.6667e-07, 'epoch': 0.59, 'throughput': 624.99}

[INFO|callbacks.py:310] 2024-07-10 09:47:37,581 >> {'loss': 0.2141, 'learning_rate': 7.7500e-07, 'epoch': 0.60, 'throughput': 625.28}

[INFO|callbacks.py:310] 2024-07-10 09:47:48,654 >> {'loss': 0.1898, 'learning_rate': 7.8333e-07, 'epoch': 0.60, 'throughput': 625.15}

[INFO|callbacks.py:310] 2024-07-10 09:47:59,715 >> {'loss': 0.2099, 'learning_rate': 7.9167e-07, 'epoch': 0.61, 'throughput': 625.51}

[INFO|callbacks.py:310] 2024-07-10 09:48:10,807 >> {'loss': 0.2014, 'learning_rate': 8.0000e-07, 'epoch': 0.62, 'throughput': 625.29}

[INFO|callbacks.py:310] 2024-07-10 09:48:21,907 >> {'loss': 0.2349, 'learning_rate': 8.0833e-07, 'epoch': 0.62, 'throughput': 625.44}

[INFO|callbacks.py:310] 2024-07-10 09:48:33,011 >> {'loss': 0.2205, 'learning_rate': 8.1667e-07, 'epoch': 0.63, 'throughput': 625.72}

[INFO|callbacks.py:310] 2024-07-10 09:48:44,106 >> {'loss': 0.1990, 'learning_rate': 8.2500e-07, 'epoch': 0.64, 'throughput': 625.80}

[INFO|callbacks.py:310] 2024-07-10 09:48:55,161 >> {'loss': 0.2360, 'learning_rate': 8.3333e-07, 'epoch': 0.64, 'throughput': 625.95}

[INFO|callbacks.py:310] 2024-07-10 09:49:06,220 >> {'loss': 0.2265, 'learning_rate': 8.4167e-07, 'epoch': 0.65, 'throughput': 625.80}

[INFO|callbacks.py:310] 2024-07-10 09:49:17,286 >> {'loss': 0.2443, 'learning_rate': 8.5000e-07, 'epoch': 0.66, 'throughput': 625.83}

[INFO|callbacks.py:310] 2024-07-10 09:49:28,351 >> {'loss': 0.2086, 'learning_rate': 8.5833e-07, 'epoch': 0.66, 'throughput': 625.91}

[INFO|callbacks.py:310] 2024-07-10 09:49:39,428 >> {'loss': 0.1915, 'learning_rate': 8.6667e-07, 'epoch': 0.67, 'throughput': 625.95}

[INFO|callbacks.py:310] 2024-07-10 09:49:50,517 >> {'loss': 0.1967, 'learning_rate': 8.7500e-07, 'epoch': 0.68, 'throughput': 626.20}

[INFO|callbacks.py:310] 2024-07-10 09:50:01,619 >> {'loss': 0.1880, 'learning_rate': 8.8333e-07, 'epoch': 0.68, 'throughput': 626.18}

[INFO|callbacks.py:310] 2024-07-10 09:50:12,726 >> {'loss': 0.1895, 'learning_rate': 8.9167e-07, 'epoch': 0.69, 'throughput': 625.93}

[INFO|callbacks.py:310] 2024-07-10 09:50:23,838 >> {'loss': 0.1667, 'learning_rate': 9.0000e-07, 'epoch': 0.70, 'throughput': 625.76}

[INFO|callbacks.py:310] 2024-07-10 09:50:34,871 >> {'loss': 0.1976, 'learning_rate': 9.0833e-07, 'epoch': 0.70, 'throughput': 626.08}

[INFO|callbacks.py:310] 2024-07-10 09:50:45,934 >> {'loss': 0.2057, 'learning_rate': 9.1667e-07, 'epoch': 0.71, 'throughput': 626.31}

[INFO|callbacks.py:310] 2024-07-10 09:50:56,994 >> {'loss': 0.2002, 'learning_rate': 9.2500e-07, 'epoch': 0.71, 'throughput': 626.09}

[INFO|callbacks.py:310] 2024-07-10 09:51:08,070 >> {'loss': 0.2173, 'learning_rate': 9.3333e-07, 'epoch': 0.72, 'throughput': 626.01}

[INFO|callbacks.py:310] 2024-07-10 09:51:19,151 >> {'loss': 0.1756, 'learning_rate': 9.4167e-07, 'epoch': 0.73, 'throughput': 625.93}

[INFO|callbacks.py:310] 2024-07-10 09:51:30,234 >> {'loss': 0.1786, 'learning_rate': 9.5000e-07, 'epoch': 0.73, 'throughput': 626.19}

[INFO|callbacks.py:310] 2024-07-10 09:51:41,326 >> {'loss': 0.1879, 'learning_rate': 9.5833e-07, 'epoch': 0.74, 'throughput': 626.40}

[INFO|callbacks.py:310] 2024-07-10 09:51:52,430 >> {'loss': 0.1730, 'learning_rate': 9.6667e-07, 'epoch': 0.75, 'throughput': 626.51}

[INFO|callbacks.py:310] 2024-07-10 09:52:03,516 >> {'loss': 0.1714, 'learning_rate': 9.7500e-07, 'epoch': 0.75, 'throughput': 626.27}

[INFO|callbacks.py:310] 2024-07-10 09:52:14,581 >> {'loss': 0.2015, 'learning_rate': 9.8333e-07, 'epoch': 0.76, 'throughput': 626.38}

[INFO|callbacks.py:310] 2024-07-10 09:52:25,633 >> {'loss': 0.1847, 'learning_rate': 9.9167e-07, 'epoch': 0.77, 'throughput': 626.57}

[INFO|callbacks.py:310] 2024-07-10 09:52:36,711 >> {'loss': 0.1876, 'learning_rate': 1.0000e-06, 'epoch': 0.77, 'throughput': 626.56}

[INFO|callbacks.py:310] 2024-07-10 09:52:47,767 >> {'loss': 0.1937, 'learning_rate': 1.0083e-06, 'epoch': 0.78, 'throughput': 626.51}

[INFO|callbacks.py:310] 2024-07-10 09:52:58,855 >> {'loss': 0.1974, 'learning_rate': 1.0167e-06, 'epoch': 0.79, 'throughput': 626.65}

[INFO|callbacks.py:310] 2024-07-10 09:53:09,927 >> {'loss': 0.2099, 'learning_rate': 1.0250e-06, 'epoch': 0.79, 'throughput': 626.67}

[INFO|callbacks.py:310] 2024-07-10 09:53:21,036 >> {'loss': 0.2239, 'learning_rate': 1.0333e-06, 'epoch': 0.80, 'throughput': 626.55}

[INFO|callbacks.py:310] 2024-07-10 09:53:32,129 >> {'loss': 0.1953, 'learning_rate': 1.0417e-06, 'epoch': 0.80, 'throughput': 626.52}

[INFO|callbacks.py:310] 2024-07-10 09:53:43,192 >> {'loss': 0.1906, 'learning_rate': 1.0500e-06, 'epoch': 0.81, 'throughput': 626.91}

[INFO|callbacks.py:310] 2024-07-10 09:53:54,235 >> {'loss': 0.1852, 'learning_rate': 1.0583e-06, 'epoch': 0.82, 'throughput': 626.83}

[INFO|callbacks.py:310] 2024-07-10 09:54:05,302 >> {'loss': 0.1980, 'learning_rate': 1.0667e-06, 'epoch': 0.82, 'throughput': 626.82}

[INFO|callbacks.py:310] 2024-07-10 09:54:16,377 >> {'loss': 0.2105, 'learning_rate': 1.0750e-06, 'epoch': 0.83, 'throughput': 627.18}

[INFO|callbacks.py:310] 2024-07-10 09:54:27,441 >> {'loss': 0.1899, 'learning_rate': 1.0833e-06, 'epoch': 0.84, 'throughput': 627.22}

[INFO|callbacks.py:310] 2024-07-10 09:54:38,533 >> {'loss': 0.1742, 'learning_rate': 1.0917e-06, 'epoch': 0.84, 'throughput': 627.28}

[INFO|callbacks.py:310] 2024-07-10 09:54:49,617 >> {'loss': 0.1585, 'learning_rate': 1.1000e-06, 'epoch': 0.85, 'throughput': 627.54}

[INFO|callbacks.py:310] 2024-07-10 09:55:00,716 >> {'loss': 0.1878, 'learning_rate': 1.1083e-06, 'epoch': 0.86, 'throughput': 627.69}

[INFO|callbacks.py:310] 2024-07-10 09:55:11,807 >> {'loss': 0.1748, 'learning_rate': 1.1167e-06, 'epoch': 0.86, 'throughput': 627.79}

[INFO|callbacks.py:310] 2024-07-10 09:55:22,872 >> {'loss': 0.1944, 'learning_rate': 1.1250e-06, 'epoch': 0.87, 'throughput': 627.55}

[INFO|callbacks.py:310] 2024-07-10 09:55:33,926 >> {'loss': 0.1727, 'learning_rate': 1.1333e-06, 'epoch': 0.88, 'throughput': 627.40}

[INFO|callbacks.py:310] 2024-07-10 09:55:44,980 >> {'loss': 0.1748, 'learning_rate': 1.1417e-06, 'epoch': 0.88, 'throughput': 627.51}

[INFO|callbacks.py:310] 2024-07-10 09:55:56,057 >> {'loss': 0.1625, 'learning_rate': 1.1500e-06, 'epoch': 0.89, 'throughput': 627.55}

[INFO|callbacks.py:310] 2024-07-10 09:56:07,109 >> {'loss': 0.1649, 'learning_rate': 1.1583e-06, 'epoch': 0.89, 'throughput': 627.37}

[INFO|callbacks.py:310] 2024-07-10 09:56:18,205 >> {'loss': 0.1738, 'learning_rate': 1.1667e-06, 'epoch': 0.90, 'throughput': 627.21}

[INFO|callbacks.py:310] 2024-07-10 09:56:29,283 >> {'loss': 0.2221, 'learning_rate': 1.1750e-06, 'epoch': 0.91, 'throughput': 627.14}

[INFO|callbacks.py:310] 2024-07-10 09:56:40,397 >> {'loss': 0.1989, 'learning_rate': 1.1833e-06, 'epoch': 0.91, 'throughput': 627.09}

[INFO|callbacks.py:310] 2024-07-10 09:56:51,500 >> {'loss': 0.1569, 'learning_rate': 1.1917e-06, 'epoch': 0.92, 'throughput': 627.32}

[INFO|callbacks.py:310] 2024-07-10 09:57:02,570 >> {'loss': 0.1594, 'learning_rate': 1.2000e-06, 'epoch': 0.93, 'throughput': 627.15}

[INFO|callbacks.py:310] 2024-07-10 09:57:13,611 >> {'loss': 0.1609, 'learning_rate': 1.2083e-06, 'epoch': 0.93, 'throughput': 627.25}

[INFO|callbacks.py:310] 2024-07-10 09:57:24,671 >> {'loss': 0.1649, 'learning_rate': 1.2167e-06, 'epoch': 0.94, 'throughput': 627.25}

[INFO|callbacks.py:310] 2024-07-10 09:57:35,745 >> {'loss': 0.1412, 'learning_rate': 1.2250e-06, 'epoch': 0.95, 'throughput': 627.06}

[INFO|callbacks.py:310] 2024-07-10 09:57:46,808 >> {'loss': 0.1650, 'learning_rate': 1.2333e-06, 'epoch': 0.95, 'throughput': 627.08}

[INFO|callbacks.py:310] 2024-07-10 09:57:57,903 >> {'loss': 0.1971, 'learning_rate': 1.2417e-06, 'epoch': 0.96, 'throughput': 627.15}

[INFO|callbacks.py:310] 2024-07-10 09:58:08,999 >> {'loss': 0.1843, 'learning_rate': 1.2500e-06, 'epoch': 0.97, 'throughput': 627.22}

[INFO|callbacks.py:310] 2024-07-10 09:58:20,105 >> {'loss': 0.1628, 'learning_rate': 1.2583e-06, 'epoch': 0.97, 'throughput': 627.19}

[INFO|callbacks.py:310] 2024-07-10 09:58:31,213 >> {'loss': 0.1878, 'learning_rate': 1.2667e-06, 'epoch': 0.98, 'throughput': 627.21}

[INFO|callbacks.py:310] 2024-07-10 09:58:42,266 >> {'loss': 0.1153, 'learning_rate': 1.2750e-06, 'epoch': 0.98, 'throughput': 627.12}

[INFO|callbacks.py:310] 2024-07-10 09:58:53,326 >> {'loss': 0.1622, 'learning_rate': 1.2833e-06, 'epoch': 0.99, 'throughput': 627.19}

[INFO|callbacks.py:310] 2024-07-10 09:59:04,393 >> {'loss': 0.1549, 'learning_rate': 1.2917e-06, 'epoch': 1.00, 'throughput': 627.31}

[INFO|callbacks.py:310] 2024-07-10 09:59:15,452 >> {'loss': 0.1729, 'learning_rate': 1.3000e-06, 'epoch': 1.00, 'throughput': 627.57}

[INFO|callbacks.py:310] 2024-07-10 09:59:26,529 >> {'loss': 0.1198, 'learning_rate': 1.3083e-06, 'epoch': 1.01, 'throughput': 627.77}

[INFO|callbacks.py:310] 2024-07-10 09:59:37,623 >> {'loss': 0.1723, 'learning_rate': 1.3167e-06, 'epoch': 1.02, 'throughput': 627.70}

[INFO|callbacks.py:310] 2024-07-10 09:59:48,714 >> {'loss': 0.1485, 'learning_rate': 1.3250e-06, 'epoch': 1.02, 'throughput': 627.63}

[INFO|callbacks.py:310] 2024-07-10 09:59:59,831 >> {'loss': 0.1757, 'learning_rate': 1.3333e-06, 'epoch': 1.03, 'throughput': 627.47}

[INFO|callbacks.py:310] 2024-07-10 10:00:10,914 >> {'loss': 0.1907, 'learning_rate': 1.3417e-06, 'epoch': 1.04, 'throughput': 627.38}

[INFO|callbacks.py:310] 2024-07-10 10:00:21,964 >> {'loss': 0.1842, 'learning_rate': 1.3500e-06, 'epoch': 1.04, 'throughput': 627.39}

[INFO|callbacks.py:310] 2024-07-10 10:00:33,017 >> {'loss': 0.1549, 'learning_rate': 1.3583e-06, 'epoch': 1.05, 'throughput': 627.36}

[INFO|callbacks.py:310] 2024-07-10 10:00:44,102 >> {'loss': 0.1530, 'learning_rate': 1.3667e-06, 'epoch': 1.06, 'throughput': 627.55}

[INFO|callbacks.py:310] 2024-07-10 10:00:55,163 >> {'loss': 0.1532, 'learning_rate': 1.3750e-06, 'epoch': 1.06, 'throughput': 627.47}

[INFO|callbacks.py:310] 2024-07-10 10:01:06,241 >> {'loss': 0.1629, 'learning_rate': 1.3833e-06, 'epoch': 1.07, 'throughput': 627.42}

[INFO|callbacks.py:310] 2024-07-10 10:01:17,335 >> {'loss': 0.1708, 'learning_rate': 1.3917e-06, 'epoch': 1.07, 'throughput': 627.48}

[INFO|callbacks.py:310] 2024-07-10 10:01:28,422 >> {'loss': 0.1461, 'learning_rate': 1.4000e-06, 'epoch': 1.08, 'throughput': 627.66}

[INFO|callbacks.py:310] 2024-07-10 10:01:39,514 >> {'loss': 0.1402, 'learning_rate': 1.4083e-06, 'epoch': 1.09, 'throughput': 627.69}

[INFO|callbacks.py:310] 2024-07-10 10:01:50,597 >> {'loss': 0.1430, 'learning_rate': 1.4167e-06, 'epoch': 1.09, 'throughput': 627.84}

[INFO|callbacks.py:310] 2024-07-10 10:02:01,647 >> {'loss': 0.1454, 'learning_rate': 1.4250e-06, 'epoch': 1.10, 'throughput': 627.85}

[INFO|callbacks.py:310] 2024-07-10 10:02:12,707 >> {'loss': 0.1286, 'learning_rate': 1.4333e-06, 'epoch': 1.11, 'throughput': 627.77}

[INFO|callbacks.py:310] 2024-07-10 10:02:23,790 >> {'loss': 0.1523, 'learning_rate': 1.4417e-06, 'epoch': 1.11, 'throughput': 627.62}

[INFO|callbacks.py:310] 2024-07-10 10:02:34,856 >> {'loss': 0.1345, 'learning_rate': 1.4500e-06, 'epoch': 1.12, 'throughput': 627.52}

[INFO|callbacks.py:310] 2024-07-10 10:02:45,936 >> {'loss': 0.1749, 'learning_rate': 1.4583e-06, 'epoch': 1.13, 'throughput': 627.59}

[INFO|callbacks.py:310] 2024-07-10 10:02:57,011 >> {'loss': 0.1660, 'learning_rate': 1.4667e-06, 'epoch': 1.13, 'throughput': 627.77}

[INFO|callbacks.py:310] 2024-07-10 10:03:08,126 >> {'loss': 0.1708, 'learning_rate': 1.4750e-06, 'epoch': 1.14, 'throughput': 627.81}

[INFO|callbacks.py:310] 2024-07-10 10:03:19,236 >> {'loss': 0.1363, 'learning_rate': 1.4833e-06, 'epoch': 1.15, 'throughput': 627.67}

[INFO|callbacks.py:310] 2024-07-10 10:03:30,288 >> {'loss': 0.1369, 'learning_rate': 1.4917e-06, 'epoch': 1.15, 'throughput': 627.89}

[INFO|callbacks.py:310] 2024-07-10 10:03:41,338 >> {'loss': 0.1632, 'learning_rate': 1.5000e-06, 'epoch': 1.16, 'throughput': 627.96}

[INFO|callbacks.py:310] 2024-07-10 10:03:52,390 >> {'loss': 0.1734, 'learning_rate': 1.5083e-06, 'epoch': 1.16, 'throughput': 628.08}

[INFO|callbacks.py:310] 2024-07-10 10:04:03,453 >> {'loss': 0.1350, 'learning_rate': 1.5167e-06, 'epoch': 1.17, 'throughput': 628.15}

[INFO|callbacks.py:310] 2024-07-10 10:04:14,498 >> {'loss': 0.1355, 'learning_rate': 1.5250e-06, 'epoch': 1.18, 'throughput': 628.23}

[INFO|callbacks.py:310] 2024-07-10 10:04:25,603 >> {'loss': 0.1324, 'learning_rate': 1.5333e-06, 'epoch': 1.18, 'throughput': 628.25}

[INFO|callbacks.py:310] 2024-07-10 10:04:36,684 >> {'loss': 0.1754, 'learning_rate': 1.5417e-06, 'epoch': 1.19, 'throughput': 628.12}

[INFO|callbacks.py:310] 2024-07-10 10:04:47,787 >> {'loss': 0.1484, 'learning_rate': 1.5500e-06, 'epoch': 1.20, 'throughput': 628.23}

[INFO|callbacks.py:310] 2024-07-10 10:04:58,887 >> {'loss': 0.1315, 'learning_rate': 1.5583e-06, 'epoch': 1.20, 'throughput': 628.17}

[INFO|callbacks.py:310] 2024-07-10 10:05:09,934 >> {'loss': 0.1665, 'learning_rate': 1.5667e-06, 'epoch': 1.21, 'throughput': 628.32}

[INFO|callbacks.py:310] 2024-07-10 10:05:20,981 >> {'loss': 0.1406, 'learning_rate': 1.5750e-06, 'epoch': 1.22, 'throughput': 628.54}

[INFO|callbacks.py:310] 2024-07-10 10:05:32,049 >> {'loss': 0.1391, 'learning_rate': 1.5833e-06, 'epoch': 1.22, 'throughput': 628.80}

[INFO|callbacks.py:310] 2024-07-10 10:05:43,101 >> {'loss': 0.1339, 'learning_rate': 1.5917e-06, 'epoch': 1.23, 'throughput': 628.85}

[INFO|callbacks.py:310] 2024-07-10 10:05:54,165 >> {'loss': 0.1613, 'learning_rate': 1.6000e-06, 'epoch': 1.24, 'throughput': 628.74}

[INFO|callbacks.py:310] 2024-07-10 10:06:05,258 >> {'loss': 0.1274, 'learning_rate': 1.6083e-06, 'epoch': 1.24, 'throughput': 628.59}

[INFO|callbacks.py:310] 2024-07-10 10:06:16,340 >> {'loss': 0.1882, 'learning_rate': 1.6167e-06, 'epoch': 1.25, 'throughput': 628.63}

[INFO|callbacks.py:310] 2024-07-10 10:06:27,453 >> {'loss': 0.1106, 'learning_rate': 1.6250e-06, 'epoch': 1.26, 'throughput': 628.59}

[INFO|callbacks.py:310] 2024-07-10 10:06:38,561 >> {'loss': 0.1249, 'learning_rate': 1.6333e-06, 'epoch': 1.26, 'throughput': 628.62}

[INFO|callbacks.py:310] 2024-07-10 10:06:49,605 >> {'loss': 0.1499, 'learning_rate': 1.6417e-06, 'epoch': 1.27, 'throughput': 628.76}

[INFO|callbacks.py:310] 2024-07-10 10:07:00,664 >> {'loss': 0.1512, 'learning_rate': 1.6500e-06, 'epoch': 1.27, 'throughput': 628.61}

[INFO|callbacks.py:310] 2024-07-10 10:07:11,735 >> {'loss': 0.1166, 'learning_rate': 1.6583e-06, 'epoch': 1.28, 'throughput': 628.35}

[INFO|callbacks.py:310] 2024-07-10 10:07:22,801 >> {'loss': 0.2100, 'learning_rate': 1.6667e-06, 'epoch': 1.29, 'throughput': 628.25}

[INFO|callbacks.py:310] 2024-07-10 10:07:33,882 >> {'loss': 0.1915, 'learning_rate': 1.6750e-06, 'epoch': 1.29, 'throughput': 628.26}

[INFO|callbacks.py:310] 2024-07-10 10:07:44,965 >> {'loss': 0.1911, 'learning_rate': 1.6833e-06, 'epoch': 1.30, 'throughput': 628.43}

[INFO|callbacks.py:310] 2024-07-10 10:07:56,062 >> {'loss': 0.1487, 'learning_rate': 1.6917e-06, 'epoch': 1.31, 'throughput': 628.31}

[INFO|callbacks.py:310] 2024-07-10 10:08:07,174 >> {'loss': 0.2154, 'learning_rate': 1.7000e-06, 'epoch': 1.31, 'throughput': 628.40}

[INFO|callbacks.py:310] 2024-07-10 10:08:18,265 >> {'loss': 0.2909, 'learning_rate': 1.7083e-06, 'epoch': 1.32, 'throughput': 628.32}

[INFO|callbacks.py:310] 2024-07-10 10:08:29,310 >> {'loss': 0.2710, 'learning_rate': 1.7167e-06, 'epoch': 1.33, 'throughput': 628.26}

[INFO|callbacks.py:310] 2024-07-10 10:08:40,375 >> {'loss': 0.1616, 'learning_rate': 1.7250e-06, 'epoch': 1.33, 'throughput': 628.34}

[INFO|callbacks.py:310] 2024-07-10 10:08:51,453 >> {'loss': 0.1427, 'learning_rate': 1.7333e-06, 'epoch': 1.34, 'throughput': 628.19}

[INFO|callbacks.py:310] 2024-07-10 10:09:02,518 >> {'loss': 0.1602, 'learning_rate': 1.7417e-06, 'epoch': 1.35, 'throughput': 628.09}

[INFO|callbacks.py:310] 2024-07-10 10:09:13,600 >> {'loss': 0.1690, 'learning_rate': 1.7500e-06, 'epoch': 1.35, 'throughput': 627.98}

[INFO|callbacks.py:310] 2024-07-10 10:09:24,701 >> {'loss': 0.1285, 'learning_rate': 1.7583e-06, 'epoch': 1.36, 'throughput': 627.95}

[INFO|callbacks.py:310] 2024-07-10 10:09:35,810 >> {'loss': 0.1636, 'learning_rate': 1.7667e-06, 'epoch': 1.36, 'throughput': 627.99}

[INFO|callbacks.py:310] 2024-07-10 10:09:46,911 >> {'loss': 0.1702, 'learning_rate': 1.7750e-06, 'epoch': 1.37, 'throughput': 628.02}

[INFO|callbacks.py:310] 2024-07-10 10:09:58,009 >> {'loss': 0.1562, 'learning_rate': 1.7833e-06, 'epoch': 1.38, 'throughput': 627.99}

[INFO|callbacks.py:310] 2024-07-10 10:10:09,055 >> {'loss': 0.1537, 'learning_rate': 1.7917e-06, 'epoch': 1.38, 'throughput': 628.06}

[INFO|callbacks.py:310] 2024-07-10 10:10:20,108 >> {'loss': 0.1114, 'learning_rate': 1.8000e-06, 'epoch': 1.39, 'throughput': 628.15}

[INFO|callbacks.py:310] 2024-07-10 10:10:31,183 >> {'loss': 0.1017, 'learning_rate': 1.8083e-06, 'epoch': 1.40, 'throughput': 628.35}

[INFO|callbacks.py:310] 2024-07-10 10:10:42,251 >> {'loss': 0.1184, 'learning_rate': 1.8167e-06, 'epoch': 1.40, 'throughput': 628.35}

[INFO|callbacks.py:310] 2024-07-10 10:10:53,342 >> {'loss': 0.1343, 'learning_rate': 1.8250e-06, 'epoch': 1.41, 'throughput': 628.40}

[INFO|callbacks.py:310] 2024-07-10 10:11:04,442 >> {'loss': 0.1271, 'learning_rate': 1.8333e-06, 'epoch': 1.42, 'throughput': 628.47}

[INFO|callbacks.py:310] 2024-07-10 10:11:15,540 >> {'loss': 0.1383, 'learning_rate': 1.8417e-06, 'epoch': 1.42, 'throughput': 628.42}

[INFO|callbacks.py:310] 2024-07-10 10:11:26,643 >> {'loss': 0.1558, 'learning_rate': 1.8500e-06, 'epoch': 1.43, 'throughput': 628.57}

[INFO|callbacks.py:310] 2024-07-10 10:11:37,742 >> {'loss': 0.1312, 'learning_rate': 1.8583e-06, 'epoch': 1.44, 'throughput': 628.57}

[INFO|callbacks.py:310] 2024-07-10 10:11:48,791 >> {'loss': 0.1212, 'learning_rate': 1.8667e-06, 'epoch': 1.44, 'throughput': 628.54}

[INFO|callbacks.py:310] 2024-07-10 10:11:59,847 >> {'loss': 0.1342, 'learning_rate': 1.8750e-06, 'epoch': 1.45, 'throughput': 628.57}

[INFO|callbacks.py:310] 2024-07-10 10:12:10,930 >> {'loss': 0.0960, 'learning_rate': 1.8833e-06, 'epoch': 1.45, 'throughput': 628.46}

[INFO|callbacks.py:310] 2024-07-10 10:12:21,984 >> {'loss': 0.0982, 'learning_rate': 1.8917e-06, 'epoch': 1.46, 'throughput': 628.44}

[INFO|callbacks.py:310] 2024-07-10 10:12:33,101 >> {'loss': 0.1099, 'learning_rate': 1.9000e-06, 'epoch': 1.47, 'throughput': 628.29}

[INFO|callbacks.py:310] 2024-07-10 10:12:44,191 >> {'loss': 0.1265, 'learning_rate': 1.9083e-06, 'epoch': 1.47, 'throughput': 628.11}

[INFO|callbacks.py:310] 2024-07-10 10:12:55,301 >> {'loss': 0.1066, 'learning_rate': 1.9167e-06, 'epoch': 1.48, 'throughput': 628.17}

[INFO|callbacks.py:310] 2024-07-10 10:13:06,412 >> {'loss': 0.1593, 'learning_rate': 1.9250e-06, 'epoch': 1.49, 'throughput': 628.06}

[INFO|callbacks.py:310] 2024-07-10 10:13:17,482 >> {'loss': 0.1891, 'learning_rate': 1.9333e-06, 'epoch': 1.49, 'throughput': 627.95}

[INFO|callbacks.py:310] 2024-07-10 10:13:28,527 >> {'loss': 0.0867, 'learning_rate': 1.9417e-06, 'epoch': 1.50, 'throughput': 627.94}

[INFO|callbacks.py:310] 2024-07-10 10:13:39,602 >> {'loss': 0.1286, 'learning_rate': 1.9500e-06, 'epoch': 1.51, 'throughput': 627.92}

[INFO|callbacks.py:310] 2024-07-10 10:13:50,679 >> {'loss': 0.1262, 'learning_rate': 1.9583e-06, 'epoch': 1.51, 'throughput': 627.86}

[INFO|callbacks.py:310] 2024-07-10 10:14:01,751 >> {'loss': 0.0878, 'learning_rate': 1.9667e-06, 'epoch': 1.52, 'throughput': 627.84}

[INFO|callbacks.py:310] 2024-07-10 10:14:12,847 >> {'loss': 0.1532, 'learning_rate': 1.9750e-06, 'epoch': 1.53, 'throughput': 627.87}

[INFO|callbacks.py:310] 2024-07-10 10:14:23,927 >> {'loss': 0.1232, 'learning_rate': 1.9833e-06, 'epoch': 1.53, 'throughput': 627.91}

[INFO|callbacks.py:310] 2024-07-10 10:14:35,033 >> {'loss': 0.1119, 'learning_rate': 1.9917e-06, 'epoch': 1.54, 'throughput': 627.87}

[INFO|callbacks.py:310] 2024-07-10 10:14:46,134 >> {'loss': 0.1334, 'learning_rate': 2.0000e-06, 'epoch': 1.54, 'throughput': 627.79}

[INFO|callbacks.py:310] 2024-07-10 10:14:57,194 >> {'loss': 0.1224, 'learning_rate': 2.0083e-06, 'epoch': 1.55, 'throughput': 627.75}

[INFO|callbacks.py:310] 2024-07-10 10:15:08,251 >> {'loss': 0.1060, 'learning_rate': 2.0167e-06, 'epoch': 1.56, 'throughput': 627.72}

[INFO|callbacks.py:310] 2024-07-10 10:15:19,306 >> {'loss': 0.1237, 'learning_rate': 2.0250e-06, 'epoch': 1.56, 'throughput': 627.75}

[INFO|callbacks.py:310] 2024-07-10 10:15:30,373 >> {'loss': 0.0950, 'learning_rate': 2.0333e-06, 'epoch': 1.57, 'throughput': 627.72}

[INFO|callbacks.py:310] 2024-07-10 10:15:41,436 >> {'loss': 0.0741, 'learning_rate': 2.0417e-06, 'epoch': 1.58, 'throughput': 627.77}

[INFO|callbacks.py:310] 2024-07-10 10:15:52,535 >> {'loss': 0.1239, 'learning_rate': 2.0500e-06, 'epoch': 1.58, 'throughput': 627.82}

[INFO|callbacks.py:310] 2024-07-10 10:16:03,626 >> {'loss': 0.0910, 'learning_rate': 2.0583e-06, 'epoch': 1.59, 'throughput': 627.78}

[INFO|callbacks.py:310] 2024-07-10 10:16:14,752 >> {'loss': 0.1403, 'learning_rate': 2.0667e-06, 'epoch': 1.60, 'throughput': 627.75}

[INFO|callbacks.py:310] 2024-07-10 10:16:25,845 >> {'loss': 0.1638, 'learning_rate': 2.0750e-06, 'epoch': 1.60, 'throughput': 627.93}

[INFO|callbacks.py:310] 2024-07-10 10:16:36,895 >> {'loss': 0.1343, 'learning_rate': 2.0833e-06, 'epoch': 1.61, 'throughput': 627.98}

[INFO|callbacks.py:310] 2024-07-10 10:16:47,938 >> {'loss': 0.1558, 'learning_rate': 2.0917e-06, 'epoch': 1.62, 'throughput': 628.09}

[INFO|callbacks.py:310] 2024-07-10 10:16:59,013 >> {'loss': 0.0809, 'learning_rate': 2.1000e-06, 'epoch': 1.62, 'throughput': 628.04}

[INFO|callbacks.py:310] 2024-07-10 10:17:10,075 >> {'loss': 0.1435, 'learning_rate': 2.1083e-06, 'epoch': 1.63, 'throughput': 628.05}

[INFO|callbacks.py:310] 2024-07-10 10:17:21,151 >> {'loss': 0.0887, 'learning_rate': 2.1167e-06, 'epoch': 1.63, 'throughput': 628.12}

[INFO|callbacks.py:310] 2024-07-10 10:17:32,247 >> {'loss': 0.1038, 'learning_rate': 2.1250e-06, 'epoch': 1.64, 'throughput': 628.10}

[INFO|callbacks.py:310] 2024-07-10 10:17:43,350 >> {'loss': 0.0889, 'learning_rate': 2.1333e-06, 'epoch': 1.65, 'throughput': 628.31}

[INFO|callbacks.py:310] 2024-07-10 10:17:54,457 >> {'loss': 0.0751, 'learning_rate': 2.1417e-06, 'epoch': 1.65, 'throughput': 628.11}

[INFO|callbacks.py:310] 2024-07-10 10:18:05,562 >> {'loss': 0.0921, 'learning_rate': 2.1500e-06, 'epoch': 1.66, 'throughput': 628.06}

[INFO|callbacks.py:310] 2024-07-10 10:18:16,609 >> {'loss': 0.1102, 'learning_rate': 2.1583e-06, 'epoch': 1.67, 'throughput': 628.16}

[INFO|callbacks.py:310] 2024-07-10 10:18:27,683 >> {'loss': 0.1036, 'learning_rate': 2.1667e-06, 'epoch': 1.67, 'throughput': 628.14}

[INFO|callbacks.py:310] 2024-07-10 10:18:38,758 >> {'loss': 0.1093, 'learning_rate': 2.1750e-06, 'epoch': 1.68, 'throughput': 628.16}

[INFO|callbacks.py:310] 2024-07-10 10:18:49,829 >> {'loss': 0.1714, 'learning_rate': 2.1833e-06, 'epoch': 1.69, 'throughput': 628.18}

[INFO|callbacks.py:310] 2024-07-10 10:19:00,900 >> {'loss': 0.1007, 'learning_rate': 2.1917e-06, 'epoch': 1.69, 'throughput': 628.16}

[INFO|callbacks.py:310] 2024-07-10 10:19:11,995 >> {'loss': 0.1090, 'learning_rate': 2.2000e-06, 'epoch': 1.70, 'throughput': 628.07}

[INFO|callbacks.py:310] 2024-07-10 10:19:23,097 >> {'loss': 0.0574, 'learning_rate': 2.2083e-06, 'epoch': 1.71, 'throughput': 628.11}

[INFO|callbacks.py:310] 2024-07-10 10:19:34,224 >> {'loss': 0.0697, 'learning_rate': 2.2167e-06, 'epoch': 1.71, 'throughput': 628.02}

[INFO|callbacks.py:310] 2024-07-10 10:19:45,308 >> {'loss': 0.1188, 'learning_rate': 2.2250e-06, 'epoch': 1.72, 'throughput': 627.97}

[INFO|callbacks.py:310] 2024-07-10 10:19:56,363 >> {'loss': 0.0998, 'learning_rate': 2.2333e-06, 'epoch': 1.72, 'throughput': 627.98}

[INFO|callbacks.py:310] 2024-07-10 10:20:07,411 >> {'loss': 0.1083, 'learning_rate': 2.2417e-06, 'epoch': 1.73, 'throughput': 627.95}

[INFO|callbacks.py:310] 2024-07-10 10:20:18,493 >> {'loss': 0.0749, 'learning_rate': 2.2500e-06, 'epoch': 1.74, 'throughput': 627.99}

[INFO|callbacks.py:310] 2024-07-10 10:20:29,573 >> {'loss': 0.0771, 'learning_rate': 2.2583e-06, 'epoch': 1.74, 'throughput': 627.94}

[INFO|callbacks.py:310] 2024-07-10 10:20:40,665 >> {'loss': 0.0646, 'learning_rate': 2.2667e-06, 'epoch': 1.75, 'throughput': 627.98}

[INFO|callbacks.py:310] 2024-07-10 10:20:51,753 >> {'loss': 0.1147, 'learning_rate': 2.2750e-06, 'epoch': 1.76, 'throughput': 628.03}

[INFO|callbacks.py:310] 2024-07-10 10:21:02,859 >> {'loss': 0.0917, 'learning_rate': 2.2833e-06, 'epoch': 1.76, 'throughput': 628.02}

[INFO|callbacks.py:310] 2024-07-10 10:21:13,970 >> {'loss': 0.1060, 'learning_rate': 2.2917e-06, 'epoch': 1.77, 'throughput': 628.01}

[INFO|callbacks.py:310] 2024-07-10 10:21:25,036 >> {'loss': 0.0849, 'learning_rate': 2.3000e-06, 'epoch': 1.78, 'throughput': 628.04}

[INFO|callbacks.py:310] 2024-07-10 10:21:36,081 >> {'loss': 0.1074, 'learning_rate': 2.3083e-06, 'epoch': 1.78, 'throughput': 628.08}

[INFO|callbacks.py:310] 2024-07-10 10:21:47,142 >> {'loss': 0.0981, 'learning_rate': 2.3167e-06, 'epoch': 1.79, 'throughput': 628.08}

[INFO|callbacks.py:310] 2024-07-10 10:21:58,216 >> {'loss': 0.1008, 'learning_rate': 2.3250e-06, 'epoch': 1.80, 'throughput': 628.09}

[INFO|callbacks.py:310] 2024-07-10 10:22:09,264 >> {'loss': 0.1032, 'learning_rate': 2.3333e-06, 'epoch': 1.80, 'throughput': 628.13}

[INFO|callbacks.py:310] 2024-07-10 10:22:20,355 >> {'loss': 0.0711, 'learning_rate': 2.3417e-06, 'epoch': 1.81, 'throughput': 628.10}

[INFO|callbacks.py:310] 2024-07-10 10:22:31,450 >> {'loss': 0.0901, 'learning_rate': 2.3500e-06, 'epoch': 1.81, 'throughput': 628.14}

[INFO|callbacks.py:310] 2024-07-10 10:22:42,563 >> {'loss': 0.0847, 'learning_rate': 2.3583e-06, 'epoch': 1.82, 'throughput': 628.01}

[INFO|callbacks.py:310] 2024-07-10 10:22:53,668 >> {'loss': 0.0752, 'learning_rate': 2.3667e-06, 'epoch': 1.83, 'throughput': 627.92}

[INFO|callbacks.py:310] 2024-07-10 10:23:04,734 >> {'loss': 0.0680, 'learning_rate': 2.3750e-06, 'epoch': 1.83, 'throughput': 627.95}

[INFO|callbacks.py:310] 2024-07-10 10:23:15,786 >> {'loss': 0.1088, 'learning_rate': 2.3833e-06, 'epoch': 1.84, 'throughput': 627.90}

[INFO|callbacks.py:310] 2024-07-10 10:23:26,851 >> {'loss': 0.1017, 'learning_rate': 2.3917e-06, 'epoch': 1.85, 'throughput': 627.99}

[INFO|callbacks.py:310] 2024-07-10 10:23:37,920 >> {'loss': 0.1300, 'learning_rate': 2.4000e-06, 'epoch': 1.85, 'throughput': 628.03}

[INFO|callbacks.py:310] 2024-07-10 10:23:48,988 >> {'loss': 0.0998, 'learning_rate': 2.4083e-06, 'epoch': 1.86, 'throughput': 627.95}

[INFO|callbacks.py:310] 2024-07-10 10:24:00,086 >> {'loss': 0.0779, 'learning_rate': 2.4167e-06, 'epoch': 1.87, 'throughput': 627.90}

[INFO|callbacks.py:310] 2024-07-10 10:24:11,165 >> {'loss': 0.0907, 'learning_rate': 2.4250e-06, 'epoch': 1.87, 'throughput': 627.82}

[INFO|callbacks.py:310] 2024-07-10 10:24:22,274 >> {'loss': 0.0911, 'learning_rate': 2.4333e-06, 'epoch': 1.88, 'throughput': 627.88}

[INFO|callbacks.py:310] 2024-07-10 10:24:33,373 >> {'loss': 0.0987, 'learning_rate': 2.4417e-06, 'epoch': 1.89, 'throughput': 627.87}

[INFO|callbacks.py:310] 2024-07-10 10:24:44,449 >> {'loss': 0.1254, 'learning_rate': 2.4500e-06, 'epoch': 1.89, 'throughput': 627.89}

[INFO|callbacks.py:310] 2024-07-10 10:24:55,500 >> {'loss': 0.0494, 'learning_rate': 2.4583e-06, 'epoch': 1.90, 'throughput': 627.82}

[INFO|callbacks.py:310] 2024-07-10 10:25:06,577 >> {'loss': 0.1082, 'learning_rate': 2.4667e-06, 'epoch': 1.91, 'throughput': 627.72}

[INFO|callbacks.py:310] 2024-07-10 10:25:17,634 >> {'loss': 0.0785, 'learning_rate': 2.4750e-06, 'epoch': 1.91, 'throughput': 627.95}

[INFO|callbacks.py:310] 2024-07-10 10:25:28,712 >> {'loss': 0.1026, 'learning_rate': 2.4833e-06, 'epoch': 1.92, 'throughput': 627.88}

[INFO|callbacks.py:310] 2024-07-10 10:25:39,812 >> {'loss': 0.0617, 'learning_rate': 2.4917e-06, 'epoch': 1.92, 'throughput': 627.87}

[INFO|callbacks.py:310] 2024-07-10 10:25:50,911 >> {'loss': 0.0659, 'learning_rate': 2.5000e-06, 'epoch': 1.93, 'throughput': 627.81}

[INFO|callbacks.py:310] 2024-07-10 10:26:02,010 >> {'loss': 0.0880, 'learning_rate': 2.5083e-06, 'epoch': 1.94, 'throughput': 627.93}

[INFO|callbacks.py:310] 2024-07-10 10:26:13,096 >> {'loss': 0.0771, 'learning_rate': 2.5167e-06, 'epoch': 1.94, 'throughput': 628.06}

[INFO|callbacks.py:310] 2024-07-10 10:26:24,141 >> {'loss': 0.0642, 'learning_rate': 2.5250e-06, 'epoch': 1.95, 'throughput': 628.05}

[INFO|callbacks.py:310] 2024-07-10 10:26:35,201 >> {'loss': 0.0947, 'learning_rate': 2.5333e-06, 'epoch': 1.96, 'throughput': 628.08}

[INFO|callbacks.py:310] 2024-07-10 10:26:46,274 >> {'loss': 0.0975, 'learning_rate': 2.5417e-06, 'epoch': 1.96, 'throughput': 627.94}

[INFO|callbacks.py:310] 2024-07-10 10:26:57,330 >> {'loss': 0.0527, 'learning_rate': 2.5500e-06, 'epoch': 1.97, 'throughput': 627.83}

[INFO|callbacks.py:310] 2024-07-10 10:27:08,395 >> {'loss': 0.0869, 'learning_rate': 2.5583e-06, 'epoch': 1.98, 'throughput': 627.88}

[INFO|callbacks.py:310] 2024-07-10 10:27:19,513 >> {'loss': 0.0605, 'learning_rate': 2.5667e-06, 'epoch': 1.98, 'throughput': 627.75}

[INFO|callbacks.py:310] 2024-07-10 10:27:30,602 >> {'loss': 0.0684, 'learning_rate': 2.5750e-06, 'epoch': 1.99, 'throughput': 627.82}

[INFO|callbacks.py:310] 2024-07-10 10:27:41,711 >> {'loss': 0.0701, 'learning_rate': 2.5833e-06, 'epoch': 2.00, 'throughput': 627.86}

[INFO|callbacks.py:310] 2024-07-10 10:27:52,807 >> {'loss': 0.0699, 'learning_rate': 2.5917e-06, 'epoch': 2.00, 'throughput': 627.84}

[INFO|callbacks.py:310] 2024-07-10 10:28:03,854 >> {'loss': 0.0949, 'learning_rate': 2.6000e-06, 'epoch': 2.01, 'throughput': 627.85}

[INFO|callbacks.py:310] 2024-07-10 10:28:14,908 >> {'loss': 0.0911, 'learning_rate': 2.6083e-06, 'epoch': 2.01, 'throughput': 627.90}

[INFO|callbacks.py:310] 2024-07-10 10:28:25,979 >> {'loss': 0.0615, 'learning_rate': 2.6167e-06, 'epoch': 2.02, 'throughput': 627.86}

[INFO|callbacks.py:310] 2024-07-10 10:28:37,044 >> {'loss': 0.0448, 'learning_rate': 2.6250e-06, 'epoch': 2.03, 'throughput': 627.81}

[INFO|callbacks.py:310] 2024-07-10 10:28:48,123 >> {'loss': 0.0619, 'learning_rate': 2.6333e-06, 'epoch': 2.03, 'throughput': 627.84}

[INFO|callbacks.py:310] 2024-07-10 10:28:59,218 >> {'loss': 0.0475, 'learning_rate': 2.6417e-06, 'epoch': 2.04, 'throughput': 627.82}

[INFO|callbacks.py:310] 2024-07-10 10:29:10,330 >> {'loss': 0.0438, 'learning_rate': 2.6500e-06, 'epoch': 2.05, 'throughput': 627.90}

[INFO|callbacks.py:310] 2024-07-10 10:29:21,421 >> {'loss': 0.0245, 'learning_rate': 2.6583e-06, 'epoch': 2.05, 'throughput': 627.81}

[INFO|callbacks.py:310] 2024-07-10 10:29:32,509 >> {'loss': 0.0716, 'learning_rate': 2.6667e-06, 'epoch': 2.06, 'throughput': 627.85}

[INFO|callbacks.py:310] 2024-07-10 10:29:43,558 >> {'loss': 0.1015, 'learning_rate': 2.6750e-06, 'epoch': 2.07, 'throughput': 627.78}

[INFO|callbacks.py:310] 2024-07-10 10:29:54,623 >> {'loss': 0.0753, 'learning_rate': 2.6833e-06, 'epoch': 2.07, 'throughput': 627.73}

[INFO|callbacks.py:310] 2024-07-10 10:30:05,698 >> {'loss': 0.0521, 'learning_rate': 2.6917e-06, 'epoch': 2.08, 'throughput': 627.62}

[INFO|callbacks.py:310] 2024-07-10 10:30:16,773 >> {'loss': 0.0287, 'learning_rate': 2.7000e-06, 'epoch': 2.09, 'throughput': 627.58}

[INFO|callbacks.py:310] 2024-07-10 10:30:27,868 >> {'loss': 0.0471, 'learning_rate': 2.7083e-06, 'epoch': 2.09, 'throughput': 627.59}

[INFO|callbacks.py:310] 2024-07-10 10:30:38,967 >> {'loss': 0.0697, 'learning_rate': 2.7167e-06, 'epoch': 2.10, 'throughput': 627.62}

[INFO|callbacks.py:310] 2024-07-10 10:30:50,068 >> {'loss': 0.0438, 'learning_rate': 2.7250e-06, 'epoch': 2.10, 'throughput': 627.58}

[INFO|callbacks.py:310] 2024-07-10 10:31:01,167 >> {'loss': 0.0475, 'learning_rate': 2.7333e-06, 'epoch': 2.11, 'throughput': 627.61}

[INFO|callbacks.py:310] 2024-07-10 10:31:12,238 >> {'loss': 0.0601, 'learning_rate': 2.7417e-06, 'epoch': 2.12, 'throughput': 627.60}

[INFO|callbacks.py:310] 2024-07-10 10:31:23,289 >> {'loss': 0.0355, 'learning_rate': 2.7500e-06, 'epoch': 2.12, 'throughput': 627.48}

[INFO|callbacks.py:310] 2024-07-10 10:31:34,348 >> {'loss': 0.0515, 'learning_rate': 2.7583e-06, 'epoch': 2.13, 'throughput': 627.55}

[INFO|callbacks.py:310] 2024-07-10 10:31:45,436 >> {'loss': 0.0601, 'learning_rate': 2.7667e-06, 'epoch': 2.14, 'throughput': 627.53}

[INFO|callbacks.py:310] 2024-07-10 10:31:56,498 >> {'loss': 0.0288, 'learning_rate': 2.7750e-06, 'epoch': 2.14, 'throughput': 627.42}

[INFO|callbacks.py:310] 2024-07-10 10:32:07,601 >> {'loss': 0.0563, 'learning_rate': 2.7833e-06, 'epoch': 2.15, 'throughput': 627.34}

[INFO|callbacks.py:310] 2024-07-10 10:32:18,698 >> {'loss': 0.1176, 'learning_rate': 2.7917e-06, 'epoch': 2.16, 'throughput': 627.31}

[INFO|callbacks.py:310] 2024-07-10 10:32:29,833 >> {'loss': 0.0965, 'learning_rate': 2.8000e-06, 'epoch': 2.16, 'throughput': 627.23}

[INFO|callbacks.py:310] 2024-07-10 10:32:40,924 >> {'loss': 0.0359, 'learning_rate': 2.8083e-06, 'epoch': 2.17, 'throughput': 627.33}

[INFO|callbacks.py:310] 2024-07-10 10:32:51,996 >> {'loss': 0.0609, 'learning_rate': 2.8167e-06, 'epoch': 2.18, 'throughput': 627.33}

[INFO|callbacks.py:310] 2024-07-10 10:33:03,044 >> {'loss': 0.0768, 'learning_rate': 2.8250e-06, 'epoch': 2.18, 'throughput': 627.29}

[INFO|callbacks.py:310] 2024-07-10 10:33:14,106 >> {'loss': 0.0849, 'learning_rate': 2.8333e-06, 'epoch': 2.19, 'throughput': 627.32}

[INFO|callbacks.py:310] 2024-07-10 10:33:25,171 >> {'loss': 0.0581, 'learning_rate': 2.8417e-06, 'epoch': 2.19, 'throughput': 627.29}

[INFO|callbacks.py:310] 2024-07-10 10:33:36,232 >> {'loss': 0.0460, 'learning_rate': 2.8500e-06, 'epoch': 2.20, 'throughput': 627.26}

[INFO|callbacks.py:310] 2024-07-10 10:33:47,328 >> {'loss': 0.0674, 'learning_rate': 2.8583e-06, 'epoch': 2.21, 'throughput': 627.32}

[INFO|callbacks.py:310] 2024-07-10 10:33:58,410 >> {'loss': 0.0563, 'learning_rate': 2.8667e-06, 'epoch': 2.21, 'throughput': 627.35}

[INFO|callbacks.py:310] 2024-07-10 10:34:09,520 >> {'loss': 0.0621, 'learning_rate': 2.8750e-06, 'epoch': 2.22, 'throughput': 627.43}

[INFO|callbacks.py:310] 2024-07-10 10:34:20,625 >> {'loss': 0.0659, 'learning_rate': 2.8833e-06, 'epoch': 2.23, 'throughput': 627.44}

[INFO|callbacks.py:310] 2024-07-10 10:34:31,691 >> {'loss': 0.0390, 'learning_rate': 2.8917e-06, 'epoch': 2.23, 'throughput': 627.44}

[INFO|callbacks.py:310] 2024-07-10 10:34:42,747 >> {'loss': 0.0239, 'learning_rate': 2.9000e-06, 'epoch': 2.24, 'throughput': 627.51}

[INFO|callbacks.py:310] 2024-07-10 10:34:53,800 >> {'loss': 0.0521, 'learning_rate': 2.9083e-06, 'epoch': 2.25, 'throughput': 627.54}

[INFO|callbacks.py:310] 2024-07-10 10:35:04,863 >> {'loss': 0.0561, 'learning_rate': 2.9167e-06, 'epoch': 2.25, 'throughput': 627.49}

[INFO|callbacks.py:310] 2024-07-10 10:35:15,935 >> {'loss': 0.0622, 'learning_rate': 2.9250e-06, 'epoch': 2.26, 'throughput': 627.49}

[INFO|callbacks.py:310] 2024-07-10 10:35:27,033 >> {'loss': 0.0875, 'learning_rate': 2.9333e-06, 'epoch': 2.27, 'throughput': 627.46}

[INFO|callbacks.py:310] 2024-07-10 10:35:38,121 >> {'loss': 0.0307, 'learning_rate': 2.9417e-06, 'epoch': 2.27, 'throughput': 627.42}

[INFO|callbacks.py:310] 2024-07-10 10:35:49,236 >> {'loss': 0.0630, 'learning_rate': 2.9500e-06, 'epoch': 2.28, 'throughput': 627.36}

[INFO|callbacks.py:310] 2024-07-10 10:36:00,324 >> {'loss': 0.0404, 'learning_rate': 2.9583e-06, 'epoch': 2.28, 'throughput': 627.42}

[INFO|callbacks.py:310] 2024-07-10 10:36:11,382 >> {'loss': 0.0759, 'learning_rate': 2.9667e-06, 'epoch': 2.29, 'throughput': 627.39}

[INFO|callbacks.py:310] 2024-07-10 10:36:22,433 >> {'loss': 0.0488, 'learning_rate': 2.9750e-06, 'epoch': 2.30, 'throughput': 627.31}

[INFO|callbacks.py:310] 2024-07-10 10:36:33,517 >> {'loss': 0.0499, 'learning_rate': 2.9833e-06, 'epoch': 2.30, 'throughput': 627.28}

[INFO|callbacks.py:310] 2024-07-10 10:36:44,575 >> {'loss': 0.0441, 'learning_rate': 2.9917e-06, 'epoch': 2.31, 'throughput': 627.30}

[INFO|callbacks.py:310] 2024-07-10 10:36:55,649 >> {'loss': 0.0485, 'learning_rate': 3.0000e-06, 'epoch': 2.32, 'throughput': 627.37}

[INFO|callbacks.py:310] 2024-07-10 10:37:06,736 >> {'loss': 0.0674, 'learning_rate': 3.0083e-06, 'epoch': 2.32, 'throughput': 627.29}

[INFO|callbacks.py:310] 2024-07-10 10:37:17,829 >> {'loss': 0.0689, 'learning_rate': 3.0167e-06, 'epoch': 2.33, 'throughput': 627.26}

[INFO|callbacks.py:310] 2024-07-10 10:37:28,939 >> {'loss': 0.0618, 'learning_rate': 3.0250e-06, 'epoch': 2.34, 'throughput': 627.21}

[INFO|callbacks.py:310] 2024-07-10 10:37:40,018 >> {'loss': 0.0629, 'learning_rate': 3.0333e-06, 'epoch': 2.34, 'throughput': 627.20}

[INFO|callbacks.py:310] 2024-07-10 10:37:51,069 >> {'loss': 0.0499, 'learning_rate': 3.0417e-06, 'epoch': 2.35, 'throughput': 627.18}

[INFO|callbacks.py:310] 2024-07-10 10:38:02,129 >> {'loss': 0.0361, 'learning_rate': 3.0500e-06, 'epoch': 2.36, 'throughput': 627.15}

[INFO|callbacks.py:310] 2024-07-10 10:38:13,198 >> {'loss': 0.0471, 'learning_rate': 3.0583e-06, 'epoch': 2.36, 'throughput': 627.23}

[INFO|callbacks.py:310] 2024-07-10 10:38:24,255 >> {'loss': 0.0622, 'learning_rate': 3.0667e-06, 'epoch': 2.37, 'throughput': 627.39}

[INFO|callbacks.py:310] 2024-07-10 10:38:35,328 >> {'loss': 0.0328, 'learning_rate': 3.0750e-06, 'epoch': 2.37, 'throughput': 627.32}

[INFO|callbacks.py:310] 2024-07-10 10:38:46,420 >> {'loss': 0.0437, 'learning_rate': 3.0833e-06, 'epoch': 2.38, 'throughput': 627.36}

[INFO|callbacks.py:310] 2024-07-10 10:38:57,528 >> {'loss': 0.0644, 'learning_rate': 3.0917e-06, 'epoch': 2.39, 'throughput': 627.41}

[INFO|callbacks.py:310] 2024-07-10 10:39:08,633 >> {'loss': 0.0553, 'learning_rate': 3.1000e-06, 'epoch': 2.39, 'throughput': 627.37}

[INFO|callbacks.py:310] 2024-07-10 10:39:19,713 >> {'loss': 0.0610, 'learning_rate': 3.1083e-06, 'epoch': 2.40, 'throughput': 627.43}

[INFO|callbacks.py:310] 2024-07-10 10:39:30,769 >> {'loss': 0.0866, 'learning_rate': 3.1167e-06, 'epoch': 2.41, 'throughput': 627.41}

[INFO|callbacks.py:310] 2024-07-10 10:39:41,832 >> {'loss': 0.0632, 'learning_rate': 3.1250e-06, 'epoch': 2.41, 'throughput': 627.48}

[INFO|callbacks.py:310] 2024-07-10 10:39:52,923 >> {'loss': 0.0395, 'learning_rate': 3.1333e-06, 'epoch': 2.42, 'throughput': 627.45}

[INFO|callbacks.py:310] 2024-07-10 10:40:03,977 >> {'loss': 0.0819, 'learning_rate': 3.1417e-06, 'epoch': 2.43, 'throughput': 627.51}

[INFO|callbacks.py:310] 2024-07-10 10:40:15,059 >> {'loss': 0.0640, 'learning_rate': 3.1500e-06, 'epoch': 2.43, 'throughput': 627.59}

[INFO|callbacks.py:310] 2024-07-10 10:40:26,138 >> {'loss': 0.0803, 'learning_rate': 3.1583e-06, 'epoch': 2.44, 'throughput': 627.65}

[INFO|callbacks.py:310] 2024-07-10 10:40:37,234 >> {'loss': 0.0798, 'learning_rate': 3.1667e-06, 'epoch': 2.45, 'throughput': 627.66}

[INFO|callbacks.py:310] 2024-07-10 10:40:48,339 >> {'loss': 0.0544, 'learning_rate': 3.1750e-06, 'epoch': 2.45, 'throughput': 627.72}

[INFO|callbacks.py:310] 2024-07-10 10:40:59,421 >> {'loss': 0.0629, 'learning_rate': 3.1833e-06, 'epoch': 2.46, 'throughput': 627.72}

[INFO|callbacks.py:310] 2024-07-10 10:41:10,476 >> {'loss': 0.0471, 'learning_rate': 3.1917e-06, 'epoch': 2.47, 'throughput': 627.74}

[INFO|callbacks.py:310] 2024-07-10 10:41:21,546 >> {'loss': 0.0533, 'learning_rate': 3.2000e-06, 'epoch': 2.47, 'throughput': 627.83}

[INFO|callbacks.py:310] 2024-07-10 10:41:32,607 >> {'loss': 0.0542, 'learning_rate': 3.2083e-06, 'epoch': 2.48, 'throughput': 627.88}

[INFO|callbacks.py:310] 2024-07-10 10:41:43,662 >> {'loss': 0.0340, 'learning_rate': 3.2167e-06, 'epoch': 2.48, 'throughput': 627.85}

[INFO|callbacks.py:310] 2024-07-10 10:41:54,762 >> {'loss': 0.0517, 'learning_rate': 3.2250e-06, 'epoch': 2.49, 'throughput': 627.82}

[INFO|callbacks.py:310] 2024-07-10 10:42:05,856 >> {'loss': 0.0698, 'learning_rate': 3.2333e-06, 'epoch': 2.50, 'throughput': 627.81}

[INFO|callbacks.py:310] 2024-07-10 10:42:16,959 >> {'loss': 0.0573, 'learning_rate': 3.2417e-06, 'epoch': 2.50, 'throughput': 627.76}

[INFO|callbacks.py:310] 2024-07-10 10:42:28,063 >> {'loss': 0.0568, 'learning_rate': 3.2500e-06, 'epoch': 2.51, 'throughput': 627.71}

[INFO|callbacks.py:310] 2024-07-10 10:42:39,114 >> {'loss': 0.0669, 'learning_rate': 3.2583e-06, 'epoch': 2.52, 'throughput': 627.79}

[INFO|callbacks.py:310] 2024-07-10 10:42:50,181 >> {'loss': 0.0270, 'learning_rate': 3.2667e-06, 'epoch': 2.52, 'throughput': 627.74}

[INFO|callbacks.py:310] 2024-07-10 10:43:01,252 >> {'loss': 0.0303, 'learning_rate': 3.2750e-06, 'epoch': 2.53, 'throughput': 627.72}

[INFO|callbacks.py:310] 2024-07-10 10:43:12,331 >> {'loss': 0.0392, 'learning_rate': 3.2833e-06, 'epoch': 2.54, 'throughput': 627.71}

[INFO|callbacks.py:310] 2024-07-10 10:43:23,401 >> {'loss': 0.0813, 'learning_rate': 3.2917e-06, 'epoch': 2.54, 'throughput': 627.70}

[INFO|callbacks.py:310] 2024-07-10 10:43:34,505 >> {'loss': 0.0403, 'learning_rate': 3.3000e-06, 'epoch': 2.55, 'throughput': 627.76}

[INFO|callbacks.py:310] 2024-07-10 10:43:45,593 >> {'loss': 0.0222, 'learning_rate': 3.3083e-06, 'epoch': 2.56, 'throughput': 627.68}

[INFO|callbacks.py:310] 2024-07-10 10:43:56,708 >> {'loss': 0.0611, 'learning_rate': 3.3167e-06, 'epoch': 2.56, 'throughput': 627.74}

[INFO|callbacks.py:310] 2024-07-10 10:44:07,810 >> {'loss': 0.0350, 'learning_rate': 3.3250e-06, 'epoch': 2.57, 'throughput': 627.75}

[INFO|callbacks.py:310] 2024-07-10 10:44:18,867 >> {'loss': 0.0680, 'learning_rate': 3.3333e-06, 'epoch': 2.57, 'throughput': 627.68}

[INFO|callbacks.py:310] 2024-07-10 10:44:29,921 >> {'loss': 0.0543, 'learning_rate': 3.3417e-06, 'epoch': 2.58, 'throughput': 627.73}

[INFO|callbacks.py:310] 2024-07-10 10:44:40,998 >> {'loss': 0.0466, 'learning_rate': 3.3500e-06, 'epoch': 2.59, 'throughput': 627.69}

[INFO|callbacks.py:310] 2024-07-10 10:44:52,057 >> {'loss': 0.0626, 'learning_rate': 3.3583e-06, 'epoch': 2.59, 'throughput': 627.70}

[INFO|callbacks.py:310] 2024-07-10 10:45:03,131 >> {'loss': 0.0589, 'learning_rate': 3.3667e-06, 'epoch': 2.60, 'throughput': 627.75}

[INFO|callbacks.py:310] 2024-07-10 10:45:14,222 >> {'loss': 0.0415, 'learning_rate': 3.3750e-06, 'epoch': 2.61, 'throughput': 627.67}

[INFO|callbacks.py:310] 2024-07-10 10:45:25,326 >> {'loss': 0.0344, 'learning_rate': 3.3833e-06, 'epoch': 2.61, 'throughput': 627.69}

[INFO|callbacks.py:310] 2024-07-10 10:45:36,435 >> {'loss': 0.0423, 'learning_rate': 3.3917e-06, 'epoch': 2.62, 'throughput': 627.76}

[INFO|callbacks.py:310] 2024-07-10 10:45:47,525 >> {'loss': 0.0838, 'learning_rate': 3.4000e-06, 'epoch': 2.63, 'throughput': 627.84}

[INFO|callbacks.py:310] 2024-07-10 10:45:58,585 >> {'loss': 0.0732, 'learning_rate': 3.4083e-06, 'epoch': 2.63, 'throughput': 627.73}

[INFO|callbacks.py:310] 2024-07-10 10:46:09,638 >> {'loss': 0.0464, 'learning_rate': 3.4167e-06, 'epoch': 2.64, 'throughput': 627.82}

[INFO|callbacks.py:310] 2024-07-10 10:46:20,706 >> {'loss': 0.0387, 'learning_rate': 3.4250e-06, 'epoch': 2.65, 'throughput': 627.80}

[INFO|callbacks.py:310] 2024-07-10 10:46:31,767 >> {'loss': 0.0507, 'learning_rate': 3.4333e-06, 'epoch': 2.65, 'throughput': 627.81}

[INFO|callbacks.py:310] 2024-07-10 10:46:42,845 >> {'loss': 0.0549, 'learning_rate': 3.4417e-06, 'epoch': 2.66, 'throughput': 627.66}

[INFO|callbacks.py:310] 2024-07-10 10:46:53,927 >> {'loss': 0.0285, 'learning_rate': 3.4500e-06, 'epoch': 2.66, 'throughput': 627.71}

[INFO|callbacks.py:310] 2024-07-10 10:47:05,023 >> {'loss': 0.0668, 'learning_rate': 3.4583e-06, 'epoch': 2.67, 'throughput': 627.81}

[INFO|callbacks.py:310] 2024-07-10 10:47:16,150 >> {'loss': 0.0691, 'learning_rate': 3.4667e-06, 'epoch': 2.68, 'throughput': 627.77}

[INFO|callbacks.py:310] 2024-07-10 10:47:27,220 >> {'loss': 0.0660, 'learning_rate': 3.4750e-06, 'epoch': 2.68, 'throughput': 627.84}

[INFO|callbacks.py:310] 2024-07-10 10:47:38,275 >> {'loss': 0.0495, 'learning_rate': 3.4833e-06, 'epoch': 2.69, 'throughput': 627.77}

[INFO|callbacks.py:310] 2024-07-10 10:47:49,335 >> {'loss': 0.0340, 'learning_rate': 3.4917e-06, 'epoch': 2.70, 'throughput': 627.75}

[INFO|callbacks.py:310] 2024-07-10 10:48:00,399 >> {'loss': 0.0604, 'learning_rate': 3.5000e-06, 'epoch': 2.70, 'throughput': 627.72}

[INFO|callbacks.py:310] 2024-07-10 10:48:11,462 >> {'loss': 0.0535, 'learning_rate': 3.5083e-06, 'epoch': 2.71, 'throughput': 627.76}

[INFO|callbacks.py:310] 2024-07-10 10:48:22,530 >> {'loss': 0.0369, 'learning_rate': 3.5167e-06, 'epoch': 2.72, 'throughput': 627.83}

[INFO|callbacks.py:310] 2024-07-10 10:48:33,628 >> {'loss': 0.0450, 'learning_rate': 3.5250e-06, 'epoch': 2.72, 'throughput': 627.78}

[INFO|callbacks.py:310] 2024-07-10 10:48:44,724 >> {'loss': 0.0314, 'learning_rate': 3.5333e-06, 'epoch': 2.73, 'throughput': 627.81}

[INFO|callbacks.py:310] 2024-07-10 10:48:55,833 >> {'loss': 0.0611, 'learning_rate': 3.5417e-06, 'epoch': 2.74, 'throughput': 627.72}

[INFO|callbacks.py:310] 2024-07-10 10:49:06,920 >> {'loss': 0.0478, 'learning_rate': 3.5500e-06, 'epoch': 2.74, 'throughput': 627.71}

[INFO|callbacks.py:310] 2024-07-10 10:49:17,965 >> {'loss': 0.0380, 'learning_rate': 3.5583e-06, 'epoch': 2.75, 'throughput': 627.64}

[INFO|callbacks.py:310] 2024-07-10 10:49:29,025 >> {'loss': 0.0430, 'learning_rate': 3.5667e-06, 'epoch': 2.75, 'throughput': 627.68}

[INFO|callbacks.py:310] 2024-07-10 10:49:40,098 >> {'loss': 0.0478, 'learning_rate': 3.5750e-06, 'epoch': 2.76, 'throughput': 627.73}

[INFO|callbacks.py:310] 2024-07-10 10:49:51,154 >> {'loss': 0.0977, 'learning_rate': 3.5833e-06, 'epoch': 2.77, 'throughput': 627.74}

[INFO|callbacks.py:310] 2024-07-10 10:50:02,243 >> {'loss': 0.0835, 'learning_rate': 3.5917e-06, 'epoch': 2.77, 'throughput': 627.70}

[INFO|callbacks.py:310] 2024-07-10 10:50:13,345 >> {'loss': 0.0406, 'learning_rate': 3.6000e-06, 'epoch': 2.78, 'throughput': 627.64}

[INFO|callbacks.py:310] 2024-07-10 10:50:24,445 >> {'loss': 0.0661, 'learning_rate': 3.6083e-06, 'epoch': 2.79, 'throughput': 627.55}

[INFO|callbacks.py:310] 2024-07-10 10:50:35,549 >> {'loss': 0.0647, 'learning_rate': 3.6167e-06, 'epoch': 2.79, 'throughput': 627.54}

[INFO|callbacks.py:310] 2024-07-10 10:50:46,621 >> {'loss': 0.0499, 'learning_rate': 3.6250e-06, 'epoch': 2.80, 'throughput': 627.54}

[INFO|callbacks.py:310] 2024-07-10 10:50:57,684 >> {'loss': 0.0422, 'learning_rate': 3.6333e-06, 'epoch': 2.81, 'throughput': 627.49}

[INFO|callbacks.py:310] 2024-07-10 10:51:08,751 >> {'loss': 0.0480, 'learning_rate': 3.6417e-06, 'epoch': 2.81, 'throughput': 627.57}

[INFO|callbacks.py:310] 2024-07-10 10:51:19,824 >> {'loss': 0.0474, 'learning_rate': 3.6500e-06, 'epoch': 2.82, 'throughput': 627.50}

[INFO|callbacks.py:310] 2024-07-10 10:51:30,893 >> {'loss': 0.0801, 'learning_rate': 3.6583e-06, 'epoch': 2.83, 'throughput': 627.51}

[INFO|callbacks.py:310] 2024-07-10 10:51:41,993 >> {'loss': 0.0528, 'learning_rate': 3.6667e-06, 'epoch': 2.83, 'throughput': 627.48}

[INFO|callbacks.py:310] 2024-07-10 10:51:53,074 >> {'loss': 0.0461, 'learning_rate': 3.6750e-06, 'epoch': 2.84, 'throughput': 627.47}

[INFO|callbacks.py:310] 2024-07-10 10:52:04,179 >> {'loss': 0.0646, 'learning_rate': 3.6833e-06, 'epoch': 2.84, 'throughput': 627.43}

[INFO|callbacks.py:310] 2024-07-10 10:52:15,285 >> {'loss': 0.0490, 'learning_rate': 3.6917e-06, 'epoch': 2.85, 'throughput': 627.41}

[INFO|callbacks.py:310] 2024-07-10 10:52:26,348 >> {'loss': 0.0531, 'learning_rate': 3.7000e-06, 'epoch': 2.86, 'throughput': 627.46}

[INFO|callbacks.py:310] 2024-07-10 10:52:37,387 >> {'loss': 0.0521, 'learning_rate': 3.7083e-06, 'epoch': 2.86, 'throughput': 627.52}

[INFO|callbacks.py:310] 2024-07-10 10:52:48,453 >> {'loss': 0.0927, 'learning_rate': 3.7167e-06, 'epoch': 2.87, 'throughput': 627.57}

[INFO|callbacks.py:310] 2024-07-10 10:52:59,524 >> {'loss': 0.0491, 'learning_rate': 3.7250e-06, 'epoch': 2.88, 'throughput': 627.53}

[INFO|callbacks.py:310] 2024-07-10 10:53:10,580 >> {'loss': 0.0399, 'learning_rate': 3.7333e-06, 'epoch': 2.88, 'throughput': 627.60}

[INFO|callbacks.py:310] 2024-07-10 10:53:21,681 >> {'loss': 0.0499, 'learning_rate': 3.7417e-06, 'epoch': 2.89, 'throughput': 627.58}

[INFO|callbacks.py:310] 2024-07-10 10:53:32,766 >> {'loss': 0.0569, 'learning_rate': 3.7500e-06, 'epoch': 2.90, 'throughput': 627.63}

[INFO|callbacks.py:310] 2024-07-10 10:53:43,872 >> {'loss': 0.0608, 'learning_rate': 3.7583e-06, 'epoch': 2.90, 'throughput': 627.70}

[INFO|callbacks.py:310] 2024-07-10 10:53:54,975 >> {'loss': 0.0325, 'learning_rate': 3.7667e-06, 'epoch': 2.91, 'throughput': 627.69}

[INFO|callbacks.py:310] 2024-07-10 10:54:06,019 >> {'loss': 0.0643, 'learning_rate': 3.7750e-06, 'epoch': 2.92, 'throughput': 627.66}

[INFO|callbacks.py:310] 2024-07-10 10:54:17,089 >> {'loss': 0.0479, 'learning_rate': 3.7833e-06, 'epoch': 2.92, 'throughput': 627.67}

[INFO|callbacks.py:310] 2024-07-10 10:54:28,155 >> {'loss': 0.0549, 'learning_rate': 3.7917e-06, 'epoch': 2.93, 'throughput': 627.67}

[INFO|callbacks.py:310] 2024-07-10 10:54:39,228 >> {'loss': 0.0789, 'learning_rate': 3.8000e-06, 'epoch': 2.93, 'throughput': 627.71}

[INFO|callbacks.py:310] 2024-07-10 10:54:50,303 >> {'loss': 0.0319, 'learning_rate': 3.8083e-06, 'epoch': 2.94, 'throughput': 627.72}

[INFO|callbacks.py:310] 2024-07-10 10:55:01,404 >> {'loss': 0.0500, 'learning_rate': 3.8167e-06, 'epoch': 2.95, 'throughput': 627.74}

[INFO|callbacks.py:310] 2024-07-10 10:55:12,500 >> {'loss': 0.0677, 'learning_rate': 3.8250e-06, 'epoch': 2.95, 'throughput': 627.79}

[INFO|callbacks.py:310] 2024-07-10 10:55:23,592 >> {'loss': 0.0512, 'learning_rate': 3.8333e-06, 'epoch': 2.96, 'throughput': 627.78}

[INFO|callbacks.py:310] 2024-07-10 10:55:34,684 >> {'loss': 0.0681, 'learning_rate': 3.8417e-06, 'epoch': 2.97, 'throughput': 627.80}

[INFO|callbacks.py:310] 2024-07-10 10:55:45,737 >> {'loss': 0.0651, 'learning_rate': 3.8500e-06, 'epoch': 2.97, 'throughput': 627.78}

[INFO|callbacks.py:310] 2024-07-10 10:55:56,787 >> {'loss': 0.0498, 'learning_rate': 3.8583e-06, 'epoch': 2.98, 'throughput': 627.83}

[INFO|callbacks.py:310] 2024-07-10 10:56:07,870 >> {'loss': 0.0496, 'learning_rate': 3.8667e-06, 'epoch': 2.99, 'throughput': 627.90}

[INFO|callbacks.py:310] 2024-07-10 10:56:18,930 >> {'loss': 0.0446, 'learning_rate': 3.8750e-06, 'epoch': 2.99, 'throughput': 627.89}

[INFO|callbacks.py:310] 2024-07-10 10:56:30,009 >> {'loss': 0.0543, 'learning_rate': 3.8833e-06, 'epoch': 3.00, 'throughput': 627.93}

[INFO|callbacks.py:310] 2024-07-10 10:56:41,105 >> {'loss': 0.0137, 'learning_rate': 3.8917e-06, 'epoch': 3.01, 'throughput': 627.79}

[INFO|callbacks.py:310] 2024-07-10 10:56:52,197 >> {'loss': 0.0201, 'learning_rate': 3.9000e-06, 'epoch': 3.01, 'throughput': 627.86}

[INFO|callbacks.py:310] 2024-07-10 10:57:03,299 >> {'loss': 0.0119, 'learning_rate': 3.9083e-06, 'epoch': 3.02, 'throughput': 627.85}

[INFO|callbacks.py:310] 2024-07-10 10:57:14,395 >> {'loss': 0.0305, 'learning_rate': 3.9167e-06, 'epoch': 3.02, 'throughput': 627.78}

[INFO|callbacks.py:310] 2024-07-10 10:57:25,440 >> {'loss': 0.0213, 'learning_rate': 3.9250e-06, 'epoch': 3.03, 'throughput': 627.79}

[INFO|callbacks.py:310] 2024-07-10 10:57:36,511 >> {'loss': 0.0350, 'learning_rate': 3.9333e-06, 'epoch': 3.04, 'throughput': 627.74}

[INFO|callbacks.py:310] 2024-07-10 10:57:47,594 >> {'loss': 0.0438, 'learning_rate': 3.9417e-06, 'epoch': 3.04, 'throughput': 627.66}

[INFO|callbacks.py:310] 2024-07-10 10:57:58,656 >> {'loss': 0.0464, 'learning_rate': 3.9500e-06, 'epoch': 3.05, 'throughput': 627.67}

[INFO|callbacks.py:310] 2024-07-10 10:58:09,732 >> {'loss': 0.0172, 'learning_rate': 3.9583e-06, 'epoch': 3.06, 'throughput': 627.73}

[INFO|callbacks.py:310] 2024-07-10 10:58:20,822 >> {'loss': 0.0287, 'learning_rate': 3.9667e-06, 'epoch': 3.06, 'throughput': 627.72}

[INFO|callbacks.py:310] 2024-07-10 10:58:31,913 >> {'loss': 0.0579, 'learning_rate': 3.9750e-06, 'epoch': 3.07, 'throughput': 627.80}

[INFO|callbacks.py:310] 2024-07-10 10:58:43,016 >> {'loss': 0.0095, 'learning_rate': 3.9833e-06, 'epoch': 3.08, 'throughput': 627.78}

[INFO|callbacks.py:310] 2024-07-10 10:58:54,094 >> {'loss': 0.0238, 'learning_rate': 3.9917e-06, 'epoch': 3.08, 'throughput': 627.77}

[INFO|callbacks.py:310] 2024-07-10 10:59:05,147 >> {'loss': 0.0313, 'learning_rate': 4.0000e-06, 'epoch': 3.09, 'throughput': 627.82}

[INFO|callbacks.py:310] 2024-07-10 10:59:16,207 >> {'loss': 0.0146, 'learning_rate': 4.0083e-06, 'epoch': 3.10, 'throughput': 627.82}

[INFO|callbacks.py:310] 2024-07-10 10:59:27,277 >> {'loss': 0.0347, 'learning_rate': 4.0167e-06, 'epoch': 3.10, 'throughput': 627.78}

[INFO|callbacks.py:310] 2024-07-10 10:59:38,327 >> {'loss': 0.0240, 'learning_rate': 4.0250e-06, 'epoch': 3.11, 'throughput': 627.75}

[INFO|callbacks.py:310] 2024-07-10 10:59:49,428 >> {'loss': 0.0128, 'learning_rate': 4.0333e-06, 'epoch': 3.12, 'throughput': 627.71}

[INFO|callbacks.py:310] 2024-07-10 11:00:00,518 >> {'loss': 0.0311, 'learning_rate': 4.0417e-06, 'epoch': 3.12, 'throughput': 627.77}

[INFO|callbacks.py:310] 2024-07-10 11:00:11,633 >> {'loss': 0.0178, 'learning_rate': 4.0500e-06, 'epoch': 3.13, 'throughput': 627.74}

[INFO|callbacks.py:310] 2024-07-10 11:00:22,742 >> {'loss': 0.0379, 'learning_rate': 4.0583e-06, 'epoch': 3.13, 'throughput': 627.78}

[INFO|callbacks.py:310] 2024-07-10 11:00:33,803 >> {'loss': 0.0299, 'learning_rate': 4.0667e-06, 'epoch': 3.14, 'throughput': 627.80}

[INFO|callbacks.py:310] 2024-07-10 11:00:44,855 >> {'loss': 0.0261, 'learning_rate': 4.0750e-06, 'epoch': 3.15, 'throughput': 627.80}

[INFO|callbacks.py:310] 2024-07-10 11:00:55,920 >> {'loss': 0.0347, 'learning_rate': 4.0833e-06, 'epoch': 3.15, 'throughput': 627.81}

[INFO|callbacks.py:310] 2024-07-10 11:01:06,980 >> {'loss': 0.0110, 'learning_rate': 4.0917e-06, 'epoch': 3.16, 'throughput': 627.83}

[INFO|callbacks.py:310] 2024-07-10 11:01:18,044 >> {'loss': 0.0541, 'learning_rate': 4.1000e-06, 'epoch': 3.17, 'throughput': 627.84}

[INFO|callbacks.py:310] 2024-07-10 11:01:29,125 >> {'loss': 0.0465, 'learning_rate': 4.1083e-06, 'epoch': 3.17, 'throughput': 627.88}

[INFO|callbacks.py:310] 2024-07-10 11:01:40,220 >> {'loss': 0.0187, 'learning_rate': 4.1167e-06, 'epoch': 3.18, 'throughput': 627.82}

[INFO|callbacks.py:310] 2024-07-10 11:01:51,331 >> {'loss': 0.0089, 'learning_rate': 4.1250e-06, 'epoch': 3.19, 'throughput': 627.77}

[INFO|callbacks.py:310] 2024-07-10 11:02:02,437 >> {'loss': 0.0289, 'learning_rate': 4.1333e-06, 'epoch': 3.19, 'throughput': 627.73}

[INFO|callbacks.py:310] 2024-07-10 11:02:13,493 >> {'loss': 0.0103, 'learning_rate': 4.1417e-06, 'epoch': 3.20, 'throughput': 627.66}

[INFO|callbacks.py:310] 2024-07-10 11:02:24,550 >> {'loss': 0.0132, 'learning_rate': 4.1500e-06, 'epoch': 3.21, 'throughput': 627.72}

[INFO|callbacks.py:310] 2024-07-10 11:02:35,620 >> {'loss': 0.0375, 'learning_rate': 4.1583e-06, 'epoch': 3.21, 'throughput': 627.72}

[INFO|callbacks.py:310] 2024-07-10 11:02:46,676 >> {'loss': 0.0143, 'learning_rate': 4.1667e-06, 'epoch': 3.22, 'throughput': 627.85}

[INFO|callbacks.py:310] 2024-07-10 11:02:57,729 >> {'loss': 0.0159, 'learning_rate': 4.1750e-06, 'epoch': 3.22, 'throughput': 627.79}

[INFO|callbacks.py:310] 2024-07-10 11:03:08,834 >> {'loss': 0.0073, 'learning_rate': 4.1833e-06, 'epoch': 3.23, 'throughput': 627.81}

[INFO|callbacks.py:310] 2024-07-10 11:03:19,916 >> {'loss': 0.0186, 'learning_rate': 4.1917e-06, 'epoch': 3.24, 'throughput': 627.82}

[INFO|callbacks.py:310] 2024-07-10 11:03:31,024 >> {'loss': 0.0035, 'learning_rate': 4.2000e-06, 'epoch': 3.24, 'throughput': 627.80}

[INFO|callbacks.py:310] 2024-07-10 11:03:42,115 >> {'loss': 0.0416, 'learning_rate': 4.2083e-06, 'epoch': 3.25, 'throughput': 627.79}

[INFO|callbacks.py:310] 2024-07-10 11:03:53,178 >> {'loss': 0.0266, 'learning_rate': 4.2167e-06, 'epoch': 3.26, 'throughput': 627.77}

[INFO|callbacks.py:310] 2024-07-10 11:04:04,229 >> {'loss': 0.0351, 'learning_rate': 4.2250e-06, 'epoch': 3.26, 'throughput': 627.77}

[INFO|callbacks.py:310] 2024-07-10 11:04:15,295 >> {'loss': 0.0209, 'learning_rate': 4.2333e-06, 'epoch': 3.27, 'throughput': 627.77}

[INFO|callbacks.py:310] 2024-07-10 11:04:26,363 >> {'loss': 0.0434, 'learning_rate': 4.2417e-06, 'epoch': 3.28, 'throughput': 627.78}

[INFO|callbacks.py:310] 2024-07-10 11:04:37,437 >> {'loss': 0.0174, 'learning_rate': 4.2500e-06, 'epoch': 3.28, 'throughput': 627.71}

[INFO|callbacks.py:310] 2024-07-10 11:04:48,534 >> {'loss': 0.0529, 'learning_rate': 4.2583e-06, 'epoch': 3.29, 'throughput': 627.78}

[INFO|callbacks.py:310] 2024-07-10 11:04:59,629 >> {'loss': 0.0033, 'learning_rate': 4.2667e-06, 'epoch': 3.30, 'throughput': 627.76}

[INFO|callbacks.py:310] 2024-07-10 11:05:10,732 >> {'loss': 0.0196, 'learning_rate': 4.2750e-06, 'epoch': 3.30, 'throughput': 627.69}

[INFO|callbacks.py:310] 2024-07-10 11:05:21,819 >> {'loss': 0.0242, 'learning_rate': 4.2833e-06, 'epoch': 3.31, 'throughput': 627.70}

[INFO|callbacks.py:310] 2024-07-10 11:05:32,859 >> {'loss': 0.0316, 'learning_rate': 4.2917e-06, 'epoch': 3.31, 'throughput': 627.73}

[INFO|callbacks.py:310] 2024-07-10 11:05:43,932 >> {'loss': 0.0268, 'learning_rate': 4.3000e-06, 'epoch': 3.32, 'throughput': 627.73}

[INFO|callbacks.py:310] 2024-07-10 11:05:55,001 >> {'loss': 0.0248, 'learning_rate': 4.3083e-06, 'epoch': 3.33, 'throughput': 627.75}

[INFO|callbacks.py:310] 2024-07-10 11:06:06,058 >> {'loss': 0.0320, 'learning_rate': 4.3167e-06, 'epoch': 3.33, 'throughput': 627.80}

[INFO|callbacks.py:310] 2024-07-10 11:06:17,132 >> {'loss': 0.0202, 'learning_rate': 4.3250e-06, 'epoch': 3.34, 'throughput': 627.80}

[INFO|callbacks.py:310] 2024-07-10 11:06:28,227 >> {'loss': 0.0329, 'learning_rate': 4.3333e-06, 'epoch': 3.35, 'throughput': 627.77}

[INFO|callbacks.py:310] 2024-07-10 11:06:39,331 >> {'loss': 0.0190, 'learning_rate': 4.3417e-06, 'epoch': 3.35, 'throughput': 627.76}

[INFO|callbacks.py:310] 2024-07-10 11:06:50,444 >> {'loss': 0.0182, 'learning_rate': 4.3500e-06, 'epoch': 3.36, 'throughput': 627.71}

[INFO|callbacks.py:310] 2024-07-10 11:07:01,540 >> {'loss': 0.0124, 'learning_rate': 4.3583e-06, 'epoch': 3.37, 'throughput': 627.71}

[INFO|callbacks.py:310] 2024-07-10 11:07:12,593 >> {'loss': 0.0122, 'learning_rate': 4.3667e-06, 'epoch': 3.37, 'throughput': 627.76}

[INFO|callbacks.py:310] 2024-07-10 11:07:23,657 >> {'loss': 0.0388, 'learning_rate': 4.3750e-06, 'epoch': 3.38, 'throughput': 627.74}

[INFO|callbacks.py:310] 2024-07-10 11:07:34,734 >> {'loss': 0.0106, 'learning_rate': 4.3833e-06, 'epoch': 3.39, 'throughput': 627.70}

[INFO|callbacks.py:310] 2024-07-10 11:07:45,798 >> {'loss': 0.0305, 'learning_rate': 4.3917e-06, 'epoch': 3.39, 'throughput': 627.68}

[INFO|callbacks.py:310] 2024-07-10 11:07:56,880 >> {'loss': 0.0512, 'learning_rate': 4.4000e-06, 'epoch': 3.40, 'throughput': 627.63}

[INFO|callbacks.py:310] 2024-07-10 11:08:07,969 >> {'loss': 0.0031, 'learning_rate': 4.4083e-06, 'epoch': 3.40, 'throughput': 627.57}

[INFO|callbacks.py:310] 2024-07-10 11:08:19,062 >> {'loss': 0.0309, 'learning_rate': 4.4167e-06, 'epoch': 3.41, 'throughput': 627.55}

[INFO|callbacks.py:310] 2024-07-10 11:08:30,165 >> {'loss': 0.0472, 'learning_rate': 4.4250e-06, 'epoch': 3.42, 'throughput': 627.51}

[INFO|callbacks.py:310] 2024-07-10 11:08:41,247 >> {'loss': 0.0222, 'learning_rate': 4.4333e-06, 'epoch': 3.42, 'throughput': 627.46}

[INFO|callbacks.py:310] 2024-07-10 11:08:52,302 >> {'loss': 0.0077, 'learning_rate': 4.4417e-06, 'epoch': 3.43, 'throughput': 627.43}

[INFO|callbacks.py:310] 2024-07-10 11:09:03,360 >> {'loss': 0.0064, 'learning_rate': 4.4500e-06, 'epoch': 3.44, 'throughput': 627.52}

[INFO|callbacks.py:310] 2024-07-10 11:09:14,437 >> {'loss': 0.0104, 'learning_rate': 4.4583e-06, 'epoch': 3.44, 'throughput': 627.56}

[INFO|callbacks.py:310] 2024-07-10 11:09:25,499 >> {'loss': 0.0121, 'learning_rate': 4.4667e-06, 'epoch': 3.45, 'throughput': 627.53}

[INFO|callbacks.py:310] 2024-07-10 11:09:36,589 >> {'loss': 0.0386, 'learning_rate': 4.4750e-06, 'epoch': 3.46, 'throughput': 627.50}

[INFO|callbacks.py:310] 2024-07-10 11:09:47,688 >> {'loss': 0.0319, 'learning_rate': 4.4833e-06, 'epoch': 3.46, 'throughput': 627.47}

[INFO|callbacks.py:310] 2024-07-10 11:09:58,793 >> {'loss': 0.0579, 'learning_rate': 4.4917e-06, 'epoch': 3.47, 'throughput': 627.56}

[INFO|callbacks.py:310] 2024-07-10 11:10:09,903 >> {'loss': 0.0255, 'learning_rate': 4.5000e-06, 'epoch': 3.48, 'throughput': 627.50}

[INFO|callbacks.py:310] 2024-07-10 11:10:20,961 >> {'loss': 0.0301, 'learning_rate': 4.5083e-06, 'epoch': 3.48, 'throughput': 627.46}

[INFO|callbacks.py:310] 2024-07-10 11:10:32,020 >> {'loss': 0.0589, 'learning_rate': 4.5167e-06, 'epoch': 3.49, 'throughput': 627.49}

[INFO|callbacks.py:310] 2024-07-10 11:10:43,080 >> {'loss': 0.0442, 'learning_rate': 4.5250e-06, 'epoch': 3.49, 'throughput': 627.49}

[INFO|callbacks.py:310] 2024-07-10 11:10:54,153 >> {'loss': 0.0366, 'learning_rate': 4.5333e-06, 'epoch': 3.50, 'throughput': 627.54}

[INFO|callbacks.py:310] 2024-07-10 11:11:05,205 >> {'loss': 0.0643, 'learning_rate': 4.5417e-06, 'epoch': 3.51, 'throughput': 627.55}

[INFO|callbacks.py:310] 2024-07-10 11:11:16,310 >> {'loss': 0.0213, 'learning_rate': 4.5500e-06, 'epoch': 3.51, 'throughput': 627.53}

[INFO|callbacks.py:310] 2024-07-10 11:11:27,391 >> {'loss': 0.0154, 'learning_rate': 4.5583e-06, 'epoch': 3.52, 'throughput': 627.52}

[INFO|callbacks.py:310] 2024-07-10 11:11:38,502 >> {'loss': 0.0252, 'learning_rate': 4.5667e-06, 'epoch': 3.53, 'throughput': 627.50}

[INFO|callbacks.py:310] 2024-07-10 11:11:49,602 >> {'loss': 0.0380, 'learning_rate': 4.5750e-06, 'epoch': 3.53, 'throughput': 627.49}

[INFO|callbacks.py:310] 2024-07-10 11:12:00,665 >> {'loss': 0.0169, 'learning_rate': 4.5833e-06, 'epoch': 3.54, 'throughput': 627.50}

[INFO|callbacks.py:310] 2024-07-10 11:12:11,719 >> {'loss': 0.0253, 'learning_rate': 4.5917e-06, 'epoch': 3.55, 'throughput': 627.52}

[INFO|callbacks.py:310] 2024-07-10 11:12:22,790 >> {'loss': 0.0164, 'learning_rate': 4.6000e-06, 'epoch': 3.55, 'throughput': 627.52}

[INFO|callbacks.py:310] 2024-07-10 11:12:33,850 >> {'loss': 0.0269, 'learning_rate': 4.6083e-06, 'epoch': 3.56, 'throughput': 627.59}

[INFO|callbacks.py:310] 2024-07-10 11:12:44,926 >> {'loss': 0.0192, 'learning_rate': 4.6167e-06, 'epoch': 3.57, 'throughput': 627.59}

[INFO|callbacks.py:310] 2024-07-10 11:12:56,022 >> {'loss': 0.0132, 'learning_rate': 4.6250e-06, 'epoch': 3.57, 'throughput': 627.62}

[INFO|callbacks.py:310] 2024-07-10 11:13:07,118 >> {'loss': 0.0156, 'learning_rate': 4.6333e-06, 'epoch': 3.58, 'throughput': 627.66}

[INFO|callbacks.py:310] 2024-07-10 11:13:18,225 >> {'loss': 0.0081, 'learning_rate': 4.6417e-06, 'epoch': 3.58, 'throughput': 627.61}

[INFO|callbacks.py:310] 2024-07-10 11:13:29,325 >> {'loss': 0.0351, 'learning_rate': 4.6500e-06, 'epoch': 3.59, 'throughput': 627.62}

[INFO|callbacks.py:310] 2024-07-10 11:13:40,385 >> {'loss': 0.0288, 'learning_rate': 4.6583e-06, 'epoch': 3.60, 'throughput': 627.55}

[INFO|callbacks.py:310] 2024-07-10 11:13:51,447 >> {'loss': 0.0335, 'learning_rate': 4.6667e-06, 'epoch': 3.60, 'throughput': 627.54}

[INFO|callbacks.py:310] 2024-07-10 11:14:02,532 >> {'loss': 0.0492, 'learning_rate': 4.6750e-06, 'epoch': 3.61, 'throughput': 627.56}

[INFO|callbacks.py:310] 2024-07-10 11:14:13,605 >> {'loss': 0.0082, 'learning_rate': 4.6833e-06, 'epoch': 3.62, 'throughput': 627.54}

[INFO|callbacks.py:310] 2024-07-10 11:14:24,672 >> {'loss': 0.0491, 'learning_rate': 4.6917e-06, 'epoch': 3.62, 'throughput': 627.60}

[INFO|callbacks.py:310] 2024-07-10 11:14:35,763 >> {'loss': 0.0568, 'learning_rate': 4.7000e-06, 'epoch': 3.63, 'throughput': 627.59}

[INFO|callbacks.py:310] 2024-07-10 11:14:46,861 >> {'loss': 0.0414, 'learning_rate': 4.7083e-06, 'epoch': 3.64, 'throughput': 627.63}

[INFO|callbacks.py:310] 2024-07-10 11:14:57,976 >> {'loss': 0.0340, 'learning_rate': 4.7167e-06, 'epoch': 3.64, 'throughput': 627.63}

[INFO|callbacks.py:310] 2024-07-10 11:15:09,086 >> {'loss': 0.0392, 'learning_rate': 4.7250e-06, 'epoch': 3.65, 'throughput': 627.59}

[INFO|callbacks.py:310] 2024-07-10 11:15:20,130 >> {'loss': 0.0510, 'learning_rate': 4.7333e-06, 'epoch': 3.66, 'throughput': 627.66}

[INFO|callbacks.py:310] 2024-07-10 11:15:31,193 >> {'loss': 0.0340, 'learning_rate': 4.7417e-06, 'epoch': 3.66, 'throughput': 627.64}

[INFO|callbacks.py:310] 2024-07-10 11:15:42,272 >> {'loss': 0.0140, 'learning_rate': 4.7500e-06, 'epoch': 3.67, 'throughput': 627.68}

[INFO|callbacks.py:310] 2024-07-10 11:15:53,335 >> {'loss': 0.0406, 'learning_rate': 4.7583e-06, 'epoch': 3.67, 'throughput': 627.69}

[INFO|callbacks.py:310] 2024-07-10 11:16:04,431 >> {'loss': 0.0407, 'learning_rate': 4.7667e-06, 'epoch': 3.68, 'throughput': 627.65}

[INFO|callbacks.py:310] 2024-07-10 11:16:15,524 >> {'loss': 0.0282, 'learning_rate': 4.7750e-06, 'epoch': 3.69, 'throughput': 627.65}

[INFO|callbacks.py:310] 2024-07-10 11:16:26,634 >> {'loss': 0.0326, 'learning_rate': 4.7833e-06, 'epoch': 3.69, 'throughput': 627.66}

[INFO|callbacks.py:310] 2024-07-10 11:16:37,755 >> {'loss': 0.0348, 'learning_rate': 4.7917e-06, 'epoch': 3.70, 'throughput': 627.62}

[INFO|callbacks.py:310] 2024-07-10 11:16:48,840 >> {'loss': 0.0256, 'learning_rate': 4.8000e-06, 'epoch': 3.71, 'throughput': 627.70}

[INFO|callbacks.py:310] 2024-07-10 11:16:59,875 >> {'loss': 0.0765, 'learning_rate': 4.8083e-06, 'epoch': 3.71, 'throughput': 627.80}

[INFO|callbacks.py:310] 2024-07-10 11:17:10,939 >> {'loss': 0.0099, 'learning_rate': 4.8167e-06, 'epoch': 3.72, 'throughput': 627.79}

[INFO|callbacks.py:310] 2024-07-10 11:17:22,014 >> {'loss': 0.0173, 'learning_rate': 4.8250e-06, 'epoch': 3.73, 'throughput': 627.77}

[INFO|callbacks.py:310] 2024-07-10 11:17:33,068 >> {'loss': 0.0084, 'learning_rate': 4.8333e-06, 'epoch': 3.73, 'throughput': 627.76}

[INFO|callbacks.py:310] 2024-07-10 11:17:44,146 >> {'loss': 0.0251, 'learning_rate': 4.8417e-06, 'epoch': 3.74, 'throughput': 627.75}

[INFO|callbacks.py:310] 2024-07-10 11:17:55,245 >> {'loss': 0.0909, 'learning_rate': 4.8500e-06, 'epoch': 3.75, 'throughput': 627.77}

[INFO|callbacks.py:310] 2024-07-10 11:18:06,345 >> {'loss': 0.0390, 'learning_rate': 4.8583e-06, 'epoch': 3.75, 'throughput': 627.73}

[INFO|callbacks.py:310] 2024-07-10 11:18:17,440 >> {'loss': 0.0079, 'learning_rate': 4.8667e-06, 'epoch': 3.76, 'throughput': 627.72}

[INFO|callbacks.py:310] 2024-07-10 11:18:28,507 >> {'loss': 0.0242, 'learning_rate': 4.8750e-06, 'epoch': 3.77, 'throughput': 627.72}

[INFO|callbacks.py:310] 2024-07-10 11:18:39,555 >> {'loss': 0.0138, 'learning_rate': 4.8833e-06, 'epoch': 3.77, 'throughput': 627.70}

[INFO|callbacks.py:310] 2024-07-10 11:18:50,609 >> {'loss': 0.0467, 'learning_rate': 4.8917e-06, 'epoch': 3.78, 'throughput': 627.69}

[INFO|callbacks.py:310] 2024-07-10 11:19:01,679 >> {'loss': 0.0241, 'learning_rate': 4.9000e-06, 'epoch': 3.78, 'throughput': 627.68}

[INFO|callbacks.py:310] 2024-07-10 11:19:12,734 >> {'loss': 0.0404, 'learning_rate': 4.9083e-06, 'epoch': 3.79, 'throughput': 627.65}

[INFO|callbacks.py:310] 2024-07-10 11:19:23,829 >> {'loss': 0.0094, 'learning_rate': 4.9167e-06, 'epoch': 3.80, 'throughput': 627.59}

[INFO|callbacks.py:310] 2024-07-10 11:19:34,928 >> {'loss': 0.0196, 'learning_rate': 4.9250e-06, 'epoch': 3.80, 'throughput': 627.56}

[INFO|callbacks.py:310] 2024-07-10 11:19:46,044 >> {'loss': 0.0514, 'learning_rate': 4.9333e-06, 'epoch': 3.81, 'throughput': 627.59}

[INFO|callbacks.py:310] 2024-07-10 11:19:57,148 >> {'loss': 0.0286, 'learning_rate': 4.9417e-06, 'epoch': 3.82, 'throughput': 627.61}

[INFO|callbacks.py:310] 2024-07-10 11:20:08,215 >> {'loss': 0.0417, 'learning_rate': 4.9500e-06, 'epoch': 3.82, 'throughput': 627.63}

[INFO|callbacks.py:310] 2024-07-10 11:20:19,257 >> {'loss': 0.0347, 'learning_rate': 4.9583e-06, 'epoch': 3.83, 'throughput': 627.64}

[INFO|callbacks.py:310] 2024-07-10 11:20:30,318 >> {'loss': 0.0229, 'learning_rate': 4.9667e-06, 'epoch': 3.84, 'throughput': 627.67}

[INFO|callbacks.py:310] 2024-07-10 11:20:41,383 >> {'loss': 0.0176, 'learning_rate': 4.9750e-06, 'epoch': 3.84, 'throughput': 627.66}

[INFO|callbacks.py:310] 2024-07-10 11:20:52,450 >> {'loss': 0.0274, 'learning_rate': 4.9833e-06, 'epoch': 3.85, 'throughput': 627.66}

[INFO|callbacks.py:310] 2024-07-10 11:21:03,555 >> {'loss': 0.0253, 'learning_rate': 4.9917e-06, 'epoch': 3.86, 'throughput': 627.65}

[INFO|callbacks.py:310] 2024-07-10 11:21:14,642 >> {'loss': 0.0429, 'learning_rate': 5.0000e-06, 'epoch': 3.86, 'throughput': 627.67}

[INFO|callbacks.py:310] 2024-07-10 11:21:25,745 >> {'loss': 0.0142, 'learning_rate': 4.9996e-06, 'epoch': 3.87, 'throughput': 627.68}

[INFO|callbacks.py:310] 2024-07-10 11:21:36,857 >> {'loss': 0.0148, 'learning_rate': 4.9984e-06, 'epoch': 3.87, 'throughput': 627.68}

[INFO|callbacks.py:310] 2024-07-10 11:21:47,915 >> {'loss': 0.0166, 'learning_rate': 4.9964e-06, 'epoch': 3.88, 'throughput': 627.70}

[INFO|callbacks.py:310] 2024-07-10 11:21:58,970 >> {'loss': 0.0224, 'learning_rate': 4.9936e-06, 'epoch': 3.89, 'throughput': 627.68}

[INFO|callbacks.py:310] 2024-07-10 11:22:10,033 >> {'loss': 0.0227, 'learning_rate': 4.9899e-06, 'epoch': 3.89, 'throughput': 627.71}

[INFO|callbacks.py:310] 2024-07-10 11:22:21,096 >> {'loss': 0.0597, 'learning_rate': 4.9855e-06, 'epoch': 3.90, 'throughput': 627.77}

[INFO|callbacks.py:310] 2024-07-10 11:22:32,152 >> {'loss': 0.0339, 'learning_rate': 4.9803e-06, 'epoch': 3.91, 'throughput': 627.83}

[INFO|callbacks.py:310] 2024-07-10 11:22:43,259 >> {'loss': 0.0192, 'learning_rate': 4.9743e-06, 'epoch': 3.91, 'throughput': 627.82}

[INFO|callbacks.py:310] 2024-07-10 11:22:54,352 >> {'loss': 0.0070, 'learning_rate': 4.9674e-06, 'epoch': 3.92, 'throughput': 627.77}

[INFO|callbacks.py:310] 2024-07-10 11:23:05,462 >> {'loss': 0.0170, 'learning_rate': 4.9598e-06, 'epoch': 3.93, 'throughput': 627.74}

[INFO|callbacks.py:310] 2024-07-10 11:23:16,556 >> {'loss': 0.0296, 'learning_rate': 4.9514e-06, 'epoch': 3.93, 'throughput': 627.77}

[INFO|callbacks.py:310] 2024-07-10 11:23:27,614 >> {'loss': 0.0429, 'learning_rate': 4.9422e-06, 'epoch': 3.94, 'throughput': 627.77}

[INFO|callbacks.py:310] 2024-07-10 11:23:38,678 >> {'loss': 0.0027, 'learning_rate': 4.9322e-06, 'epoch': 3.95, 'throughput': 627.78}

[INFO|callbacks.py:310] 2024-07-10 11:23:49,754 >> {'loss': 0.0395, 'learning_rate': 4.9215e-06, 'epoch': 3.95, 'throughput': 627.75}

[INFO|callbacks.py:310] 2024-07-10 11:24:00,825 >> {'loss': 0.0300, 'learning_rate': 4.9099e-06, 'epoch': 3.96, 'throughput': 627.74}

[INFO|callbacks.py:310] 2024-07-10 11:24:11,907 >> {'loss': 0.0178, 'learning_rate': 4.8976e-06, 'epoch': 3.96, 'throughput': 627.74}

[INFO|callbacks.py:310] 2024-07-10 11:24:23,007 >> {'loss': 0.0139, 'learning_rate': 4.8845e-06, 'epoch': 3.97, 'throughput': 627.76}

[INFO|callbacks.py:310] 2024-07-10 11:24:34,099 >> {'loss': 0.0076, 'learning_rate': 4.8706e-06, 'epoch': 3.98, 'throughput': 627.80}

[INFO|callbacks.py:310] 2024-07-10 11:24:45,203 >> {'loss': 0.0683, 'learning_rate': 4.8560e-06, 'epoch': 3.98, 'throughput': 627.89}

[INFO|callbacks.py:310] 2024-07-10 11:24:56,293 >> {'loss': 0.0367, 'learning_rate': 4.8406e-06, 'epoch': 3.99, 'throughput': 627.88}

[INFO|callbacks.py:310] 2024-07-10 11:25:07,338 >> {'loss': 0.0497, 'learning_rate': 4.8244e-06, 'epoch': 4.00, 'throughput': 627.92}

[INFO|callbacks.py:310] 2024-07-10 11:25:18,410 >> {'loss': 0.0054, 'learning_rate': 4.8075e-06, 'epoch': 4.00, 'throughput': 627.91}

[INFO|callbacks.py:310] 2024-07-10 11:25:29,477 >> {'loss': 0.0259, 'learning_rate': 4.7899e-06, 'epoch': 4.01, 'throughput': 627.95}

[INFO|callbacks.py:310] 2024-07-10 11:25:40,538 >> {'loss': 0.0082, 'learning_rate': 4.7715e-06, 'epoch': 4.02, 'throughput': 627.93}

[INFO|callbacks.py:310] 2024-07-10 11:25:51,616 >> {'loss': 0.0359, 'learning_rate': 4.7524e-06, 'epoch': 4.02, 'throughput': 627.89}

[INFO|callbacks.py:310] 2024-07-10 11:26:02,715 >> {'loss': 0.0081, 'learning_rate': 4.7326e-06, 'epoch': 4.03, 'throughput': 627.83}

[INFO|callbacks.py:310] 2024-07-10 11:26:13,814 >> {'loss': 0.0118, 'learning_rate': 4.7120e-06, 'epoch': 4.04, 'throughput': 627.82}

[INFO|callbacks.py:310] 2024-07-10 11:26:24,925 >> {'loss': 0.0235, 'learning_rate': 4.6908e-06, 'epoch': 4.04, 'throughput': 627.78}

[INFO|callbacks.py:310] 2024-07-10 11:26:36,011 >> {'loss': 0.0145, 'learning_rate': 4.6688e-06, 'epoch': 4.05, 'throughput': 627.79}

[INFO|callbacks.py:310] 2024-07-10 11:26:47,057 >> {'loss': 0.0060, 'learning_rate': 4.6461e-06, 'epoch': 4.05, 'throughput': 627.81}

[INFO|callbacks.py:310] 2024-07-10 11:26:58,101 >> {'loss': 0.0191, 'learning_rate': 4.6228e-06, 'epoch': 4.06, 'throughput': 627.84}

[INFO|callbacks.py:310] 2024-07-10 11:27:09,174 >> {'loss': 0.0075, 'learning_rate': 4.5987e-06, 'epoch': 4.07, 'throughput': 627.87}

[INFO|callbacks.py:310] 2024-07-10 11:27:20,239 >> {'loss': 0.0253, 'learning_rate': 4.5740e-06, 'epoch': 4.07, 'throughput': 627.82}

[INFO|callbacks.py:310] 2024-07-10 11:27:31,322 >> {'loss': 0.0317, 'learning_rate': 4.5486e-06, 'epoch': 4.08, 'throughput': 627.85}

[INFO|callbacks.py:310] 2024-07-10 11:27:42,405 >> {'loss': 0.0082, 'learning_rate': 4.5225e-06, 'epoch': 4.09, 'throughput': 627.87}

[INFO|callbacks.py:310] 2024-07-10 11:27:53,519 >> {'loss': 0.0101, 'learning_rate': 4.4958e-06, 'epoch': 4.09, 'throughput': 627.86}

[INFO|callbacks.py:310] 2024-07-10 11:28:04,623 >> {'loss': 0.0112, 'learning_rate': 4.4685e-06, 'epoch': 4.10, 'throughput': 627.84}

[INFO|callbacks.py:310] 2024-07-10 11:28:15,703 >> {'loss': 0.0034, 'learning_rate': 4.4405e-06, 'epoch': 4.11, 'throughput': 627.82}

[INFO|callbacks.py:310] 2024-07-10 11:28:26,757 >> {'loss': 0.0037, 'learning_rate': 4.4119e-06, 'epoch': 4.11, 'throughput': 627.79}

[INFO|callbacks.py:310] 2024-07-10 11:28:37,819 >> {'loss': 0.0085, 'learning_rate': 4.3827e-06, 'epoch': 4.12, 'throughput': 627.83}

[INFO|callbacks.py:310] 2024-07-10 11:28:48,903 >> {'loss': 0.0328, 'learning_rate': 4.3528e-06, 'epoch': 4.13, 'throughput': 627.77}

[INFO|callbacks.py:310] 2024-07-10 11:28:59,973 >> {'loss': 0.0098, 'learning_rate': 4.3224e-06, 'epoch': 4.13, 'throughput': 627.77}

[INFO|callbacks.py:310] 2024-07-10 11:29:11,062 >> {'loss': 0.0051, 'learning_rate': 4.2914e-06, 'epoch': 4.14, 'throughput': 627.80}

[INFO|callbacks.py:310] 2024-07-10 11:29:22,169 >> {'loss': 0.0047, 'learning_rate': 4.2598e-06, 'epoch': 4.14, 'throughput': 627.78}

[INFO|callbacks.py:310] 2024-07-10 11:29:33,279 >> {'loss': 0.0111, 'learning_rate': 4.2277e-06, 'epoch': 4.15, 'throughput': 627.73}

[INFO|callbacks.py:310] 2024-07-10 11:29:44,363 >> {'loss': 0.0334, 'learning_rate': 4.1949e-06, 'epoch': 4.16, 'throughput': 627.74}

[INFO|callbacks.py:310] 2024-07-10 11:29:55,430 >> {'loss': 0.0130, 'learning_rate': 4.1617e-06, 'epoch': 4.16, 'throughput': 627.71}

[INFO|callbacks.py:310] 2024-07-10 11:30:06,480 >> {'loss': 0.0305, 'learning_rate': 4.1279e-06, 'epoch': 4.17, 'throughput': 627.70}

[INFO|callbacks.py:310] 2024-07-10 11:30:17,554 >> {'loss': 0.0092, 'learning_rate': 4.0936e-06, 'epoch': 4.18, 'throughput': 627.69}

[INFO|callbacks.py:310] 2024-07-10 11:30:28,611 >> {'loss': 0.0223, 'learning_rate': 4.0587e-06, 'epoch': 4.18, 'throughput': 627.69}

[INFO|callbacks.py:310] 2024-07-10 11:30:39,680 >> {'loss': 0.0234, 'learning_rate': 4.0234e-06, 'epoch': 4.19, 'throughput': 627.69}

[INFO|callbacks.py:310] 2024-07-10 11:30:50,780 >> {'loss': 0.0185, 'learning_rate': 3.9876e-06, 'epoch': 4.20, 'throughput': 627.66}

[INFO|callbacks.py:310] 2024-07-10 11:31:01,870 >> {'loss': 0.0075, 'learning_rate': 3.9512e-06, 'epoch': 4.20, 'throughput': 627.69}

[INFO|callbacks.py:310] 2024-07-10 11:31:12,990 >> {'loss': 0.0217, 'learning_rate': 3.9145e-06, 'epoch': 4.21, 'throughput': 627.66}

[INFO|callbacks.py:310] 2024-07-10 11:31:24,092 >> {'loss': 0.0027, 'learning_rate': 3.8772e-06, 'epoch': 4.22, 'throughput': 627.61}

[INFO|callbacks.py:310] 2024-07-10 11:31:35,151 >> {'loss': 0.0153, 'learning_rate': 3.8396e-06, 'epoch': 4.22, 'throughput': 627.61}

[INFO|callbacks.py:310] 2024-07-10 11:31:46,202 >> {'loss': 0.0092, 'learning_rate': 3.8015e-06, 'epoch': 4.23, 'throughput': 627.70}

[INFO|callbacks.py:310] 2024-07-10 11:31:57,251 >> {'loss': 0.0082, 'learning_rate': 3.7629e-06, 'epoch': 4.23, 'throughput': 627.75}

[INFO|callbacks.py:310] 2024-07-10 11:32:08,322 >> {'loss': 0.0163, 'learning_rate': 3.7240e-06, 'epoch': 4.24, 'throughput': 627.74}

[INFO|callbacks.py:310] 2024-07-10 11:32:19,402 >> {'loss': 0.0099, 'learning_rate': 3.6847e-06, 'epoch': 4.25, 'throughput': 627.76}

[INFO|callbacks.py:310] 2024-07-10 11:32:30,494 >> {'loss': 0.0269, 'learning_rate': 3.6450e-06, 'epoch': 4.25, 'throughput': 627.72}

[INFO|callbacks.py:310] 2024-07-10 11:32:41,583 >> {'loss': 0.0070, 'learning_rate': 3.6049e-06, 'epoch': 4.26, 'throughput': 627.77}

[INFO|callbacks.py:310] 2024-07-10 11:32:52,686 >> {'loss': 0.0427, 'learning_rate': 3.5644e-06, 'epoch': 4.27, 'throughput': 627.78}

[INFO|callbacks.py:310] 2024-07-10 11:33:03,792 >> {'loss': 0.0122, 'learning_rate': 3.5237e-06, 'epoch': 4.27, 'throughput': 627.75}

[INFO|callbacks.py:310] 2024-07-10 11:33:14,841 >> {'loss': 0.0137, 'learning_rate': 3.4826e-06, 'epoch': 4.28, 'throughput': 627.68}

[INFO|callbacks.py:310] 2024-07-10 11:33:25,903 >> {'loss': 0.0200, 'learning_rate': 3.4411e-06, 'epoch': 4.29, 'throughput': 627.69}

[INFO|callbacks.py:310] 2024-07-10 11:33:36,964 >> {'loss': 0.0060, 'learning_rate': 3.3994e-06, 'epoch': 4.29, 'throughput': 627.65}

[INFO|callbacks.py:310] 2024-07-10 11:33:48,029 >> {'loss': 0.0085, 'learning_rate': 3.3574e-06, 'epoch': 4.30, 'throughput': 627.65}

[INFO|callbacks.py:310] 2024-07-10 11:33:59,105 >> {'loss': 0.0112, 'learning_rate': 3.3151e-06, 'epoch': 4.31, 'throughput': 627.63}

[INFO|callbacks.py:310] 2024-07-10 11:34:10,196 >> {'loss': 0.0131, 'learning_rate': 3.2725e-06, 'epoch': 4.31, 'throughput': 627.64}

[INFO|callbacks.py:310] 2024-07-10 11:34:21,290 >> {'loss': 0.0124, 'learning_rate': 3.2297e-06, 'epoch': 4.32, 'throughput': 627.68}

[INFO|callbacks.py:310] 2024-07-10 11:34:32,399 >> {'loss': 0.0042, 'learning_rate': 3.1867e-06, 'epoch': 4.33, 'throughput': 627.70}

[INFO|callbacks.py:310] 2024-07-10 11:34:43,488 >> {'loss': 0.0011, 'learning_rate': 3.1434e-06, 'epoch': 4.33, 'throughput': 627.70}

[INFO|callbacks.py:310] 2024-07-10 11:34:54,540 >> {'loss': 0.0008, 'learning_rate': 3.1000e-06, 'epoch': 4.34, 'throughput': 627.70}

[INFO|callbacks.py:310] 2024-07-10 11:35:05,591 >> {'loss': 0.0056, 'learning_rate': 3.0563e-06, 'epoch': 4.34, 'throughput': 627.71}

[INFO|callbacks.py:310] 2024-07-10 11:35:16,659 >> {'loss': 0.0079, 'learning_rate': 3.0125e-06, 'epoch': 4.35, 'throughput': 627.75}

[INFO|callbacks.py:310] 2024-07-10 11:35:27,711 >> {'loss': 0.0216, 'learning_rate': 2.9685e-06, 'epoch': 4.36, 'throughput': 627.74}

[INFO|callbacks.py:310] 2024-07-10 11:35:38,807 >> {'loss': 0.0258, 'learning_rate': 2.9243e-06, 'epoch': 4.36, 'throughput': 627.65}

[INFO|callbacks.py:310] 2024-07-10 11:35:49,889 >> {'loss': 0.0049, 'learning_rate': 2.8800e-06, 'epoch': 4.37, 'throughput': 627.66}

[INFO|callbacks.py:310] 2024-07-10 11:36:00,996 >> {'loss': 0.0034, 'learning_rate': 2.8356e-06, 'epoch': 4.38, 'throughput': 627.67}

[INFO|callbacks.py:310] 2024-07-10 11:36:12,090 >> {'loss': 0.0341, 'learning_rate': 2.7911e-06, 'epoch': 4.38, 'throughput': 627.63}

[INFO|callbacks.py:310] 2024-07-10 11:36:23,185 >> {'loss': 0.0074, 'learning_rate': 2.7464e-06, 'epoch': 4.39, 'throughput': 627.61}

[INFO|callbacks.py:310] 2024-07-10 11:36:34,236 >> {'loss': 0.0272, 'learning_rate': 2.7017e-06, 'epoch': 4.40, 'throughput': 627.63}

[INFO|callbacks.py:310] 2024-07-10 11:36:45,287 >> {'loss': 0.0168, 'learning_rate': 2.6570e-06, 'epoch': 4.40, 'throughput': 627.64}

[INFO|callbacks.py:310] 2024-07-10 11:36:56,354 >> {'loss': 0.0047, 'learning_rate': 2.6122e-06, 'epoch': 4.41, 'throughput': 627.62}

[INFO|callbacks.py:310] 2024-07-10 11:37:07,412 >> {'loss': 0.0198, 'learning_rate': 2.5673e-06, 'epoch': 4.42, 'throughput': 627.67}

[INFO|callbacks.py:310] 2024-07-10 11:37:18,505 >> {'loss': 0.0012, 'learning_rate': 2.5224e-06, 'epoch': 4.42, 'throughput': 627.65}

[INFO|callbacks.py:310] 2024-07-10 11:37:29,586 >> {'loss': 0.0256, 'learning_rate': 2.4776e-06, 'epoch': 4.43, 'throughput': 627.66}

[INFO|callbacks.py:310] 2024-07-10 11:37:40,693 >> {'loss': 0.0187, 'learning_rate': 2.4327e-06, 'epoch': 4.43, 'throughput': 627.62}

[INFO|callbacks.py:310] 2024-07-10 11:37:51,810 >> {'loss': 0.0135, 'learning_rate': 2.3878e-06, 'epoch': 4.44, 'throughput': 627.57}

[INFO|callbacks.py:310] 2024-07-10 11:38:02,880 >> {'loss': 0.0036, 'learning_rate': 2.3430e-06, 'epoch': 4.45, 'throughput': 627.56}

[INFO|callbacks.py:310] 2024-07-10 11:38:13,935 >> {'loss': 0.0135, 'learning_rate': 2.2983e-06, 'epoch': 4.45, 'throughput': 627.65}

[INFO|callbacks.py:310] 2024-07-10 11:38:25,002 >> {'loss': 0.0046, 'learning_rate': 2.2536e-06, 'epoch': 4.46, 'throughput': 627.64}

[INFO|callbacks.py:310] 2024-07-10 11:38:36,083 >> {'loss': 0.0019, 'learning_rate': 2.2089e-06, 'epoch': 4.47, 'throughput': 627.58}

[INFO|callbacks.py:310] 2024-07-10 11:38:47,131 >> {'loss': 0.0022, 'learning_rate': 2.1644e-06, 'epoch': 4.47, 'throughput': 627.59}

[INFO|callbacks.py:310] 2024-07-10 11:38:58,235 >> {'loss': 0.0093, 'learning_rate': 2.1200e-06, 'epoch': 4.48, 'throughput': 627.57}

[INFO|callbacks.py:310] 2024-07-10 11:39:09,313 >> {'loss': 0.0012, 'learning_rate': 2.0757e-06, 'epoch': 4.49, 'throughput': 627.60}

[INFO|callbacks.py:310] 2024-07-10 11:39:20,422 >> {'loss': 0.0036, 'learning_rate': 2.0315e-06, 'epoch': 4.49, 'throughput': 627.60}

[INFO|callbacks.py:310] 2024-07-10 11:39:31,521 >> {'loss': 0.0071, 'learning_rate': 1.9875e-06, 'epoch': 4.50, 'throughput': 627.60}

[INFO|callbacks.py:310] 2024-07-10 11:39:42,574 >> {'loss': 0.0055, 'learning_rate': 1.9437e-06, 'epoch': 4.51, 'throughput': 627.62}

[INFO|callbacks.py:310] 2024-07-10 11:39:53,619 >> {'loss': 0.0291, 'learning_rate': 1.9000e-06, 'epoch': 4.51, 'throughput': 627.63}

[INFO|callbacks.py:310] 2024-07-10 11:40:04,687 >> {'loss': 0.0407, 'learning_rate': 1.8566e-06, 'epoch': 4.52, 'throughput': 627.65}

[INFO|callbacks.py:310] 2024-07-10 11:40:15,754 >> {'loss': 0.0245, 'learning_rate': 1.8133e-06, 'epoch': 4.52, 'throughput': 627.65}

[INFO|callbacks.py:310] 2024-07-10 11:40:26,817 >> {'loss': 0.0207, 'learning_rate': 1.7703e-06, 'epoch': 4.53, 'throughput': 627.66}

[INFO|callbacks.py:310] 2024-07-10 11:40:37,916 >> {'loss': 0.0269, 'learning_rate': 1.7275e-06, 'epoch': 4.54, 'throughput': 627.64}

[INFO|callbacks.py:310] 2024-07-10 11:40:49,015 >> {'loss': 0.0010, 'learning_rate': 1.6849e-06, 'epoch': 4.54, 'throughput': 627.64}

[INFO|callbacks.py:310] 2024-07-10 11:41:00,132 >> {'loss': 0.0205, 'learning_rate': 1.6426e-06, 'epoch': 4.55, 'throughput': 627.57}

[INFO|callbacks.py:310] 2024-07-10 11:41:11,234 >> {'loss': 0.0148, 'learning_rate': 1.6006e-06, 'epoch': 4.56, 'throughput': 627.56}

[INFO|callbacks.py:310] 2024-07-10 11:41:22,294 >> {'loss': 0.0239, 'learning_rate': 1.5589e-06, 'epoch': 4.56, 'throughput': 627.54}

[INFO|callbacks.py:310] 2024-07-10 11:41:33,354 >> {'loss': 0.0165, 'learning_rate': 1.5174e-06, 'epoch': 4.57, 'throughput': 627.53}

[INFO|callbacks.py:310] 2024-07-10 11:41:44,406 >> {'loss': 0.0079, 'learning_rate': 1.4763e-06, 'epoch': 4.58, 'throughput': 627.60}

[INFO|callbacks.py:310] 2024-07-10 11:41:55,475 >> {'loss': 0.0338, 'learning_rate': 1.4356e-06, 'epoch': 4.58, 'throughput': 627.56}

[INFO|callbacks.py:310] 2024-07-10 11:42:06,543 >> {'loss': 0.0136, 'learning_rate': 1.3951e-06, 'epoch': 4.59, 'throughput': 627.55}

[INFO|callbacks.py:310] 2024-07-10 11:42:17,633 >> {'loss': 0.0126, 'learning_rate': 1.3550e-06, 'epoch': 4.60, 'throughput': 627.60}

[INFO|callbacks.py:310] 2024-07-10 11:42:28,733 >> {'loss': 0.0174, 'learning_rate': 1.3153e-06, 'epoch': 4.60, 'throughput': 627.61}

[INFO|callbacks.py:310] 2024-07-10 11:42:39,839 >> {'loss': 0.0155, 'learning_rate': 1.2760e-06, 'epoch': 4.61, 'throughput': 627.61}

[INFO|callbacks.py:310] 2024-07-10 11:42:50,932 >> {'loss': 0.0349, 'learning_rate': 1.2371e-06, 'epoch': 4.61, 'throughput': 627.65}

[INFO|callbacks.py:310] 2024-07-10 11:43:01,989 >> {'loss': 0.0082, 'learning_rate': 1.1985e-06, 'epoch': 4.62, 'throughput': 627.67}

[INFO|callbacks.py:310] 2024-07-10 11:43:13,052 >> {'loss': 0.0204, 'learning_rate': 1.1604e-06, 'epoch': 4.63, 'throughput': 627.69}

[INFO|callbacks.py:310] 2024-07-10 11:43:24,131 >> {'loss': 0.0074, 'learning_rate': 1.1228e-06, 'epoch': 4.63, 'throughput': 627.68}

[INFO|callbacks.py:310] 2024-07-10 11:43:35,188 >> {'loss': 0.0154, 'learning_rate': 1.0855e-06, 'epoch': 4.64, 'throughput': 627.69}

[INFO|callbacks.py:310] 2024-07-10 11:43:46,279 >> {'loss': 0.0101, 'learning_rate': 1.0488e-06, 'epoch': 4.65, 'throughput': 627.65}

[INFO|callbacks.py:310] 2024-07-10 11:43:57,371 >> {'loss': 0.0107, 'learning_rate': 1.0124e-06, 'epoch': 4.65, 'throughput': 627.64}

[INFO|callbacks.py:310] 2024-07-10 11:44:08,486 >> {'loss': 0.0279, 'learning_rate': 9.7661e-07, 'epoch': 4.66, 'throughput': 627.62}

[INFO|callbacks.py:310] 2024-07-10 11:44:19,595 >> {'loss': 0.0044, 'learning_rate': 9.4128e-07, 'epoch': 4.67, 'throughput': 627.61}

[INFO|callbacks.py:310] 2024-07-10 11:44:30,682 >> {'loss': 0.0021, 'learning_rate': 9.0644e-07, 'epoch': 4.67, 'throughput': 627.61}

[INFO|callbacks.py:310] 2024-07-10 11:44:41,722 >> {'loss': 0.0086, 'learning_rate': 8.7212e-07, 'epoch': 4.68, 'throughput': 627.67}

[INFO|callbacks.py:310] 2024-07-10 11:44:52,781 >> {'loss': 0.0050, 'learning_rate': 8.3832e-07, 'epoch': 4.69, 'throughput': 627.71}

[INFO|callbacks.py:310] 2024-07-10 11:45:03,852 >> {'loss': 0.0096, 'learning_rate': 8.0506e-07, 'epoch': 4.69, 'throughput': 627.71}

[INFO|callbacks.py:310] 2024-07-10 11:45:14,912 >> {'loss': 0.0029, 'learning_rate': 7.7234e-07, 'epoch': 4.70, 'throughput': 627.72}

[INFO|callbacks.py:310] 2024-07-10 11:45:26,003 >> {'loss': 0.0082, 'learning_rate': 7.4018e-07, 'epoch': 4.70, 'throughput': 627.69}

[INFO|callbacks.py:310] 2024-07-10 11:45:37,098 >> {'loss': 0.0033, 'learning_rate': 7.0859e-07, 'epoch': 4.71, 'throughput': 627.69}

[INFO|callbacks.py:310] 2024-07-10 11:45:48,185 >> {'loss': 0.0253, 'learning_rate': 6.7758e-07, 'epoch': 4.72, 'throughput': 627.72}

[INFO|callbacks.py:310] 2024-07-10 11:45:59,293 >> {'loss': 0.0117, 'learning_rate': 6.4715e-07, 'epoch': 4.72, 'throughput': 627.69}

[INFO|callbacks.py:310] 2024-07-10 11:46:10,363 >> {'loss': 0.0052, 'learning_rate': 6.1732e-07, 'epoch': 4.73, 'throughput': 627.71}

[INFO|callbacks.py:310] 2024-07-10 11:46:21,413 >> {'loss': 0.0305, 'learning_rate': 5.8810e-07, 'epoch': 4.74, 'throughput': 627.69}

[INFO|callbacks.py:310] 2024-07-10 11:46:32,459 >> {'loss': 0.0018, 'learning_rate': 5.5949e-07, 'epoch': 4.74, 'throughput': 627.73}

[INFO|callbacks.py:310] 2024-07-10 11:46:43,535 >> {'loss': 0.0066, 'learning_rate': 5.3151e-07, 'epoch': 4.75, 'throughput': 627.75}

[INFO|callbacks.py:310] 2024-07-10 11:46:54,587 >> {'loss': 0.0032, 'learning_rate': 5.0416e-07, 'epoch': 4.76, 'throughput': 627.77}

[INFO|callbacks.py:310] 2024-07-10 11:47:05,692 >> {'loss': 0.0089, 'learning_rate': 4.7746e-07, 'epoch': 4.76, 'throughput': 627.77}

[INFO|callbacks.py:310] 2024-07-10 11:47:16,775 >> {'loss': 0.0124, 'learning_rate': 4.5141e-07, 'epoch': 4.77, 'throughput': 627.76}

[INFO|callbacks.py:310] 2024-07-10 11:47:27,872 >> {'loss': 0.0022, 'learning_rate': 4.2601e-07, 'epoch': 4.78, 'throughput': 627.85}

[INFO|callbacks.py:310] 2024-07-10 11:47:38,965 >> {'loss': 0.0063, 'learning_rate': 4.0129e-07, 'epoch': 4.78, 'throughput': 627.86}

[INFO|callbacks.py:310] 2024-07-10 11:47:50,038 >> {'loss': 0.0149, 'learning_rate': 3.7724e-07, 'epoch': 4.79, 'throughput': 627.92}

[INFO|callbacks.py:310] 2024-07-10 11:48:01,087 >> {'loss': 0.0012, 'learning_rate': 3.5388e-07, 'epoch': 4.79, 'throughput': 627.94}

[INFO|callbacks.py:310] 2024-07-10 11:48:12,150 >> {'loss': 0.0127, 'learning_rate': 3.3121e-07, 'epoch': 4.80, 'throughput': 627.91}

[INFO|callbacks.py:310] 2024-07-10 11:48:23,214 >> {'loss': 0.0018, 'learning_rate': 3.0923e-07, 'epoch': 4.81, 'throughput': 627.93}

[INFO|callbacks.py:310] 2024-07-10 11:48:34,274 >> {'loss': 0.0034, 'learning_rate': 2.8797e-07, 'epoch': 4.81, 'throughput': 627.93}

[INFO|callbacks.py:310] 2024-07-10 11:48:45,358 >> {'loss': 0.0137, 'learning_rate': 2.6741e-07, 'epoch': 4.82, 'throughput': 628.03}

[INFO|callbacks.py:310] 2024-07-10 11:48:56,449 >> {'loss': 0.0219, 'learning_rate': 2.4758e-07, 'epoch': 4.83, 'throughput': 628.02}

[INFO|callbacks.py:310] 2024-07-10 11:49:07,559 >> {'loss': 0.0014, 'learning_rate': 2.2847e-07, 'epoch': 4.83, 'throughput': 627.99}

[INFO|callbacks.py:310] 2024-07-10 11:49:18,677 >> {'loss': 0.0190, 'learning_rate': 2.1009e-07, 'epoch': 4.84, 'throughput': 628.01}

[INFO|callbacks.py:310] 2024-07-10 11:49:29,723 >> {'loss': 0.0075, 'learning_rate': 1.9245e-07, 'epoch': 4.85, 'throughput': 628.07}

[INFO|callbacks.py:310] 2024-07-10 11:49:40,779 >> {'loss': 0.0078, 'learning_rate': 1.7556e-07, 'epoch': 4.85, 'throughput': 628.06}

[INFO|callbacks.py:310] 2024-07-10 11:49:51,843 >> {'loss': 0.0244, 'learning_rate': 1.5941e-07, 'epoch': 4.86, 'throughput': 628.07}

[INFO|callbacks.py:310] 2024-07-10 11:50:02,918 >> {'loss': 0.0218, 'learning_rate': 1.4402e-07, 'epoch': 4.87, 'throughput': 628.05}

[INFO|callbacks.py:310] 2024-07-10 11:50:13,988 >> {'loss': 0.0044, 'learning_rate': 1.2939e-07, 'epoch': 4.87, 'throughput': 628.03}

[INFO|callbacks.py:310] 2024-07-10 11:50:25,097 >> {'loss': 0.0122, 'learning_rate': 1.1552e-07, 'epoch': 4.88, 'throughput': 628.01}

[INFO|callbacks.py:310] 2024-07-10 11:50:36,211 >> {'loss': 0.0070, 'learning_rate': 1.0242e-07, 'epoch': 4.88, 'throughput': 627.95}

[INFO|callbacks.py:310] 2024-07-10 11:50:47,321 >> {'loss': 0.0019, 'learning_rate': 9.0093e-08, 'epoch': 4.89, 'throughput': 627.93}

[INFO|callbacks.py:310] 2024-07-10 11:50:58,426 >> {'loss': 0.0147, 'learning_rate': 7.8542e-08, 'epoch': 4.90, 'throughput': 627.93}

[INFO|callbacks.py:310] 2024-07-10 11:51:09,491 >> {'loss': 0.0087, 'learning_rate': 6.7772e-08, 'epoch': 4.90, 'throughput': 627.92}

[INFO|callbacks.py:310] 2024-07-10 11:51:20,554 >> {'loss': 0.0085, 'learning_rate': 5.7785e-08, 'epoch': 4.91, 'throughput': 627.91}

[INFO|callbacks.py:310] 2024-07-10 11:51:31,632 >> {'loss': 0.0026, 'learning_rate': 4.8586e-08, 'epoch': 4.92, 'throughput': 627.94}

[INFO|callbacks.py:310] 2024-07-10 11:51:42,696 >> {'loss': 0.0018, 'learning_rate': 4.0176e-08, 'epoch': 4.92, 'throughput': 627.94}

[INFO|callbacks.py:310] 2024-07-10 11:51:53,786 >> {'loss': 0.0072, 'learning_rate': 3.2559e-08, 'epoch': 4.93, 'throughput': 627.89}

[INFO|callbacks.py:310] 2024-07-10 11:52:04,882 >> {'loss': 0.0079, 'learning_rate': 2.5738e-08, 'epoch': 4.94, 'throughput': 627.83}

[INFO|callbacks.py:310] 2024-07-10 11:52:15,977 >> {'loss': 0.0275, 'learning_rate': 1.9713e-08, 'epoch': 4.94, 'throughput': 627.87}

[INFO|callbacks.py:310] 2024-07-10 11:52:27,074 >> {'loss': 0.0036, 'learning_rate': 1.4488e-08, 'epoch': 4.95, 'throughput': 627.90}

[INFO|callbacks.py:310] 2024-07-10 11:52:38,165 >> {'loss': 0.0501, 'learning_rate': 1.0064e-08, 'epoch': 4.96, 'throughput': 627.95}

[INFO|callbacks.py:310] 2024-07-10 11:52:49,203 >> {'loss': 0.0082, 'learning_rate': 6.4427e-09, 'epoch': 4.96, 'throughput': 627.95}

[INFO|callbacks.py:310] 2024-07-10 11:53:00,266 >> {'loss': 0.0032, 'learning_rate': 3.6247e-09, 'epoch': 4.97, 'throughput': 627.93}

[INFO|callbacks.py:310] 2024-07-10 11:53:11,342 >> {'loss': 0.0017, 'learning_rate': 1.6112e-09, 'epoch': 4.98, 'throughput': 627.96}

[INFO|callbacks.py:310] 2024-07-10 11:53:22,400 >> {'loss': 0.0042, 'learning_rate': 4.0283e-10, 'epoch': 4.98, 'throughput': 627.99}

[INFO|callbacks.py:310] 2024-07-10 11:53:33,482 >> {'loss': 0.0176, 'learning_rate': 0.0000e+00, 'epoch': 4.99, 'throughput': 627.97}

[INFO|trainer.py:3478] 2024-07-10 11:53:40,538 >> Saving model checkpoint to saves/LLaMA2-7B/full/train_2024-07-10-09-28-39/checkpoint-775

[INFO|configuration_utils.py:472] 2024-07-10 11:53:40,541 >> Configuration saved in saves/LLaMA2-7B/full/train_2024-07-10-09-28-39/checkpoint-775/config.json

[INFO|configuration_utils.py:769] 2024-07-10 11:53:40,542 >> Configuration saved in saves/LLaMA2-7B/full/train_2024-07-10-09-28-39/checkpoint-775/generation_config.json

[INFO|modeling_utils.py:2698] 2024-07-10 11:53:53,881 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 3 checkpoint shards. You can find where each parameters has been saved in the index located at saves/LLaMA2-7B/full/train_2024-07-10-09-28-39/checkpoint-775/model.safetensors.index.json.

[INFO|tokenization_utils_base.py:2574] 2024-07-10 11:53:53,882 >> tokenizer config file saved in saves/LLaMA2-7B/full/train_2024-07-10-09-28-39/checkpoint-775/tokenizer_config.json

[INFO|tokenization_utils_base.py:2583] 2024-07-10 11:53:53,882 >> Special tokens file saved in saves/LLaMA2-7B/full/train_2024-07-10-09-28-39/checkpoint-775/special_tokens_map.json

[INFO|trainer.py:2383] 2024-07-10 11:54:25,382 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


[INFO|trainer.py:3478] 2024-07-10 11:54:32,799 >> Saving model checkpoint to saves/LLaMA2-7B/full/train_2024-07-10-09-28-39

[INFO|configuration_utils.py:472] 2024-07-10 11:54:32,802 >> Configuration saved in saves/LLaMA2-7B/full/train_2024-07-10-09-28-39/config.json

[INFO|configuration_utils.py:769] 2024-07-10 11:54:32,802 >> Configuration saved in saves/LLaMA2-7B/full/train_2024-07-10-09-28-39/generation_config.json

[INFO|modeling_utils.py:2698] 2024-07-10 11:54:46,750 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 3 checkpoint shards. You can find where each parameters has been saved in the index located at saves/LLaMA2-7B/full/train_2024-07-10-09-28-39/model.safetensors.index.json.

[INFO|tokenization_utils_base.py:2574] 2024-07-10 11:54:46,752 >> tokenizer config file saved in saves/LLaMA2-7B/full/train_2024-07-10-09-28-39/tokenizer_config.json

[INFO|tokenization_utils_base.py:2583] 2024-07-10 11:54:46,752 >> Special tokens file saved in saves/LLaMA2-7B/full/train_2024-07-10-09-28-39/special_tokens_map.json

[WARNING|ploting.py:89] 2024-07-10 11:54:47,873 >> No metric eval_loss to plot.

[WARNING|ploting.py:89] 2024-07-10 11:54:47,874 >> No metric eval_accuracy to plot.

[INFO|modelcard.py:449] 2024-07-10 11:54:47,875 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}