[2023-04-18 19:40:04,917] [WARNING] [runner.py:190:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only. [2023-04-18 19:40:04,965] [INFO] [runner.py:540:main] cmd = /home/ubuntu/miniconda3/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgM119 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None main.py --sft_only_data_path Dahoas/rm-static Dahoas/full-hh-rlhf --model_name_or_path EleutherAI/gpt-neo-1.3B --per_device_train_batch_size 4 --per_device_eval_batch_size 4 --max_seq_len 512 --learning_rate 1e-3 --weight_decay 0.1 --num_train_epochs 1 --gradient_accumulation_steps 2 --lr_scheduler_type cosine --num_warmup_steps 0 --seed 1234 --zero_stage 3 --lora_dim 16 --lora_module_name h. --only_optimize_lora --deepspeed --output_dir ./gpt_neo_1.3b_sft_lora_dim_32_zero_stage3 [2023-04-18 19:40:06,687] [INFO] [launch.py:229:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3]} [2023-04-18 19:40:06,688] [INFO] [launch.py:235:main] nnodes=1, num_local_procs=4, node_rank=0 [2023-04-18 19:40:06,688] [INFO] [launch.py:246:main] global_rank_mapping=defaultdict(, {'localhost': [0, 1, 2, 3]}) [2023-04-18 19:40:06,688] [INFO] [launch.py:247:main] dist_world_size=4 [2023-04-18 19:40:06,688] [INFO] [launch.py:249:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3 [2023-04-18 19:40:09,955] [INFO] [comm.py:586:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [2023-04-18 19:40:16,152] [INFO] [partition_parameters.py:436:__exit__] finished initializing model with 1.42B parameters Found cached dataset parquet (/home/ubuntu/.cache/huggingface/datasets/Dahoas___parquet/default-b9d2c4937d617106/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec) 0%| | 0/2 [00:00 [2023-04-18 19:45:41,698] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.float16 ZeRO stage 3 optimizer Using /home/ubuntu/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/ubuntu/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... Using /home/ubuntu/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) [2023-04-18 19:45:42,744] [INFO] [utils.py:785:see_memory_usage] Stage 3 initialize beginning [2023-04-18 19:45:42,745] [INFO] [utils.py:786:see_memory_usage] MA 0.97 GB Max_MA 1.33 GB CA 3.57 GB Max_CA 4 GB [2023-04-18 19:45:42,745] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 88.33 GB, percent = 47.3% [2023-04-18 19:45:42,747] [INFO] [stage3.py:113:__init__] Reduce bucket size 500,000,000 [2023-04-18 19:45:42,747] [INFO] [stage3.py:114:__init__] Prefetch bucket size 30000000 Using /home/ubuntu/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Emitting ninja build file /home/ubuntu/.cache/torch_extensions/py39_cu117/utils/build.ninja... Building extension module utils... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) ninja: no work to do. Loading extension module utils... Time to load utils op: 1.7378928661346436 seconds Loading extension module utils... Time to load utils op: 0.7027928829193115 seconds Loading extension module utils... Time to load utils op: 1.8046653270721436 seconds Loading extension module utils... Time to load utils op: 1.8046658039093018 seconds [2023-04-18 19:45:44,299] [INFO] [utils.py:785:see_memory_usage] DeepSpeedZeRoOffload initialize [begin] [2023-04-18 19:45:44,300] [INFO] [utils.py:786:see_memory_usage] MA 0.97 GB Max_MA 0.97 GB CA 3.57 GB Max_CA 4 GB [2023-04-18 19:45:44,300] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 88.3 GB, percent = 47.3% Parameter Offload: Total persistent parameters: 495616 in 170 params [2023-04-18 19:45:45,209] [INFO] [utils.py:785:see_memory_usage] DeepSpeedZeRoOffload initialize [end] [2023-04-18 19:45:45,210] [INFO] [utils.py:786:see_memory_usage] MA 0.81 GB Max_MA 1.02 GB CA 3.57 GB Max_CA 4 GB [2023-04-18 19:45:45,210] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 88.3 GB, percent = 47.3% [2023-04-18 19:45:46,056] [INFO] [utils.py:785:see_memory_usage] Before creating fp16 partitions [2023-04-18 19:45:46,057] [INFO] [utils.py:786:see_memory_usage] MA 0.81 GB Max_MA 0.81 GB CA 3.57 GB Max_CA 4 GB [2023-04-18 19:45:46,057] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 88.3 GB, percent = 47.3% [2023-04-18 19:45:47,906] [INFO] [utils.py:785:see_memory_usage] After creating fp16 partitions: 1 [2023-04-18 19:45:47,907] [INFO] [utils.py:786:see_memory_usage] MA 0.81 GB Max_MA 0.81 GB CA 1.77 GB Max_CA 4 GB [2023-04-18 19:45:47,907] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 88.33 GB, percent = 47.3% [2023-04-18 19:45:48,752] [INFO] [utils.py:785:see_memory_usage] Before creating fp32 partitions [2023-04-18 19:45:48,753] [INFO] [utils.py:786:see_memory_usage] MA 0.81 GB Max_MA 0.81 GB CA 1.77 GB Max_CA 2 GB [2023-04-18 19:45:48,753] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 88.33 GB, percent = 47.3% [2023-04-18 19:45:49,605] [INFO] [utils.py:785:see_memory_usage] After creating fp32 partitions [2023-04-18 19:45:49,605] [INFO] [utils.py:786:see_memory_usage] MA 0.82 GB Max_MA 0.83 GB CA 1.77 GB Max_CA 2 GB [2023-04-18 19:45:49,606] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 88.33 GB, percent = 47.3% [2023-04-18 19:45:50,455] [INFO] [utils.py:785:see_memory_usage] Before initializing optimizer states [2023-04-18 19:45:50,455] [INFO] [utils.py:786:see_memory_usage] MA 0.82 GB Max_MA 0.82 GB CA 1.77 GB Max_CA 2 GB [2023-04-18 19:45:50,455] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 88.33 GB, percent = 47.3% [2023-04-18 19:45:51,303] [INFO] [utils.py:785:see_memory_usage] After initializing optimizer states [2023-04-18 19:45:51,304] [INFO] [utils.py:786:see_memory_usage] MA 0.85 GB Max_MA 0.86 GB CA 1.77 GB Max_CA 2 GB [2023-04-18 19:45:51,304] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 88.33 GB, percent = 47.3% [2023-04-18 19:45:51,305] [INFO] [stage3.py:366:_setup_for_real_optimizer] optimizer state initialized Using /home/ubuntu/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0003459453582763672 seconds Using /home/ubuntu/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0003991127014160156 seconds Using /home/ubuntu/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0003223419189453125 seconds [2023-04-18 19:45:52,309] [INFO] [utils.py:785:see_memory_usage] After initializing ZeRO optimizer [2023-04-18 19:45:52,310] [INFO] [utils.py:786:see_memory_usage] MA 1.79 GB Max_MA 1.79 GB CA 2.7 GB Max_CA 3 GB [2023-04-18 19:45:52,310] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 88.33 GB, percent = 47.3% [2023-04-18 19:45:52,311] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2023-04-18 19:45:52,311] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler [2023-04-18 19:45:52,311] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2023-04-18 19:45:52,311] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.001], mom=[(0.9, 0.95)] [2023-04-18 19:45:52,312] [INFO] [config.py:953:print] DeepSpeedEngine configuration: [2023-04-18 19:45:52,312] [INFO] [config.py:957:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2023-04-18 19:45:52,312] [INFO] [config.py:957:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2023-04-18 19:45:52,312] [INFO] [config.py:957:print] amp_enabled .................. False [2023-04-18 19:45:52,312] [INFO] [config.py:957:print] amp_params ................... False [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] bfloat16_enabled ............. False [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] checkpoint_parallel_write_pipeline False [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] checkpoint_tag_validation_enabled True [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] checkpoint_tag_validation_fail False [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] comms_config ................. [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] communication_data_type ...... None [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] curriculum_enabled_legacy .... False [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] curriculum_params_legacy ..... False [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] data_efficiency_enabled ...... False [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] dataloader_drop_last ......... False [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] disable_allgather ............ False [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] dump_state ................... False [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 100, 'delayed_shift': 2, 'min_scale': 1} [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] eigenvalue_enabled ........... False [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] eigenvalue_gas_boundary_resolution 1 [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] eigenvalue_layer_name ........ bert.encoder.layer [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] eigenvalue_layer_num ......... 0 [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] eigenvalue_max_iter .......... 100 [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] eigenvalue_stability ......... 1e-06 [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] eigenvalue_tol ............... 0.01 [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] eigenvalue_verbose ........... False [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] elasticity_enabled ........... False [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] fp16_auto_cast ............... False [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] fp16_enabled ................. True [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] fp16_master_weights_and_gradients False [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] global_rank .................. 0 [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] grad_accum_dtype ............. None [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] gradient_accumulation_steps .. 2 [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] gradient_clipping ............ 1.0 [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] gradient_predivide_factor .... 1.0 [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] initial_dynamic_scale ........ 65536 [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] load_universal_checkpoint .... False [2023-04-18 19:45:52,313] [INFO] [config.py:957:print] loss_scale ................... 0 [2023-04-18 19:45:52,314] [INFO] [config.py:957:print] memory_breakdown ............. False [2023-04-18 19:45:52,314] [INFO] [config.py:957:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False [2023-04-18 19:45:52,314] [INFO] [config.py:957:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2023-04-18 19:45:52,314] [INFO] [config.py:957:print] optimizer_legacy_fusion ...... False [2023-04-18 19:45:52,314] [INFO] [config.py:957:print] optimizer_name ............... None [2023-04-18 19:45:52,314] [INFO] [config.py:957:print] optimizer_params ............. None [2023-04-18 19:45:52,314] [INFO] [config.py:957:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2023-04-18 19:45:52,314] [INFO] [config.py:957:print] pld_enabled .................. False [2023-04-18 19:45:52,314] [INFO] [config.py:957:print] pld_params ................... False [2023-04-18 19:45:52,314] [INFO] [config.py:957:print] prescale_gradients ........... False [2023-04-18 19:45:52,314] [INFO] [config.py:957:print] scheduler_name ............... None [2023-04-18 19:45:52,314] [INFO] [config.py:957:print] scheduler_params ............. None [2023-04-18 19:45:52,314] [INFO] [config.py:957:print] sparse_attention ............. None [2023-04-18 19:45:52,314] [INFO] [config.py:957:print] sparse_gradients_enabled ..... False [2023-04-18 19:45:52,314] [INFO] [config.py:957:print] steps_per_print .............. 10 [2023-04-18 19:45:52,314] [INFO] [config.py:957:print] train_batch_size ............. 32 [2023-04-18 19:45:52,314] [INFO] [config.py:957:print] train_micro_batch_size_per_gpu 4 [2023-04-18 19:45:52,314] [INFO] [config.py:957:print] use_node_local_storage ....... False [2023-04-18 19:45:52,314] [INFO] [config.py:957:print] wall_clock_breakdown ......... False [2023-04-18 19:45:52,314] [INFO] [config.py:957:print] world_size ................... 4 [2023-04-18 19:45:52,314] [INFO] [config.py:957:print] zero_allow_untested_optimizer False [2023-04-18 19:45:52,314] [INFO] [config.py:957:print] zero_config .................. stage=3 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='none', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=30000000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=30000000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False memory_efficient_linear=False [2023-04-18 19:45:52,314] [INFO] [config.py:957:print] zero_enabled ................. True [2023-04-18 19:45:52,314] [INFO] [config.py:957:print] zero_force_ds_cpu_optimizer .. True [2023-04-18 19:45:52,314] [INFO] [config.py:957:print] zero_optimization_stage ...... 3 [2023-04-18 19:45:52,314] [INFO] [config.py:943:print_user_config] json = { "train_batch_size": 32, "train_micro_batch_size_per_gpu": 4, "steps_per_print": 10, "zero_optimization": { "stage": 3, "offload_param": { "device": "none" }, "offload_optimizer": { "device": "none" }, "stage3_param_persistence_threshold": 1.000000e+04, "stage3_max_live_parameters": 3.000000e+07, "stage3_prefetch_bucket_size": 3.000000e+07, "memory_efficient_linear": false }, "fp16": { "enabled": true, "loss_scale_window": 100 }, "gradient_clipping": 1.0, "prescale_gradients": false, "wall_clock_breakdown": false, "hybrid_engine": { "enabled": false, "inference_tp_size": 1, "release_inference_cache": false, "pin_parameters": true, "tp_gather_partition_size": 8 } } Using /home/ubuntu/.cache/torch_extensions/py39_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0003299713134765625 seconds ***** Running training ***** ***** Evaluating perplexity, Epoch 0/1 ***** ppl: 8769.4208984375 Beginning of Epoch 1/1, Total Micro Batches 14629 Invalidate trace cache @ step 0: expected module 16, but got module 0 [2023-04-18 20:02:28,764] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 20:02:33,001] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 [2023-04-18 20:02:51,368] [INFO] [logging.py:96:log_dist] [Rank 0] step=10, skipped=2, lr=[0.0009999970488543783], mom=[(0.9, 0.95)] [2023-04-18 20:02:51,368] [INFO] [timer.py:199:stop] epoch=0/micro_step=20/global_step=10, RunningAvgSamplesPerSec=11.335973990501, CurrSamplesPerSec=12.168557409509441, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:03:17,633] [INFO] [logging.py:96:log_dist] [Rank 0] step=20, skipped=2, lr=[0.0009999850598849966], mom=[(0.9, 0.95)] [2023-04-18 20:03:17,634] [INFO] [timer.py:199:stop] epoch=0/micro_step=40/global_step=20, RunningAvgSamplesPerSec=11.798197133114973, CurrSamplesPerSec=12.210991898321893, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:03:43,887] [INFO] [logging.py:96:log_dist] [Rank 0] step=30, skipped=2, lr=[0.0009999638488662156], mom=[(0.9, 0.95)] [2023-04-18 20:03:43,887] [INFO] [timer.py:199:stop] epoch=0/micro_step=60/global_step=30, RunningAvgSamplesPerSec=11.939112684202591, CurrSamplesPerSec=12.228458661515965, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:04:10,058] [INFO] [logging.py:96:log_dist] [Rank 0] step=40, skipped=2, lr=[0.0009999334161892649], mom=[(0.9, 0.95)] [2023-04-18 20:04:10,058] [INFO] [timer.py:199:stop] epoch=0/micro_step=80/global_step=40, RunningAvgSamplesPerSec=12.016793269631771, CurrSamplesPerSec=12.21201960762013, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:04:37,823] [INFO] [logging.py:96:log_dist] [Rank 0] step=50, skipped=2, lr=[0.000999893762415465], mom=[(0.9, 0.95)] [2023-04-18 20:04:37,824] [INFO] [timer.py:199:stop] epoch=0/micro_step=100/global_step=50, RunningAvgSamplesPerSec=11.9134352055899, CurrSamplesPerSec=12.221342438214569, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:05:03,963] [INFO] [logging.py:96:log_dist] [Rank 0] step=60, skipped=2, lr=[0.0009998448882762156], mom=[(0.9, 0.95)] [2023-04-18 20:05:03,963] [INFO] [timer.py:199:stop] epoch=0/micro_step=120/global_step=60, RunningAvgSamplesPerSec=11.97090345035429, CurrSamplesPerSec=12.246154239191824, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:05:30,061] [INFO] [logging.py:96:log_dist] [Rank 0] step=70, skipped=2, lr=[0.0009997867946729831], mom=[(0.9, 0.95)] [2023-04-18 20:05:30,061] [INFO] [timer.py:199:stop] epoch=0/micro_step=140/global_step=70, RunningAvgSamplesPerSec=12.014524512473129, CurrSamplesPerSec=12.29348081852044, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:05:57,088] [INFO] [logging.py:96:log_dist] [Rank 0] step=80, skipped=2, lr=[0.0009997194826772836], mom=[(0.9, 0.95)] [2023-04-18 20:05:57,088] [INFO] [timer.py:199:stop] epoch=0/micro_step=160/global_step=80, RunningAvgSamplesPerSec=11.993421014065062, CurrSamplesPerSec=9.151259683150508, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:06:24,077] [INFO] [logging.py:96:log_dist] [Rank 0] step=90, skipped=2, lr=[0.0009996429535306637], mom=[(0.9, 0.95)] [2023-04-18 20:06:24,077] [INFO] [timer.py:199:stop] epoch=0/micro_step=180/global_step=90, RunningAvgSamplesPerSec=11.979095227598512, CurrSamplesPerSec=12.27225856629029, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:06:50,181] [INFO] [logging.py:96:log_dist] [Rank 0] step=100, skipped=2, lr=[0.0009995572086446763], mom=[(0.9, 0.95)] [2023-04-18 20:06:50,181] [INFO] [timer.py:199:stop] epoch=0/micro_step=200/global_step=100, RunningAvgSamplesPerSec=12.008271174153988, CurrSamplesPerSec=12.241814869286635, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:07:16,323] [INFO] [logging.py:96:log_dist] [Rank 0] step=110, skipped=2, lr=[0.0009994622496008558], mom=[(0.9, 0.95)] [2023-04-18 20:07:16,324] [INFO] [timer.py:199:stop] epoch=0/micro_step=220/global_step=110, RunningAvgSamplesPerSec=12.030517355288328, CurrSamplesPerSec=12.215574037488485, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:07:44,230] [INFO] [logging.py:96:log_dist] [Rank 0] step=120, skipped=2, lr=[0.000999358078150689], mom=[(0.9, 0.95)] [2023-04-18 20:07:44,230] [INFO] [timer.py:199:stop] epoch=0/micro_step=240/global_step=120, RunningAvgSamplesPerSec=11.981630642398017, CurrSamplesPerSec=12.30396383735795, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:08:10,370] [INFO] [logging.py:96:log_dist] [Rank 0] step=130, skipped=2, lr=[0.0009992446962155813], mom=[(0.9, 0.95)] [2023-04-18 20:08:10,371] [INFO] [timer.py:199:stop] epoch=0/micro_step=260/global_step=130, RunningAvgSamplesPerSec=12.002480729053055, CurrSamplesPerSec=12.252090248186164, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:08:36,510] [INFO] [logging.py:96:log_dist] [Rank 0] step=140, skipped=2, lr=[0.0009991221058868233], mom=[(0.9, 0.95)] [2023-04-18 20:08:36,511] [INFO] [timer.py:199:stop] epoch=0/micro_step=280/global_step=140, RunningAvgSamplesPerSec=12.020395152268957, CurrSamplesPerSec=12.205063450217603, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:09:02,663] [INFO] [logging.py:96:log_dist] [Rank 0] step=150, skipped=2, lr=[0.00099899030942555], mom=[(0.9, 0.95)] [2023-04-18 20:09:02,663] [INFO] [timer.py:199:stop] epoch=0/micro_step=300/global_step=150, RunningAvgSamplesPerSec=12.035566593920572, CurrSamplesPerSec=12.221544976219851, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:09:30,553] [INFO] [logging.py:96:log_dist] [Rank 0] step=160, skipped=2, lr=[0.0009988493092627016], mom=[(0.9, 0.95)] [2023-04-18 20:09:30,554] [INFO] [timer.py:199:stop] epoch=0/micro_step=320/global_step=160, RunningAvgSamplesPerSec=11.999129094488865, CurrSamplesPerSec=12.30201170901501, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:09:56,642] [INFO] [logging.py:96:log_dist] [Rank 0] step=170, skipped=2, lr=[0.0009986991079989765], mom=[(0.9, 0.95)] [2023-04-18 20:09:56,642] [INFO] [timer.py:199:stop] epoch=0/micro_step=340/global_step=170, RunningAvgSamplesPerSec=12.015404952691531, CurrSamplesPerSec=12.303538624341728, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:10:22,733] [INFO] [logging.py:96:log_dist] [Rank 0] step=180, skipped=2, lr=[0.0009985397084047844], mom=[(0.9, 0.95)] [2023-04-18 20:10:22,733] [INFO] [timer.py:199:stop] epoch=0/micro_step=360/global_step=180, RunningAvgSamplesPerSec=12.029826033939697, CurrSamplesPerSec=12.282876262340395, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:10:48,837] [INFO] [logging.py:96:log_dist] [Rank 0] step=190, skipped=2, lr=[0.0009983711134201952], mom=[(0.9, 0.95)] [2023-04-18 20:10:48,838] [INFO] [timer.py:199:stop] epoch=0/micro_step=380/global_step=190, RunningAvgSamplesPerSec=12.042421296969295, CurrSamplesPerSec=12.264527547726424, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:11:16,584] [INFO] [logging.py:96:log_dist] [Rank 0] step=200, skipped=2, lr=[0.0009981933261548842], mom=[(0.9, 0.95)] [2023-04-18 20:11:16,585] [INFO] [timer.py:199:stop] epoch=0/micro_step=400/global_step=200, RunningAvgSamplesPerSec=12.016221297063279, CurrSamplesPerSec=12.248700093596181, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:11:26,932] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 20:11:29,486] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 20:11:42,561] [INFO] [logging.py:96:log_dist] [Rank 0] step=210, skipped=4, lr=[0.0009980444800984588], mom=[(0.9, 0.95)] [2023-04-18 20:11:42,561] [INFO] [timer.py:199:stop] epoch=0/micro_step=420/global_step=210, RunningAvgSamplesPerSec=12.031015501593783, CurrSamplesPerSec=12.248146799928127, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:12:08,658] [INFO] [logging.py:96:log_dist] [Rank 0] step=220, skipped=4, lr=[0.0009978501551053908], mom=[(0.9, 0.95)] [2023-04-18 20:12:08,659] [INFO] [timer.py:199:stop] epoch=0/micro_step=440/global_step=220, RunningAvgSamplesPerSec=12.04195836311696, CurrSamplesPerSec=12.233970488305204, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:12:34,759] [INFO] [logging.py:96:log_dist] [Rank 0] step=230, skipped=4, lr=[0.000997646647440495], mom=[(0.9, 0.95)] [2023-04-18 20:12:34,759] [INFO] [timer.py:199:stop] epoch=0/micro_step=460/global_step=230, RunningAvgSamplesPerSec=12.051900473135651, CurrSamplesPerSec=12.26839968896128, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:13:02,592] [INFO] [logging.py:96:log_dist] [Rank 0] step=240, skipped=4, lr=[0.0009974339608573982], mom=[(0.9, 0.95)] [2023-04-18 20:13:02,592] [INFO] [timer.py:199:stop] epoch=0/micro_step=480/global_step=240, RunningAvgSamplesPerSec=12.028027738986369, CurrSamplesPerSec=12.359944776209568, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:13:28,673] [INFO] [logging.py:96:log_dist] [Rank 0] step=250, skipped=4, lr=[0.0009972120992790313], mom=[(0.9, 0.95)] [2023-04-18 20:13:28,673] [INFO] [timer.py:199:stop] epoch=0/micro_step=500/global_step=250, RunningAvgSamplesPerSec=12.038069663293612, CurrSamplesPerSec=12.276255735327421, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:13:54,734] [INFO] [logging.py:96:log_dist] [Rank 0] step=260, skipped=4, lr=[0.0009969810667975524], mom=[(0.9, 0.95)] [2023-04-18 20:13:54,734] [INFO] [timer.py:199:stop] epoch=0/micro_step=520/global_step=260, RunningAvgSamplesPerSec=12.047710954567755, CurrSamplesPerSec=12.3150647299641, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:14:20,836] [INFO] [logging.py:96:log_dist] [Rank 0] step=270, skipped=4, lr=[0.0009967408676742752], mom=[(0.9, 0.95)] [2023-04-18 20:14:20,836] [INFO] [timer.py:199:stop] epoch=0/micro_step=540/global_step=270, RunningAvgSamplesPerSec=12.055941268397993, CurrSamplesPerSec=12.244962141881025, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:14:48,604] [INFO] [logging.py:96:log_dist] [Rank 0] step=280, skipped=4, lr=[0.0009964915063395883], mom=[(0.9, 0.95)] [2023-04-18 20:14:48,604] [INFO] [timer.py:199:stop] epoch=0/micro_step=560/global_step=280, RunningAvgSamplesPerSec=12.036396146501568, CurrSamplesPerSec=12.249838135481378, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:15:14,701] [INFO] [logging.py:96:log_dist] [Rank 0] step=290, skipped=4, lr=[0.0009962329873928742], mom=[(0.9, 0.95)] [2023-04-18 20:15:14,702] [INFO] [timer.py:199:stop] epoch=0/micro_step=580/global_step=290, RunningAvgSamplesPerSec=12.044495094823716, CurrSamplesPerSec=12.32235842658575, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:15:40,737] [INFO] [logging.py:96:log_dist] [Rank 0] step=300, skipped=4, lr=[0.0009959653156024245], mom=[(0.9, 0.95)] [2023-04-18 20:15:40,738] [INFO] [timer.py:199:stop] epoch=0/micro_step=600/global_step=300, RunningAvgSamplesPerSec=12.053005665707325, CurrSamplesPerSec=12.330150088265599, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:15:56,313] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 20:15:58,863] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 20:16:06,708] [INFO] [logging.py:96:log_dist] [Rank 0] step=310, skipped=6, lr=[0.0009957445914346443], mom=[(0.9, 0.95)] [2023-04-18 20:16:06,708] [INFO] [timer.py:199:stop] epoch=0/micro_step=620/global_step=310, RunningAvgSamplesPerSec=12.06194150263518, CurrSamplesPerSec=12.287195302700209, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:16:34,357] [INFO] [logging.py:96:log_dist] [Rank 0] step=320, skipped=6, lr=[0.0009954604570803836], mom=[(0.9, 0.95)] [2023-04-18 20:16:34,357] [INFO] [timer.py:199:stop] epoch=0/micro_step=640/global_step=320, RunningAvgSamplesPerSec=12.046345160584982, CurrSamplesPerSec=12.268276334668375, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:17:00,381] [INFO] [logging.py:96:log_dist] [Rank 0] step=330, skipped=6, lr=[0.0009951671841314383], mom=[(0.9, 0.95)] [2023-04-18 20:17:00,382] [INFO] [timer.py:199:stop] epoch=0/micro_step=660/global_step=330, RunningAvgSamplesPerSec=12.054178449766308, CurrSamplesPerSec=12.326675848306767, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:17:26,411] [INFO] [logging.py:96:log_dist] [Rank 0] step=340, skipped=6, lr=[0.000994864777997125], mom=[(0.9, 0.95)] [2023-04-18 20:17:26,411] [INFO] [timer.py:199:stop] epoch=0/micro_step=680/global_step=340, RunningAvgSamplesPerSec=12.061494657613391, CurrSamplesPerSec=12.297391547347178, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:17:53,189] [INFO] [logging.py:96:log_dist] [Rank 0] step=350, skipped=6, lr=[0.0009945532442552198], mom=[(0.9, 0.95)] [2023-04-18 20:17:53,189] [INFO] [timer.py:199:stop] epoch=0/micro_step=700/global_step=350, RunningAvgSamplesPerSec=12.058615181406726, CurrSamplesPerSec=9.530630717312462, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:18:20,067] [INFO] [logging.py:96:log_dist] [Rank 0] step=360, skipped=6, lr=[0.000994232588651853], mom=[(0.9, 0.95)] [2023-04-18 20:18:20,068] [INFO] [timer.py:199:stop] epoch=0/micro_step=720/global_step=360, RunningAvgSamplesPerSec=12.054625606904096, CurrSamplesPerSec=12.283160656613864, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:18:46,114] [INFO] [logging.py:96:log_dist] [Rank 0] step=370, skipped=6, lr=[0.000993902817101405], mom=[(0.9, 0.95)] [2023-04-18 20:18:46,115] [INFO] [timer.py:199:stop] epoch=0/micro_step=740/global_step=370, RunningAvgSamplesPerSec=12.061112779409191, CurrSamplesPerSec=12.274136159438076, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:19:12,120] [INFO] [logging.py:96:log_dist] [Rank 0] step=380, skipped=6, lr=[0.0009935639356863967], mom=[(0.9, 0.95)] [2023-04-18 20:19:12,121] [INFO] [timer.py:199:stop] epoch=0/micro_step=760/global_step=380, RunningAvgSamplesPerSec=12.067757257006717, CurrSamplesPerSec=12.2720509776936, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:19:39,032] [INFO] [logging.py:96:log_dist] [Rank 0] step=390, skipped=6, lr=[0.0009932159506573768], mom=[(0.9, 0.95)] [2023-04-18 20:19:39,032] [INFO] [timer.py:199:stop] epoch=0/micro_step=780/global_step=390, RunningAvgSamplesPerSec=12.063444959368105, CurrSamplesPerSec=12.299884357869095, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:20:05,056] [INFO] [logging.py:96:log_dist] [Rank 0] step=400, skipped=6, lr=[0.000992858868432808], mom=[(0.9, 0.95)] [2023-04-18 20:20:05,056] [INFO] [timer.py:199:stop] epoch=0/micro_step=800/global_step=400, RunningAvgSamplesPerSec=12.069494928147808, CurrSamplesPerSec=12.338559552208057, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:20:25,784] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 20:20:28,338] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 20:20:30,959] [INFO] [logging.py:96:log_dist] [Rank 0] step=410, skipped=8, lr=[0.000992566657092687], mom=[(0.9, 0.95)] [2023-04-18 20:20:30,960] [INFO] [timer.py:199:stop] epoch=0/micro_step=820/global_step=410, RunningAvgSamplesPerSec=12.076600427191641, CurrSamplesPerSec=12.220857264581303, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:20:56,971] [INFO] [logging.py:96:log_dist] [Rank 0] step=420, skipped=8, lr=[0.0009921932166261764], mom=[(0.9, 0.95)] [2023-04-18 20:20:56,971] [INFO] [timer.py:199:stop] epoch=0/micro_step=840/global_step=420, RunningAvgSamplesPerSec=12.082201106501527, CurrSamplesPerSec=12.318397887315864, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:21:24,801] [INFO] [logging.py:96:log_dist] [Rank 0] step=430, skipped=8, lr=[0.000991810697828088], mom=[(0.9, 0.95)] [2023-04-18 20:21:24,801] [INFO] [timer.py:199:stop] epoch=0/micro_step=860/global_step=430, RunningAvgSamplesPerSec=12.068168483681742, CurrSamplesPerSec=12.327085679563455, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:21:50,792] [INFO] [logging.py:96:log_dist] [Rank 0] step=440, skipped=8, lr=[0.0009914191077538468], mom=[(0.9, 0.95)] [2023-04-18 20:21:50,792] [INFO] [timer.py:199:stop] epoch=0/micro_step=880/global_step=440, RunningAvgSamplesPerSec=12.073906282954773, CurrSamplesPerSec=12.286187517381011, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:22:16,782] [INFO] [logging.py:96:log_dist] [Rank 0] step=450, skipped=8, lr=[0.0009910184536261945], mom=[(0.9, 0.95)] [2023-04-18 20:22:16,783] [INFO] [timer.py:199:stop] epoch=0/micro_step=900/global_step=450, RunningAvgSamplesPerSec=12.079397149335867, CurrSamplesPerSec=12.342092701517592, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:22:42,776] [INFO] [logging.py:96:log_dist] [Rank 0] step=460, skipped=8, lr=[0.0009906087428350565], mom=[(0.9, 0.95)] [2023-04-18 20:22:42,776] [INFO] [timer.py:199:stop] epoch=0/micro_step=920/global_step=460, RunningAvgSamplesPerSec=12.084620684970092, CurrSamplesPerSec=12.363165617179915, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:23:10,549] [INFO] [logging.py:96:log_dist] [Rank 0] step=470, skipped=8, lr=[0.0009901899829374047], mom=[(0.9, 0.95)] [2023-04-18 20:23:10,549] [INFO] [timer.py:199:stop] epoch=0/micro_step=940/global_step=470, RunningAvgSamplesPerSec=12.072281583920605, CurrSamplesPerSec=12.291929381111668, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:23:36,525] [INFO] [logging.py:96:log_dist] [Rank 0] step=480, skipped=8, lr=[0.000989762181657119], mom=[(0.9, 0.95)] [2023-04-18 20:23:36,525] [INFO] [timer.py:199:stop] epoch=0/micro_step=960/global_step=480, RunningAvgSamplesPerSec=12.077599244699185, CurrSamplesPerSec=12.321703438696606, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:24:02,564] [INFO] [logging.py:96:log_dist] [Rank 0] step=490, skipped=8, lr=[0.0009893253468848443], mom=[(0.9, 0.95)] [2023-04-18 20:24:02,565] [INFO] [timer.py:199:stop] epoch=0/micro_step=980/global_step=490, RunningAvgSamplesPerSec=12.082109626029117, CurrSamplesPerSec=12.275131864786156, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:24:28,593] [INFO] [logging.py:96:log_dist] [Rank 0] step=500, skipped=8, lr=[0.0009888794866778447], mom=[(0.9, 0.95)] [2023-04-18 20:24:28,593] [INFO] [timer.py:199:stop] epoch=0/micro_step=1000/global_step=500, RunningAvgSamplesPerSec=12.08654395616738, CurrSamplesPerSec=12.36849068592284, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:24:56,154] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 20:24:56,155] [INFO] [logging.py:96:log_dist] [Rank 0] step=510, skipped=9, lr=[0.0009884705025383356], mom=[(0.9, 0.95)] [2023-04-18 20:24:56,155] [INFO] [timer.py:199:stop] epoch=0/micro_step=1020/global_step=510, RunningAvgSamplesPerSec=12.077031653904772, CurrSamplesPerSec=12.651527559720142, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:24:58,706] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 20:25:22,166] [INFO] [logging.py:96:log_dist] [Rank 0] step=520, skipped=10, lr=[0.0009880542205660853], mom=[(0.9, 0.95)] [2023-04-18 20:25:22,167] [INFO] [timer.py:199:stop] epoch=0/micro_step=1040/global_step=520, RunningAvgSamplesPerSec=12.081534654358283, CurrSamplesPerSec=12.297560557715082, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:25:48,177] [INFO] [logging.py:96:log_dist] [Rank 0] step=530, skipped=10, lr=[0.0009875831334229218], mom=[(0.9, 0.95)] [2023-04-18 20:25:48,178] [INFO] [timer.py:199:stop] epoch=0/micro_step=1060/global_step=530, RunningAvgSamplesPerSec=12.085876179033589, CurrSamplesPerSec=12.339433007153811, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:26:14,221] [INFO] [logging.py:96:log_dist] [Rank 0] step=540, skipped=10, lr=[0.000987103052979551], mom=[(0.9, 0.95)] [2023-04-18 20:26:14,222] [INFO] [timer.py:199:stop] epoch=0/micro_step=1080/global_step=540, RunningAvgSamplesPerSec=12.089779401633521, CurrSamplesPerSec=12.314363064065773, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:26:42,001] [INFO] [logging.py:96:log_dist] [Rank 0] step=550, skipped=10, lr=[0.0009866139880908885], mom=[(0.9, 0.95)] [2023-04-18 20:26:42,002] [INFO] [timer.py:199:stop] epoch=0/micro_step=1100/global_step=550, RunningAvgSamplesPerSec=12.079082366636198, CurrSamplesPerSec=12.292214194771242, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:27:07,982] [INFO] [logging.py:96:log_dist] [Rank 0] step=560, skipped=10, lr=[0.0009861159477775653], mom=[(0.9, 0.95)] [2023-04-18 20:27:07,983] [INFO] [timer.py:199:stop] epoch=0/micro_step=1120/global_step=560, RunningAvgSamplesPerSec=12.083480319943893, CurrSamplesPerSec=12.396626346315621, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:27:34,016] [INFO] [logging.py:96:log_dist] [Rank 0] step=570, skipped=10, lr=[0.0009856089412257605], mom=[(0.9, 0.95)] [2023-04-18 20:27:34,017] [INFO] [timer.py:199:stop] epoch=0/micro_step=1140/global_step=570, RunningAvgSamplesPerSec=12.087294999231815, CurrSamplesPerSec=12.31723576801541, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:28:00,063] [INFO] [logging.py:96:log_dist] [Rank 0] step=580, skipped=10, lr=[0.0009850929777870322], mom=[(0.9, 0.95)] [2023-04-18 20:28:00,064] [INFO] [timer.py:199:stop] epoch=0/micro_step=1160/global_step=580, RunningAvgSamplesPerSec=12.09088260135357, CurrSamplesPerSec=12.278715270783097, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:28:27,844] [INFO] [logging.py:96:log_dist] [Rank 0] step=590, skipped=10, lr=[0.0009845680669781458], mom=[(0.9, 0.95)] [2023-04-18 20:28:27,844] [INFO] [timer.py:199:stop] epoch=0/micro_step=1180/global_step=590, RunningAvgSamplesPerSec=12.080887660309184, CurrSamplesPerSec=12.33638325900005, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:28:53,876] [INFO] [logging.py:96:log_dist] [Rank 0] step=600, skipped=10, lr=[0.0009840342184808972], mom=[(0.9, 0.95)] [2023-04-18 20:28:53,877] [INFO] [timer.py:199:stop] epoch=0/micro_step=1200/global_step=600, RunningAvgSamplesPerSec=12.084565595269366, CurrSamplesPerSec=12.256546592422382, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:29:19,891] [INFO] [logging.py:96:log_dist] [Rank 0] step=610, skipped=10, lr=[0.000983491442141935], mom=[(0.9, 0.95)] [2023-04-18 20:29:19,892] [INFO] [timer.py:199:stop] epoch=0/micro_step=1220/global_step=610, RunningAvgSamplesPerSec=12.088258020173946, CurrSamplesPerSec=12.272264176890374, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:29:25,012] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 20:29:27,562] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 20:29:45,784] [INFO] [logging.py:96:log_dist] [Rank 0] step=620, skipped=12, lr=[0.00098305079974681], mom=[(0.9, 0.95)] [2023-04-18 20:29:45,785] [INFO] [timer.py:199:stop] epoch=0/micro_step=1240/global_step=620, RunningAvgSamplesPerSec=12.092733405459684, CurrSamplesPerSec=12.329831799029343, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:30:13,586] [INFO] [logging.py:96:log_dist] [Rank 0] step=630, skipped=12, lr=[0.00098249197863183], mom=[(0.9, 0.95)] [2023-04-18 20:30:13,586] [INFO] [timer.py:199:stop] epoch=0/micro_step=1260/global_step=630, RunningAvgSamplesPerSec=12.083190452523965, CurrSamplesPerSec=12.303177723886426, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:30:39,596] [INFO] [logging.py:96:log_dist] [Rank 0] step=640, skipped=12, lr=[0.0009819242581212105], mom=[(0.9, 0.95)] [2023-04-18 20:30:39,597] [INFO] [timer.py:199:stop] epoch=0/micro_step=1280/global_step=640, RunningAvgSamplesPerSec=12.086761545518334, CurrSamplesPerSec=12.328188510029337, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:31:05,633] [INFO] [logging.py:96:log_dist] [Rank 0] step=650, skipped=12, lr=[0.0009813476486863575], mom=[(0.9, 0.95)] [2023-04-18 20:31:05,634] [INFO] [timer.py:199:stop] epoch=0/micro_step=1300/global_step=650, RunningAvgSamplesPerSec=12.090036425081548, CurrSamplesPerSec=12.313324835631487, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:31:33,259] [INFO] [logging.py:96:log_dist] [Rank 0] step=660, skipped=12, lr=[0.0009807621609626304], mom=[(0.9, 0.95)] [2023-04-18 20:31:33,259] [INFO] [timer.py:199:stop] epoch=0/micro_step=1320/global_step=660, RunningAvgSamplesPerSec=12.082203897972962, CurrSamplesPerSec=12.314776596610471, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:31:59,272] [INFO] [logging.py:96:log_dist] [Rank 0] step=670, skipped=12, lr=[0.0009801678057491455], mom=[(0.9, 0.95)] [2023-04-18 20:31:59,272] [INFO] [timer.py:199:stop] epoch=0/micro_step=1340/global_step=670, RunningAvgSamplesPerSec=12.085608396215243, CurrSamplesPerSec=12.247506383009787, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:32:25,365] [INFO] [logging.py:96:log_dist] [Rank 0] step=680, skipped=12, lr=[0.000979564594008576], mom=[(0.9, 0.95)] [2023-04-18 20:32:25,365] [INFO] [timer.py:199:stop] epoch=0/micro_step=1360/global_step=680, RunningAvgSamplesPerSec=12.088378625141418, CurrSamplesPerSec=12.26181043735079, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:32:51,408] [INFO] [logging.py:96:log_dist] [Rank 0] step=690, skipped=12, lr=[0.0009789525368669519], mom=[(0.9, 0.95)] [2023-04-18 20:32:51,409] [INFO] [timer.py:199:stop] epoch=0/micro_step=1380/global_step=690, RunningAvgSamplesPerSec=12.09139915140873, CurrSamplesPerSec=12.357919090812999, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:33:19,207] [INFO] [logging.py:96:log_dist] [Rank 0] step=700, skipped=12, lr=[0.0009783316456134527], mom=[(0.9, 0.95)] [2023-04-18 20:33:19,207] [INFO] [timer.py:199:stop] epoch=0/micro_step=1400/global_step=700, RunningAvgSamplesPerSec=12.082852522313205, CurrSamplesPerSec=12.322180815044074, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:33:45,209] [INFO] [logging.py:96:log_dist] [Rank 0] step=710, skipped=12, lr=[0.0009777019317002004], mom=[(0.9, 0.95)] [2023-04-18 20:33:45,210] [INFO] [timer.py:199:stop] epoch=0/micro_step=1420/global_step=710, RunningAvgSamplesPerSec=12.086126904461215, CurrSamplesPerSec=12.318890836827737, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:33:55,553] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 20:33:58,101] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 20:34:11,128] [INFO] [logging.py:96:log_dist] [Rank 0] step=720, skipped=14, lr=[0.0009771918160542945], mom=[(0.9, 0.95)] [2023-04-18 20:34:11,128] [INFO] [timer.py:199:stop] epoch=0/micro_step=1440/global_step=720, RunningAvgSamplesPerSec=12.08985107984681, CurrSamplesPerSec=12.30357245984496, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:34:37,142] [INFO] [logging.py:96:log_dist] [Rank 0] step=730, skipped=14, lr=[0.0009765462507321333], mom=[(0.9, 0.95)] [2023-04-18 20:34:37,142] [INFO] [timer.py:199:stop] epoch=0/micro_step=1460/global_step=730, RunningAvgSamplesPerSec=12.092870079596592, CurrSamplesPerSec=12.273542405294739, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:35:04,957] [INFO] [logging.py:96:log_dist] [Rank 0] step=740, skipped=14, lr=[0.0009758918956812024], mom=[(0.9, 0.95)] [2023-04-18 20:35:04,958] [INFO] [timer.py:199:stop] epoch=0/micro_step=1480/global_step=740, RunningAvgSamplesPerSec=12.084660964807226, CurrSamplesPerSec=12.342031415792531, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:35:31,023] [INFO] [logging.py:96:log_dist] [Rank 0] step=750, skipped=14, lr=[0.0009752287629708516], mom=[(0.9, 0.95)] [2023-04-18 20:35:31,023] [INFO] [timer.py:199:stop] epoch=0/micro_step=1500/global_step=750, RunningAvgSamplesPerSec=12.08734989430793, CurrSamplesPerSec=12.344402709155725, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:35:57,035] [INFO] [logging.py:96:log_dist] [Rank 0] step=760, skipped=14, lr=[0.0009745568648323315], mom=[(0.9, 0.95)] [2023-04-18 20:35:57,036] [INFO] [timer.py:199:stop] epoch=0/micro_step=1520/global_step=760, RunningAvgSamplesPerSec=12.090290682213423, CurrSamplesPerSec=12.379345272627829, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:36:23,004] [INFO] [logging.py:96:log_dist] [Rank 0] step=770, skipped=14, lr=[0.0009738762136585684], mom=[(0.9, 0.95)] [2023-04-18 20:36:23,004] [INFO] [timer.py:199:stop] epoch=0/micro_step=1540/global_step=770, RunningAvgSamplesPerSec=12.093419993075683, CurrSamplesPerSec=12.36084516752489, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:36:50,600] [INFO] [logging.py:96:log_dist] [Rank 0] step=780, skipped=14, lr=[0.0009731868220039344], mom=[(0.9, 0.95)] [2023-04-18 20:36:50,600] [INFO] [timer.py:199:stop] epoch=0/micro_step=1560/global_step=780, RunningAvgSamplesPerSec=12.086911031239993, CurrSamplesPerSec=12.307881239681933, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:37:16,642] [INFO] [logging.py:96:log_dist] [Rank 0] step=790, skipped=14, lr=[0.0009724887025840177], mom=[(0.9, 0.95)] [2023-04-18 20:37:16,643] [INFO] [timer.py:199:stop] epoch=0/micro_step=1580/global_step=790, RunningAvgSamplesPerSec=12.089571818248377, CurrSamplesPerSec=12.264008682721563, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:37:42,634] [INFO] [logging.py:96:log_dist] [Rank 0] step=800, skipped=14, lr=[0.0009717818682753865], mom=[(0.9, 0.95)] [2023-04-18 20:37:42,634] [INFO] [timer.py:199:stop] epoch=0/micro_step=1600/global_step=800, RunningAvgSamplesPerSec=12.092457883310368, CurrSamplesPerSec=12.423264126750169, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:38:08,539] [INFO] [logging.py:96:log_dist] [Rank 0] step=810, skipped=14, lr=[0.0009710663321153521], mom=[(0.9, 0.95)] [2023-04-18 20:38:08,539] [INFO] [timer.py:199:stop] epoch=0/micro_step=1620/global_step=810, RunningAvgSamplesPerSec=12.095764250377693, CurrSamplesPerSec=12.325909469061324, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:38:25,529] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 20:38:28,078] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 20:38:35,931] [INFO] [logging.py:96:log_dist] [Rank 0] step=820, skipped=16, lr=[0.0009704876467178586], mom=[(0.9, 0.95)] [2023-04-18 20:38:35,932] [INFO] [timer.py:199:stop] epoch=0/micro_step=1640/global_step=820, RunningAvgSamplesPerSec=12.090678762465215, CurrSamplesPerSec=12.246010102627535, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:39:02,038] [INFO] [logging.py:96:log_dist] [Rank 0] step=830, skipped=16, lr=[0.0009697564805914922], mom=[(0.9, 0.95)] [2023-04-18 20:39:02,039] [INFO] [timer.py:199:stop] epoch=0/micro_step=1660/global_step=830, RunningAvgSamplesPerSec=12.09281225136637, CurrSamplesPerSec=12.367109418889896, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:39:28,006] [INFO] [logging.py:96:log_dist] [Rank 0] step=840, skipped=16, lr=[0.0009690166499712892], mom=[(0.9, 0.95)] [2023-04-18 20:39:28,006] [INFO] [timer.py:199:stop] epoch=0/micro_step=1680/global_step=840, RunningAvgSamplesPerSec=12.095655450312861, CurrSamplesPerSec=12.360471790857497, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:39:53,972] [INFO] [logging.py:96:log_dist] [Rank 0] step=850, skipped=16, lr=[0.0009682681685031668], mom=[(0.9, 0.95)] [2023-04-18 20:39:53,973] [INFO] [timer.py:199:stop] epoch=0/micro_step=1700/global_step=850, RunningAvgSamplesPerSec=12.098436984171046, CurrSamplesPerSec=12.285820886505359, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:40:21,833] [INFO] [logging.py:96:log_dist] [Rank 0] step=860, skipped=16, lr=[0.0009675110499926034], mom=[(0.9, 0.95)] [2023-04-18 20:40:21,833] [INFO] [timer.py:199:stop] epoch=0/micro_step=1720/global_step=860, RunningAvgSamplesPerSec=12.091064402748923, CurrSamplesPerSec=12.234263774171101, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:40:47,943] [INFO] [logging.py:96:log_dist] [Rank 0] step=870, skipped=16, lr=[0.0009667453084043849], mom=[(0.9, 0.95)] [2023-04-18 20:40:47,943] [INFO] [timer.py:199:stop] epoch=0/micro_step=1740/global_step=870, RunningAvgSamplesPerSec=12.093077142929516, CurrSamplesPerSec=12.33473368903281, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:41:13,993] [INFO] [logging.py:96:log_dist] [Rank 0] step=880, skipped=16, lr=[0.000965970957862347], mom=[(0.9, 0.95)] [2023-04-18 20:41:13,994] [INFO] [timer.py:199:stop] epoch=0/micro_step=1760/global_step=880, RunningAvgSamplesPerSec=12.095354602421352, CurrSamplesPerSec=12.321611813818414, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:41:40,063] [INFO] [logging.py:96:log_dist] [Rank 0] step=890, skipped=16, lr=[0.0009651880126491142], mom=[(0.9, 0.95)] [2023-04-18 20:41:40,064] [INFO] [timer.py:199:stop] epoch=0/micro_step=1780/global_step=890, RunningAvgSamplesPerSec=12.097484335741479, CurrSamplesPerSec=12.292291873514554, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:42:07,904] [INFO] [logging.py:96:log_dist] [Rank 0] step=900, skipped=16, lr=[0.000964396487205837], mom=[(0.9, 0.95)] [2023-04-18 20:42:07,905] [INFO] [timer.py:199:stop] epoch=0/micro_step=1800/global_step=900, RunningAvgSamplesPerSec=12.090548985570663, CurrSamplesPerSec=12.249117052963316, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:42:33,919] [INFO] [logging.py:96:log_dist] [Rank 0] step=910, skipped=16, lr=[0.000963596396131925], mom=[(0.9, 0.95)] [2023-04-18 20:42:33,920] [INFO] [timer.py:199:stop] epoch=0/micro_step=1820/global_step=910, RunningAvgSamplesPerSec=12.092959816411017, CurrSamplesPerSec=12.341308517155003, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:42:54,691] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 20:42:57,242] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 20:42:59,870] [INFO] [logging.py:96:log_dist] [Rank 0] step=920, skipped=18, lr=[0.0009629501659304096], mom=[(0.9, 0.95)] [2023-04-18 20:42:59,870] [INFO] [timer.py:199:stop] epoch=0/micro_step=1840/global_step=920, RunningAvgSamplesPerSec=12.095637722398463, CurrSamplesPerSec=12.191627801523989, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:43:27,671] [INFO] [logging.py:96:log_dist] [Rank 0] step=930, skipped=18, lr=[0.0009621346940159981], mom=[(0.9, 0.95)] [2023-04-18 20:43:27,671] [INFO] [timer.py:199:stop] epoch=0/micro_step=1860/global_step=930, RunningAvgSamplesPerSec=12.089145468128093, CurrSamplesPerSec=9.145276145067985, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:43:53,684] [INFO] [logging.py:96:log_dist] [Rank 0] step=940, skipped=18, lr=[0.0009613106981889352], mom=[(0.9, 0.95)] [2023-04-18 20:43:53,684] [INFO] [timer.py:199:stop] epoch=0/micro_step=1880/global_step=940, RunningAvgSamplesPerSec=12.091502342562391, CurrSamplesPerSec=12.265973429123234, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:44:19,742] [INFO] [logging.py:96:log_dist] [Rank 0] step=950, skipped=18, lr=[0.000960478193647536], mom=[(0.9, 0.95)] [2023-04-18 20:44:19,742] [INFO] [timer.py:199:stop] epoch=0/micro_step=1900/global_step=950, RunningAvgSamplesPerSec=12.093590878944898, CurrSamplesPerSec=12.272636732215714, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:44:45,771] [INFO] [logging.py:96:log_dist] [Rank 0] step=960, skipped=18, lr=[0.000959637195747055], mom=[(0.9, 0.95)] [2023-04-18 20:44:45,772] [INFO] [timer.py:199:stop] epoch=0/micro_step=1920/global_step=960, RunningAvgSamplesPerSec=12.095775115785447, CurrSamplesPerSec=12.310032809883632, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:45:13,460] [INFO] [logging.py:96:log_dist] [Rank 0] step=970, skipped=18, lr=[0.0009587877199994042], mom=[(0.9, 0.95)] [2023-04-18 20:45:13,461] [INFO] [timer.py:199:stop] epoch=0/micro_step=1940/global_step=970, RunningAvgSamplesPerSec=12.090081312969048, CurrSamplesPerSec=12.303597272665604, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:45:39,505] [INFO] [logging.py:96:log_dist] [Rank 0] step=980, skipped=18, lr=[0.000957929782072866], mom=[(0.9, 0.95)] [2023-04-18 20:45:39,505] [INFO] [timer.py:199:stop] epoch=0/micro_step=1960/global_step=980, RunningAvgSamplesPerSec=12.092184789283008, CurrSamplesPerSec=12.328542952270315, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:46:05,591] [INFO] [logging.py:96:log_dist] [Rank 0] step=990, skipped=18, lr=[0.0009570633977918058], mom=[(0.9, 0.95)] [2023-04-18 20:46:05,592] [INFO] [timer.py:199:stop] epoch=0/micro_step=1980/global_step=990, RunningAvgSamplesPerSec=12.094051682092955, CurrSamplesPerSec=12.186127499630697, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:46:31,640] [INFO] [logging.py:96:log_dist] [Rank 0] step=1000, skipped=18, lr=[0.0009561885831363777], mom=[(0.9, 0.95)] [2023-04-18 20:46:31,641] [INFO] [timer.py:199:stop] epoch=0/micro_step=2000/global_step=1000, RunningAvgSamplesPerSec=12.096053024319552, CurrSamplesPerSec=12.319157679167981, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:46:59,405] [INFO] [logging.py:96:log_dist] [Rank 0] step=1010, skipped=18, lr=[0.000955305354242232], mom=[(0.9, 0.95)] [2023-04-18 20:46:59,405] [INFO] [timer.py:199:stop] epoch=0/micro_step=2020/global_step=1010, RunningAvgSamplesPerSec=12.090236392887418, CurrSamplesPerSec=12.351537972879715, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:47:25,364] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 20:47:25,364] [INFO] [logging.py:96:log_dist] [Rank 0] step=1020, skipped=19, lr=[0.0009545032675245813], mom=[(0.9, 0.95)] [2023-04-18 20:47:25,365] [INFO] [timer.py:199:stop] epoch=0/micro_step=2040/global_step=1020, RunningAvgSamplesPerSec=12.092638234056357, CurrSamplesPerSec=12.650809686261772, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:47:27,912] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 20:47:51,286] [INFO] [logging.py:96:log_dist] [Rank 0] step=1030, skipped=20, lr=[0.000953694390450482], mom=[(0.9, 0.95)] [2023-04-18 20:47:51,286] [INFO] [timer.py:199:stop] epoch=0/micro_step=2060/global_step=1030, RunningAvgSamplesPerSec=12.095160188065448, CurrSamplesPerSec=12.380176550098687, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:48:17,279] [INFO] [logging.py:96:log_dist] [Rank 0] step=1040, skipped=20, lr=[0.0009527876888494757], mom=[(0.9, 0.95)] [2023-04-18 20:48:17,279] [INFO] [timer.py:199:stop] epoch=0/micro_step=2080/global_step=1040, RunningAvgSamplesPerSec=12.097321691910333, CurrSamplesPerSec=12.344697907564958, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:48:44,919] [INFO] [logging.py:96:log_dist] [Rank 0] step=1050, skipped=20, lr=[0.0009518726357380564], mom=[(0.9, 0.95)] [2023-04-18 20:48:44,919] [INFO] [timer.py:199:stop] epoch=0/micro_step=2100/global_step=1050, RunningAvgSamplesPerSec=12.092260584391378, CurrSamplesPerSec=12.338322492916515, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:49:10,933] [INFO] [logging.py:96:log_dist] [Rank 0] step=1060, skipped=20, lr=[0.0009509492479940584], mom=[(0.9, 0.95)] [2023-04-18 20:49:10,933] [INFO] [timer.py:199:stop] epoch=0/micro_step=2120/global_step=1060, RunningAvgSamplesPerSec=12.094316365223163, CurrSamplesPerSec=12.282069238843803, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:49:36,926] [INFO] [logging.py:96:log_dist] [Rank 0] step=1070, skipped=20, lr=[0.0009500175426490454], mom=[(0.9, 0.95)] [2023-04-18 20:49:36,927] [INFO] [timer.py:199:stop] epoch=0/micro_step=2140/global_step=1070, RunningAvgSamplesPerSec=12.09642251357148, CurrSamplesPerSec=12.3439111215997, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:50:02,925] [INFO] [logging.py:96:log_dist] [Rank 0] step=1080, skipped=20, lr=[0.0009490775368879965], mom=[(0.9, 0.95)] [2023-04-18 20:50:02,925] [INFO] [timer.py:199:stop] epoch=0/micro_step=2160/global_step=1080, RunningAvgSamplesPerSec=12.098467601662309, CurrSamplesPerSec=12.363761241061868, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:50:30,519] [INFO] [logging.py:96:log_dist] [Rank 0] step=1090, skipped=20, lr=[0.0009481292480489885], mom=[(0.9, 0.95)] [2023-04-18 20:50:30,519] [INFO] [timer.py:199:stop] epoch=0/micro_step=2180/global_step=1090, RunningAvgSamplesPerSec=12.093771537207406, CurrSamplesPerSec=12.30088086319149, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:50:56,621] [INFO] [logging.py:96:log_dist] [Rank 0] step=1100, skipped=20, lr=[0.0009471726936228775], mom=[(0.9, 0.95)] [2023-04-18 20:50:56,622] [INFO] [timer.py:199:stop] epoch=0/micro_step=2200/global_step=1100, RunningAvgSamplesPerSec=12.095370487586163, CurrSamplesPerSec=12.280902726595047, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:51:22,585] [INFO] [logging.py:96:log_dist] [Rank 0] step=1110, skipped=20, lr=[0.0009462078912529748], mom=[(0.9, 0.95)] [2023-04-18 20:51:22,585] [INFO] [timer.py:199:stop] epoch=0/micro_step=2220/global_step=1110, RunningAvgSamplesPerSec=12.097516655503412, CurrSamplesPerSec=12.366964700020013, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:51:48,505] [INFO] [logging.py:96:log_dist] [Rank 0] step=1120, skipped=20, lr=[0.0009452348587347224], mom=[(0.9, 0.95)] [2023-04-18 20:51:48,505] [INFO] [timer.py:199:stop] epoch=0/micro_step=2240/global_step=1120, RunningAvgSamplesPerSec=12.099800523313704, CurrSamplesPerSec=12.328944980118298, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:51:54,512] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 20:51:57,829] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 20:52:16,084] [INFO] [logging.py:96:log_dist] [Rank 0] step=1130, skipped=22, lr=[0.0009444505190687644], mom=[(0.9, 0.95)] [2023-04-18 20:52:16,084] [INFO] [timer.py:199:stop] epoch=0/micro_step=2260/global_step=1130, RunningAvgSamplesPerSec=12.095320245032243, CurrSamplesPerSec=12.363728212613298, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:52:42,009] [INFO] [logging.py:96:log_dist] [Rank 0] step=1140, skipped=22, lr=[0.0009434627176123377], mom=[(0.9, 0.95)] [2023-04-18 20:52:42,009] [INFO] [timer.py:199:stop] epoch=0/micro_step=2280/global_step=1140, RunningAvgSamplesPerSec=12.09756242223335, CurrSamplesPerSec=12.39843223964722, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:53:07,940] [INFO] [logging.py:96:log_dist] [Rank 0] step=1150, skipped=22, lr=[0.000942466736641328], mom=[(0.9, 0.95)] [2023-04-18 20:53:07,940] [INFO] [timer.py:199:stop] epoch=0/micro_step=2300/global_step=1150, RunningAvgSamplesPerSec=12.099742969918955, CurrSamplesPerSec=12.336620243778757, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:53:33,940] [INFO] [logging.py:96:log_dist] [Rank 0] step=1160, skipped=22, lr=[0.0009414625945262557], mom=[(0.9, 0.95)] [2023-04-18 20:53:33,940] [INFO] [timer.py:199:stop] epoch=0/micro_step=2320/global_step=1160, RunningAvgSamplesPerSec=12.101614594608836, CurrSamplesPerSec=12.33862874347941, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:54:02,293] [INFO] [logging.py:96:log_dist] [Rank 0] step=1170, skipped=22, lr=[0.0009404503097881707], mom=[(0.9, 0.95)] [2023-04-18 20:54:02,293] [INFO] [timer.py:199:stop] epoch=0/micro_step=2340/global_step=1170, RunningAvgSamplesPerSec=12.094240423302988, CurrSamplesPerSec=12.438102941269142, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:54:28,085] [INFO] [logging.py:96:log_dist] [Rank 0] step=1180, skipped=22, lr=[0.0009394299010983105], mom=[(0.9, 0.95)] [2023-04-18 20:54:28,086] [INFO] [timer.py:199:stop] epoch=0/micro_step=2360/global_step=1180, RunningAvgSamplesPerSec=12.096929931785912, CurrSamplesPerSec=12.349986621081422, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:54:54,085] [INFO] [logging.py:96:log_dist] [Rank 0] step=1190, skipped=22, lr=[0.0009384013872777561], mom=[(0.9, 0.95)] [2023-04-18 20:54:54,086] [INFO] [timer.py:199:stop] epoch=0/micro_step=2380/global_step=1190, RunningAvgSamplesPerSec=12.098774706643809, CurrSamplesPerSec=12.26792086389854, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:55:22,794] [INFO] [logging.py:96:log_dist] [Rank 0] step=1200, skipped=22, lr=[0.0009373647872970852], mom=[(0.9, 0.95)] [2023-04-18 20:55:22,794] [INFO] [timer.py:199:stop] epoch=0/micro_step=2400/global_step=1200, RunningAvgSamplesPerSec=12.090253743234936, CurrSamplesPerSec=7.362322351208578, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:55:48,867] [INFO] [logging.py:96:log_dist] [Rank 0] step=1210, skipped=22, lr=[0.0009363201202760212], mom=[(0.9, 0.95)] [2023-04-18 20:55:48,868] [INFO] [timer.py:199:stop] epoch=0/micro_step=2420/global_step=1210, RunningAvgSamplesPerSec=12.091844602826002, CurrSamplesPerSec=12.232465249850646, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:56:14,915] [INFO] [logging.py:96:log_dist] [Rank 0] step=1220, skipped=22, lr=[0.0009352674054830817], mom=[(0.9, 0.95)] [2023-04-18 20:56:14,916] [INFO] [timer.py:199:stop] epoch=0/micro_step=2440/global_step=1220, RunningAvgSamplesPerSec=12.093506559958724, CurrSamplesPerSec=12.310423469409749, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:56:25,268] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 20:56:27,821] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 20:56:40,884] [INFO] [logging.py:96:log_dist] [Rank 0] step=1230, skipped=24, lr=[0.0009344194522961957], mom=[(0.9, 0.95)] [2023-04-18 20:56:40,885] [INFO] [timer.py:199:stop] epoch=0/micro_step=2460/global_step=1230, RunningAvgSamplesPerSec=12.095434440378217, CurrSamplesPerSec=12.242872342133415, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:57:09,604] [INFO] [logging.py:96:log_dist] [Rank 0] step=1240, skipped=24, lr=[0.0009333523005441313], mom=[(0.9, 0.95)] [2023-04-18 20:57:09,605] [INFO] [timer.py:199:stop] epoch=0/micro_step=2480/global_step=1240, RunningAvgSamplesPerSec=12.08717905216454, CurrSamplesPerSec=12.257030126668168, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:57:35,661] [INFO] [logging.py:96:log_dist] [Rank 0] step=1250, skipped=24, lr=[0.0009322771557605873], mom=[(0.9, 0.95)] [2023-04-18 20:57:35,661] [INFO] [timer.py:199:stop] epoch=0/micro_step=2500/global_step=1250, RunningAvgSamplesPerSec=12.088804138946198, CurrSamplesPerSec=12.306108396443472, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:58:01,751] [INFO] [logging.py:96:log_dist] [Rank 0] step=1260, skipped=24, lr=[0.0009311940377762329], mom=[(0.9, 0.95)] [2023-04-18 20:58:01,751] [INFO] [timer.py:199:stop] epoch=0/micro_step=2520/global_step=1260, RunningAvgSamplesPerSec=12.090284797203196, CurrSamplesPerSec=12.252466053649584, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:58:27,825] [INFO] [logging.py:96:log_dist] [Rank 0] step=1270, skipped=24, lr=[0.0009301029665688003], mom=[(0.9, 0.95)] [2023-04-18 20:58:27,825] [INFO] [timer.py:199:stop] epoch=0/micro_step=2540/global_step=1270, RunningAvgSamplesPerSec=12.091796987679135, CurrSamplesPerSec=12.303809313578459, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:58:56,309] [INFO] [logging.py:96:log_dist] [Rank 0] step=1280, skipped=24, lr=[0.000929003962262716], mom=[(0.9, 0.95)] [2023-04-18 20:58:56,309] [INFO] [timer.py:199:stop] epoch=0/micro_step=2560/global_step=1280, RunningAvgSamplesPerSec=12.084678049640367, CurrSamplesPerSec=12.295967537803044, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:59:22,319] [INFO] [logging.py:96:log_dist] [Rank 0] step=1290, skipped=24, lr=[0.0009278970451287296], mom=[(0.9, 0.95)] [2023-04-18 20:59:22,320] [INFO] [timer.py:199:stop] epoch=0/micro_step=2580/global_step=1290, RunningAvgSamplesPerSec=12.08643534045386, CurrSamplesPerSec=12.27283648504841, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 20:59:48,376] [INFO] [logging.py:96:log_dist] [Rank 0] step=1300, skipped=24, lr=[0.0009267822355835401], mom=[(0.9, 0.95)] [2023-04-18 20:59:48,376] [INFO] [timer.py:199:stop] epoch=0/micro_step=2600/global_step=1300, RunningAvgSamplesPerSec=12.088002816199612, CurrSamplesPerSec=12.272369657126754, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:00:14,456] [INFO] [logging.py:96:log_dist] [Rank 0] step=1310, skipped=24, lr=[0.0009256595541894195], mom=[(0.9, 0.95)] [2023-04-18 21:00:14,456] [INFO] [timer.py:199:stop] epoch=0/micro_step=2620/global_step=1310, RunningAvgSamplesPerSec=12.089466716533163, CurrSamplesPerSec=12.223672030650224, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:00:42,156] [INFO] [logging.py:96:log_dist] [Rank 0] step=1320, skipped=24, lr=[0.0009245290216538327], mom=[(0.9, 0.95)] [2023-04-18 21:00:42,156] [INFO] [timer.py:199:stop] epoch=0/micro_step=2640/global_step=1320, RunningAvgSamplesPerSec=12.08529645059973, CurrSamplesPerSec=12.306297956819218, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:00:57,744] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 21:01:00,296] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 21:01:08,132] [INFO] [logging.py:96:log_dist] [Rank 0] step=1330, skipped=26, lr=[0.0009236189568113822], mom=[(0.9, 0.95)] [2023-04-18 21:01:08,133] [INFO] [timer.py:199:stop] epoch=0/micro_step=2660/global_step=1330, RunningAvgSamplesPerSec=12.087112499814504, CurrSamplesPerSec=12.296471085763246, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:01:34,184] [INFO] [logging.py:96:log_dist] [Rank 0] step=1340, skipped=26, lr=[0.0009224743448659623], mom=[(0.9, 0.95)] [2023-04-18 21:01:34,185] [INFO] [timer.py:199:stop] epoch=0/micro_step=2680/global_step=1340, RunningAvgSamplesPerSec=12.088645248152245, CurrSamplesPerSec=12.324833076584587, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:02:00,199] [INFO] [logging.py:96:log_dist] [Rank 0] step=1350, skipped=26, lr=[0.0009213219405291475], mom=[(0.9, 0.95)] [2023-04-18 21:02:00,200] [INFO] [timer.py:199:stop] epoch=0/micro_step=2700/global_step=1350, RunningAvgSamplesPerSec=12.090280536825933, CurrSamplesPerSec=12.308798895105651, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:02:27,804] [INFO] [logging.py:96:log_dist] [Rank 0] step=1360, skipped=26, lr=[0.0009201617650566323], mom=[(0.9, 0.95)] [2023-04-18 21:02:27,804] [INFO] [timer.py:199:stop] epoch=0/micro_step=2720/global_step=1360, RunningAvgSamplesPerSec=12.086546104079329, CurrSamplesPerSec=12.314590164566154, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:02:53,843] [INFO] [logging.py:96:log_dist] [Rank 0] step=1370, skipped=26, lr=[0.000918993839847447], mom=[(0.9, 0.95)] [2023-04-18 21:02:53,843] [INFO] [timer.py:199:stop] epoch=0/micro_step=2740/global_step=1370, RunningAvgSamplesPerSec=12.088091942395627, CurrSamplesPerSec=12.333664821884787, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:03:19,875] [INFO] [logging.py:96:log_dist] [Rank 0] step=1380, skipped=26, lr=[0.0009178181864435633], mom=[(0.9, 0.95)] [2023-04-18 21:03:19,876] [INFO] [timer.py:199:stop] epoch=0/micro_step=2760/global_step=1380, RunningAvgSamplesPerSec=12.089637621676248, CurrSamplesPerSec=12.338164836517468, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:03:45,943] [INFO] [logging.py:96:log_dist] [Rank 0] step=1390, skipped=26, lr=[0.0009166348265294967], mom=[(0.9, 0.95)] [2023-04-18 21:03:45,943] [INFO] [timer.py:199:stop] epoch=0/micro_step=2780/global_step=1390, RunningAvgSamplesPerSec=12.091046263566415, CurrSamplesPerSec=12.250656583146236, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:04:13,757] [INFO] [logging.py:96:log_dist] [Rank 0] step=1400, skipped=26, lr=[0.0009154437819319065], mom=[(0.9, 0.95)] [2023-04-18 21:04:13,758] [INFO] [timer.py:199:stop] epoch=0/micro_step=2800/global_step=1400, RunningAvgSamplesPerSec=12.08672699524532, CurrSamplesPerSec=12.277629135228455, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:04:39,841] [INFO] [logging.py:96:log_dist] [Rank 0] step=1410, skipped=26, lr=[0.0009142450746191935], mom=[(0.9, 0.95)] [2023-04-18 21:04:39,841] [INFO] [timer.py:199:stop] epoch=0/micro_step=2820/global_step=1410, RunningAvgSamplesPerSec=12.088084726233372, CurrSamplesPerSec=12.305096378259675, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:05:05,918] [INFO] [logging.py:96:log_dist] [Rank 0] step=1420, skipped=26, lr=[0.0009130387267010942], mom=[(0.9, 0.95)] [2023-04-18 21:05:05,918] [INFO] [timer.py:199:stop] epoch=0/micro_step=2840/global_step=1420, RunningAvgSamplesPerSec=12.089442392739935, CurrSamplesPerSec=12.285172025982911, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:05:26,709] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 21:05:29,268] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 21:05:31,905] [INFO] [logging.py:96:log_dist] [Rank 0] step=1430, skipped=28, lr=[0.0009120681620784591], mom=[(0.9, 0.95)] [2023-04-18 21:05:31,905] [INFO] [timer.py:199:stop] epoch=0/micro_step=2860/global_step=1430, RunningAvgSamplesPerSec=12.091071248182784, CurrSamplesPerSec=12.148830490079916, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:05:59,560] [INFO] [logging.py:96:log_dist] [Rank 0] step=1440, skipped=28, lr=[0.0009108481172367826], mom=[(0.9, 0.95)] [2023-04-18 21:05:59,560] [INFO] [timer.py:199:stop] epoch=0/micro_step=2880/global_step=1440, RunningAvgSamplesPerSec=12.087379683494394, CurrSamplesPerSec=12.220300920216436, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:06:25,622] [INFO] [logging.py:96:log_dist] [Rank 0] step=1450, skipped=28, lr=[0.0009096204944454096], mom=[(0.9, 0.95)] [2023-04-18 21:06:25,622] [INFO] [timer.py:199:stop] epoch=0/micro_step=2900/global_step=1450, RunningAvgSamplesPerSec=12.08876138894592, CurrSamplesPerSec=12.287460773663557, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:06:51,661] [INFO] [logging.py:96:log_dist] [Rank 0] step=1460, skipped=28, lr=[0.0009083853163474129], mom=[(0.9, 0.95)] [2023-04-18 21:06:51,661] [INFO] [timer.py:199:stop] epoch=0/micro_step=2920/global_step=1460, RunningAvgSamplesPerSec=12.09019929347288, CurrSamplesPerSec=12.323449096324783, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:07:18,631] [INFO] [logging.py:96:log_dist] [Rank 0] step=1470, skipped=28, lr=[0.0009071426057252201], mom=[(0.9, 0.95)] [2023-04-18 21:07:18,632] [INFO] [timer.py:199:stop] epoch=0/micro_step=2940/global_step=1470, RunningAvgSamplesPerSec=12.0887166217993, CurrSamplesPerSec=12.318292744933235, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:07:45,549] [INFO] [logging.py:96:log_dist] [Rank 0] step=1480, skipped=28, lr=[0.0009058923855001934], mom=[(0.9, 0.95)] [2023-04-18 21:07:45,549] [INFO] [timer.py:199:stop] epoch=0/micro_step=2960/global_step=1480, RunningAvgSamplesPerSec=12.08742002994358, CurrSamplesPerSec=12.323274847744564, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:08:11,609] [INFO] [logging.py:96:log_dist] [Rank 0] step=1490, skipped=28, lr=[0.0009046346787322075], mom=[(0.9, 0.95)] [2023-04-18 21:08:11,610] [INFO] [timer.py:199:stop] epoch=0/micro_step=2980/global_step=1490, RunningAvgSamplesPerSec=12.088770651493192, CurrSamplesPerSec=12.267636053604134, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:08:37,658] [INFO] [logging.py:96:log_dist] [Rank 0] step=1500, skipped=28, lr=[0.000903369508619223], mom=[(0.9, 0.95)] [2023-04-18 21:08:37,658] [INFO] [timer.py:199:stop] epoch=0/micro_step=3000/global_step=1500, RunningAvgSamplesPerSec=12.090138163253881, CurrSamplesPerSec=12.308583295863752, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:09:05,485] [INFO] [logging.py:96:log_dist] [Rank 0] step=1510, skipped=28, lr=[0.0009020968984968603], mom=[(0.9, 0.95)] [2023-04-18 21:09:05,485] [INFO] [timer.py:199:stop] epoch=0/micro_step=3020/global_step=1510, RunningAvgSamplesPerSec=12.08610396047713, CurrSamplesPerSec=12.323081369704942, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:09:31,525] [INFO] [logging.py:96:log_dist] [Rank 0] step=1520, skipped=28, lr=[0.0009008168718379671], mom=[(0.9, 0.95)] [2023-04-18 21:09:31,525] [INFO] [timer.py:199:stop] epoch=0/micro_step=3040/global_step=1520, RunningAvgSamplesPerSec=12.08749731455105, CurrSamplesPerSec=12.271350838757645, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:09:57,490] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 21:09:57,490] [INFO] [logging.py:96:log_dist] [Rank 0] step=1530, skipped=29, lr=[0.0008996585262167807], mom=[(0.9, 0.95)] [2023-04-18 21:09:57,491] [INFO] [timer.py:199:stop] epoch=0/micro_step=3060/global_step=1530, RunningAvgSamplesPerSec=12.089095289154216, CurrSamplesPerSec=12.651349872489918, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:10:00,045] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 21:10:23,524] [INFO] [logging.py:96:log_dist] [Rank 0] step=1540, skipped=30, lr=[0.0008984942096289488], mom=[(0.9, 0.95)] [2023-04-18 21:10:23,525] [INFO] [timer.py:199:stop] epoch=0/micro_step=3080/global_step=1540, RunningAvgSamplesPerSec=12.090469560421512, CurrSamplesPerSec=12.300618193968875, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:10:51,321] [INFO] [logging.py:96:log_dist] [Rank 0] step=1550, skipped=30, lr=[0.0008971935427060562], mom=[(0.9, 0.95)] [2023-04-18 21:10:51,322] [INFO] [timer.py:199:stop] epoch=0/micro_step=3100/global_step=1550, RunningAvgSamplesPerSec=12.08662430522739, CurrSamplesPerSec=12.349084402295853, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:11:17,338] [INFO] [logging.py:96:log_dist] [Rank 0] step=1560, skipped=30, lr=[0.0008958855496873145], mom=[(0.9, 0.95)] [2023-04-18 21:11:17,339] [INFO] [timer.py:199:stop] epoch=0/micro_step=3120/global_step=1560, RunningAvgSamplesPerSec=12.08804574022021, CurrSamplesPerSec=12.32520203957906, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:11:43,377] [INFO] [logging.py:96:log_dist] [Rank 0] step=1570, skipped=30, lr=[0.0008945702546981969], mom=[(0.9, 0.95)] [2023-04-18 21:11:43,377] [INFO] [timer.py:199:stop] epoch=0/micro_step=3140/global_step=1570, RunningAvgSamplesPerSec=12.089386592382095, CurrSamplesPerSec=12.270635074168112, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:12:09,392] [INFO] [logging.py:96:log_dist] [Rank 0] step=1580, skipped=30, lr=[0.0008932476819988589], mom=[(0.9, 0.95)] [2023-04-18 21:12:09,393] [INFO] [timer.py:199:stop] epoch=0/micro_step=3160/global_step=1580, RunningAvgSamplesPerSec=12.090776092354165, CurrSamplesPerSec=12.321315456260379, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:12:36,899] [INFO] [logging.py:96:log_dist] [Rank 0] step=1590, skipped=30, lr=[0.0008919178559836909], mom=[(0.9, 0.95)] [2023-04-18 21:12:36,900] [INFO] [timer.py:199:stop] epoch=0/micro_step=3180/global_step=1590, RunningAvgSamplesPerSec=12.087861205886668, CurrSamplesPerSec=12.315589054263834, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:13:02,865] [INFO] [logging.py:96:log_dist] [Rank 0] step=1600, skipped=30, lr=[0.0008905808011808685], mom=[(0.9, 0.95)] [2023-04-18 21:13:02,865] [INFO] [timer.py:199:stop] epoch=0/micro_step=3200/global_step=1600, RunningAvgSamplesPerSec=12.089386943336445, CurrSamplesPerSec=12.3424048141895, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:13:28,930] [INFO] [logging.py:96:log_dist] [Rank 0] step=1610, skipped=30, lr=[0.0008892365422518995], mom=[(0.9, 0.95)] [2023-04-18 21:13:28,930] [INFO] [timer.py:199:stop] epoch=0/micro_step=3220/global_step=1610, RunningAvgSamplesPerSec=12.090612441393514, CurrSamplesPerSec=12.291420576911648, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:13:54,977] [INFO] [logging.py:96:log_dist] [Rank 0] step=1620, skipped=30, lr=[0.0008878851039911688], mom=[(0.9, 0.95)] [2023-04-18 21:13:54,978] [INFO] [timer.py:199:stop] epoch=0/micro_step=3240/global_step=1620, RunningAvgSamplesPerSec=12.091871722741272, CurrSamplesPerSec=12.3047511793628, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:14:22,616] [INFO] [logging.py:96:log_dist] [Rank 0] step=1630, skipped=30, lr=[0.0008865265113254826], mom=[(0.9, 0.95)] [2023-04-18 21:14:22,616] [INFO] [timer.py:199:stop] epoch=0/micro_step=3260/global_step=1630, RunningAvgSamplesPerSec=12.088650237208666, CurrSamplesPerSec=12.291587171878804, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:14:27,758] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 21:14:30,309] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 21:14:48,536] [INFO] [logging.py:96:log_dist] [Rank 0] step=1640, skipped=32, lr=[0.0008854345028564341], mom=[(0.9, 0.95)] [2023-04-18 21:14:48,536] [INFO] [timer.py:199:stop] epoch=0/micro_step=3280/global_step=1640, RunningAvgSamplesPerSec=12.090261231587373, CurrSamplesPerSec=12.307318072440733, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:15:14,629] [INFO] [logging.py:96:log_dist] [Rank 0] step=1650, skipped=32, lr=[0.0008840630954983003], mom=[(0.9, 0.95)] [2023-04-18 21:15:14,630] [INFO] [timer.py:199:stop] epoch=0/micro_step=3300/global_step=1650, RunningAvgSamplesPerSec=12.091371724911633, CurrSamplesPerSec=12.284754857237418, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:15:40,695] [INFO] [logging.py:96:log_dist] [Rank 0] step=1660, skipped=32, lr=[0.0008826846042308196], mom=[(0.9, 0.95)] [2023-04-18 21:15:40,695] [INFO] [timer.py:199:stop] epoch=0/micro_step=3320/global_step=1660, RunningAvgSamplesPerSec=12.092546308380316, CurrSamplesPerSec=12.277254030674628, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:16:08,505] [INFO] [logging.py:96:log_dist] [Rank 0] step=1670, skipped=32, lr=[0.0008812990544797805], mom=[(0.9, 0.95)] [2023-04-18 21:16:08,506] [INFO] [timer.py:199:stop] epoch=0/micro_step=3340/global_step=1670, RunningAvgSamplesPerSec=12.088926669840058, CurrSamplesPerSec=12.303516067442963, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:16:34,554] [INFO] [logging.py:96:log_dist] [Rank 0] step=1680, skipped=32, lr=[0.0008799064718011633], mom=[(0.9, 0.95)] [2023-04-18 21:16:34,554] [INFO] [timer.py:199:stop] epoch=0/micro_step=3360/global_step=1680, RunningAvgSamplesPerSec=12.09014813221978, CurrSamplesPerSec=12.296307737945874, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:17:00,563] [INFO] [logging.py:96:log_dist] [Rank 0] step=1690, skipped=32, lr=[0.000878506881880668], mom=[(0.9, 0.95)] [2023-04-18 21:17:00,564] [INFO] [timer.py:199:stop] epoch=0/micro_step=3380/global_step=1690, RunningAvgSamplesPerSec=12.091459900564466, CurrSamplesPerSec=12.3236301390302, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:17:26,576] [INFO] [logging.py:96:log_dist] [Rank 0] step=1700, skipped=32, lr=[0.0008771003105332407], mom=[(0.9, 0.95)] [2023-04-18 21:17:26,577] [INFO] [timer.py:199:stop] epoch=0/micro_step=3400/global_step=1700, RunningAvgSamplesPerSec=12.092748209554678, CurrSamplesPerSec=12.289969815718754, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:17:54,353] [INFO] [logging.py:96:log_dist] [Rank 0] step=1710, skipped=32, lr=[0.0008756867837025975], mom=[(0.9, 0.95)] [2023-04-18 21:17:54,353] [INFO] [timer.py:199:stop] epoch=0/micro_step=3420/global_step=1710, RunningAvgSamplesPerSec=12.089302318959184, CurrSamplesPerSec=12.33605217589512, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:18:20,375] [INFO] [logging.py:96:log_dist] [Rank 0] step=1720, skipped=32, lr=[0.000874266327460746], mom=[(0.9, 0.95)] [2023-04-18 21:18:20,375] [INFO] [timer.py:199:stop] epoch=0/micro_step=3440/global_step=1720, RunningAvgSamplesPerSec=12.09056363145482, CurrSamplesPerSec=12.300892136799748, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:18:46,388] [INFO] [logging.py:96:log_dist] [Rank 0] step=1730, skipped=32, lr=[0.0008728389680075041], mom=[(0.9, 0.95)] [2023-04-18 21:18:46,388] [INFO] [timer.py:199:stop] epoch=0/micro_step=3460/global_step=1730, RunningAvgSamplesPerSec=12.091835432553086, CurrSamplesPerSec=12.321284916406258, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:18:56,727] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 21:18:59,278] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 21:19:13,183] [INFO] [logging.py:96:log_dist] [Rank 0] step=1740, skipped=34, lr=[0.0008716921278202858], mom=[(0.9, 0.95)] [2023-04-18 21:19:13,183] [INFO] [timer.py:199:stop] epoch=0/micro_step=3480/global_step=1740, RunningAvgSamplesPerSec=12.091035305499142, CurrSamplesPerSec=12.289691857909272, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:19:39,942] [INFO] [logging.py:96:log_dist] [Rank 0] step=1750, skipped=34, lr=[0.0008702524090162022], mom=[(0.9, 0.95)] [2023-04-18 21:19:39,942] [INFO] [timer.py:199:stop] epoch=0/micro_step=3500/global_step=1750, RunningAvgSamplesPerSec=12.090338613022197, CurrSamplesPerSec=12.329180546645178, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:20:05,983] [INFO] [logging.py:96:log_dist] [Rank 0] step=1760, skipped=34, lr=[0.0008688058610360489], mom=[(0.9, 0.95)] [2023-04-18 21:20:05,983] [INFO] [timer.py:199:stop] epoch=0/micro_step=3520/global_step=1760, RunningAvgSamplesPerSec=12.091515025589777, CurrSamplesPerSec=12.313034524509472, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:20:32,001] [INFO] [logging.py:96:log_dist] [Rank 0] step=1770, skipped=34, lr=[0.000867352510560897], mom=[(0.9, 0.95)] [2023-04-18 21:20:32,002] [INFO] [timer.py:199:stop] epoch=0/micro_step=3540/global_step=1770, RunningAvgSamplesPerSec=12.092737322873381, CurrSamplesPerSec=12.303024347604179, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:20:59,809] [INFO] [logging.py:96:log_dist] [Rank 0] step=1780, skipped=34, lr=[0.0008658923843972875], mom=[(0.9, 0.95)] [2023-04-18 21:20:59,809] [INFO] [timer.py:199:stop] epoch=0/micro_step=3560/global_step=1780, RunningAvgSamplesPerSec=12.089349008101008, CurrSamplesPerSec=9.132435760235413, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:21:25,830] [INFO] [logging.py:96:log_dist] [Rank 0] step=1790, skipped=34, lr=[0.0008644255094767358], mom=[(0.9, 0.95)] [2023-04-18 21:21:25,831] [INFO] [timer.py:199:stop] epoch=0/micro_step=3580/global_step=1790, RunningAvgSamplesPerSec=12.090561398104985, CurrSamplesPerSec=12.285670191816042, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:21:51,829] [INFO] [logging.py:96:log_dist] [Rank 0] step=1800, skipped=34, lr=[0.0008629519128552368], mom=[(0.9, 0.95)] [2023-04-18 21:21:51,829] [INFO] [timer.py:199:stop] epoch=0/micro_step=3600/global_step=1800, RunningAvgSamplesPerSec=12.091821637524424, CurrSamplesPerSec=12.385087703352289, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:22:17,868] [INFO] [logging.py:96:log_dist] [Rank 0] step=1810, skipped=34, lr=[0.000861471621712764], mom=[(0.9, 0.95)] [2023-04-18 21:22:17,868] [INFO] [timer.py:199:stop] epoch=0/micro_step=3620/global_step=1810, RunningAvgSamplesPerSec=12.092963400253105, CurrSamplesPerSec=12.297554923961306, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:22:45,661] [INFO] [logging.py:96:log_dist] [Rank 0] step=1820, skipped=34, lr=[0.0008599846633527696], mom=[(0.9, 0.95)] [2023-04-18 21:22:45,661] [INFO] [timer.py:199:stop] epoch=0/micro_step=3640/global_step=1820, RunningAvgSamplesPerSec=12.08968298030801, CurrSamplesPerSec=12.342723752898939, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:23:11,705] [INFO] [logging.py:96:log_dist] [Rank 0] step=1830, skipped=34, lr=[0.0008584910652016797], mom=[(0.9, 0.95)] [2023-04-18 21:23:11,706] [INFO] [timer.py:199:stop] epoch=0/micro_step=3660/global_step=1830, RunningAvgSamplesPerSec=12.090810703874503, CurrSamplesPerSec=12.327041525162441, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:23:27,250] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 21:23:29,803] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 21:23:37,615] [INFO] [logging.py:96:log_dist] [Rank 0] step=1840, skipped=36, lr=[0.0008572914245399747], mom=[(0.9, 0.95)] [2023-04-18 21:23:37,615] [INFO] [timer.py:199:stop] epoch=0/micro_step=3680/global_step=1840, RunningAvgSamplesPerSec=12.092259887005177, CurrSamplesPerSec=12.379543946925889, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:24:03,652] [INFO] [logging.py:96:log_dist] [Rank 0] step=1850, skipped=36, lr=[0.0008557859442701034], mom=[(0.9, 0.95)] [2023-04-18 21:24:03,653] [INFO] [timer.py:199:stop] epoch=0/micro_step=3700/global_step=1850, RunningAvgSamplesPerSec=12.09337834113586, CurrSamplesPerSec=12.327875983619869, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:24:31,321] [INFO] [logging.py:96:log_dist] [Rank 0] step=1860, skipped=36, lr=[0.0008542739016530403], mom=[(0.9, 0.95)] [2023-04-18 21:24:31,322] [INFO] [timer.py:199:stop] epoch=0/micro_step=3720/global_step=1860, RunningAvgSamplesPerSec=12.09047268739885, CurrSamplesPerSec=12.29582222626125, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:24:57,344] [INFO] [logging.py:96:log_dist] [Rank 0] step=1870, skipped=36, lr=[0.0008527553245778823], mom=[(0.9, 0.95)] [2023-04-18 21:24:57,344] [INFO] [timer.py:199:stop] epoch=0/micro_step=3740/global_step=1870, RunningAvgSamplesPerSec=12.091624689959637, CurrSamplesPerSec=12.26886060740572, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:25:23,377] [INFO] [logging.py:96:log_dist] [Rank 0] step=1880, skipped=36, lr=[0.0008512302410542519], mom=[(0.9, 0.95)] [2023-04-18 21:25:23,377] [INFO] [timer.py:199:stop] epoch=0/micro_step=3760/global_step=1880, RunningAvgSamplesPerSec=12.092739571635207, CurrSamplesPerSec=12.263629927783887, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:25:49,411] [INFO] [logging.py:96:log_dist] [Rank 0] step=1890, skipped=36, lr=[0.0008496986792117805], mom=[(0.9, 0.95)] [2023-04-18 21:25:49,411] [INFO] [timer.py:199:stop] epoch=0/micro_step=3780/global_step=1890, RunningAvgSamplesPerSec=12.093840347411032, CurrSamplesPerSec=12.270343407216275, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:26:16,897] [INFO] [logging.py:96:log_dist] [Rank 0] step=1900, skipped=36, lr=[0.0008481606672995907], mom=[(0.9, 0.95)] [2023-04-18 21:26:16,897] [INFO] [timer.py:199:stop] epoch=0/micro_step=3800/global_step=1900, RunningAvgSamplesPerSec=12.091434394080379, CurrSamplesPerSec=12.35411191696777, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:26:42,935] [INFO] [logging.py:96:log_dist] [Rank 0] step=1910, skipped=36, lr=[0.0008466162336857734], mom=[(0.9, 0.95)] [2023-04-18 21:26:42,936] [INFO] [timer.py:199:stop] epoch=0/micro_step=3820/global_step=1910, RunningAvgSamplesPerSec=12.092519774121822, CurrSamplesPerSec=12.228371760456154, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:27:08,924] [INFO] [logging.py:96:log_dist] [Rank 0] step=1920, skipped=36, lr=[0.0008450654068568662], mom=[(0.9, 0.95)] [2023-04-18 21:27:08,924] [INFO] [timer.py:199:stop] epoch=0/micro_step=3840/global_step=1920, RunningAvgSamplesPerSec=12.093712949854492, CurrSamplesPerSec=12.318966591740773, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:27:34,984] [INFO] [logging.py:96:log_dist] [Rank 0] step=1930, skipped=36, lr=[0.0008435082154173269], mom=[(0.9, 0.95)] [2023-04-18 21:27:34,984] [INFO] [timer.py:199:stop] epoch=0/micro_step=3860/global_step=1930, RunningAvgSamplesPerSec=12.094723134853204, CurrSamplesPerSec=12.30546303228538, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:27:57,484] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 21:28:00,032] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 21:28:02,644] [INFO] [logging.py:96:log_dist] [Rank 0] step=1940, skipped=38, lr=[0.0008422578990431933], mom=[(0.9, 0.95)] [2023-04-18 21:28:02,645] [INFO] [timer.py:199:stop] epoch=0/micro_step=3880/global_step=1940, RunningAvgSamplesPerSec=12.091949716028086, CurrSamplesPerSec=12.263330750333566, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:28:28,663] [INFO] [logging.py:96:log_dist] [Rank 0] step=1950, skipped=38, lr=[0.0008406893237621304], mom=[(0.9, 0.95)] [2023-04-18 21:28:28,664] [INFO] [timer.py:199:stop] epoch=0/micro_step=3900/global_step=1950, RunningAvgSamplesPerSec=12.093055209889254, CurrSamplesPerSec=12.31685371832213, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:28:54,701] [INFO] [logging.py:96:log_dist] [Rank 0] step=1960, skipped=38, lr=[0.0008391144645857564], mom=[(0.9, 0.95)] [2023-04-18 21:28:54,701] [INFO] [timer.py:199:stop] epoch=0/micro_step=3920/global_step=1960, RunningAvgSamplesPerSec=12.094107861220037, CurrSamplesPerSec=12.320225164141645, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:29:20,789] [INFO] [logging.py:96:log_dist] [Rank 0] step=1970, skipped=38, lr=[0.0008375333505617974], mom=[(0.9, 0.95)] [2023-04-18 21:29:20,789] [INFO] [timer.py:199:stop] epoch=0/micro_step=3940/global_step=1970, RunningAvgSamplesPerSec=12.09503127284497, CurrSamplesPerSec=12.290759869724164, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:29:48,573] [INFO] [logging.py:96:log_dist] [Rank 0] step=1980, skipped=38, lr=[0.0008359460108533478], mom=[(0.9, 0.95)] [2023-04-18 21:29:48,574] [INFO] [timer.py:199:stop] epoch=0/micro_step=3960/global_step=1980, RunningAvgSamplesPerSec=12.092026014477067, CurrSamplesPerSec=12.310902230501377, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:30:14,654] [INFO] [logging.py:96:log_dist] [Rank 0] step=1990, skipped=38, lr=[0.0008343524747383329], mom=[(0.9, 0.95)] [2023-04-18 21:30:14,655] [INFO] [timer.py:199:stop] epoch=0/micro_step=3980/global_step=1990, RunningAvgSamplesPerSec=12.092966001589982, CurrSamplesPerSec=12.30242666769815, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:30:40,697] [INFO] [logging.py:96:log_dist] [Rank 0] step=2000, skipped=38, lr=[0.0008327527716089686], mom=[(0.9, 0.95)] [2023-04-18 21:30:40,698] [INFO] [timer.py:199:stop] epoch=0/micro_step=4000/global_step=2000, RunningAvgSamplesPerSec=12.09398431659448, CurrSamplesPerSec=12.333627420556706, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:31:07,611] [INFO] [logging.py:96:log_dist] [Rank 0] step=2010, skipped=38, lr=[0.0008311469309712192], mom=[(0.9, 0.95)] [2023-04-18 21:31:07,611] [INFO] [timer.py:199:stop] epoch=0/micro_step=4020/global_step=2010, RunningAvgSamplesPerSec=12.09301139651263, CurrSamplesPerSec=12.297687881927004, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:31:34,440] [INFO] [logging.py:96:log_dist] [Rank 0] step=2020, skipped=38, lr=[0.0008295349824442528], mom=[(0.9, 0.95)] [2023-04-18 21:31:34,441] [INFO] [timer.py:199:stop] epoch=0/micro_step=4040/global_step=2020, RunningAvgSamplesPerSec=12.092238206297186, CurrSamplesPerSec=12.321778097012281, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:32:00,450] [INFO] [logging.py:96:log_dist] [Rank 0] step=2030, skipped=38, lr=[0.000827916955759896], mom=[(0.9, 0.95)] [2023-04-18 21:32:00,451] [INFO] [timer.py:199:stop] epoch=0/micro_step=4060/global_step=2030, RunningAvgSamplesPerSec=12.093319724249307, CurrSamplesPerSec=12.33025769866387, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:32:26,385] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 21:32:26,385] [INFO] [logging.py:96:log_dist] [Rank 0] step=2040, skipped=39, lr=[0.0008264555595831111], mom=[(0.9, 0.95)] [2023-04-18 21:32:26,386] [INFO] [timer.py:199:stop] epoch=0/micro_step=4080/global_step=2040, RunningAvgSamplesPerSec=12.094558722597741, CurrSamplesPerSec=12.64948863044113, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:32:28,933] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 21:32:54,000] [INFO] [logging.py:96:log_dist] [Rank 0] step=2050, skipped=40, lr=[0.0008249892861045059], mom=[(0.9, 0.95)] [2023-04-18 21:32:54,001] [INFO] [timer.py:199:stop] epoch=0/micro_step=4100/global_step=2050, RunningAvgSamplesPerSec=12.092036380928082, CurrSamplesPerSec=9.571874102978192, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:33:20,020] [INFO] [logging.py:96:log_dist] [Rank 0] step=2060, skipped=40, lr=[0.0008233543997049375], mom=[(0.9, 0.95)] [2023-04-18 21:33:20,020] [INFO] [timer.py:199:stop] epoch=0/micro_step=4120/global_step=2060, RunningAvgSamplesPerSec=12.093081766565378, CurrSamplesPerSec=12.321075665922635, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:33:46,069] [INFO] [logging.py:96:log_dist] [Rank 0] step=2070, skipped=40, lr=[0.0008217135491466636], mom=[(0.9, 0.95)] [2023-04-18 21:33:46,070] [INFO] [timer.py:199:stop] epoch=0/micro_step=4140/global_step=2070, RunningAvgSamplesPerSec=12.094052570025612, CurrSamplesPerSec=12.308663439202592, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:34:12,052] [INFO] [logging.py:96:log_dist] [Rank 0] step=2080, skipped=40, lr=[0.0008200667646945983], mom=[(0.9, 0.95)] [2023-04-18 21:34:12,053] [INFO] [timer.py:199:stop] epoch=0/micro_step=4160/global_step=2080, RunningAvgSamplesPerSec=12.095158728771384, CurrSamplesPerSec=12.32546575979628, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:34:39,850] [INFO] [logging.py:96:log_dist] [Rank 0] step=2090, skipped=40, lr=[0.0008184140767231043], mom=[(0.9, 0.95)] [2023-04-18 21:34:39,850] [INFO] [timer.py:199:stop] epoch=0/micro_step=4180/global_step=2090, RunningAvgSamplesPerSec=12.092282035016884, CurrSamplesPerSec=12.2769912465272, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:35:05,862] [INFO] [logging.py:96:log_dist] [Rank 0] step=2100, skipped=40, lr=[0.0008167555157154328], mom=[(0.9, 0.95)] [2023-04-18 21:35:05,862] [INFO] [timer.py:199:stop] epoch=0/micro_step=4200/global_step=2100, RunningAvgSamplesPerSec=12.093322884198916, CurrSamplesPerSec=12.25485452295514, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:35:31,874] [INFO] [logging.py:96:log_dist] [Rank 0] step=2110, skipped=40, lr=[0.0008150911122631606], mom=[(0.9, 0.95)] [2023-04-18 21:35:31,875] [INFO] [timer.py:199:stop] epoch=0/micro_step=4220/global_step=2110, RunningAvgSamplesPerSec=12.09435351161068, CurrSamplesPerSec=12.29186746684519, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:35:57,892] [INFO] [logging.py:96:log_dist] [Rank 0] step=2120, skipped=40, lr=[0.0008134208970656271], mom=[(0.9, 0.95)] [2023-04-18 21:35:57,893] [INFO] [timer.py:199:stop] epoch=0/micro_step=4240/global_step=2120, RunningAvgSamplesPerSec=12.095361119255227, CurrSamplesPerSec=12.330822967286108, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:36:25,528] [INFO] [logging.py:96:log_dist] [Rank 0] step=2130, skipped=40, lr=[0.0008117449009293668], mom=[(0.9, 0.95)] [2023-04-18 21:36:25,529] [INFO] [timer.py:199:stop] epoch=0/micro_step=4260/global_step=2130, RunningAvgSamplesPerSec=12.092884693994279, CurrSamplesPerSec=12.287158182674558, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:36:51,498] [INFO] [logging.py:96:log_dist] [Rank 0] step=2140, skipped=40, lr=[0.0008100631547675416], mom=[(0.9, 0.95)] [2023-04-18 21:36:51,498] [INFO] [timer.py:199:stop] epoch=0/micro_step=4280/global_step=2140, RunningAvgSamplesPerSec=12.093993816568254, CurrSamplesPerSec=12.348368629791812, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:36:56,633] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 21:36:59,182] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 21:37:17,415] [INFO] [logging.py:96:log_dist] [Rank 0] step=2150, skipped=42, lr=[0.0008087136386610414], mom=[(0.9, 0.95)] [2023-04-18 21:37:17,415] [INFO] [timer.py:199:stop] epoch=0/micro_step=4300/global_step=2150, RunningAvgSamplesPerSec=12.095203945534653, CurrSamplesPerSec=12.306687250901584, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:37:43,439] [INFO] [logging.py:96:log_dist] [Rank 0] step=2160, skipped=42, lr=[0.0008070216206925374], mom=[(0.9, 0.95)] [2023-04-18 21:37:43,439] [INFO] [timer.py:199:stop] epoch=0/micro_step=4320/global_step=2160, RunningAvgSamplesPerSec=12.096177035150957, CurrSamplesPerSec=12.316322505179835, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:38:11,076] [INFO] [logging.py:96:log_dist] [Rank 0] step=2170, skipped=42, lr=[0.0008053239398177191], mom=[(0.9, 0.95)] [2023-04-18 21:38:11,076] [INFO] [timer.py:199:stop] epoch=0/micro_step=4340/global_step=2170, RunningAvgSamplesPerSec=12.093739983479146, CurrSamplesPerSec=12.300239428919047, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:38:37,117] [INFO] [logging.py:96:log_dist] [Rank 0] step=2180, skipped=42, lr=[0.000803620627349716], mom=[(0.9, 0.95)] [2023-04-18 21:38:37,118] [INFO] [timer.py:199:stop] epoch=0/micro_step=4360/global_step=2180, RunningAvgSamplesPerSec=12.094673941506617, CurrSamplesPerSec=12.290122866681683, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:39:03,120] [INFO] [logging.py:96:log_dist] [Rank 0] step=2190, skipped=42, lr=[0.00080191171470553], mom=[(0.9, 0.95)] [2023-04-18 21:39:03,120] [INFO] [timer.py:199:stop] epoch=0/micro_step=4380/global_step=2190, RunningAvgSamplesPerSec=12.09568094887143, CurrSamplesPerSec=12.313223168700436, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:39:29,167] [INFO] [logging.py:96:log_dist] [Rank 0] step=2200, skipped=42, lr=[0.0008001972334054571], mom=[(0.9, 0.95)] [2023-04-18 21:39:29,167] [INFO] [timer.py:199:stop] epoch=0/micro_step=4400/global_step=2200, RunningAvgSamplesPerSec=12.09658614709808, CurrSamplesPerSec=12.273252844718778, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:39:56,805] [INFO] [logging.py:96:log_dist] [Rank 0] step=2210, skipped=42, lr=[0.0007984772150725041], mom=[(0.9, 0.95)] [2023-04-18 21:39:56,806] [INFO] [timer.py:199:stop] epoch=0/micro_step=4420/global_step=2210, RunningAvgSamplesPerSec=12.094188347612798, CurrSamplesPerSec=12.33342228397308, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:40:22,804] [INFO] [logging.py:96:log_dist] [Rank 0] step=2220, skipped=42, lr=[0.0007967516914318074], mom=[(0.9, 0.95)] [2023-04-18 21:40:22,804] [INFO] [timer.py:199:stop] epoch=0/micro_step=4440/global_step=2220, RunningAvgSamplesPerSec=12.09519248927909, CurrSamplesPerSec=12.320221771417556, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:40:48,804] [INFO] [logging.py:96:log_dist] [Rank 0] step=2230, skipped=42, lr=[0.0007950206943100464], mom=[(0.9, 0.95)] [2023-04-18 21:40:48,805] [INFO] [timer.py:199:stop] epoch=0/micro_step=4460/global_step=2230, RunningAvgSamplesPerSec=12.096182064686987, CurrSamplesPerSec=12.317737668127071, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:41:14,855] [INFO] [logging.py:96:log_dist] [Rank 0] step=2240, skipped=42, lr=[0.0007932842556348571], mom=[(0.9, 0.95)] [2023-04-18 21:41:14,855] [INFO] [timer.py:199:stop] epoch=0/micro_step=4480/global_step=2240, RunningAvgSamplesPerSec=12.09706139215428, CurrSamplesPerSec=12.295421229280784, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:41:26,952] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 21:41:29,500] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 21:41:42,533] [INFO] [logging.py:96:log_dist] [Rank 0] step=2250, skipped=44, lr=[0.0007918912082956984], mom=[(0.9, 0.95)] [2023-04-18 21:41:42,533] [INFO] [timer.py:199:stop] epoch=0/micro_step=4500/global_step=2250, RunningAvgSamplesPerSec=12.094623365965374, CurrSamplesPerSec=12.306867801444739, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:42:08,492] [INFO] [logging.py:96:log_dist] [Rank 0] step=2260, skipped=44, lr=[0.0007901450556019767], mom=[(0.9, 0.95)] [2023-04-18 21:42:08,493] [INFO] [timer.py:199:stop] epoch=0/micro_step=4520/global_step=2260, RunningAvgSamplesPerSec=12.095685951084679, CurrSamplesPerSec=12.318962069033248, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:42:34,547] [INFO] [logging.py:96:log_dist] [Rank 0] step=2270, skipped=44, lr=[0.0007883935512842759], mom=[(0.9, 0.95)] [2023-04-18 21:42:34,547] [INFO] [timer.py:199:stop] epoch=0/micro_step=4540/global_step=2270, RunningAvgSamplesPerSec=12.09654905094477, CurrSamplesPerSec=12.329793288278882, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:43:01,410] [INFO] [logging.py:96:log_dist] [Rank 0] step=2280, skipped=44, lr=[0.0007866367276484798], mom=[(0.9, 0.95)] [2023-04-18 21:43:01,410] [INFO] [timer.py:199:stop] epoch=0/micro_step=4560/global_step=2280, RunningAvgSamplesPerSec=12.095780788856018, CurrSamplesPerSec=12.333560552080908, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:43:28,330] [INFO] [logging.py:96:log_dist] [Rank 0] step=2290, skipped=44, lr=[0.0007848746170985854], mom=[(0.9, 0.95)] [2023-04-18 21:43:28,330] [INFO] [timer.py:199:stop] epoch=0/micro_step=4580/global_step=2290, RunningAvgSamplesPerSec=12.094905061632675, CurrSamplesPerSec=12.314369843063902, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:43:54,378] [INFO] [logging.py:96:log_dist] [Rank 0] step=2300, skipped=44, lr=[0.0007831072521361051], mom=[(0.9, 0.95)] [2023-04-18 21:43:54,378] [INFO] [timer.py:199:stop] epoch=0/micro_step=4600/global_step=2300, RunningAvgSamplesPerSec=12.095772390994432, CurrSamplesPerSec=12.329670961608437, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:44:20,389] [INFO] [logging.py:96:log_dist] [Rank 0] step=2310, skipped=44, lr=[0.0007813346653594667], mom=[(0.9, 0.95)] [2023-04-18 21:44:20,390] [INFO] [timer.py:199:stop] epoch=0/micro_step=4620/global_step=2310, RunningAvgSamplesPerSec=12.096705920788796, CurrSamplesPerSec=12.281251083732155, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:44:48,227] [INFO] [logging.py:96:log_dist] [Rank 0] step=2320, skipped=44, lr=[0.000779556889463413], mom=[(0.9, 0.95)] [2023-04-18 21:44:48,227] [INFO] [timer.py:199:stop] epoch=0/micro_step=4640/global_step=2320, RunningAvgSamplesPerSec=12.09402909430913, CurrSamplesPerSec=9.133663167910425, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:45:14,245] [INFO] [logging.py:96:log_dist] [Rank 0] step=2330, skipped=44, lr=[0.0007777739572383978], mom=[(0.9, 0.95)] [2023-04-18 21:45:14,245] [INFO] [timer.py:199:stop] epoch=0/micro_step=4660/global_step=2330, RunningAvgSamplesPerSec=12.094947326474676, CurrSamplesPerSec=12.309594758749595, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:45:40,259] [INFO] [logging.py:96:log_dist] [Rank 0] step=2340, skipped=44, lr=[0.000775985901569982], mom=[(0.9, 0.95)] [2023-04-18 21:45:40,260] [INFO] [timer.py:199:stop] epoch=0/micro_step=4680/global_step=2340, RunningAvgSamplesPerSec=12.09586481002567, CurrSamplesPerSec=12.272056588103876, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:45:55,801] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 21:45:58,348] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 21:46:06,155] [INFO] [logging.py:96:log_dist] [Rank 0] step=2350, skipped=46, lr=[0.0007745517903154421], mom=[(0.9, 0.95)] [2023-04-18 21:46:06,155] [INFO] [timer.py:199:stop] epoch=0/micro_step=4700/global_step=2350, RunningAvgSamplesPerSec=12.09700614562638, CurrSamplesPerSec=12.370037569717429, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:46:33,793] [INFO] [logging.py:96:log_dist] [Rank 0] step=2360, skipped=46, lr=[0.0007727545956217742], mom=[(0.9, 0.95)] [2023-04-18 21:46:33,793] [INFO] [timer.py:199:stop] epoch=0/micro_step=4720/global_step=2360, RunningAvgSamplesPerSec=12.094759714147429, CurrSamplesPerSec=12.2966062734459, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:46:59,794] [INFO] [logging.py:96:log_dist] [Rank 0] step=2370, skipped=46, lr=[0.0007709523700650736], mom=[(0.9, 0.95)] [2023-04-18 21:46:59,794] [INFO] [timer.py:199:stop] epoch=0/micro_step=4740/global_step=2370, RunningAvgSamplesPerSec=12.095692102237864, CurrSamplesPerSec=12.298549924921863, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:47:25,808] [INFO] [logging.py:96:log_dist] [Rank 0] step=2380, skipped=46, lr=[0.0007691451468867596], mom=[(0.9, 0.95)] [2023-04-18 21:47:25,809] [INFO] [timer.py:199:stop] epoch=0/micro_step=4760/global_step=2380, RunningAvgSamplesPerSec=12.096591269648457, CurrSamplesPerSec=12.29584024921164, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:47:51,811] [INFO] [logging.py:96:log_dist] [Rank 0] step=2390, skipped=46, lr=[0.0007673329594204314], mom=[(0.9, 0.95)] [2023-04-18 21:47:51,812] [INFO] [timer.py:199:stop] epoch=0/micro_step=4780/global_step=2390, RunningAvgSamplesPerSec=12.097505407749873, CurrSamplesPerSec=12.317790799587014, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:48:19,424] [INFO] [logging.py:96:log_dist] [Rank 0] step=2400, skipped=46, lr=[0.0007655158410912519], mom=[(0.9, 0.95)] [2023-04-18 21:48:19,424] [INFO] [timer.py:199:stop] epoch=0/micro_step=4800/global_step=2400, RunningAvgSamplesPerSec=12.095343125977314, CurrSamplesPerSec=12.308933225153751, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:48:45,430] [INFO] [logging.py:96:log_dist] [Rank 0] step=2410, skipped=46, lr=[0.0007636938254153332], mom=[(0.9, 0.95)] [2023-04-18 21:48:45,431] [INFO] [timer.py:199:stop] epoch=0/micro_step=4820/global_step=2410, RunningAvgSamplesPerSec=12.096248020570936, CurrSamplesPerSec=12.313665996935752, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:49:11,459] [INFO] [logging.py:96:log_dist] [Rank 0] step=2420, skipped=46, lr=[0.0007618669459991161], mom=[(0.9, 0.95)] [2023-04-18 21:49:11,459] [INFO] [timer.py:199:stop] epoch=0/micro_step=4840/global_step=2420, RunningAvgSamplesPerSec=12.097103271237614, CurrSamplesPerSec=12.300884245271796, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:49:37,480] [INFO] [logging.py:96:log_dist] [Rank 0] step=2430, skipped=46, lr=[0.0007600352365387522], mom=[(0.9, 0.95)] [2023-04-18 21:49:37,480] [INFO] [timer.py:199:stop] epoch=0/micro_step=4860/global_step=2430, RunningAvgSamplesPerSec=12.097966737759501, CurrSamplesPerSec=12.281996184848362, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:50:05,112] [INFO] [logging.py:96:log_dist] [Rank 0] step=2440, skipped=46, lr=[0.000758198730819481], mom=[(0.9, 0.95)] [2023-04-18 21:50:05,113] [INFO] [timer.py:199:stop] epoch=0/micro_step=4880/global_step=2440, RunningAvgSamplesPerSec=12.095799158080322, CurrSamplesPerSec=12.293153160150386, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:50:25,845] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 21:50:28,398] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 21:50:31,018] [INFO] [logging.py:96:log_dist] [Rank 0] step=2450, skipped=48, lr=[0.0007567260956978242], mom=[(0.9, 0.95)] [2023-04-18 21:50:31,019] [INFO] [timer.py:199:stop] epoch=0/micro_step=4900/global_step=2450, RunningAvgSamplesPerSec=12.096874765739255, CurrSamplesPerSec=12.224162994513424, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:50:57,027] [INFO] [logging.py:96:log_dist] [Rank 0] step=2460, skipped=48, lr=[0.000754881042133307], mom=[(0.9, 0.95)] [2023-04-18 21:50:57,027] [INFO] [timer.py:199:stop] epoch=0/micro_step=4920/global_step=2460, RunningAvgSamplesPerSec=12.097751595057375, CurrSamplesPerSec=12.319674436398856, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:51:23,049] [INFO] [logging.py:96:log_dist] [Rank 0] step=2470, skipped=48, lr=[0.0007530312873771939], mom=[(0.9, 0.95)] [2023-04-18 21:51:23,049] [INFO] [timer.py:199:stop] epoch=0/micro_step=4940/global_step=2470, RunningAvgSamplesPerSec=12.098597703735782, CurrSamplesPerSec=12.286765624010185, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:51:50,824] [INFO] [logging.py:96:log_dist] [Rank 0] step=2480, skipped=48, lr=[0.0007511768655475642], mom=[(0.9, 0.95)] [2023-04-18 21:51:50,825] [INFO] [timer.py:199:stop] epoch=0/micro_step=4960/global_step=2480, RunningAvgSamplesPerSec=12.09619853805542, CurrSamplesPerSec=12.285500383573691, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:52:16,859] [INFO] [logging.py:96:log_dist] [Rank 0] step=2490, skipped=48, lr=[0.000749317810848579], mom=[(0.9, 0.95)] [2023-04-18 21:52:16,859] [INFO] [timer.py:199:stop] epoch=0/micro_step=4980/global_step=2490, RunningAvgSamplesPerSec=12.097020085088424, CurrSamplesPerSec=12.356930387837599, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:52:42,844] [INFO] [logging.py:96:log_dist] [Rank 0] step=2500, skipped=48, lr=[0.000747454157569852], mom=[(0.9, 0.95)] [2023-04-18 21:52:42,845] [INFO] [timer.py:199:stop] epoch=0/micro_step=5000/global_step=2500, RunningAvgSamplesPerSec=12.097924605381774, CurrSamplesPerSec=12.334972877672348, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:53:08,848] [INFO] [logging.py:96:log_dist] [Rank 0] step=2510, skipped=48, lr=[0.000745585940085815], mom=[(0.9, 0.95)] [2023-04-18 21:53:08,848] [INFO] [timer.py:199:stop] epoch=0/micro_step=5020/global_step=2510, RunningAvgSamplesPerSec=12.098789121844082, CurrSamplesPerSec=12.273174284128155, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:53:36,463] [INFO] [logging.py:96:log_dist] [Rank 0] step=2520, skipped=48, lr=[0.0007437131928550854], mom=[(0.9, 0.95)] [2023-04-18 21:53:36,463] [INFO] [timer.py:199:stop] epoch=0/micro_step=5040/global_step=2520, RunningAvgSamplesPerSec=12.096721637799837, CurrSamplesPerSec=12.323031586864612, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:54:02,460] [INFO] [logging.py:96:log_dist] [Rank 0] step=2530, skipped=48, lr=[0.0007418359504198308], mom=[(0.9, 0.95)] [2023-04-18 21:54:02,460] [INFO] [timer.py:199:stop] epoch=0/micro_step=5060/global_step=2530, RunningAvgSamplesPerSec=12.097596040117761, CurrSamplesPerSec=12.371645279697752, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:54:28,467] [INFO] [logging.py:96:log_dist] [Rank 0] step=2540, skipped=48, lr=[0.0007399542474051303], mom=[(0.9, 0.95)] [2023-04-18 21:54:28,468] [INFO] [timer.py:199:stop] epoch=0/micro_step=5080/global_step=2540, RunningAvgSamplesPerSec=12.09844447777006, CurrSamplesPerSec=12.346168433030194, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:54:55,288] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 21:54:55,288] [INFO] [logging.py:96:log_dist] [Rank 0] step=2550, skipped=49, lr=[0.0007382569295804118], mom=[(0.9, 0.95)] [2023-04-18 21:54:55,289] [INFO] [timer.py:199:stop] epoch=0/micro_step=5100/global_step=2550, RunningAvgSamplesPerSec=12.097826253048906, CurrSamplesPerSec=12.648037933070938, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:54:57,837] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 21:55:22,142] [INFO] [logging.py:96:log_dist] [Rank 0] step=2560, skipped=50, lr=[0.0007365560521564695], mom=[(0.9, 0.95)] [2023-04-18 21:55:22,143] [INFO] [timer.py:199:stop] epoch=0/micro_step=5120/global_step=2560, RunningAvgSamplesPerSec=12.09715407293771, CurrSamplesPerSec=12.313946170024511, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:55:48,167] [INFO] [logging.py:96:log_dist] [Rank 0] step=2570, skipped=50, lr=[0.000734662044422606], mom=[(0.9, 0.95)] [2023-04-18 21:55:48,167] [INFO] [timer.py:199:stop] epoch=0/micro_step=5140/global_step=2570, RunningAvgSamplesPerSec=12.097963777137299, CurrSamplesPerSec=12.326186803575187, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:56:14,177] [INFO] [logging.py:96:log_dist] [Rank 0] step=2580, skipped=50, lr=[0.0007327637084294817], mom=[(0.9, 0.95)] [2023-04-18 21:56:14,177] [INFO] [timer.py:199:stop] epoch=0/micro_step=5160/global_step=2580, RunningAvgSamplesPerSec=12.098794132776487, CurrSamplesPerSec=12.282036645415388, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:56:41,084] [INFO] [logging.py:96:log_dist] [Rank 0] step=2590, skipped=50, lr=[0.0007308610791912389], mom=[(0.9, 0.95)] [2023-04-18 21:56:41,084] [INFO] [timer.py:199:stop] epoch=0/micro_step=5180/global_step=2590, RunningAvgSamplesPerSec=12.09803236466467, CurrSamplesPerSec=12.374917861037332, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:57:07,988] [INFO] [logging.py:96:log_dist] [Rank 0] step=2600, skipped=50, lr=[0.0007289541918012085], mom=[(0.9, 0.95)] [2023-04-18 21:57:07,988] [INFO] [timer.py:199:stop] epoch=0/micro_step=5200/global_step=2600, RunningAvgSamplesPerSec=12.09728144457654, CurrSamplesPerSec=12.3162524336756, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:57:33,989] [INFO] [logging.py:96:log_dist] [Rank 0] step=2610, skipped=50, lr=[0.0007270430814312603], mom=[(0.9, 0.95)] [2023-04-18 21:57:33,990] [INFO] [timer.py:199:stop] epoch=0/micro_step=5220/global_step=2610, RunningAvgSamplesPerSec=12.09811845368897, CurrSamplesPerSec=12.34633310827095, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:57:59,988] [INFO] [logging.py:96:log_dist] [Rank 0] step=2620, skipped=50, lr=[0.0007251277833311564], mom=[(0.9, 0.95)] [2023-04-18 21:57:59,988] [INFO] [timer.py:199:stop] epoch=0/micro_step=5240/global_step=2620, RunningAvgSamplesPerSec=12.09895530660416, CurrSamplesPerSec=12.28904483883426, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:58:27,765] [INFO] [logging.py:96:log_dist] [Rank 0] step=2630, skipped=50, lr=[0.0007232083328278998], mom=[(0.9, 0.95)] [2023-04-18 21:58:27,765] [INFO] [timer.py:199:stop] epoch=0/micro_step=5260/global_step=2630, RunningAvgSamplesPerSec=12.096689818851262, CurrSamplesPerSec=12.33802533097759, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:58:53,779] [INFO] [logging.py:96:log_dist] [Rank 0] step=2640, skipped=50, lr=[0.0007212847653250828], mom=[(0.9, 0.95)] [2023-04-18 21:58:53,779] [INFO] [timer.py:199:stop] epoch=0/micro_step=5280/global_step=2640, RunningAvgSamplesPerSec=12.097498999530545, CurrSamplesPerSec=12.287501270242633, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:59:19,760] [INFO] [logging.py:96:log_dist] [Rank 0] step=2650, skipped=50, lr=[0.0007193571163022348], mom=[(0.9, 0.95)] [2023-04-18 21:59:19,760] [INFO] [timer.py:199:stop] epoch=0/micro_step=5300/global_step=2650, RunningAvgSamplesPerSec=12.098357580153264, CurrSamplesPerSec=12.338584506347603, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 21:59:24,888] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 21:59:27,436] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 21:59:45,695] [INFO] [logging.py:96:log_dist] [Rank 0] step=2660, skipped=52, lr=[0.0007178120822798547], mom=[(0.9, 0.95)] [2023-04-18 21:59:45,695] [INFO] [timer.py:199:stop] epoch=0/micro_step=5320/global_step=2660, RunningAvgSamplesPerSec=12.099289935170088, CurrSamplesPerSec=12.302661221945709, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:00:13,336] [INFO] [logging.py:96:log_dist] [Rank 0] step=2670, skipped=52, lr=[0.0007158771761692464], mom=[(0.9, 0.95)] [2023-04-18 22:00:13,337] [INFO] [timer.py:199:stop] epoch=0/micro_step=5340/global_step=2670, RunningAvgSamplesPerSec=12.097290189451588, CurrSamplesPerSec=12.370086592993575, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:00:39,346] [INFO] [logging.py:96:log_dist] [Rank 0] step=2680, skipped=52, lr=[0.0007139382882796963], mom=[(0.9, 0.95)] [2023-04-18 22:00:39,346] [INFO] [timer.py:199:stop] epoch=0/micro_step=5360/global_step=2680, RunningAvgSamplesPerSec=12.09809199876214, CurrSamplesPerSec=12.319223260834827, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:01:05,406] [INFO] [logging.py:96:log_dist] [Rank 0] step=2690, skipped=52, lr=[0.0007119954543733125], mom=[(0.9, 0.95)] [2023-04-18 22:01:05,407] [INFO] [timer.py:199:stop] epoch=0/micro_step=5380/global_step=2690, RunningAvgSamplesPerSec=12.098801324932028, CurrSamplesPerSec=12.30902691927592, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:01:31,410] [INFO] [logging.py:96:log_dist] [Rank 0] step=2700, skipped=52, lr=[0.0007100487102849861], mom=[(0.9, 0.95)] [2023-04-18 22:01:31,411] [INFO] [timer.py:199:stop] epoch=0/micro_step=5400/global_step=2700, RunningAvgSamplesPerSec=12.099601105812914, CurrSamplesPerSec=12.315879485858842, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:01:59,869] [INFO] [logging.py:96:log_dist] [Rank 0] step=2710, skipped=52, lr=[0.0007080980919217304], mom=[(0.9, 0.95)] [2023-04-18 22:01:59,870] [INFO] [timer.py:199:stop] epoch=0/micro_step=5420/global_step=2710, RunningAvgSamplesPerSec=12.09624806633701, CurrSamplesPerSec=12.294493179289574, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:02:25,864] [INFO] [logging.py:96:log_dist] [Rank 0] step=2720, skipped=52, lr=[0.0007061436352620186], mom=[(0.9, 0.95)] [2023-04-18 22:02:25,864] [INFO] [timer.py:199:stop] epoch=0/micro_step=5440/global_step=2720, RunningAvgSamplesPerSec=12.097067518504637, CurrSamplesPerSec=12.335925189830512, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:02:51,900] [INFO] [logging.py:96:log_dist] [Rank 0] step=2730, skipped=52, lr=[0.000704185376355119], mom=[(0.9, 0.95)] [2023-04-18 22:02:51,900] [INFO] [timer.py:199:stop] epoch=0/micro_step=5460/global_step=2730, RunningAvgSamplesPerSec=12.097810881376436, CurrSamplesPerSec=12.282116443426133, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:03:17,896] [INFO] [logging.py:96:log_dist] [Rank 0] step=2740, skipped=52, lr=[0.0007022233513204321], mom=[(0.9, 0.95)] [2023-04-18 22:03:17,897] [INFO] [timer.py:199:stop] epoch=0/micro_step=5480/global_step=2740, RunningAvgSamplesPerSec=12.098614344849352, CurrSamplesPerSec=12.329702675697334, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:03:46,540] [INFO] [logging.py:96:log_dist] [Rank 0] step=2750, skipped=52, lr=[0.0007002575963468225], mom=[(0.9, 0.95)] [2023-04-18 22:03:46,541] [INFO] [timer.py:199:stop] epoch=0/micro_step=5500/global_step=2750, RunningAvgSamplesPerSec=12.095007210713083, CurrSamplesPerSec=12.273400990283255, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:03:56,868] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 22:03:59,418] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 22:04:12,437] [INFO] [logging.py:96:log_dist] [Rank 0] step=2760, skipped=54, lr=[0.0006986823311747652], mom=[(0.9, 0.95)] [2023-04-18 22:04:12,437] [INFO] [timer.py:199:stop] epoch=0/micro_step=5520/global_step=2760, RunningAvgSamplesPerSec=12.09598129202422, CurrSamplesPerSec=12.365019870320788, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:04:38,440] [INFO] [logging.py:96:log_dist] [Rank 0] step=2770, skipped=54, lr=[0.0006967099537262091], mom=[(0.9, 0.95)] [2023-04-18 22:04:38,440] [INFO] [timer.py:199:stop] epoch=0/micro_step=5540/global_step=2770, RunningAvgSamplesPerSec=12.096772373798894, CurrSamplesPerSec=12.245890550067125, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:05:04,459] [INFO] [logging.py:96:log_dist] [Rank 0] step=2780, skipped=54, lr=[0.000694733948031419], mom=[(0.9, 0.95)] [2023-04-18 22:05:04,460] [INFO] [timer.py:199:stop] epoch=0/micro_step=5560/global_step=2780, RunningAvgSamplesPerSec=12.097530877052161, CurrSamplesPerSec=12.247746671211058, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:05:33,049] [INFO] [logging.py:96:log_dist] [Rank 0] step=2790, skipped=54, lr=[0.0006927543505371281], mom=[(0.9, 0.95)] [2023-04-18 22:05:33,049] [INFO] [timer.py:199:stop] epoch=0/micro_step=5580/global_step=2790, RunningAvgSamplesPerSec=12.094069688678747, CurrSamplesPerSec=12.300075980968742, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:05:59,087] [INFO] [logging.py:96:log_dist] [Rank 0] step=2800, skipped=54, lr=[0.0006907711977563193], mom=[(0.9, 0.95)] [2023-04-18 22:05:59,088] [INFO] [timer.py:199:stop] epoch=0/micro_step=5600/global_step=2800, RunningAvgSamplesPerSec=12.094800887672694, CurrSamplesPerSec=12.263569418807345, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:06:25,122] [INFO] [logging.py:96:log_dist] [Rank 0] step=2810, skipped=54, lr=[0.0006887845262675514], mom=[(0.9, 0.95)] [2023-04-18 22:06:25,122] [INFO] [timer.py:199:stop] epoch=0/micro_step=5620/global_step=2810, RunningAvgSamplesPerSec=12.095533846042732, CurrSamplesPerSec=12.295460651981914, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:06:52,045] [INFO] [logging.py:96:log_dist] [Rank 0] step=2820, skipped=54, lr=[0.0006867943727142845], mom=[(0.9, 0.95)] [2023-04-18 22:06:52,046] [INFO] [timer.py:199:stop] epoch=0/micro_step=5640/global_step=2820, RunningAvgSamplesPerSec=12.094819299396498, CurrSamplesPerSec=12.330995163617676, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:07:19,609] [INFO] [logging.py:96:log_dist] [Rank 0] step=2830, skipped=54, lr=[0.0006848007738042039], mom=[(0.9, 0.95)] [2023-04-18 22:07:19,609] [INFO] [timer.py:199:stop] epoch=0/micro_step=5660/global_step=2830, RunningAvgSamplesPerSec=12.093076104941565, CurrSamplesPerSec=12.312269840501026, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:07:45,625] [INFO] [logging.py:96:log_dist] [Rank 0] step=2840, skipped=54, lr=[0.0006828037663085442], mom=[(0.9, 0.95)] [2023-04-18 22:07:45,626] [INFO] [timer.py:199:stop] epoch=0/micro_step=5680/global_step=2840, RunningAvgSamplesPerSec=12.093836445158745, CurrSamplesPerSec=12.28603006585797, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:08:11,673] [INFO] [logging.py:96:log_dist] [Rank 0] step=2850, skipped=54, lr=[0.0006808033870614091], mom=[(0.9, 0.95)] [2023-04-18 22:08:11,673] [INFO] [timer.py:199:stop] epoch=0/micro_step=5700/global_step=2850, RunningAvgSamplesPerSec=12.094540460223373, CurrSamplesPerSec=12.277152958517394, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:08:27,225] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 22:08:29,773] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 22:08:38,496] [INFO] [logging.py:96:log_dist] [Rank 0] step=2860, skipped=56, lr=[0.0006792006807948769], mom=[(0.9, 0.95)] [2023-04-18 22:08:38,497] [INFO] [timer.py:199:stop] epoch=0/micro_step=5720/global_step=2860, RunningAvgSamplesPerSec=12.093999327292194, CurrSamplesPerSec=12.27884669835508, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:09:05,386] [INFO] [logging.py:96:log_dist] [Rank 0] step=2870, skipped=56, lr=[0.0006771943254151074], mom=[(0.9, 0.95)] [2023-04-18 22:09:05,386] [INFO] [timer.py:199:stop] epoch=0/micro_step=5740/global_step=2870, RunningAvgSamplesPerSec=12.09335749176107, CurrSamplesPerSec=12.321705701056516, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:09:31,381] [INFO] [logging.py:96:log_dist] [Rank 0] step=2880, skipped=56, lr=[0.0006751847017480361], mom=[(0.9, 0.95)] [2023-04-18 22:09:31,382] [INFO] [timer.py:199:stop] epoch=0/micro_step=5760/global_step=2880, RunningAvgSamplesPerSec=12.094138950902172, CurrSamplesPerSec=12.307822550509739, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:09:57,374] [INFO] [logging.py:96:log_dist] [Rank 0] step=2890, skipped=56, lr=[0.0006731718468604684], mom=[(0.9, 0.95)] [2023-04-18 22:09:57,375] [INFO] [timer.py:199:stop] epoch=0/micro_step=5780/global_step=2890, RunningAvgSamplesPerSec=12.094919057881949, CurrSamplesPerSec=12.374180837396436, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:10:25,185] [INFO] [logging.py:96:log_dist] [Rank 0] step=2900, skipped=56, lr=[0.0006711557978788075], mom=[(0.9, 0.95)] [2023-04-18 22:10:25,185] [INFO] [timer.py:199:stop] epoch=0/micro_step=5800/global_step=2900, RunningAvgSamplesPerSec=12.092827073768476, CurrSamplesPerSec=9.12083179697291, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:10:51,172] [INFO] [logging.py:96:log_dist] [Rank 0] step=2910, skipped=56, lr=[0.0006691365919883716], mom=[(0.9, 0.95)] [2023-04-18 22:10:51,173] [INFO] [timer.py:199:stop] epoch=0/micro_step=5820/global_step=2910, RunningAvgSamplesPerSec=12.093614641746187, CurrSamplesPerSec=12.270814568415108, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:11:17,175] [INFO] [logging.py:96:log_dist] [Rank 0] step=2920, skipped=56, lr=[0.0006671142664327063], mom=[(0.9, 0.95)] [2023-04-18 22:11:17,176] [INFO] [timer.py:199:stop] epoch=0/micro_step=5840/global_step=2920, RunningAvgSamplesPerSec=12.094374319325368, CurrSamplesPerSec=12.28362718166888, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:11:43,222] [INFO] [logging.py:96:log_dist] [Rank 0] step=2930, skipped=56, lr=[0.0006650888585128983], mom=[(0.9, 0.95)] [2023-04-18 22:11:43,223] [INFO] [timer.py:199:stop] epoch=0/micro_step=5860/global_step=2930, RunningAvgSamplesPerSec=12.095058863027537, CurrSamplesPerSec=12.285213631966773, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:12:10,886] [INFO] [logging.py:96:log_dist] [Rank 0] step=2940, skipped=56, lr=[0.0006630604055868882], mom=[(0.9, 0.95)] [2023-04-18 22:12:10,887] [INFO] [timer.py:199:stop] epoch=0/micro_step=5880/global_step=2940, RunningAvgSamplesPerSec=12.09322302072152, CurrSamplesPerSec=12.281535402751873, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:12:36,904] [INFO] [logging.py:96:log_dist] [Rank 0] step=2950, skipped=56, lr=[0.0006610289450687796], mom=[(0.9, 0.95)] [2023-04-18 22:12:36,904] [INFO] [timer.py:199:stop] epoch=0/micro_step=5900/global_step=2950, RunningAvgSamplesPerSec=12.093952414310664, CurrSamplesPerSec=12.273780349017647, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:12:57,633] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 22:13:00,187] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 22:13:02,809] [INFO] [logging.py:96:log_dist] [Rank 0] step=2960, skipped=58, lr=[0.000659401636365692], mom=[(0.9, 0.95)] [2023-04-18 22:13:02,810] [INFO] [timer.py:199:stop] epoch=0/micro_step=5920/global_step=2960, RunningAvgSamplesPerSec=12.094849727248075, CurrSamplesPerSec=12.21531944574894, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:13:28,805] [INFO] [logging.py:96:log_dist] [Rank 0] step=2970, skipped=58, lr=[0.0006573648566419807], mom=[(0.9, 0.95)] [2023-04-18 22:13:28,805] [INFO] [timer.py:199:stop] epoch=0/micro_step=5940/global_step=2970, RunningAvgSamplesPerSec=12.095602614073696, CurrSamplesPerSec=12.297577459007382, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:13:56,297] [INFO] [logging.py:96:log_dist] [Rank 0] step=2980, skipped=58, lr=[0.0006553251743785773], mom=[(0.9, 0.95)] [2023-04-18 22:13:56,297] [INFO] [timer.py:199:stop] epoch=0/micro_step=5960/global_step=2980, RunningAvgSamplesPerSec=12.094053818509986, CurrSamplesPerSec=12.279664533135664, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:14:22,341] [INFO] [logging.py:96:log_dist] [Rank 0] step=2990, skipped=58, lr=[0.0006532826271967075], mom=[(0.9, 0.95)] [2023-04-18 22:14:22,342] [INFO] [timer.py:199:stop] epoch=0/micro_step=5980/global_step=2990, RunningAvgSamplesPerSec=12.094729377638641, CurrSamplesPerSec=12.2580879908524, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:14:48,368] [INFO] [logging.py:96:log_dist] [Rank 0] step=3000, skipped=58, lr=[0.0006512372527704386], mom=[(0.9, 0.95)] [2023-04-18 22:14:48,369] [INFO] [timer.py:199:stop] epoch=0/micro_step=6000/global_step=3000, RunningAvgSamplesPerSec=12.095427362410723, CurrSamplesPerSec=12.344998797395384, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:15:14,344] [INFO] [logging.py:96:log_dist] [Rank 0] step=3010, skipped=58, lr=[0.0006491890888259864], mom=[(0.9, 0.95)] [2023-04-18 22:15:14,344] [INFO] [timer.py:199:stop] epoch=0/micro_step=6020/global_step=3010, RunningAvgSamplesPerSec=12.096198692836987, CurrSamplesPerSec=12.35631949626754, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:15:42,137] [INFO] [logging.py:96:log_dist] [Rank 0] step=3020, skipped=58, lr=[0.0006471381731410174], mom=[(0.9, 0.95)] [2023-04-18 22:15:42,137] [INFO] [timer.py:199:stop] epoch=0/micro_step=6040/global_step=3020, RunningAvgSamplesPerSec=12.09421270076189, CurrSamplesPerSec=12.30894564236506, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:16:08,164] [INFO] [logging.py:96:log_dist] [Rank 0] step=3030, skipped=58, lr=[0.0006450845435439539], mom=[(0.9, 0.95)] [2023-04-18 22:16:08,164] [INFO] [timer.py:199:stop] epoch=0/micro_step=6060/global_step=3030, RunningAvgSamplesPerSec=12.094905219271437, CurrSamplesPerSec=12.364360339413055, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:16:34,170] [INFO] [logging.py:96:log_dist] [Rank 0] step=3040, skipped=58, lr=[0.0006430282379132743], mom=[(0.9, 0.95)] [2023-04-18 22:16:34,170] [INFO] [timer.py:199:stop] epoch=0/micro_step=6080/global_step=3040, RunningAvgSamplesPerSec=12.095624381760985, CurrSamplesPerSec=12.274448212053965, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:17:00,188] [INFO] [logging.py:96:log_dist] [Rank 0] step=3050, skipped=58, lr=[0.0006409692941768166], mom=[(0.9, 0.95)] [2023-04-18 22:17:00,188] [INFO] [timer.py:199:stop] epoch=0/micro_step=6100/global_step=3050, RunningAvgSamplesPerSec=12.096321380263937, CurrSamplesPerSec=12.303192385054825, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:17:27,905] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 22:17:27,906] [INFO] [logging.py:96:log_dist] [Rank 0] step=3060, skipped=59, lr=[0.0006391140206201444], mom=[(0.9, 0.95)] [2023-04-18 22:17:27,906] [INFO] [timer.py:199:stop] epoch=0/micro_step=6120/global_step=3060, RunningAvgSamplesPerSec=12.094472941606742, CurrSamplesPerSec=12.642375412753887, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:17:30,455] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 22:17:53,889] [INFO] [logging.py:96:log_dist] [Rank 0] step=3070, skipped=60, lr=[0.0006372566686762426], mom=[(0.9, 0.95)] [2023-04-18 22:17:53,889] [INFO] [timer.py:199:stop] epoch=0/micro_step=6140/global_step=3070, RunningAvgSamplesPerSec=12.095220652662178, CurrSamplesPerSec=12.32616529555425, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:18:19,958] [INFO] [logging.py:96:log_dist] [Rank 0] step=3080, skipped=60, lr=[0.0006351905404312025], mom=[(0.9, 0.95)] [2023-04-18 22:18:19,959] [INFO] [timer.py:199:stop] epoch=0/micro_step=6160/global_step=3080, RunningAvgSamplesPerSec=12.095835031604611, CurrSamplesPerSec=12.250009195420343, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:18:46,857] [INFO] [logging.py:96:log_dist] [Rank 0] step=3090, skipped=60, lr=[0.0006331219186439704], mom=[(0.9, 0.95)] [2023-04-18 22:18:46,857] [INFO] [timer.py:199:stop] epoch=0/micro_step=6180/global_step=3090, RunningAvgSamplesPerSec=12.095218671740566, CurrSamplesPerSec=12.303189001705169, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:19:13,749] [INFO] [logging.py:96:log_dist] [Rank 0] step=3100, skipped=60, lr=[0.000631050841469551], mom=[(0.9, 0.95)] [2023-04-18 22:19:13,750] [INFO] [timer.py:199:stop] epoch=0/micro_step=6200/global_step=3100, RunningAvgSamplesPerSec=12.094615204383699, CurrSamplesPerSec=12.25176479243989, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:19:39,800] [INFO] [logging.py:96:log_dist] [Rank 0] step=3110, skipped=60, lr=[0.0006289773471082381], mom=[(0.9, 0.95)] [2023-04-18 22:19:39,800] [INFO] [timer.py:199:stop] epoch=0/micro_step=6220/global_step=3110, RunningAvgSamplesPerSec=12.095253628358334, CurrSamplesPerSec=12.305531852861048, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:20:05,888] [INFO] [logging.py:96:log_dist] [Rank 0] step=3120, skipped=60, lr=[0.0006269014738049087], mom=[(0.9, 0.95)] [2023-04-18 22:20:05,888] [INFO] [timer.py:199:stop] epoch=0/micro_step=6240/global_step=3120, RunningAvgSamplesPerSec=12.095833854963692, CurrSamplesPerSec=12.328790960717686, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:20:32,783] [INFO] [logging.py:96:log_dist] [Rank 0] step=3130, skipped=60, lr=[0.0006248232598483196], mom=[(0.9, 0.95)] [2023-04-18 22:20:32,784] [INFO] [timer.py:199:stop] epoch=0/micro_step=6260/global_step=3130, RunningAvgSamplesPerSec=12.09522990132726, CurrSamplesPerSec=12.24402828963939, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:20:59,555] [INFO] [logging.py:96:log_dist] [Rank 0] step=3140, skipped=60, lr=[0.0006227427435703996], mom=[(0.9, 0.95)] [2023-04-18 22:20:59,556] [INFO] [timer.py:199:stop] epoch=0/micro_step=6280/global_step=3140, RunningAvgSamplesPerSec=12.09481001431695, CurrSamplesPerSec=12.355179781863225, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:21:25,580] [INFO] [logging.py:96:log_dist] [Rank 0] step=3150, skipped=60, lr=[0.0006206599633455432], mom=[(0.9, 0.95)] [2023-04-18 22:21:25,580] [INFO] [timer.py:199:stop] epoch=0/micro_step=6300/global_step=3150, RunningAvgSamplesPerSec=12.095477358482576, CurrSamplesPerSec=12.298572463614732, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:21:51,597] [INFO] [logging.py:96:log_dist] [Rank 0] step=3160, skipped=60, lr=[0.0006185749575899022], mom=[(0.9, 0.95)] [2023-04-18 22:21:51,598] [INFO] [timer.py:199:stop] epoch=0/micro_step=6320/global_step=3160, RunningAvgSamplesPerSec=12.096151492786587, CurrSamplesPerSec=12.34542574549578, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:21:56,732] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 22:21:59,292] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 22:22:19,285] [INFO] [logging.py:96:log_dist] [Rank 0] step=3170, skipped=62, lr=[0.000616905376445103], mom=[(0.9, 0.95)] [2023-04-18 22:22:19,285] [INFO] [timer.py:199:stop] epoch=0/micro_step=6340/global_step=3170, RunningAvgSamplesPerSec=12.094411494307558, CurrSamplesPerSec=9.182663942608132, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:22:45,308] [INFO] [logging.py:96:log_dist] [Rank 0] step=3180, skipped=62, lr=[0.0006148164616733297], mom=[(0.9, 0.95)] [2023-04-18 22:22:45,309] [INFO] [timer.py:199:stop] epoch=0/micro_step=6360/global_step=3180, RunningAvgSamplesPerSec=12.095076133651567, CurrSamplesPerSec=12.266994718211706, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:23:11,281] [INFO] [logging.py:96:log_dist] [Rank 0] step=3190, skipped=62, lr=[0.0006127254291521148], mom=[(0.9, 0.95)] [2023-04-18 22:23:11,282] [INFO] [timer.py:199:stop] epoch=0/micro_step=6380/global_step=3190, RunningAvgSamplesPerSec=12.095808518603299, CurrSamplesPerSec=12.364907095800632, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:23:37,298] [INFO] [logging.py:96:log_dist] [Rank 0] step=3200, skipped=62, lr=[0.0006106323174498212], mom=[(0.9, 0.95)] [2023-04-18 22:23:37,298] [INFO] [timer.py:199:stop] epoch=0/micro_step=6400/global_step=3200, RunningAvgSamplesPerSec=12.096474622730566, CurrSamplesPerSec=12.291697487247674, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:24:05,088] [INFO] [logging.py:96:log_dist] [Rank 0] step=3210, skipped=62, lr=[0.0006085371651731612], mom=[(0.9, 0.95)] [2023-04-18 22:24:05,088] [INFO] [timer.py:199:stop] epoch=0/micro_step=6420/global_step=3210, RunningAvgSamplesPerSec=12.094607910597492, CurrSamplesPerSec=12.31327626122165, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:24:31,100] [INFO] [logging.py:96:log_dist] [Rank 0] step=3220, skipped=62, lr=[0.0006064400109664854], mom=[(0.9, 0.95)] [2023-04-18 22:24:31,100] [INFO] [timer.py:199:stop] epoch=0/micro_step=6440/global_step=3220, RunningAvgSamplesPerSec=12.095279517210809, CurrSamplesPerSec=12.326039644932049, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:24:57,084] [INFO] [logging.py:96:log_dist] [Rank 0] step=3230, skipped=62, lr=[0.0006043408935110688], mom=[(0.9, 0.95)] [2023-04-18 22:24:57,084] [INFO] [timer.py:199:stop] epoch=0/micro_step=6460/global_step=3230, RunningAvgSamplesPerSec=12.09598678314621, CurrSamplesPerSec=12.31067526600204, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:25:23,143] [INFO] [logging.py:96:log_dist] [Rank 0] step=3240, skipped=62, lr=[0.000602239851524398], mom=[(0.9, 0.95)] [2023-04-18 22:25:23,144] [INFO] [timer.py:199:stop] epoch=0/micro_step=6480/global_step=3240, RunningAvgSamplesPerSec=12.096583636145708, CurrSamplesPerSec=12.271709874386106, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:25:50,772] [INFO] [logging.py:96:log_dist] [Rank 0] step=3250, skipped=62, lr=[0.0006001369237594575], mom=[(0.9, 0.95)] [2023-04-18 22:25:50,772] [INFO] [timer.py:199:stop] epoch=0/micro_step=6500/global_step=3250, RunningAvgSamplesPerSec=12.094967320624358, CurrSamplesPerSec=12.319777340717003, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:26:16,782] [INFO] [logging.py:96:log_dist] [Rank 0] step=3260, skipped=62, lr=[0.0005980321490040129], mom=[(0.9, 0.95)] [2023-04-18 22:26:16,782] [INFO] [timer.py:199:stop] epoch=0/micro_step=6520/global_step=3260, RunningAvgSamplesPerSec=12.095633050361704, CurrSamplesPerSec=12.342242513630078, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:26:27,089] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 22:26:29,643] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 22:26:42,648] [INFO] [logging.py:96:log_dist] [Rank 0] step=3270, skipped=64, lr=[0.0005963470254536404], mom=[(0.9, 0.95)] [2023-04-18 22:26:42,649] [INFO] [timer.py:199:stop] epoch=0/micro_step=6540/global_step=3270, RunningAvgSamplesPerSec=12.096494771224695, CurrSamplesPerSec=12.309983132414535, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:27:08,639] [INFO] [logging.py:96:log_dist] [Rank 0] step=3280, skipped=64, lr=[0.0005942390239687457], mom=[(0.9, 0.95)] [2023-04-18 22:27:08,639] [INFO] [timer.py:199:stop] epoch=0/micro_step=6560/global_step=3280, RunningAvgSamplesPerSec=12.097178488683229, CurrSamplesPerSec=12.337348263536176, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:27:36,106] [INFO] [logging.py:96:log_dist] [Rank 0] step=3290, skipped=64, lr=[0.0005921292842780444], mom=[(0.9, 0.95)] [2023-04-18 22:27:36,107] [INFO] [timer.py:199:stop] epoch=0/micro_step=6580/global_step=3290, RunningAvgSamplesPerSec=12.095804256494391, CurrSamplesPerSec=12.323985450803171, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:28:02,081] [INFO] [logging.py:96:log_dist] [Rank 0] step=3300, skipped=64, lr=[0.0005900178452949463], mom=[(0.9, 0.95)] [2023-04-18 22:28:02,082] [INFO] [timer.py:199:stop] epoch=0/micro_step=6600/global_step=3300, RunningAvgSamplesPerSec=12.0965071454309, CurrSamplesPerSec=12.316957705831896, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:28:28,097] [INFO] [logging.py:96:log_dist] [Rank 0] step=3310, skipped=64, lr=[0.0005879047459642041], mom=[(0.9, 0.95)] [2023-04-18 22:28:28,097] [INFO] [timer.py:199:stop] epoch=0/micro_step=6620/global_step=3310, RunningAvgSamplesPerSec=12.097150060732012, CurrSamplesPerSec=12.320245520525416, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:28:54,125] [INFO] [logging.py:96:log_dist] [Rank 0] step=3320, skipped=64, lr=[0.0005857900252611959], mom=[(0.9, 0.95)] [2023-04-18 22:28:54,125] [INFO] [timer.py:199:stop] epoch=0/micro_step=6640/global_step=3320, RunningAvgSamplesPerSec=12.097771130183727, CurrSamplesPerSec=12.30722327580879, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:29:21,902] [INFO] [logging.py:96:log_dist] [Rank 0] step=3330, skipped=64, lr=[0.0005836737221912041], mom=[(0.9, 0.95)] [2023-04-18 22:29:21,903] [INFO] [timer.py:199:stop] epoch=0/micro_step=6660/global_step=3330, RunningAvgSamplesPerSec=12.095985879158798, CurrSamplesPerSec=12.373958378469576, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:29:47,896] [INFO] [logging.py:96:log_dist] [Rank 0] step=3340, skipped=64, lr=[0.0005815558757886985], mom=[(0.9, 0.95)] [2023-04-18 22:29:47,896] [INFO] [timer.py:199:stop] epoch=0/micro_step=6680/global_step=3340, RunningAvgSamplesPerSec=12.096654227916636, CurrSamplesPerSec=12.318677145153773, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:30:13,875] [INFO] [logging.py:96:log_dist] [Rank 0] step=3350, skipped=64, lr=[0.000579436525116614], mom=[(0.9, 0.95)] [2023-04-18 22:30:13,875] [INFO] [timer.py:199:stop] epoch=0/micro_step=6700/global_step=3350, RunningAvgSamplesPerSec=12.097339238345093, CurrSamplesPerSec=12.314707672590954, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:30:40,731] [INFO] [logging.py:96:log_dist] [Rank 0] step=3360, skipped=64, lr=[0.0005773157092656323], mom=[(0.9, 0.95)] [2023-04-18 22:30:40,731] [INFO] [timer.py:199:stop] epoch=0/micro_step=6720/global_step=3360, RunningAvgSamplesPerSec=12.096825049820989, CurrSamplesPerSec=12.322867532623972, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:30:57,187] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 22:30:59,739] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 22:31:07,552] [INFO] [logging.py:96:log_dist] [Rank 0] step=3370, skipped=66, lr=[0.0005756180279422497], mom=[(0.9, 0.95)] [2023-04-18 22:31:07,552] [INFO] [timer.py:199:stop] epoch=0/micro_step=6740/global_step=3370, RunningAvgSamplesPerSec=12.096362595271396, CurrSamplesPerSec=12.342191440910414, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:31:33,539] [INFO] [logging.py:96:log_dist] [Rank 0] step=3380, skipped=66, lr=[0.0005734946733635717], mom=[(0.9, 0.95)] [2023-04-18 22:31:33,539] [INFO] [timer.py:199:stop] epoch=0/micro_step=6760/global_step=3380, RunningAvgSamplesPerSec=12.097030934588311, CurrSamplesPerSec=12.296673868401925, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:31:59,572] [INFO] [logging.py:96:log_dist] [Rank 0] step=3390, skipped=66, lr=[0.0005713699632013718], mom=[(0.9, 0.95)] [2023-04-18 22:31:59,572] [INFO] [timer.py:199:stop] epoch=0/micro_step=6780/global_step=3390, RunningAvgSamplesPerSec=12.097633748075753, CurrSamplesPerSec=12.28654629699727, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:32:26,474] [INFO] [logging.py:96:log_dist] [Rank 0] step=3400, skipped=66, lr=[0.0005692439366451853], mom=[(0.9, 0.95)] [2023-04-18 22:32:26,474] [INFO] [timer.py:199:stop] epoch=0/micro_step=6800/global_step=3400, RunningAvgSamplesPerSec=12.0970633129876, CurrSamplesPerSec=12.29404947673837, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:32:53,362] [INFO] [logging.py:96:log_dist] [Rank 0] step=3410, skipped=66, lr=[0.0005671166329088278], mom=[(0.9, 0.95)] [2023-04-18 22:32:53,363] [INFO] [timer.py:199:stop] epoch=0/micro_step=6820/global_step=3410, RunningAvgSamplesPerSec=12.096514371266295, CurrSamplesPerSec=12.321498698564422, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:33:19,364] [INFO] [logging.py:96:log_dist] [Rank 0] step=3420, skipped=66, lr=[0.0005649880912296718], mom=[(0.9, 0.95)] [2023-04-18 22:33:19,364] [INFO] [timer.py:199:stop] epoch=0/micro_step=6840/global_step=3420, RunningAvgSamplesPerSec=12.097154969288791, CurrSamplesPerSec=12.291907992476368, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:33:45,373] [INFO] [logging.py:96:log_dist] [Rank 0] step=3430, skipped=66, lr=[0.0005628583508679233], mom=[(0.9, 0.95)] [2023-04-18 22:33:45,373] [INFO] [timer.py:199:stop] epoch=0/micro_step=6860/global_step=3430, RunningAvgSamplesPerSec=12.097782204674772, CurrSamplesPerSec=12.266455465350424, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:34:13,040] [INFO] [logging.py:96:log_dist] [Rank 0] step=3440, skipped=66, lr=[0.0005607274511058983], mom=[(0.9, 0.95)] [2023-04-18 22:34:13,040] [INFO] [timer.py:199:stop] epoch=0/micro_step=6880/global_step=3440, RunningAvgSamplesPerSec=12.096200650373401, CurrSamplesPerSec=9.54321267148495, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:34:39,060] [INFO] [logging.py:96:log_dist] [Rank 0] step=3450, skipped=66, lr=[0.0005585954312472963], mom=[(0.9, 0.95)] [2023-04-18 22:34:39,061] [INFO] [timer.py:199:stop] epoch=0/micro_step=6900/global_step=3450, RunningAvgSamplesPerSec=12.096812883359435, CurrSamplesPerSec=12.260287141479642, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:35:05,116] [INFO] [logging.py:96:log_dist] [Rank 0] step=3460, skipped=66, lr=[0.0005564623306164776], mom=[(0.9, 0.95)] [2023-04-18 22:35:05,117] [INFO] [timer.py:199:stop] epoch=0/micro_step=6920/global_step=3460, RunningAvgSamplesPerSec=12.097373376740583, CurrSamplesPerSec=12.31668643775155, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:35:25,836] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 22:35:28,390] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 22:35:31,004] [INFO] [logging.py:96:log_dist] [Rank 0] step=3470, skipped=68, lr=[0.0005547550983945411], mom=[(0.9, 0.95)] [2023-04-18 22:35:31,004] [INFO] [timer.py:199:stop] epoch=0/micro_step=6940/global_step=3470, RunningAvgSamplesPerSec=12.098154840531942, CurrSamplesPerSec=12.256537638444282, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:35:58,769] [INFO] [logging.py:96:log_dist] [Rank 0] step=3480, skipped=68, lr=[0.0005526201515342868], mom=[(0.9, 0.95)] [2023-04-18 22:35:58,770] [INFO] [timer.py:199:stop] epoch=0/micro_step=6960/global_step=3480, RunningAvgSamplesPerSec=12.096460951137878, CurrSamplesPerSec=12.351131060491658, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:36:24,803] [INFO] [logging.py:96:log_dist] [Rank 0] step=3490, skipped=68, lr=[0.0005504842341137581], mom=[(0.9, 0.95)] [2023-04-18 22:36:24,804] [INFO] [timer.py:199:stop] epoch=0/micro_step=6980/global_step=3490, RunningAvgSamplesPerSec=12.097046499570448, CurrSamplesPerSec=12.312274358296001, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:36:50,801] [INFO] [logging.py:96:log_dist] [Rank 0] step=3500, skipped=68, lr=[0.0005483473855292043], mom=[(0.9, 0.95)] [2023-04-18 22:36:50,801] [INFO] [timer.py:199:stop] epoch=0/micro_step=7000/global_step=3500, RunningAvgSamplesPerSec=12.097677065365756, CurrSamplesPerSec=12.302059067014818, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:37:16,815] [INFO] [logging.py:96:log_dist] [Rank 0] step=3510, skipped=68, lr=[0.0005462096451940493], mom=[(0.9, 0.95)] [2023-04-18 22:37:16,815] [INFO] [timer.py:199:stop] epoch=0/micro_step=7020/global_step=3510, RunningAvgSamplesPerSec=12.098282550886656, CurrSamplesPerSec=12.326363398585245, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:37:44,432] [INFO] [logging.py:96:log_dist] [Rank 0] step=3520, skipped=68, lr=[0.0005440710525381657], mom=[(0.9, 0.95)] [2023-04-18 22:37:44,432] [INFO] [timer.py:199:stop] epoch=0/micro_step=7040/global_step=3520, RunningAvgSamplesPerSec=12.09680003467909, CurrSamplesPerSec=12.34261024968966, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:38:10,459] [INFO] [logging.py:96:log_dist] [Rank 0] step=3530, skipped=68, lr=[0.0005419316470071457], mom=[(0.9, 0.95)] [2023-04-18 22:38:10,459] [INFO] [timer.py:199:stop] epoch=0/micro_step=7060/global_step=3530, RunningAvgSamplesPerSec=12.097386913318063, CurrSamplesPerSec=12.316193664641675, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:38:36,457] [INFO] [logging.py:96:log_dist] [Rank 0] step=3540, skipped=68, lr=[0.0005397914680615759], mom=[(0.9, 0.95)] [2023-04-18 22:38:36,457] [INFO] [timer.py:199:stop] epoch=0/micro_step=7080/global_step=3540, RunningAvgSamplesPerSec=12.09800904706782, CurrSamplesPerSec=12.378179616863862, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:39:02,442] [INFO] [logging.py:96:log_dist] [Rank 0] step=3550, skipped=68, lr=[0.0005376505551763074], mom=[(0.9, 0.95)] [2023-04-18 22:39:02,442] [INFO] [timer.py:199:stop] epoch=0/micro_step=7100/global_step=3550, RunningAvgSamplesPerSec=12.098643288723585, CurrSamplesPerSec=12.310813024550683, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:39:30,053] [INFO] [logging.py:96:log_dist] [Rank 0] step=3560, skipped=68, lr=[0.0005355089478397293], mom=[(0.9, 0.95)] [2023-04-18 22:39:30,053] [INFO] [timer.py:199:stop] epoch=0/micro_step=7120/global_step=3560, RunningAvgSamplesPerSec=12.097184485256, CurrSamplesPerSec=12.33594333053682, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:39:55,994] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 22:39:55,994] [INFO] [logging.py:96:log_dist] [Rank 0] step=3570, skipped=69, lr=[0.0005335809401284257], mom=[(0.9, 0.95)] [2023-04-18 22:39:55,995] [INFO] [timer.py:199:stop] epoch=0/micro_step=7140/global_step=3570, RunningAvgSamplesPerSec=12.097874765766317, CurrSamplesPerSec=12.662920685771692, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:39:58,544] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 22:40:21,959] [INFO] [logging.py:96:log_dist] [Rank 0] step=3580, skipped=70, lr=[0.0005316524307121408], mom=[(0.9, 0.95)] [2023-04-18 22:40:21,959] [INFO] [timer.py:199:stop] epoch=0/micro_step=7160/global_step=3580, RunningAvgSamplesPerSec=12.098530801084237, CurrSamplesPerSec=12.310642520571054, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:40:47,954] [INFO] [logging.py:96:log_dist] [Rank 0] step=3590, skipped=70, lr=[0.0005295090890963614], mom=[(0.9, 0.95)] [2023-04-18 22:40:47,955] [INFO] [timer.py:199:stop] epoch=0/micro_step=7180/global_step=3590, RunningAvgSamplesPerSec=12.099143914511032, CurrSamplesPerSec=12.296460946806874, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:41:15,624] [INFO] [logging.py:96:log_dist] [Rank 0] step=3600, skipped=70, lr=[0.0005273652031957638], mom=[(0.9, 0.95)] [2023-04-18 22:41:15,624] [INFO] [timer.py:199:stop] epoch=0/micro_step=7200/global_step=3600, RunningAvgSamplesPerSec=12.097625605374962, CurrSamplesPerSec=12.308537016382864, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:41:41,645] [INFO] [logging.py:96:log_dist] [Rank 0] step=3610, skipped=70, lr=[0.0005252208125535731], mom=[(0.9, 0.95)] [2023-04-18 22:41:41,645] [INFO] [timer.py:199:stop] epoch=0/micro_step=7220/global_step=3610, RunningAvgSamplesPerSec=12.098204778809434, CurrSamplesPerSec=12.297589853317927, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:42:07,649] [INFO] [logging.py:96:log_dist] [Rank 0] step=3620, skipped=70, lr=[0.0005230759567223238], mom=[(0.9, 0.95)] [2023-04-18 22:42:07,649] [INFO] [timer.py:199:stop] epoch=0/micro_step=7240/global_step=3620, RunningAvgSamplesPerSec=12.098802910454896, CurrSamplesPerSec=12.302283458684865, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:42:34,541] [INFO] [logging.py:96:log_dist] [Rank 0] step=3630, skipped=70, lr=[0.0005209306752631311], mom=[(0.9, 0.95)] [2023-04-18 22:42:34,541] [INFO] [timer.py:199:stop] epoch=0/micro_step=7260/global_step=3630, RunningAvgSamplesPerSec=12.098278144600329, CurrSamplesPerSec=12.356754053529533, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:43:01,491] [INFO] [logging.py:96:log_dist] [Rank 0] step=3640, skipped=70, lr=[0.0005187850077449604], mom=[(0.9, 0.95)] [2023-04-18 22:43:01,492] [INFO] [timer.py:199:stop] epoch=0/micro_step=7280/global_step=3640, RunningAvgSamplesPerSec=12.097683139200788, CurrSamplesPerSec=12.297499713447463, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:43:27,512] [INFO] [logging.py:96:log_dist] [Rank 0] step=3650, skipped=70, lr=[0.0005166389937438979], mom=[(0.9, 0.95)] [2023-04-18 22:43:27,512] [INFO] [timer.py:199:stop] epoch=0/micro_step=7300/global_step=3650, RunningAvgSamplesPerSec=12.098256613013051, CurrSamplesPerSec=12.292028445235394, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:43:53,525] [INFO] [logging.py:96:log_dist] [Rank 0] step=3660, skipped=70, lr=[0.0005144926728424205], mom=[(0.9, 0.95)] [2023-04-18 22:43:53,525] [INFO] [timer.py:199:stop] epoch=0/micro_step=7320/global_step=3660, RunningAvgSamplesPerSec=12.098836591685597, CurrSamplesPerSec=12.326639621440304, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:44:20,417] [INFO] [logging.py:96:log_dist] [Rank 0] step=3670, skipped=70, lr=[0.0005123460846286661], mom=[(0.9, 0.95)] [2023-04-18 22:44:20,418] [INFO] [timer.py:199:stop] epoch=0/micro_step=7340/global_step=3670, RunningAvgSamplesPerSec=12.098317002198439, CurrSamplesPerSec=12.379555365182773, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:44:26,453] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 22:44:29,004] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 22:44:47,219] [INFO] [logging.py:96:log_dist] [Rank 0] step=3680, skipped=72, lr=[0.0005106286481992179], mom=[(0.9, 0.95)] [2023-04-18 22:44:47,220] [INFO] [timer.py:199:stop] epoch=0/micro_step=7360/global_step=3680, RunningAvgSamplesPerSec=12.097912611778044, CurrSamplesPerSec=12.3097731366341, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:45:13,233] [INFO] [logging.py:96:log_dist] [Rank 0] step=3690, skipped=72, lr=[0.0005084816786007435], mom=[(0.9, 0.95)] [2023-04-18 22:45:13,234] [INFO] [timer.py:199:stop] epoch=0/micro_step=7380/global_step=3690, RunningAvgSamplesPerSec=12.098488100064264, CurrSamplesPerSec=12.327778605440296, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:45:39,248] [INFO] [logging.py:96:log_dist] [Rank 0] step=3700, skipped=72, lr=[0.0005063345525606762], mom=[(0.9, 0.95)] [2023-04-18 22:45:39,248] [INFO] [timer.py:199:stop] epoch=0/micro_step=7400/global_step=3700, RunningAvgSamplesPerSec=12.099059571396815, CurrSamplesPerSec=12.37001362825867, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:46:06,185] [INFO] [logging.py:96:log_dist] [Rank 0] step=3710, skipped=72, lr=[0.000504187309682005], mom=[(0.9, 0.95)] [2023-04-18 22:46:06,186] [INFO] [timer.py:199:stop] epoch=0/micro_step=7420/global_step=3710, RunningAvgSamplesPerSec=12.098489148777748, CurrSamplesPerSec=12.326489055808201, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:46:33,035] [INFO] [logging.py:96:log_dist] [Rank 0] step=3720, skipped=72, lr=[0.0005020399895698721], mom=[(0.9, 0.95)] [2023-04-18 22:46:33,035] [INFO] [timer.py:199:stop] epoch=0/micro_step=7440/global_step=3720, RunningAvgSamplesPerSec=12.098030274794011, CurrSamplesPerSec=12.349221885561755, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:46:59,025] [INFO] [logging.py:96:log_dist] [Rank 0] step=3730, skipped=72, lr=[0.000499892631830846], mom=[(0.9, 0.95)] [2023-04-18 22:46:59,025] [INFO] [timer.py:199:stop] epoch=0/micro_step=7460/global_step=3730, RunningAvgSamplesPerSec=12.098628497379496, CurrSamplesPerSec=12.31094175253836, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:47:25,040] [INFO] [logging.py:96:log_dist] [Rank 0] step=3740, skipped=72, lr=[0.0004977452760721882], mom=[(0.9, 0.95)] [2023-04-18 22:47:25,041] [INFO] [timer.py:199:stop] epoch=0/micro_step=7480/global_step=3740, RunningAvgSamplesPerSec=12.099192843762442, CurrSamplesPerSec=12.291172944089784, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:47:52,655] [INFO] [logging.py:96:log_dist] [Rank 0] step=3750, skipped=72, lr=[0.0004955979619011238], mom=[(0.9, 0.95)] [2023-04-18 22:47:52,655] [INFO] [timer.py:199:stop] epoch=0/micro_step=7500/global_step=3750, RunningAvgSamplesPerSec=12.097802668440055, CurrSamplesPerSec=12.339169823296457, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:48:18,633] [INFO] [logging.py:96:log_dist] [Rank 0] step=3760, skipped=72, lr=[0.0004934507289241112], mom=[(0.9, 0.95)] [2023-04-18 22:48:18,633] [INFO] [timer.py:199:stop] epoch=0/micro_step=7520/global_step=3760, RunningAvgSamplesPerSec=12.098410896060843, CurrSamplesPerSec=12.343536496864907, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:48:44,682] [INFO] [logging.py:96:log_dist] [Rank 0] step=3770, skipped=72, lr=[0.0004913036167461109], mom=[(0.9, 0.95)] [2023-04-18 22:48:44,683] [INFO] [timer.py:199:stop] epoch=0/micro_step=7540/global_step=3770, RunningAvgSamplesPerSec=12.098930029025139, CurrSamplesPerSec=12.324475452557037, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:48:55,001] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 22:48:57,552] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 22:49:10,564] [INFO] [logging.py:96:log_dist] [Rank 0] step=3780, skipped=74, lr=[0.0004895860405921292], mom=[(0.9, 0.95)] [2023-04-18 22:49:10,564] [INFO] [timer.py:199:stop] epoch=0/micro_step=7560/global_step=3780, RunningAvgSamplesPerSec=12.099648578792081, CurrSamplesPerSec=12.355213901995775, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:49:38,189] [INFO] [logging.py:96:log_dist] [Rank 0] step=3790, skipped=74, lr=[0.0004874392456492731], mom=[(0.9, 0.95)] [2023-04-18 22:49:38,190] [INFO] [timer.py:199:stop] epoch=0/micro_step=7580/global_step=3790, RunningAvgSamplesPerSec=12.098258323439568, CurrSamplesPerSec=12.342662460906654, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:50:04,207] [INFO] [logging.py:96:log_dist] [Rank 0] step=3800, skipped=74, lr=[0.000485292682385134], mom=[(0.9, 0.95)] [2023-04-18 22:50:04,208] [INFO] [timer.py:199:stop] epoch=0/micro_step=7600/global_step=3800, RunningAvgSamplesPerSec=12.098810917727624, CurrSamplesPerSec=12.341370930494616, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:50:30,215] [INFO] [logging.py:96:log_dist] [Rank 0] step=3810, skipped=74, lr=[0.00048314639039231976], mom=[(0.9, 0.95)] [2023-04-18 22:50:30,215] [INFO] [timer.py:199:stop] epoch=0/micro_step=7620/global_step=3810, RunningAvgSamplesPerSec=12.099373550601717, CurrSamplesPerSec=12.302847292974874, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:50:56,215] [INFO] [logging.py:96:log_dist] [Rank 0] step=3820, skipped=74, lr=[0.00048100040925843447], mom=[(0.9, 0.95)] [2023-04-18 22:50:56,216] [INFO] [timer.py:199:stop] epoch=0/micro_step=7640/global_step=3820, RunningAvgSamplesPerSec=12.09994125763404, CurrSamplesPerSec=12.324784411335578, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:51:23,867] [INFO] [logging.py:96:log_dist] [Rank 0] step=3830, skipped=74, lr=[0.0004788547785653495], mom=[(0.9, 0.95)] [2023-04-18 22:51:23,867] [INFO] [timer.py:199:stop] epoch=0/micro_step=7660/global_step=3830, RunningAvgSamplesPerSec=12.098533261288942, CurrSamplesPerSec=12.288917693530154, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:51:49,867] [INFO] [logging.py:96:log_dist] [Rank 0] step=3840, skipped=74, lr=[0.0004767095378884715], mom=[(0.9, 0.95)] [2023-04-18 22:51:49,868] [INFO] [timer.py:199:stop] epoch=0/micro_step=7680/global_step=3840, RunningAvgSamplesPerSec=12.09910119162466, CurrSamplesPerSec=12.252535401224836, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:52:15,887] [INFO] [logging.py:96:log_dist] [Rank 0] step=3850, skipped=74, lr=[0.00047456472679601375], mom=[(0.9, 0.95)] [2023-04-18 22:52:15,888] [INFO] [timer.py:199:stop] epoch=0/micro_step=7700/global_step=3850, RunningAvgSamplesPerSec=12.09964227498864, CurrSamplesPerSec=12.307206347992471, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:52:41,855] [INFO] [logging.py:96:log_dist] [Rank 0] step=3860, skipped=74, lr=[0.00047242038484826614], mom=[(0.9, 0.95)] [2023-04-18 22:52:41,856] [INFO] [timer.py:199:stop] epoch=0/micro_step=7720/global_step=3860, RunningAvgSamplesPerSec=12.100242001842094, CurrSamplesPerSec=12.32561516853236, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:53:09,584] [INFO] [logging.py:96:log_dist] [Rank 0] step=3870, skipped=74, lr=[0.0004702765515968654], mom=[(0.9, 0.95)] [2023-04-18 22:53:09,585] [INFO] [timer.py:199:stop] epoch=0/micro_step=7740/global_step=3870, RunningAvgSamplesPerSec=12.098755791461898, CurrSamplesPerSec=12.305601802431696, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:53:25,112] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 22:53:27,658] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 22:53:35,456] [INFO] [logging.py:96:log_dist] [Rank 0] step=3880, skipped=76, lr=[0.00046856187782984255], mom=[(0.9, 0.95)] [2023-04-18 22:53:35,456] [INFO] [timer.py:199:stop] epoch=0/micro_step=7760/global_step=3880, RunningAvgSamplesPerSec=12.099468415264555, CurrSamplesPerSec=12.389263933080965, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:54:01,446] [INFO] [logging.py:96:log_dist] [Rank 0] step=3890, skipped=76, lr=[0.0004664190598715743], mom=[(0.9, 0.95)] [2023-04-18 22:54:01,447] [INFO] [timer.py:199:stop] epoch=0/micro_step=7780/global_step=3890, RunningAvgSamplesPerSec=12.100037099441314, CurrSamplesPerSec=12.377831446900952, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:54:28,297] [INFO] [logging.py:96:log_dist] [Rank 0] step=3900, skipped=76, lr=[0.0004642768613019911], mom=[(0.9, 0.95)] [2023-04-18 22:54:28,297] [INFO] [timer.py:199:stop] epoch=0/micro_step=7800/global_step=3900, RunningAvgSamplesPerSec=12.099593708273657, CurrSamplesPerSec=12.35815349009112, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:54:55,036] [INFO] [logging.py:96:log_dist] [Rank 0] step=3910, skipped=76, lr=[0.00046213532163319574], mom=[(0.9, 0.95)] [2023-04-18 22:54:55,036] [INFO] [timer.py:199:stop] epoch=0/micro_step=7820/global_step=3910, RunningAvgSamplesPerSec=12.09928370047854, CurrSamplesPerSec=12.344496944036873, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:55:21,037] [INFO] [logging.py:96:log_dist] [Rank 0] step=3920, skipped=76, lr=[0.00045999448036513724], mom=[(0.9, 0.95)] [2023-04-18 22:55:21,037] [INFO] [timer.py:199:stop] epoch=0/micro_step=7840/global_step=3920, RunningAvgSamplesPerSec=12.09983590490671, CurrSamplesPerSec=12.327656318742635, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:55:47,032] [INFO] [logging.py:96:log_dist] [Rank 0] step=3930, skipped=76, lr=[0.00045785437698488374], mom=[(0.9, 0.95)] [2023-04-18 22:55:47,032] [INFO] [timer.py:199:stop] epoch=0/micro_step=7860/global_step=3930, RunningAvgSamplesPerSec=12.10039268664839, CurrSamplesPerSec=12.30582293856327, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:56:13,853] [INFO] [logging.py:96:log_dist] [Rank 0] step=3940, skipped=76, lr=[0.00045571505096589273], mom=[(0.9, 0.95)] [2023-04-18 22:56:13,853] [INFO] [timer.py:199:stop] epoch=0/micro_step=7880/global_step=3940, RunningAvgSamplesPerSec=12.099987764544254, CurrSamplesPerSec=12.342278832265665, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:56:40,742] [INFO] [logging.py:96:log_dist] [Rank 0] step=3950, skipped=76, lr=[0.00045357654176728384], mom=[(0.9, 0.95)] [2023-04-18 22:56:40,743] [INFO] [timer.py:199:stop] epoch=0/micro_step=7900/global_step=3950, RunningAvgSamplesPerSec=12.09950515192551, CurrSamplesPerSec=12.297847886002463, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:57:06,737] [INFO] [logging.py:96:log_dist] [Rank 0] step=3960, skipped=76, lr=[0.0004514388888331106], mom=[(0.9, 0.95)] [2023-04-18 22:57:06,737] [INFO] [timer.py:199:stop] epoch=0/micro_step=7920/global_step=3960, RunningAvgSamplesPerSec=12.100059515117932, CurrSamplesPerSec=12.330298477831867, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:57:32,749] [INFO] [logging.py:96:log_dist] [Rank 0] step=3970, skipped=76, lr=[0.0004493021315916328], mom=[(0.9, 0.95)] [2023-04-18 22:57:32,750] [INFO] [timer.py:199:stop] epoch=0/micro_step=7940/global_step=3970, RunningAvgSamplesPerSec=12.100589756876019, CurrSamplesPerSec=12.32478554308119, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:57:54,337] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 22:57:56,882] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 22:57:59,508] [INFO] [logging.py:96:log_dist] [Rank 0] step=3980, skipped=78, lr=[0.000447593397182454], mom=[(0.9, 0.95)] [2023-04-18 22:57:59,508] [INFO] [timer.py:199:stop] epoch=0/micro_step=7960/global_step=3980, RunningAvgSamplesPerSec=12.10026012413572, CurrSamplesPerSec=12.198358001602118, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:58:26,426] [INFO] [logging.py:96:log_dist] [Rank 0] step=3990, skipped=78, lr=[0.0004454583514938455], mom=[(0.9, 0.95)] [2023-04-18 22:58:26,427] [INFO] [timer.py:199:stop] epoch=0/micro_step=7980/global_step=3990, RunningAvgSamplesPerSec=12.099748279503455, CurrSamplesPerSec=12.337747463489181, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:58:52,381] [INFO] [logging.py:96:log_dist] [Rank 0] step=4000, skipped=78, lr=[0.0004433243118068504], mom=[(0.9, 0.95)] [2023-04-18 22:58:52,382] [INFO] [timer.py:199:stop] epoch=0/micro_step=8000/global_step=4000, RunningAvgSamplesPerSec=12.100341074875406, CurrSamplesPerSec=12.294949302857612, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:59:18,354] [INFO] [logging.py:96:log_dist] [Rank 0] step=4010, skipped=78, lr=[0.0004411913174830832], mom=[(0.9, 0.95)] [2023-04-18 22:59:18,354] [INFO] [timer.py:199:stop] epoch=0/micro_step=8020/global_step=4010, RunningAvgSamplesPerSec=12.100911458859969, CurrSamplesPerSec=12.30799749104074, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 22:59:46,056] [INFO] [logging.py:96:log_dist] [Rank 0] step=4020, skipped=78, lr=[0.0004390594078648781], mom=[(0.9, 0.95)] [2023-04-18 22:59:46,057] [INFO] [timer.py:199:stop] epoch=0/micro_step=8040/global_step=4020, RunningAvgSamplesPerSec=12.099509446821976, CurrSamplesPerSec=9.170821780216325, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:00:12,012] [INFO] [logging.py:96:log_dist] [Rank 0] step=4030, skipped=78, lr=[0.00043692862227456125], mom=[(0.9, 0.95)] [2023-04-18 23:00:12,013] [INFO] [timer.py:199:stop] epoch=0/micro_step=8060/global_step=4030, RunningAvgSamplesPerSec=12.10009713763309, CurrSamplesPerSec=12.314103208097462, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:00:37,983] [INFO] [logging.py:96:log_dist] [Rank 0] step=4040, skipped=78, lr=[0.000434799000013727], mom=[(0.9, 0.95)] [2023-04-18 23:00:37,983] [INFO] [timer.py:199:stop] epoch=0/micro_step=8080/global_step=4040, RunningAvgSamplesPerSec=12.100665844947837, CurrSamplesPerSec=12.330270158936584, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:01:03,937] [INFO] [logging.py:96:log_dist] [Rank 0] step=4050, skipped=78, lr=[0.0004326705803625126], mom=[(0.9, 0.95)] [2023-04-18 23:01:03,937] [INFO] [timer.py:199:stop] epoch=0/micro_step=8100/global_step=4050, RunningAvgSamplesPerSec=12.10125061150007, CurrSamplesPerSec=12.400606422277638, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:01:31,385] [INFO] [logging.py:96:log_dist] [Rank 0] step=4060, skipped=78, lr=[0.0004305434025788734], mom=[(0.9, 0.95)] [2023-04-18 23:01:31,386] [INFO] [timer.py:199:stop] epoch=0/micro_step=8120/global_step=4060, RunningAvgSamplesPerSec=12.100148053830562, CurrSamplesPerSec=12.351443630469324, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:01:57,335] [INFO] [logging.py:96:log_dist] [Rank 0] step=4070, skipped=78, lr=[0.00042841750589785876], mom=[(0.9, 0.95)] [2023-04-18 23:01:57,335] [INFO] [timer.py:199:stop] epoch=0/micro_step=8140/global_step=4070, RunningAvgSamplesPerSec=12.100736257987558, CurrSamplesPerSec=12.338371265136525, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:02:23,245] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 23:02:23,245] [INFO] [logging.py:96:log_dist] [Rank 0] step=4080, skipped=79, lr=[0.0004265053266364285], mom=[(0.9, 0.95)] [2023-04-18 23:02:23,246] [INFO] [timer.py:199:stop] epoch=0/micro_step=8160/global_step=4080, RunningAvgSamplesPerSec=12.101365479687122, CurrSamplesPerSec=12.703197385114898, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:02:25,796] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 23:02:49,253] [INFO] [logging.py:96:log_dist] [Rank 0] step=4090, skipped=80, lr=[0.0004245942453979713], mom=[(0.9, 0.95)] [2023-04-18 23:02:49,253] [INFO] [timer.py:199:stop] epoch=0/micro_step=8180/global_step=4090, RunningAvgSamplesPerSec=12.101882582429491, CurrSamplesPerSec=12.263051755303481, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:03:16,847] [INFO] [logging.py:96:log_dist] [Rank 0] step=4100, skipped=80, lr=[0.0004224721443308654], mom=[(0.9, 0.95)] [2023-04-18 23:03:16,848] [INFO] [timer.py:199:stop] epoch=0/micro_step=8200/global_step=4100, RunningAvgSamplesPerSec=12.100625893570719, CurrSamplesPerSec=12.336836826347856, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:03:42,832] [INFO] [logging.py:96:log_dist] [Rank 0] step=4110, skipped=80, lr=[0.00042035147323791675], mom=[(0.9, 0.95)] [2023-04-18 23:03:42,832] [INFO] [timer.py:199:stop] epoch=0/micro_step=8220/global_step=4110, RunningAvgSamplesPerSec=12.10116786774111, CurrSamplesPerSec=12.333576419111248, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:04:08,830] [INFO] [logging.py:96:log_dist] [Rank 0] step=4120, skipped=80, lr=[0.00041823227123416186], mom=[(0.9, 0.95)] [2023-04-18 23:04:08,830] [INFO] [timer.py:199:stop] epoch=0/micro_step=8240/global_step=4120, RunningAvgSamplesPerSec=12.101692562658707, CurrSamplesPerSec=12.344033730997229, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:04:34,847] [INFO] [logging.py:96:log_dist] [Rank 0] step=4130, skipped=80, lr=[0.0004161145774075393], mom=[(0.9, 0.95)] [2023-04-18 23:04:34,847] [INFO] [timer.py:199:stop] epoch=0/micro_step=8260/global_step=4130, RunningAvgSamplesPerSec=12.102194012844178, CurrSamplesPerSec=12.312160284987947, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:05:02,606] [INFO] [logging.py:96:log_dist] [Rank 0] step=4140, skipped=80, lr=[0.00041399843081817085], mom=[(0.9, 0.95)] [2023-04-18 23:05:02,607] [INFO] [timer.py:199:stop] epoch=0/micro_step=8280/global_step=4140, RunningAvgSamplesPerSec=12.100765788838807, CurrSamplesPerSec=12.331024618735022, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:05:28,623] [INFO] [logging.py:96:log_dist] [Rank 0] step=4150, skipped=80, lr=[0.0004118838704976392], mom=[(0.9, 0.95)] [2023-04-18 23:05:28,624] [INFO] [timer.py:199:stop] epoch=0/micro_step=8300/global_step=4150, RunningAvgSamplesPerSec=12.101266952413106, CurrSamplesPerSec=12.282802074695745, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:05:54,574] [INFO] [logging.py:96:log_dist] [Rank 0] step=4160, skipped=80, lr=[0.00040977093544826925], mom=[(0.9, 0.95)] [2023-04-18 23:05:54,575] [INFO] [timer.py:199:stop] epoch=0/micro_step=8320/global_step=4160, RunningAvgSamplesPerSec=12.101838240282676, CurrSamplesPerSec=12.350151398193859, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:06:21,443] [INFO] [logging.py:96:log_dist] [Rank 0] step=4170, skipped=80, lr=[0.00040765966464240843], mom=[(0.9, 0.95)] [2023-04-18 23:06:21,444] [INFO] [timer.py:199:stop] epoch=0/micro_step=8340/global_step=4170, RunningAvgSamplesPerSec=12.101398892130533, CurrSamplesPerSec=12.29268140781519, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:06:48,320] [INFO] [logging.py:96:log_dist] [Rank 0] step=4180, skipped=80, lr=[0.00040555009702170774], mom=[(0.9, 0.95)] [2023-04-18 23:06:48,321] [INFO] [timer.py:199:stop] epoch=0/micro_step=8360/global_step=4180, RunningAvgSamplesPerSec=12.100952542552573, CurrSamplesPerSec=12.336509120595691, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:06:53,432] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 23:06:55,981] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 23:07:14,180] [INFO] [logging.py:96:log_dist] [Rank 0] step=4190, skipped=82, lr=[0.00040386369536721484], mom=[(0.9, 0.95)] [2023-04-18 23:07:14,180] [INFO] [timer.py:199:stop] epoch=0/micro_step=8380/global_step=4190, RunningAvgSamplesPerSec=12.10162068358969, CurrSamplesPerSec=12.328935920047032, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:07:40,176] [INFO] [logging.py:96:log_dist] [Rank 0] step=4200, skipped=82, lr=[0.0004017572915120285], mom=[(0.9, 0.95)] [2023-04-18 23:07:40,177] [INFO] [timer.py:199:stop] epoch=0/micro_step=8400/global_step=4200, RunningAvgSamplesPerSec=12.102136416567202, CurrSamplesPerSec=12.318477027927585, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:08:07,031] [INFO] [logging.py:96:log_dist] [Rank 0] step=4210, skipped=82, lr=[0.00039965269970920834], mom=[(0.9, 0.95)] [2023-04-18 23:08:07,031] [INFO] [timer.py:199:stop] epoch=0/micro_step=8420/global_step=4210, RunningAvgSamplesPerSec=12.101715951406938, CurrSamplesPerSec=12.295688182225513, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:08:33,715] [INFO] [logging.py:96:log_dist] [Rank 0] step=4220, skipped=82, lr=[0.00039754995877721357], mom=[(0.9, 0.95)] [2023-04-18 23:08:33,715] [INFO] [timer.py:199:stop] epoch=0/micro_step=8440/global_step=4220, RunningAvgSamplesPerSec=12.101483375032426, CurrSamplesPerSec=12.305666111609568, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:08:59,717] [INFO] [logging.py:96:log_dist] [Rank 0] step=4230, skipped=82, lr=[0.0003954491075003641], mom=[(0.9, 0.95)] [2023-04-18 23:08:59,717] [INFO] [timer.py:199:stop] epoch=0/micro_step=8460/global_step=4230, RunningAvgSamplesPerSec=12.101989788919916, CurrSamplesPerSec=12.35793046921339, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:09:25,718] [INFO] [logging.py:96:log_dist] [Rank 0] step=4240, skipped=82, lr=[0.00039335018462812664], mom=[(0.9, 0.95)] [2023-04-18 23:09:25,718] [INFO] [timer.py:199:stop] epoch=0/micro_step=8480/global_step=4240, RunningAvgSamplesPerSec=12.102494166597822, CurrSamplesPerSec=12.323694636779074, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:09:52,554] [INFO] [logging.py:96:log_dist] [Rank 0] step=4250, skipped=82, lr=[0.00039125322887439875], mom=[(0.9, 0.95)] [2023-04-18 23:09:52,555] [INFO] [timer.py:199:stop] epoch=0/micro_step=8500/global_step=4250, RunningAvgSamplesPerSec=12.102096391945592, CurrSamplesPerSec=12.350748041763204, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:10:19,411] [INFO] [logging.py:96:log_dist] [Rank 0] step=4260, skipped=82, lr=[0.000389158278916795], mom=[(0.9, 0.95)] [2023-04-18 23:10:19,411] [INFO] [timer.py:199:stop] epoch=0/micro_step=8520/global_step=4260, RunningAvgSamplesPerSec=12.101679574360627, CurrSamplesPerSec=12.353404657612442, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:10:45,342] [INFO] [logging.py:96:log_dist] [Rank 0] step=4270, skipped=82, lr=[0.00038706537339593437], mom=[(0.9, 0.95)] [2023-04-18 23:10:45,343] [INFO] [timer.py:199:stop] epoch=0/micro_step=8540/global_step=4270, RunningAvgSamplesPerSec=12.102255750576159, CurrSamplesPerSec=12.34070257318394, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:11:11,329] [INFO] [logging.py:96:log_dist] [Rank 0] step=4280, skipped=82, lr=[0.0003849745509147261], mom=[(0.9, 0.95)] [2023-04-18 23:11:11,330] [INFO] [timer.py:199:stop] epoch=0/micro_step=8560/global_step=4280, RunningAvgSamplesPerSec=12.102770514703176, CurrSamplesPerSec=12.29534238463692, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:11:21,639] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 23:11:25,053] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 23:11:38,985] [INFO] [logging.py:96:log_dist] [Rank 0] step=4290, skipped=84, lr=[0.00038330341863495503], mom=[(0.9, 0.95)] [2023-04-18 23:11:38,986] [INFO] [timer.py:199:stop] epoch=0/micro_step=8580/global_step=4290, RunningAvgSamplesPerSec=12.101501944593911, CurrSamplesPerSec=9.110895293323594, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:12:04,928] [INFO] [logging.py:96:log_dist] [Rank 0] step=4300, skipped=84, lr=[0.0003812164427813601], mom=[(0.9, 0.95)] [2023-04-18 23:12:04,928] [INFO] [timer.py:199:stop] epoch=0/micro_step=8600/global_step=4300, RunningAvgSamplesPerSec=12.102063061061854, CurrSamplesPerSec=12.373689156825106, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:12:30,896] [INFO] [logging.py:96:log_dist] [Rank 0] step=4310, skipped=84, lr=[0.00037913165784889553], mom=[(0.9, 0.95)] [2023-04-18 23:12:30,896] [INFO] [timer.py:199:stop] epoch=0/micro_step=8620/global_step=4310, RunningAvgSamplesPerSec=12.102594681673603, CurrSamplesPerSec=12.406229840806207, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:12:56,846] [INFO] [logging.py:96:log_dist] [Rank 0] step=4320, skipped=84, lr=[0.0003770491022906891], mom=[(0.9, 0.95)] [2023-04-18 23:12:56,846] [INFO] [timer.py:199:stop] epoch=0/micro_step=8640/global_step=4320, RunningAvgSamplesPerSec=12.103142979330963, CurrSamplesPerSec=12.327445719561384, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:13:24,390] [INFO] [logging.py:96:log_dist] [Rank 0] step=4330, skipped=84, lr=[0.0003749688145187497], mom=[(0.9, 0.95)] [2023-04-18 23:13:24,391] [INFO] [timer.py:199:stop] epoch=0/micro_step=8660/global_step=4330, RunningAvgSamplesPerSec=12.102002173964236, CurrSamplesPerSec=12.38016627263521, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:13:50,377] [INFO] [logging.py:96:log_dist] [Rank 0] step=4340, skipped=84, lr=[0.00037289083290325663], mom=[(0.9, 0.95)] [2023-04-18 23:13:50,378] [INFO] [timer.py:199:stop] epoch=0/micro_step=8680/global_step=4340, RunningAvgSamplesPerSec=12.102510515352076, CurrSamplesPerSec=12.325707984874258, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:14:16,346] [INFO] [logging.py:96:log_dist] [Rank 0] step=4350, skipped=84, lr=[0.0003708151957718533], mom=[(0.9, 0.95)] [2023-04-18 23:14:16,346] [INFO] [timer.py:199:stop] epoch=0/micro_step=8700/global_step=4350, RunningAvgSamplesPerSec=12.103035580547171, CurrSamplesPerSec=12.35044687161345, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:14:42,293] [INFO] [logging.py:96:log_dist] [Rank 0] step=4360, skipped=84, lr=[0.00036874194140894024], mom=[(0.9, 0.95)] [2023-04-18 23:14:42,294] [INFO] [timer.py:199:stop] epoch=0/micro_step=8720/global_step=4360, RunningAvgSamplesPerSec=12.103580269781101, CurrSamplesPerSec=12.341584275404237, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:15:09,721] [INFO] [logging.py:96:log_dist] [Rank 0] step=4370, skipped=84, lr=[0.0003666711080549682], mom=[(0.9, 0.95)] [2023-04-18 23:15:09,721] [INFO] [timer.py:199:stop] epoch=0/micro_step=8740/global_step=4370, RunningAvgSamplesPerSec=12.10257265291811, CurrSamplesPerSec=12.358008980747163, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:15:35,657] [INFO] [logging.py:96:log_dist] [Rank 0] step=4380, skipped=84, lr=[0.0003646027339057334], mom=[(0.9, 0.95)] [2023-04-18 23:15:35,658] [INFO] [timer.py:199:stop] epoch=0/micro_step=8760/global_step=4380, RunningAvgSamplesPerSec=12.103127370569874, CurrSamplesPerSec=12.362563218513616, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:15:51,142] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 23:15:53,689] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 23:16:01,490] [INFO] [logging.py:96:log_dist] [Rank 0] step=4390, skipped=86, lr=[0.0003629498308523923], mom=[(0.9, 0.95)] [2023-04-18 23:16:01,490] [INFO] [timer.py:199:stop] epoch=0/micro_step=8780/global_step=4390, RunningAvgSamplesPerSec=12.103787980348647, CurrSamplesPerSec=12.336490978225292, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:16:28,194] [INFO] [logging.py:96:log_dist] [Rank 0] step=4400, skipped=86, lr=[0.0003608859793798557], mom=[(0.9, 0.95)] [2023-04-18 23:16:28,194] [INFO] [timer.py:199:stop] epoch=0/micro_step=8800/global_step=4400, RunningAvgSamplesPerSec=12.103538989422026, CurrSamplesPerSec=9.661227101705176, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:16:55,042] [INFO] [logging.py:96:log_dist] [Rank 0] step=4410, skipped=86, lr=[0.0003588246938167298], mom=[(0.9, 0.95)] [2023-04-18 23:16:55,042] [INFO] [timer.py:199:stop] epoch=0/micro_step=8820/global_step=4410, RunningAvgSamplesPerSec=12.103141070693022, CurrSamplesPerSec=12.325610640941685, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:17:20,984] [INFO] [logging.py:96:log_dist] [Rank 0] step=4420, skipped=86, lr=[0.0003567660121827048], mom=[(0.9, 0.95)] [2023-04-18 23:17:20,984] [INFO] [timer.py:199:stop] epoch=0/micro_step=8840/global_step=4420, RunningAvgSamplesPerSec=12.103684345870247, CurrSamplesPerSec=12.3769411341166, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:17:46,972] [INFO] [logging.py:96:log_dist] [Rank 0] step=4430, skipped=86, lr=[0.00035470997244944327], mom=[(0.9, 0.95)] [2023-04-18 23:17:46,973] [INFO] [timer.py:199:stop] epoch=0/micro_step=8860/global_step=4430, RunningAvgSamplesPerSec=12.104176973393038, CurrSamplesPerSec=12.322893554729282, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:18:13,766] [INFO] [logging.py:96:log_dist] [Rank 0] step=4440, skipped=86, lr=[0.00035265661253987794], mom=[(0.9, 0.95)] [2023-04-18 23:18:13,766] [INFO] [timer.py:199:stop] epoch=0/micro_step=8880/global_step=4440, RunningAvgSamplesPerSec=12.103836729951428, CurrSamplesPerSec=12.368944337610253, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:18:40,645] [INFO] [logging.py:96:log_dist] [Rank 0] step=4450, skipped=86, lr=[0.00035060597032751363], mom=[(0.9, 0.95)] [2023-04-18 23:18:40,645] [INFO] [timer.py:199:stop] epoch=0/micro_step=8900/global_step=4450, RunningAvgSamplesPerSec=12.103410392519764, CurrSamplesPerSec=12.270838127424874, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:19:06,584] [INFO] [logging.py:96:log_dist] [Rank 0] step=4460, skipped=86, lr=[0.0003485580836357282], mom=[(0.9, 0.95)] [2023-04-18 23:19:06,585] [INFO] [timer.py:199:stop] epoch=0/micro_step=8920/global_step=4460, RunningAvgSamplesPerSec=12.10395055415733, CurrSamplesPerSec=12.345214538755128, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:19:32,567] [INFO] [logging.py:96:log_dist] [Rank 0] step=4470, skipped=86, lr=[0.00034651299023707457], mom=[(0.9, 0.95)] [2023-04-18 23:19:32,568] [INFO] [timer.py:199:stop] epoch=0/micro_step=8940/global_step=4470, RunningAvgSamplesPerSec=12.104443829238383, CurrSamplesPerSec=12.385029418403125, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:19:59,404] [INFO] [logging.py:96:log_dist] [Rank 0] step=4480, skipped=86, lr=[0.0003444707278525847], mom=[(0.9, 0.95)] [2023-04-18 23:19:59,405] [INFO] [timer.py:199:stop] epoch=0/micro_step=8960/global_step=4480, RunningAvgSamplesPerSec=12.104061963069965, CurrSamplesPerSec=12.397256116203291, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:20:20,975] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 23:20:23,524] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 23:20:26,150] [INFO] [logging.py:96:log_dist] [Rank 0] step=4490, skipped=88, lr=[0.0003428389815904065], mom=[(0.9, 0.95)] [2023-04-18 23:20:26,150] [INFO] [timer.py:199:stop] epoch=0/micro_step=8980/global_step=4490, RunningAvgSamplesPerSec=12.103774756469827, CurrSamplesPerSec=12.1993935648064, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:20:52,117] [INFO] [logging.py:96:log_dist] [Rank 0] step=4500, skipped=88, lr=[0.0003408019099212911], mom=[(0.9, 0.95)] [2023-04-18 23:20:52,118] [INFO] [timer.py:199:stop] epoch=0/micro_step=9000/global_step=4500, RunningAvgSamplesPerSec=12.104280887455921, CurrSamplesPerSec=12.369958905272362, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:21:18,123] [INFO] [logging.py:96:log_dist] [Rank 0] step=4510, skipped=88, lr=[0.00033876777460521647], mom=[(0.9, 0.95)] [2023-04-18 23:21:18,123] [INFO] [timer.py:199:stop] epoch=0/micro_step=9020/global_step=4510, RunningAvgSamplesPerSec=12.104746392924898, CurrSamplesPerSec=12.30146824569038, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:21:44,964] [INFO] [logging.py:96:log_dist] [Rank 0] step=4520, skipped=88, lr=[0.0003367366131610966], mom=[(0.9, 0.95)] [2023-04-18 23:21:44,964] [INFO] [timer.py:199:stop] epoch=0/micro_step=9040/global_step=4520, RunningAvgSamplesPerSec=12.104362768806006, CurrSamplesPerSec=12.358822601024634, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:22:11,685] [INFO] [logging.py:96:log_dist] [Rank 0] step=4530, skipped=88, lr=[0.0003347084630529934], mom=[(0.9, 0.95)] [2023-04-18 23:22:11,686] [INFO] [timer.py:199:stop] epoch=0/micro_step=9060/global_step=4530, RunningAvgSamplesPerSec=12.104101687222814, CurrSamplesPerSec=12.367484336281848, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:22:37,632] [INFO] [logging.py:96:log_dist] [Rank 0] step=4540, skipped=88, lr=[0.00033268336168942506], mom=[(0.9, 0.95)] [2023-04-18 23:22:37,632] [INFO] [timer.py:199:stop] epoch=0/micro_step=9080/global_step=4540, RunningAvgSamplesPerSec=12.104623772684874, CurrSamplesPerSec=12.333732824880634, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:23:03,638] [INFO] [logging.py:96:log_dist] [Rank 0] step=4550, skipped=88, lr=[0.0003306613464226778], mom=[(0.9, 0.95)] [2023-04-18 23:23:03,638] [INFO] [timer.py:199:stop] epoch=0/micro_step=9100/global_step=4550, RunningAvgSamplesPerSec=12.10508384720868, CurrSamplesPerSec=12.315296376358598, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:23:30,474] [INFO] [logging.py:96:log_dist] [Rank 0] step=4560, skipped=88, lr=[0.00032864245454811527], mom=[(0.9, 0.95)] [2023-04-18 23:23:30,474] [INFO] [timer.py:199:stop] epoch=0/micro_step=9120/global_step=4560, RunningAvgSamplesPerSec=12.104708358346487, CurrSamplesPerSec=12.28767675850324, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:23:57,326] [INFO] [logging.py:96:log_dist] [Rank 0] step=4570, skipped=88, lr=[0.0003266267233034911], mom=[(0.9, 0.95)] [2023-04-18 23:23:57,326] [INFO] [timer.py:199:stop] epoch=0/micro_step=9140/global_step=4570, RunningAvgSamplesPerSec=12.104318246444391, CurrSamplesPerSec=12.335934260176996, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:24:23,286] [INFO] [logging.py:96:log_dist] [Rank 0] step=4580, skipped=88, lr=[0.0003246141898682628], mom=[(0.9, 0.95)] [2023-04-18 23:24:23,286] [INFO] [timer.py:199:stop] epoch=0/micro_step=9160/global_step=4580, RunningAvgSamplesPerSec=12.10482194760068, CurrSamplesPerSec=12.373260251625227, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:24:49,173] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 23:24:49,173] [INFO] [logging.py:96:log_dist] [Rank 0] step=4590, skipped=89, lr=[0.00032280567458489273], mom=[(0.9, 0.95)] [2023-04-18 23:24:49,174] [INFO] [timer.py:199:stop] epoch=0/micro_step=9180/global_step=4590, RunningAvgSamplesPerSec=12.105395969371491, CurrSamplesPerSec=12.664307881310913, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:24:51,716] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 23:25:16,715] [INFO] [logging.py:96:log_dist] [Rank 0] step=4600, skipped=90, lr=[0.00032099980661501017], mom=[(0.9, 0.95)] [2023-04-18 23:25:16,715] [INFO] [timer.py:199:stop] epoch=0/micro_step=9200/global_step=4600, RunningAvgSamplesPerSec=12.104319819880402, CurrSamplesPerSec=12.382762445292856, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:25:42,714] [INFO] [logging.py:96:log_dist] [Rank 0] step=4610, skipped=90, lr=[0.00031899642433612103], mom=[(0.9, 0.95)] [2023-04-18 23:25:42,714] [INFO] [timer.py:199:stop] epoch=0/micro_step=9220/global_step=4610, RunningAvgSamplesPerSec=12.104781026713805, CurrSamplesPerSec=12.279253355329384, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:26:08,735] [INFO] [logging.py:96:log_dist] [Rank 0] step=4620, skipped=90, lr=[0.00031699638060482115], mom=[(0.9, 0.95)] [2023-04-18 23:26:08,735] [INFO] [timer.py:199:stop] epoch=0/micro_step=9240/global_step=4620, RunningAvgSamplesPerSec=12.105218979383226, CurrSamplesPerSec=12.295858272214867, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:26:34,741] [INFO] [logging.py:96:log_dist] [Rank 0] step=4630, skipped=90, lr=[0.00031499971231121674], mom=[(0.9, 0.95)] [2023-04-18 23:26:34,742] [INFO] [timer.py:199:stop] epoch=0/micro_step=9260/global_step=4630, RunningAvgSamplesPerSec=12.105669238682193, CurrSamplesPerSec=12.29578167481605, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:27:02,388] [INFO] [logging.py:96:log_dist] [Rank 0] step=4640, skipped=90, lr=[0.00031300645628315525], mom=[(0.9, 0.95)] [2023-04-18 23:27:02,389] [INFO] [timer.py:199:stop] epoch=0/micro_step=9280/global_step=4640, RunningAvgSamplesPerSec=12.104498222733152, CurrSamplesPerSec=12.257882000865061, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:27:28,419] [INFO] [logging.py:96:log_dist] [Rank 0] step=4650, skipped=90, lr=[0.0003110166492855468], mom=[(0.9, 0.95)] [2023-04-18 23:27:28,419] [INFO] [timer.py:199:stop] epoch=0/micro_step=9300/global_step=4650, RunningAvgSamplesPerSec=12.104924497547914, CurrSamplesPerSec=12.338561820762028, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:27:54,438] [INFO] [logging.py:96:log_dist] [Rank 0] step=4660, skipped=90, lr=[0.00030903032801968467], mom=[(0.9, 0.95)] [2023-04-18 23:27:54,439] [INFO] [timer.py:199:stop] epoch=0/micro_step=9320/global_step=4660, RunningAvgSamplesPerSec=12.105359653919976, CurrSamplesPerSec=12.327245317344687, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:28:20,444] [INFO] [logging.py:96:log_dist] [Rank 0] step=4670, skipped=90, lr=[0.0003070475291225692], mom=[(0.9, 0.95)] [2023-04-18 23:28:20,444] [INFO] [timer.py:199:stop] epoch=0/micro_step=9340/global_step=4670, RunningAvgSamplesPerSec=12.10580741379736, CurrSamplesPerSec=12.374762690869563, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:28:48,083] [INFO] [logging.py:96:log_dist] [Rank 0] step=4680, skipped=90, lr=[0.00030506828916623195], mom=[(0.9, 0.95)] [2023-04-18 23:28:48,083] [INFO] [timer.py:199:stop] epoch=0/micro_step=9360/global_step=4680, RunningAvgSamplesPerSec=12.104654210506173, CurrSamplesPerSec=12.329660767828797, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:29:14,120] [INFO] [logging.py:96:log_dist] [Rank 0] step=4690, skipped=90, lr=[0.0003030926446570611], mom=[(0.9, 0.95)] [2023-04-18 23:29:14,121] [INFO] [timer.py:199:stop] epoch=0/micro_step=9380/global_step=4690, RunningAvgSamplesPerSec=12.105069106519487, CurrSamplesPerSec=12.33782685292302, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:29:19,244] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 23:29:21,793] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 23:29:40,035] [INFO] [logging.py:96:log_dist] [Rank 0] step=4700, skipped=92, lr=[0.00030151474226165786], mom=[(0.9, 0.95)] [2023-04-18 23:29:40,035] [INFO] [timer.py:199:stop] epoch=0/micro_step=9400/global_step=4700, RunningAvgSamplesPerSec=12.105602884857616, CurrSamplesPerSec=12.340539182748623, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:30:06,914] [INFO] [logging.py:96:log_dist] [Rank 0] step=4710, skipped=92, lr=[0.0002995456613413722], mom=[(0.9, 0.95)] [2023-04-18 23:30:06,915] [INFO] [timer.py:199:stop] epoch=0/micro_step=9420/global_step=4710, RunningAvgSamplesPerSec=12.105195625737313, CurrSamplesPerSec=12.36776012589491, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:30:33,811] [INFO] [logging.py:96:log_dist] [Rank 0] step=4720, skipped=92, lr=[0.00029758027773119024], mom=[(0.9, 0.95)] [2023-04-18 23:30:33,812] [INFO] [timer.py:199:stop] epoch=0/micro_step=9440/global_step=4720, RunningAvgSamplesPerSec=12.104773559609496, CurrSamplesPerSec=12.33407965182423, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:30:59,832] [INFO] [logging.py:96:log_dist] [Rank 0] step=4730, skipped=92, lr=[0.00029561862768192384], mom=[(0.9, 0.95)] [2023-04-18 23:30:59,833] [INFO] [timer.py:199:stop] epoch=0/micro_step=9460/global_step=4730, RunningAvgSamplesPerSec=12.105201319919848, CurrSamplesPerSec=12.324810441537243, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:31:25,851] [INFO] [logging.py:96:log_dist] [Rank 0] step=4740, skipped=92, lr=[0.00029366074737552215], mom=[(0.9, 0.95)] [2023-04-18 23:31:25,852] [INFO] [timer.py:199:stop] epoch=0/micro_step=9480/global_step=4740, RunningAvgSamplesPerSec=12.105628799059076, CurrSamplesPerSec=12.303920976358599, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:31:52,753] [INFO] [logging.py:96:log_dist] [Rank 0] step=4750, skipped=92, lr=[0.0002917066729244018], mom=[(0.9, 0.95)] [2023-04-18 23:31:52,754] [INFO] [timer.py:199:stop] epoch=0/micro_step=9500/global_step=4750, RunningAvgSamplesPerSec=12.105203357151808, CurrSamplesPerSec=12.300455863234065, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:32:19,649] [INFO] [logging.py:96:log_dist] [Rank 0] step=4760, skipped=92, lr=[0.0002897564403707814], mom=[(0.9, 0.95)] [2023-04-18 23:32:19,650] [INFO] [timer.py:199:stop] epoch=0/micro_step=9520/global_step=4760, RunningAvgSamplesPerSec=12.104785087163876, CurrSamplesPerSec=12.321041734081495, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:32:45,639] [INFO] [logging.py:96:log_dist] [Rank 0] step=4770, skipped=92, lr=[0.0002878100856860187], mom=[(0.9, 0.95)] [2023-04-18 23:32:45,639] [INFO] [timer.py:199:stop] epoch=0/micro_step=9540/global_step=4770, RunningAvgSamplesPerSec=12.105239018949531, CurrSamplesPerSec=12.29771943169684, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:33:11,659] [INFO] [logging.py:96:log_dist] [Rank 0] step=4780, skipped=92, lr=[0.0002858676447699439], mom=[(0.9, 0.95)] [2023-04-18 23:33:11,660] [INFO] [timer.py:199:stop] epoch=0/micro_step=9560/global_step=4780, RunningAvgSamplesPerSec=12.105661366054015, CurrSamplesPerSec=12.33988112618997, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:33:38,565] [INFO] [logging.py:96:log_dist] [Rank 0] step=4790, skipped=92, lr=[0.00028392915345019963], mom=[(0.9, 0.95)] [2023-04-18 23:33:38,565] [INFO] [timer.py:199:stop] epoch=0/micro_step=9580/global_step=4790, RunningAvgSamplesPerSec=12.105236078808176, CurrSamplesPerSec=12.285315961396252, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:33:49,792] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 23:33:52,342] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 23:34:05,392] [INFO] [logging.py:96:log_dist] [Rank 0] step=4800, skipped=94, lr=[0.00028238122813344823], mom=[(0.9, 0.95)] [2023-04-18 23:34:05,392] [INFO] [timer.py:199:stop] epoch=0/micro_step=9600/global_step=4800, RunningAvgSamplesPerSec=12.104886890426476, CurrSamplesPerSec=12.2449096368395, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:34:31,417] [INFO] [logging.py:96:log_dist] [Rank 0] step=4810, skipped=94, lr=[0.00028044993613980424], mom=[(0.9, 0.95)] [2023-04-18 23:34:31,418] [INFO] [timer.py:199:stop] epoch=0/micro_step=9620/global_step=4810, RunningAvgSamplesPerSec=12.10530271543613, CurrSamplesPerSec=12.297856900440465, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:34:57,450] [INFO] [logging.py:96:log_dist] [Rank 0] step=4820, skipped=94, lr=[0.0002785226936702298], mom=[(0.9, 0.95)] [2023-04-18 23:34:57,451] [INFO] [timer.py:199:stop] epoch=0/micro_step=9640/global_step=4820, RunningAvgSamplesPerSec=12.105709698414818, CurrSamplesPerSec=12.351007173496583, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:35:24,271] [INFO] [logging.py:96:log_dist] [Rank 0] step=4830, skipped=94, lr=[0.00027659953627203783], mom=[(0.9, 0.95)] [2023-04-18 23:35:24,272] [INFO] [timer.py:199:stop] epoch=0/micro_step=9660/global_step=4830, RunningAvgSamplesPerSec=12.105367654396337, CurrSamplesPerSec=12.325106967246912, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:35:51,000] [INFO] [logging.py:96:log_dist] [Rank 0] step=4840, skipped=94, lr=[0.00027468049941719344], mom=[(0.9, 0.95)] [2023-04-18 23:35:51,001] [INFO] [timer.py:199:stop] epoch=0/micro_step=9680/global_step=4840, RunningAvgSamplesPerSec=12.105114529220602, CurrSamplesPerSec=12.269951922219876, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:36:17,000] [INFO] [logging.py:96:log_dist] [Rank 0] step=4850, skipped=94, lr=[0.0002727656185016595], mom=[(0.9, 0.95)] [2023-04-18 23:36:17,000] [INFO] [timer.py:199:stop] epoch=0/micro_step=9700/global_step=4850, RunningAvgSamplesPerSec=12.105550612542913, CurrSamplesPerSec=12.350937843327054, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:36:43,034] [INFO] [logging.py:96:log_dist] [Rank 0] step=4860, skipped=94, lr=[0.00027085492884474375], mom=[(0.9, 0.95)] [2023-04-18 23:36:43,034] [INFO] [timer.py:199:stop] epoch=0/micro_step=9720/global_step=4860, RunningAvgSamplesPerSec=12.105953470744124, CurrSamplesPerSec=12.262649531801339, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:37:10,642] [INFO] [logging.py:96:log_dist] [Rank 0] step=4870, skipped=94, lr=[0.00026894846568844877], mom=[(0.9, 0.95)] [2023-04-18 23:37:10,643] [INFO] [timer.py:199:stop] epoch=0/micro_step=9740/global_step=4870, RunningAvgSamplesPerSec=12.104872571401733, CurrSamplesPerSec=9.171481035445074, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:37:36,677] [INFO] [logging.py:96:log_dist] [Rank 0] step=4880, skipped=94, lr=[0.00026704626419681954], mom=[(0.9, 0.95)] [2023-04-18 23:37:36,677] [INFO] [timer.py:199:stop] epoch=0/micro_step=9760/global_step=4880, RunningAvgSamplesPerSec=12.105274130915166, CurrSamplesPerSec=12.336024964375472, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:38:02,665] [INFO] [logging.py:96:log_dist] [Rank 0] step=4890, skipped=94, lr=[0.000265148359455297], mom=[(0.9, 0.95)] [2023-04-18 23:38:02,665] [INFO] [timer.py:199:stop] epoch=0/micro_step=9780/global_step=4890, RunningAvgSamplesPerSec=12.105717531636026, CurrSamplesPerSec=12.369936104170966, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:38:18,201] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 23:38:20,754] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 23:38:28,587] [INFO] [logging.py:96:log_dist] [Rank 0] step=4900, skipped=96, lr=[0.00026363315284899365], mom=[(0.9, 0.95)] [2023-04-18 23:38:28,587] [INFO] [timer.py:199:stop] epoch=0/micro_step=9800/global_step=4900, RunningAvgSamplesPerSec=12.106221257814404, CurrSamplesPerSec=12.374941821483603, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:38:56,248] [INFO] [logging.py:96:log_dist] [Rank 0] step=4910, skipped=96, lr=[0.00026174307041958845], mom=[(0.9, 0.95)] [2023-04-18 23:38:56,248] [INFO] [timer.py:199:stop] epoch=0/micro_step=9820/global_step=4910, RunningAvgSamplesPerSec=12.105100079961614, CurrSamplesPerSec=12.311680296703493, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:39:22,246] [INFO] [logging.py:96:log_dist] [Rank 0] step=4920, skipped=96, lr=[0.00025985738255584237], mom=[(0.9, 0.95)] [2023-04-18 23:39:22,246] [INFO] [timer.py:199:stop] epoch=0/micro_step=9840/global_step=4920, RunningAvgSamplesPerSec=12.10553185843132, CurrSamplesPerSec=12.292505776577224, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:39:48,256] [INFO] [logging.py:96:log_dist] [Rank 0] step=4930, skipped=96, lr=[0.000257976124038608], mom=[(0.9, 0.95)] [2023-04-18 23:39:48,256] [INFO] [timer.py:199:stop] epoch=0/micro_step=9860/global_step=4930, RunningAvgSamplesPerSec=12.10595030184326, CurrSamplesPerSec=12.30326230803098, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:40:14,286] [INFO] [logging.py:96:log_dist] [Rank 0] step=4940, skipped=96, lr=[0.0002560993295670404], mom=[(0.9, 0.95)] [2023-04-18 23:40:14,287] [INFO] [timer.py:199:stop] epoch=0/micro_step=9880/global_step=4940, RunningAvgSamplesPerSec=12.106348943801585, CurrSamplesPerSec=12.346986174029954, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:40:42,018] [INFO] [logging.py:96:log_dist] [Rank 0] step=4950, skipped=96, lr=[0.0002542270337579562], mom=[(0.9, 0.95)] [2023-04-18 23:40:42,018] [INFO] [timer.py:199:stop] epoch=0/micro_step=9900/global_step=4950, RunningAvgSamplesPerSec=12.105171300137055, CurrSamplesPerSec=12.31227887609429, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:41:08,040] [INFO] [logging.py:96:log_dist] [Rank 0] step=4960, skipped=96, lr=[0.0002523592711451964], mom=[(0.9, 0.95)] [2023-04-18 23:41:08,041] [INFO] [timer.py:199:stop] epoch=0/micro_step=9920/global_step=4960, RunningAvgSamplesPerSec=12.105576644299513, CurrSamplesPerSec=12.271787294196988, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:41:34,048] [INFO] [logging.py:96:log_dist] [Rank 0] step=4970, skipped=96, lr=[0.00025049607617898864], mom=[(0.9, 0.95)] [2023-04-18 23:41:34,048] [INFO] [timer.py:199:stop] epoch=0/micro_step=9940/global_step=4970, RunningAvgSamplesPerSec=12.105994694186295, CurrSamplesPerSec=12.366467895656276, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:42:00,941] [INFO] [logging.py:96:log_dist] [Rank 0] step=4980, skipped=96, lr=[0.00024863748322531145], mom=[(0.9, 0.95)] [2023-04-18 23:42:00,941] [INFO] [timer.py:199:stop] epoch=0/micro_step=9960/global_step=4980, RunningAvgSamplesPerSec=12.105596266048554, CurrSamplesPerSec=12.273489654869701, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:42:27,663] [INFO] [logging.py:96:log_dist] [Rank 0] step=4990, skipped=96, lr=[0.00024678352656526165], mom=[(0.9, 0.95)] [2023-04-18 23:42:27,663] [INFO] [timer.py:199:stop] epoch=0/micro_step=9980/global_step=4990, RunningAvgSamplesPerSec=12.105356465192436, CurrSamplesPerSec=12.314152918825679, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:42:48,403] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 23:42:50,954] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 23:42:53,574] [INFO] [logging.py:96:log_dist] [Rank 0] step=5000, skipped=98, lr=[0.0002453037223509534], mom=[(0.9, 0.95)] [2023-04-18 23:42:53,574] [INFO] [timer.py:199:stop] epoch=0/micro_step=10000/global_step=5000, RunningAvgSamplesPerSec=12.105860479550257, CurrSamplesPerSec=12.22851882451168, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:43:19,590] [INFO] [logging.py:96:log_dist] [Rank 0] step=5010, skipped=98, lr=[0.0002434581971344294], mom=[(0.9, 0.95)] [2023-04-18 23:43:19,590] [INFO] [timer.py:199:stop] epoch=0/micro_step=10020/global_step=5010, RunningAvgSamplesPerSec=12.106266475142517, CurrSamplesPerSec=12.304427431985305, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:43:46,473] [INFO] [logging.py:96:log_dist] [Rank 0] step=5020, skipped=98, lr=[0.00024161740374165126], mom=[(0.9, 0.95)] [2023-04-18 23:43:46,473] [INFO] [timer.py:199:stop] epoch=0/micro_step=10040/global_step=5020, RunningAvgSamplesPerSec=12.105879707083766, CurrSamplesPerSec=12.295801950505215, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:44:13,385] [INFO] [logging.py:96:log_dist] [Rank 0] step=5030, skipped=98, lr=[0.00023978137612540884], mom=[(0.9, 0.95)] [2023-04-18 23:44:13,385] [INFO] [timer.py:199:stop] epoch=0/micro_step=10060/global_step=5030, RunningAvgSamplesPerSec=12.105467871041185, CurrSamplesPerSec=12.278378289864284, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:44:39,377] [INFO] [logging.py:96:log_dist] [Rank 0] step=5040, skipped=98, lr=[0.00023795014815058898], mom=[(0.9, 0.95)] [2023-04-18 23:44:39,378] [INFO] [timer.py:199:stop] epoch=0/micro_step=10080/global_step=5040, RunningAvgSamplesPerSec=12.105893732562755, CurrSamplesPerSec=12.331492520816752, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:45:05,388] [INFO] [logging.py:96:log_dist] [Rank 0] step=5050, skipped=98, lr=[0.0002361237535935502], mom=[(0.9, 0.95)] [2023-04-18 23:45:05,388] [INFO] [timer.py:199:stop] epoch=0/micro_step=10100/global_step=5050, RunningAvgSamplesPerSec=12.10630138931304, CurrSamplesPerSec=12.309118357083646, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:45:32,254] [INFO] [logging.py:96:log_dist] [Rank 0] step=5060, skipped=98, lr=[0.00023430222614150099], mom=[(0.9, 0.95)] [2023-04-18 23:45:32,255] [INFO] [timer.py:199:stop] epoch=0/micro_step=10120/global_step=5060, RunningAvgSamplesPerSec=12.105932372874221, CurrSamplesPerSec=12.349815030279274, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:45:59,201] [INFO] [logging.py:96:log_dist] [Rank 0] step=5070, skipped=98, lr=[0.00023248559939187746], mom=[(0.9, 0.95)] [2023-04-18 23:45:59,201] [INFO] [timer.py:199:stop] epoch=0/micro_step=10140/global_step=5070, RunningAvgSamplesPerSec=12.105492756823615, CurrSamplesPerSec=12.361150260328095, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:46:25,246] [INFO] [logging.py:96:log_dist] [Rank 0] step=5080, skipped=98, lr=[0.00023067390685172433], mom=[(0.9, 0.95)] [2023-04-18 23:46:25,247] [INFO] [timer.py:199:stop] epoch=0/micro_step=10160/global_step=5080, RunningAvgSamplesPerSec=12.105867540399986, CurrSamplesPerSec=12.301774924485144, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:46:51,240] [INFO] [logging.py:96:log_dist] [Rank 0] step=5090, skipped=98, lr=[0.0002288671819370761], mom=[(0.9, 0.95)] [2023-04-18 23:46:51,241] [INFO] [timer.py:199:stop] epoch=0/micro_step=10180/global_step=5090, RunningAvgSamplesPerSec=12.10628708401241, CurrSamplesPerSec=12.37539594653281, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:47:18,027] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 23:47:18,028] [INFO] [logging.py:96:log_dist] [Rank 0] step=5100, skipped=99, lr=[0.00022724540437822594], mom=[(0.9, 0.95)] [2023-04-18 23:47:18,028] [INFO] [timer.py:199:stop] epoch=0/micro_step=10200/global_step=5100, RunningAvgSamplesPerSec=12.105992706840894, CurrSamplesPerSec=12.66434731508367, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:47:21,468] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 23:47:44,874] [INFO] [logging.py:96:log_dist] [Rank 0] step=5110, skipped=100, lr=[0.00022562770181962238], mom=[(0.9, 0.95)] [2023-04-18 23:47:44,875] [INFO] [timer.py:199:stop] epoch=0/micro_step=10220/global_step=5110, RunningAvgSamplesPerSec=12.105645520926759, CurrSamplesPerSec=12.359901524303153, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:48:10,876] [INFO] [logging.py:96:log_dist] [Rank 0] step=5120, skipped=100, lr=[0.00022383506324741742], mom=[(0.9, 0.95)] [2023-04-18 23:48:10,876] [INFO] [timer.py:199:stop] epoch=0/micro_step=10240/global_step=5120, RunningAvgSamplesPerSec=12.106056151493824, CurrSamplesPerSec=12.345557469487643, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:48:36,829] [INFO] [logging.py:96:log_dist] [Rank 0] step=5130, skipped=100, lr=[0.0002220475184407933], mom=[(0.9, 0.95)] [2023-04-18 23:48:36,829] [INFO] [timer.py:199:stop] epoch=0/micro_step=10260/global_step=5130, RunningAvgSamplesPerSec=12.106508749128855, CurrSamplesPerSec=12.366920259471998, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:49:04,461] [INFO] [logging.py:96:log_dist] [Rank 0] step=5140, skipped=100, lr=[0.0002202651003703885], mom=[(0.9, 0.95)] [2023-04-18 23:49:04,462] [INFO] [timer.py:199:stop] epoch=0/micro_step=10280/global_step=5140, RunningAvgSamplesPerSec=12.105462483071726, CurrSamplesPerSec=9.53852825451288, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:49:30,464] [INFO] [logging.py:96:log_dist] [Rank 0] step=5150, skipped=100, lr=[0.00021848784191228038], mom=[(0.9, 0.95)] [2023-04-18 23:49:30,465] [INFO] [timer.py:199:stop] epoch=0/micro_step=10300/global_step=5150, RunningAvgSamplesPerSec=12.105869836399776, CurrSamplesPerSec=12.379187707891234, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:49:56,433] [INFO] [logging.py:96:log_dist] [Rank 0] step=5160, skipped=100, lr=[0.000216715775847379], mom=[(0.9, 0.95)] [2023-04-18 23:49:56,433] [INFO] [timer.py:199:stop] epoch=0/micro_step=10320/global_step=5160, RunningAvgSamplesPerSec=12.106306884287621, CurrSamplesPerSec=12.340496066549703, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:50:22,409] [INFO] [logging.py:96:log_dist] [Rank 0] step=5170, skipped=100, lr=[0.00021494893486082213], mom=[(0.9, 0.95)] [2023-04-18 23:50:22,410] [INFO] [timer.py:199:stop] epoch=0/micro_step=10340/global_step=5170, RunningAvgSamplesPerSec=12.10673504789074, CurrSamplesPerSec=12.317908368702108, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:50:50,184] [INFO] [logging.py:96:log_dist] [Rank 0] step=5180, skipped=100, lr=[0.00021318735154137402], mom=[(0.9, 0.95)] [2023-04-18 23:50:50,184] [INFO] [timer.py:199:stop] epoch=0/micro_step=10360/global_step=5180, RunningAvgSamplesPerSec=12.105570527475907, CurrSamplesPerSec=12.328530495488188, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:51:16,204] [INFO] [logging.py:96:log_dist] [Rank 0] step=5190, skipped=100, lr=[0.0002114310583808219], mom=[(0.9, 0.95)] [2023-04-18 23:51:16,204] [INFO] [timer.py:199:stop] epoch=0/micro_step=10380/global_step=5190, RunningAvgSamplesPerSec=12.105959717644735, CurrSamplesPerSec=12.291222469855912, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:51:42,164] [INFO] [logging.py:96:log_dist] [Rank 0] step=5200, skipped=100, lr=[0.00020968008777337815], mom=[(0.9, 0.95)] [2023-04-18 23:51:42,165] [INFO] [timer.py:199:stop] epoch=0/micro_step=10400/global_step=5200, RunningAvgSamplesPerSec=12.10639955428335, CurrSamplesPerSec=12.371292916010662, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:51:47,302] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 23:51:49,852] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 23:52:08,088] [INFO] [logging.py:96:log_dist] [Rank 0] step=5210, skipped=102, lr=[0.00020828316523191387], mom=[(0.9, 0.95)] [2023-04-18 23:52:08,088] [INFO] [timer.py:199:stop] epoch=0/micro_step=10420/global_step=5210, RunningAvgSamplesPerSec=12.106870111313285, CurrSamplesPerSec=12.290971468382951, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:52:35,829] [INFO] [logging.py:96:log_dist] [Rank 0] step=5220, skipped=102, lr=[0.0002065418565397339], mom=[(0.9, 0.95)] [2023-04-18 23:52:35,830] [INFO] [timer.py:199:stop] epoch=0/micro_step=10440/global_step=5220, RunningAvgSamplesPerSec=12.105743679500053, CurrSamplesPerSec=12.362812597607956, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:53:01,818] [INFO] [logging.py:96:log_dist] [Rank 0] step=5230, skipped=102, lr=[0.0002048059605802786], mom=[(0.9, 0.95)] [2023-04-18 23:53:01,819] [INFO] [timer.py:199:stop] epoch=0/micro_step=10460/global_step=5230, RunningAvgSamplesPerSec=12.106156600907912, CurrSamplesPerSec=12.365499469790985, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:53:27,819] [INFO] [logging.py:96:log_dist] [Rank 0] step=5240, skipped=102, lr=[0.00020307550937154157], mom=[(0.9, 0.95)] [2023-04-18 23:53:27,819] [INFO] [timer.py:199:stop] epoch=0/micro_step=10480/global_step=5240, RunningAvgSamplesPerSec=12.106558496560119, CurrSamplesPerSec=12.318544863546999, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:53:54,721] [INFO] [logging.py:96:log_dist] [Rank 0] step=5250, skipped=102, lr=[0.00020135053483108972], mom=[(0.9, 0.95)] [2023-04-18 23:53:54,721] [INFO] [timer.py:199:stop] epoch=0/micro_step=10500/global_step=5250, RunningAvgSamplesPerSec=12.10617119982164, CurrSamplesPerSec=12.344442446580778, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:54:21,611] [INFO] [logging.py:96:log_dist] [Rank 0] step=5260, skipped=102, lr=[0.00019963106877547417], mom=[(0.9, 0.95)] [2023-04-18 23:54:21,611] [INFO] [timer.py:199:stop] epoch=0/micro_step=10520/global_step=5260, RunningAvgSamplesPerSec=12.105796488144529, CurrSamplesPerSec=12.31933294207943, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:54:47,611] [INFO] [logging.py:96:log_dist] [Rank 0] step=5270, skipped=102, lr=[0.00019791714291964463], mom=[(0.9, 0.95)] [2023-04-18 23:54:47,611] [INFO] [timer.py:199:stop] epoch=0/micro_step=10540/global_step=5270, RunningAvgSamplesPerSec=12.106196475443124, CurrSamplesPerSec=12.278807382119806, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:55:13,638] [INFO] [logging.py:96:log_dist] [Rank 0] step=5280, skipped=102, lr=[0.0001962087888763636], mom=[(0.9, 0.95)] [2023-04-18 23:55:13,639] [INFO] [timer.py:199:stop] epoch=0/micro_step=10560/global_step=5280, RunningAvgSamplesPerSec=12.10657083847983, CurrSamplesPerSec=12.27222714702463, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:55:40,525] [INFO] [logging.py:96:log_dist] [Rank 0] step=5290, skipped=102, lr=[0.0001945060381556231], mom=[(0.9, 0.95)] [2023-04-18 23:55:40,526] [INFO] [timer.py:199:stop] epoch=0/micro_step=10580/global_step=5290, RunningAvgSamplesPerSec=12.106199688158165, CurrSamplesPerSec=12.326700754400978, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:56:07,268] [INFO] [logging.py:96:log_dist] [Rank 0] step=5300, skipped=102, lr=[0.00019280892216406442], mom=[(0.9, 0.95)] [2023-04-18 23:56:07,268] [INFO] [timer.py:199:stop] epoch=0/micro_step=10600/global_step=5300, RunningAvgSamplesPerSec=12.10595545942035, CurrSamplesPerSec=12.315770996189498, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:56:17,590] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-18 23:56:20,139] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-18 23:56:33,177] [INFO] [logging.py:96:log_dist] [Rank 0] step=5310, skipped=104, lr=[0.00019145530741475632], mom=[(0.9, 0.95)] [2023-04-18 23:56:33,178] [INFO] [timer.py:199:stop] epoch=0/micro_step=10620/global_step=5310, RunningAvgSamplesPerSec=12.10642995083265, CurrSamplesPerSec=12.330605463004526, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:56:59,225] [INFO] [logging.py:96:log_dist] [Rank 0] step=5320, skipped=104, lr=[0.0001897684127483446], mom=[(0.9, 0.95)] [2023-04-18 23:56:59,225] [INFO] [timer.py:199:stop] epoch=0/micro_step=10640/global_step=5320, RunningAvgSamplesPerSec=12.106784148489927, CurrSamplesPerSec=12.274245039303947, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:57:26,099] [INFO] [logging.py:96:log_dist] [Rank 0] step=5330, skipped=104, lr=[0.00018808724019495654], mom=[(0.9, 0.95)] [2023-04-18 23:57:26,100] [INFO] [timer.py:199:stop] epoch=0/micro_step=10660/global_step=5330, RunningAvgSamplesPerSec=12.10642602081735, CurrSamplesPerSec=12.306139989433888, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:57:53,035] [INFO] [logging.py:96:log_dist] [Rank 0] step=5340, skipped=104, lr=[0.00018641182076323148], mom=[(0.9, 0.95)] [2023-04-18 23:57:53,036] [INFO] [timer.py:199:stop] epoch=0/micro_step=10680/global_step=5340, RunningAvgSamplesPerSec=12.106016457613565, CurrSamplesPerSec=12.319758116702817, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:58:19,013] [INFO] [logging.py:96:log_dist] [Rank 0] step=5350, skipped=104, lr=[0.00018474218535569447], mom=[(0.9, 0.95)] [2023-04-18 23:58:19,014] [INFO] [timer.py:199:stop] epoch=0/micro_step=10700/global_step=5350, RunningAvgSamplesPerSec=12.10642875182231, CurrSamplesPerSec=12.341931544063197, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:58:44,989] [INFO] [logging.py:96:log_dist] [Rank 0] step=5360, skipped=104, lr=[0.0001830783647681858], mom=[(0.9, 0.95)] [2023-04-18 23:58:44,989] [INFO] [timer.py:199:stop] epoch=0/micro_step=10720/global_step=5360, RunningAvgSamplesPerSec=12.106842254760826, CurrSamplesPerSec=12.365505165970239, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:59:11,709] [INFO] [logging.py:96:log_dist] [Rank 0] step=5370, skipped=104, lr=[0.00018142038968929386], mom=[(0.9, 0.95)] [2023-04-18 23:59:11,710] [INFO] [timer.py:199:stop] epoch=0/micro_step=10740/global_step=5370, RunningAvgSamplesPerSec=12.106617883362869, CurrSamplesPerSec=12.354123288358421, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-18 23:59:38,608] [INFO] [logging.py:96:log_dist] [Rank 0] step=5380, skipped=104, lr=[0.00017976829069978878], mom=[(0.9, 0.95)] [2023-04-18 23:59:38,609] [INFO] [timer.py:199:stop] epoch=0/micro_step=10760/global_step=5380, RunningAvgSamplesPerSec=12.106242756757547, CurrSamplesPerSec=12.35822972866588, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:00:04,591] [INFO] [logging.py:96:log_dist] [Rank 0] step=5390, skipped=104, lr=[0.00017812209827205771], mom=[(0.9, 0.95)] [2023-04-19 00:00:04,592] [INFO] [timer.py:199:stop] epoch=0/micro_step=10780/global_step=5390, RunningAvgSamplesPerSec=12.10664750392464, CurrSamplesPerSec=12.322831328138536, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:00:30,605] [INFO] [logging.py:96:log_dist] [Rank 0] step=5400, skipped=104, lr=[0.0001764818427695441], mom=[(0.9, 0.95)] [2023-04-19 00:00:30,605] [INFO] [timer.py:199:stop] epoch=0/micro_step=10800/global_step=5400, RunningAvgSamplesPerSec=12.10702482431388, CurrSamplesPerSec=12.32244214335534, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:00:46,848] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-19 00:00:49,396] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-19 00:00:58,090] [INFO] [logging.py:96:log_dist] [Rank 0] step=5410, skipped=106, lr=[0.00017517393328802977], mom=[(0.9, 0.95)] [2023-04-19 00:00:58,090] [INFO] [timer.py:199:stop] epoch=0/micro_step=10820/global_step=5410, RunningAvgSamplesPerSec=12.106154712482251, CurrSamplesPerSec=9.19299506849315, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:01:24,141] [INFO] [logging.py:96:log_dist] [Rank 0] step=5420, skipped=106, lr=[0.000173544440416889], mom=[(0.9, 0.95)] [2023-04-19 00:01:24,141] [INFO] [timer.py:199:stop] epoch=0/micro_step=10840/global_step=5420, RunningAvgSamplesPerSec=12.106500030712905, CurrSamplesPerSec=12.250761692273036, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:01:50,136] [INFO] [logging.py:96:log_dist] [Rank 0] step=5430, skipped=106, lr=[0.0001719209689042619], mom=[(0.9, 0.95)] [2023-04-19 00:01:50,136] [INFO] [timer.py:199:stop] epoch=0/micro_step=10860/global_step=5430, RunningAvgSamplesPerSec=12.106891099797613, CurrSamplesPerSec=12.37508330410544, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:02:16,158] [INFO] [logging.py:96:log_dist] [Rank 0] step=5440, skipped=106, lr=[0.00017030354869451242], mom=[(0.9, 0.95)] [2023-04-19 00:02:16,159] [INFO] [timer.py:199:stop] epoch=0/micro_step=10880/global_step=5440, RunningAvgSamplesPerSec=12.107257684798473, CurrSamplesPerSec=12.282554789019162, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:02:43,805] [INFO] [logging.py:96:log_dist] [Rank 0] step=5450, skipped=106, lr=[0.0001686922096203903], mom=[(0.9, 0.95)] [2023-04-19 00:02:43,805] [INFO] [timer.py:199:stop] epoch=0/micro_step=10900/global_step=5450, RunningAvgSamplesPerSec=12.106257780554863, CurrSamplesPerSec=12.329571289819738, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:03:09,820] [INFO] [logging.py:96:log_dist] [Rank 0] step=5460, skipped=106, lr=[0.00016708698140248024], mom=[(0.9, 0.95)] [2023-04-19 00:03:09,820] [INFO] [timer.py:199:stop] epoch=0/micro_step=10920/global_step=5460, RunningAvgSamplesPerSec=12.106630541516004, CurrSamplesPerSec=12.281883795783093, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:03:35,798] [INFO] [logging.py:96:log_dist] [Rank 0] step=5470, skipped=106, lr=[0.00016548789364865518], mom=[(0.9, 0.95)] [2023-04-19 00:03:35,798] [INFO] [timer.py:199:stop] epoch=0/micro_step=10940/global_step=5470, RunningAvgSamplesPerSec=12.107032852397593, CurrSamplesPerSec=12.372906655338932, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:04:01,793] [INFO] [logging.py:96:log_dist] [Rank 0] step=5480, skipped=106, lr=[0.00016389497585352848], mom=[(0.9, 0.95)] [2023-04-19 00:04:01,794] [INFO] [timer.py:199:stop] epoch=0/micro_step=10960/global_step=5480, RunningAvgSamplesPerSec=12.10741908070644, CurrSamplesPerSec=12.324733482998155, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:04:29,508] [INFO] [logging.py:96:log_dist] [Rank 0] step=5490, skipped=106, lr=[0.00016230825739791188], mom=[(0.9, 0.95)] [2023-04-19 00:04:29,509] [INFO] [timer.py:199:stop] epoch=0/micro_step=10980/global_step=5490, RunningAvgSamplesPerSec=12.106369255967932, CurrSamplesPerSec=12.35426543250786, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:04:55,476] [INFO] [logging.py:96:log_dist] [Rank 0] step=5500, skipped=106, lr=[0.00016072776754827146], mom=[(0.9, 0.95)] [2023-04-19 00:04:55,477] [INFO] [timer.py:199:stop] epoch=0/micro_step=11000/global_step=5500, RunningAvgSamplesPerSec=12.106778284775126, CurrSamplesPerSec=12.330698354519367, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:05:16,219] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-19 00:05:18,767] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-19 00:05:21,383] [INFO] [logging.py:96:log_dist] [Rank 0] step=5510, skipped=108, lr=[0.0001594678798585988], mom=[(0.9, 0.95)] [2023-04-19 00:05:21,383] [INFO] [timer.py:199:stop] epoch=0/micro_step=11020/global_step=5510, RunningAvgSamplesPerSec=12.107236805738244, CurrSamplesPerSec=12.248619611410483, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:05:48,285] [INFO] [logging.py:96:log_dist] [Rank 0] step=5520, skipped=108, lr=[0.00015789867488415633], mom=[(0.9, 0.95)] [2023-04-19 00:05:48,285] [INFO] [timer.py:199:stop] epoch=0/micro_step=11040/global_step=5520, RunningAvgSamplesPerSec=12.10686817777906, CurrSamplesPerSec=12.346799901275444, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:06:15,148] [INFO] [logging.py:96:log_dist] [Rank 0] step=5530, skipped=108, lr=[0.00015633577984889598], mom=[(0.9, 0.95)] [2023-04-19 00:06:15,148] [INFO] [timer.py:199:stop] epoch=0/micro_step=11060/global_step=5530, RunningAvgSamplesPerSec=12.106532241785194, CurrSamplesPerSec=12.312563504070683, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:06:41,140] [INFO] [logging.py:96:log_dist] [Rank 0] step=5540, skipped=108, lr=[0.00015477922357986934], mom=[(0.9, 0.95)] [2023-04-19 00:06:41,141] [INFO] [timer.py:199:stop] epoch=0/micro_step=11080/global_step=5540, RunningAvgSamplesPerSec=12.106917863422208, CurrSamplesPerSec=12.314013955751738, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:07:07,142] [INFO] [logging.py:96:log_dist] [Rank 0] step=5550, skipped=108, lr=[0.00015322903478721196], mom=[(0.9, 0.95)] [2023-04-19 00:07:07,143] [INFO] [timer.py:199:stop] epoch=0/micro_step=11100/global_step=5550, RunningAvgSamplesPerSec=12.107294434706082, CurrSamplesPerSec=12.313608382276215, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:07:34,008] [INFO] [logging.py:96:log_dist] [Rank 0] step=5560, skipped=108, lr=[0.00015168524206361363], mom=[(0.9, 0.95)] [2023-04-19 00:07:34,009] [INFO] [timer.py:199:stop] epoch=0/micro_step=11120/global_step=5560, RunningAvgSamplesPerSec=12.106957903578184, CurrSamplesPerSec=12.298259183190078, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:08:00,858] [INFO] [logging.py:96:log_dist] [Rank 0] step=5570, skipped=108, lr=[0.00015014787388379047], mom=[(0.9, 0.95)] [2023-04-19 00:08:00,859] [INFO] [timer.py:199:stop] epoch=0/micro_step=11140/global_step=5570, RunningAvgSamplesPerSec=12.1066355050176, CurrSamplesPerSec=12.335564654360304, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:08:26,781] [INFO] [logging.py:96:log_dist] [Rank 0] step=5580, skipped=108, lr=[0.0001486169586039602], mom=[(0.9, 0.95)] [2023-04-19 00:08:26,781] [INFO] [timer.py:199:stop] epoch=0/micro_step=11160/global_step=5580, RunningAvgSamplesPerSec=12.107075282331472, CurrSamplesPerSec=12.378340580746015, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:08:52,756] [INFO] [logging.py:96:log_dist] [Rank 0] step=5590, skipped=108, lr=[0.0001470925244613197], mom=[(0.9, 0.95)] [2023-04-19 00:08:52,757] [INFO] [timer.py:199:stop] epoch=0/micro_step=11180/global_step=5590, RunningAvgSamplesPerSec=12.107470395820563, CurrSamplesPerSec=12.32618340756689, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:09:19,564] [INFO] [logging.py:96:log_dist] [Rank 0] step=5600, skipped=108, lr=[0.00014557459957352287], mom=[(0.9, 0.95)] [2023-04-19 00:09:19,564] [INFO] [timer.py:199:stop] epoch=0/micro_step=11200/global_step=5600, RunningAvgSamplesPerSec=12.10718294920921, CurrSamplesPerSec=12.292352666259356, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:09:46,221] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-19 00:09:46,221] [INFO] [logging.py:96:log_dist] [Rank 0] step=5610, skipped=109, lr=[0.0001442140557298967], mom=[(0.9, 0.95)] [2023-04-19 00:09:46,221] [INFO] [timer.py:199:stop] epoch=0/micro_step=11220/global_step=5610, RunningAvgSamplesPerSec=12.107020032690667, CurrSamplesPerSec=12.69510619113032, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:09:48,770] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-19 00:10:12,174] [INFO] [logging.py:96:log_dist] [Rank 0] step=5620, skipped=110, lr=[0.00014285882738904822], mom=[(0.9, 0.95)] [2023-04-19 00:10:12,174] [INFO] [timer.py:199:stop] epoch=0/micro_step=11240/global_step=5620, RunningAvgSamplesPerSec=12.107431208099943, CurrSamplesPerSec=12.301640750630925, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:10:38,143] [INFO] [logging.py:96:log_dist] [Rank 0] step=5630, skipped=110, lr=[0.00014135927697679808], mom=[(0.9, 0.95)] [2023-04-19 00:10:38,144] [INFO] [timer.py:199:stop] epoch=0/micro_step=11260/global_step=5630, RunningAvgSamplesPerSec=12.107827745656373, CurrSamplesPerSec=12.336183699932464, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:11:04,815] [INFO] [logging.py:96:log_dist] [Rank 0] step=5640, skipped=110, lr=[0.0001398663415671342], mom=[(0.9, 0.95)] [2023-04-19 00:11:04,815] [INFO] [timer.py:199:stop] epoch=0/micro_step=11280/global_step=5640, RunningAvgSamplesPerSec=12.10765236797409, CurrSamplesPerSec=12.306696278302924, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:11:31,713] [INFO] [logging.py:96:log_dist] [Rank 0] step=5650, skipped=110, lr=[0.00013838004869672776], mom=[(0.9, 0.95)] [2023-04-19 00:11:31,713] [INFO] [timer.py:199:stop] epoch=0/micro_step=11300/global_step=5650, RunningAvgSamplesPerSec=12.107294065213903, CurrSamplesPerSec=12.320351827178026, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:11:57,674] [INFO] [logging.py:96:log_dist] [Rank 0] step=5660, skipped=110, lr=[0.0001369004257797306], mom=[(0.9, 0.95)] [2023-04-19 00:11:57,675] [INFO] [timer.py:199:stop] epoch=0/micro_step=11320/global_step=5660, RunningAvgSamplesPerSec=12.10769517175004, CurrSamplesPerSec=12.385245418318238, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:12:23,657] [INFO] [logging.py:96:log_dist] [Rank 0] step=5670, skipped=110, lr=[0.00013542750010726918], mom=[(0.9, 0.95)] [2023-04-19 00:12:23,658] [INFO] [timer.py:199:stop] epoch=0/micro_step=11340/global_step=5670, RunningAvgSamplesPerSec=12.10807741396788, CurrSamplesPerSec=12.385843171641605, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:12:50,421] [INFO] [logging.py:96:log_dist] [Rank 0] step=5680, skipped=110, lr=[0.00013396129884694197], mom=[(0.9, 0.95)] [2023-04-19 00:12:50,421] [INFO] [timer.py:199:stop] epoch=0/micro_step=11360/global_step=5680, RunningAvgSamplesPerSec=12.107828575548767, CurrSamplesPerSec=12.287338160649, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:13:17,296] [INFO] [logging.py:96:log_dist] [Rank 0] step=5690, skipped=110, lr=[0.00013250184904231815], mom=[(0.9, 0.95)] [2023-04-19 00:13:17,296] [INFO] [timer.py:199:stop] epoch=0/micro_step=11380/global_step=5690, RunningAvgSamplesPerSec=12.10749057616973, CurrSamplesPerSec=12.333248886547535, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:13:43,271] [INFO] [logging.py:96:log_dist] [Rank 0] step=5700, skipped=110, lr=[0.00013104917761243818], mom=[(0.9, 0.95)] [2023-04-19 00:13:43,271] [INFO] [timer.py:199:stop] epoch=0/micro_step=11400/global_step=5700, RunningAvgSamplesPerSec=12.10787765270045, CurrSamplesPerSec=12.396933208013383, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:14:09,247] [INFO] [logging.py:96:log_dist] [Rank 0] step=5710, skipped=110, lr=[0.00012960331135131826], mom=[(0.9, 0.95)] [2023-04-19 00:14:09,247] [INFO] [timer.py:199:stop] epoch=0/micro_step=11420/global_step=5710, RunningAvgSamplesPerSec=12.108262444004078, CurrSamplesPerSec=12.334971744055022, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:14:15,252] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-19 00:14:17,801] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-19 00:14:36,951] [INFO] [logging.py:96:log_dist] [Rank 0] step=5720, skipped=112, lr=[0.00012845153598938937], mom=[(0.9, 0.95)] [2023-04-19 00:14:36,951] [INFO] [timer.py:199:stop] epoch=0/micro_step=11440/global_step=5720, RunningAvgSamplesPerSec=12.107261317329485, CurrSamplesPerSec=12.262767171042642, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:15:02,932] [INFO] [logging.py:96:log_dist] [Rank 0] step=5730, skipped=112, lr=[0.00012701798615198708], mom=[(0.9, 0.95)] [2023-04-19 00:15:02,932] [INFO] [timer.py:199:stop] epoch=0/micro_step=11460/global_step=5730, RunningAvgSamplesPerSec=12.107642208079062, CurrSamplesPerSec=12.311771773936927, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:15:28,933] [INFO] [logging.py:96:log_dist] [Rank 0] step=5740, skipped=112, lr=[0.00012559131583725987], mom=[(0.9, 0.95)] [2023-04-19 00:15:28,934] [INFO] [timer.py:199:stop] epoch=0/micro_step=11480/global_step=5740, RunningAvgSamplesPerSec=12.108005082037543, CurrSamplesPerSec=12.272777007401555, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:15:54,910] [INFO] [logging.py:96:log_dist] [Rank 0] step=5750, skipped=112, lr=[0.00012417155135964236], mom=[(0.9, 0.95)] [2023-04-19 00:15:54,911] [INFO] [timer.py:199:stop] epoch=0/micro_step=11500/global_step=5750, RunningAvgSamplesPerSec=12.108386129518662, CurrSamplesPerSec=12.350152534603005, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:16:22,495] [INFO] [logging.py:96:log_dist] [Rank 0] step=5760, skipped=112, lr=[0.00012275871890619355], mom=[(0.9, 0.95)] [2023-04-19 00:16:22,495] [INFO] [timer.py:199:stop] epoch=0/micro_step=11520/global_step=5760, RunningAvgSamplesPerSec=12.107487303669197, CurrSamplesPerSec=12.382933810428485, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:16:48,434] [INFO] [logging.py:96:log_dist] [Rank 0] step=5770, skipped=112, lr=[0.00012135284453611317], mom=[(0.9, 0.95)] [2023-04-19 00:16:48,435] [INFO] [timer.py:199:stop] epoch=0/micro_step=11540/global_step=5770, RunningAvgSamplesPerSec=12.107898139808663, CurrSamplesPerSec=12.411079059042672, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:17:14,419] [INFO] [logging.py:96:log_dist] [Rank 0] step=5780, skipped=112, lr=[0.000119953954180262], mom=[(0.9, 0.95)] [2023-04-19 00:17:14,419] [INFO] [timer.py:199:stop] epoch=0/micro_step=11560/global_step=5780, RunningAvgSamplesPerSec=12.108271512830903, CurrSamplesPerSec=12.258339890191838, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:17:41,329] [INFO] [logging.py:96:log_dist] [Rank 0] step=5790, skipped=112, lr=[0.00011856207364068278], mom=[(0.9, 0.95)] [2023-04-19 00:17:41,329] [INFO] [timer.py:199:stop] epoch=0/micro_step=11580/global_step=5790, RunningAvgSamplesPerSec=12.107911303659925, CurrSamplesPerSec=12.262522932091771, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:18:08,201] [INFO] [logging.py:96:log_dist] [Rank 0] step=5800, skipped=112, lr=[0.00011717722859012486], mom=[(0.9, 0.95)] [2023-04-19 00:18:08,201] [INFO] [timer.py:199:stop] epoch=0/micro_step=11600/global_step=5800, RunningAvgSamplesPerSec=12.107582256066634, CurrSamplesPerSec=12.27735735059986, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:18:34,230] [INFO] [logging.py:96:log_dist] [Rank 0] step=5810, skipped=112, lr=[0.0001157994445715706], mom=[(0.9, 0.95)] [2023-04-19 00:18:34,231] [INFO] [timer.py:199:stop] epoch=0/micro_step=11620/global_step=5810, RunningAvgSamplesPerSec=12.107918859119206, CurrSamplesPerSec=12.30004441910632, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:18:44,559] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-19 00:18:47,112] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-19 00:19:00,144] [INFO] [logging.py:96:log_dist] [Rank 0] step=5820, skipped=114, lr=[0.00011470231838152145], mom=[(0.9, 0.95)] [2023-04-19 00:19:00,145] [INFO] [timer.py:199:stop] epoch=0/micro_step=11640/global_step=5820, RunningAvgSamplesPerSec=12.108345271492336, CurrSamplesPerSec=12.326343021979687, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:19:27,034] [INFO] [logging.py:96:log_dist] [Rank 0] step=5830, skipped=114, lr=[0.0001133373081728687], mom=[(0.9, 0.95)] [2023-04-19 00:19:27,035] [INFO] [timer.py:199:stop] epoch=0/micro_step=11660/global_step=5830, RunningAvgSamplesPerSec=12.10800297986089, CurrSamplesPerSec=12.36826843153842, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:19:53,944] [INFO] [logging.py:96:log_dist] [Rank 0] step=5840, skipped=114, lr=[0.0001119794298222071], mom=[(0.9, 0.95)] [2023-04-19 00:19:53,945] [INFO] [timer.py:199:stop] epoch=0/micro_step=11680/global_step=5840, RunningAvgSamplesPerSec=12.107645990540977, CurrSamplesPerSec=12.280712824092053, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:20:19,968] [INFO] [logging.py:96:log_dist] [Rank 0] step=5850, skipped=114, lr=[0.00011062870837512773], mom=[(0.9, 0.95)] [2023-04-19 00:20:19,968] [INFO] [timer.py:199:stop] epoch=0/micro_step=11700/global_step=5850, RunningAvgSamplesPerSec=12.107985270189381, CurrSamplesPerSec=12.3045763311023, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:20:45,973] [INFO] [logging.py:96:log_dist] [Rank 0] step=5860, skipped=114, lr=[0.00010928516874521476], mom=[(0.9, 0.95)] [2023-04-19 00:20:45,974] [INFO] [timer.py:199:stop] epoch=0/micro_step=11720/global_step=5860, RunningAvgSamplesPerSec=12.10833726458041, CurrSamplesPerSec=12.291892232476934, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:21:12,854] [INFO] [logging.py:96:log_dist] [Rank 0] step=5870, skipped=114, lr=[0.00010794883571358649], mom=[(0.9, 0.95)] [2023-04-19 00:21:12,855] [INFO] [timer.py:199:stop] epoch=0/micro_step=11740/global_step=5870, RunningAvgSamplesPerSec=12.108004362386204, CurrSamplesPerSec=12.297632670219315, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:21:39,759] [INFO] [logging.py:96:log_dist] [Rank 0] step=5880, skipped=114, lr=[0.00010661973392843771], mom=[(0.9, 0.95)] [2023-04-19 00:21:39,759] [INFO] [timer.py:199:stop] epoch=0/micro_step=11760/global_step=5880, RunningAvgSamplesPerSec=12.107654251081868, CurrSamplesPerSec=12.31839562617047, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:22:05,769] [INFO] [logging.py:96:log_dist] [Rank 0] step=5890, skipped=114, lr=[0.00010529788790458534], mom=[(0.9, 0.95)] [2023-04-19 00:22:05,770] [INFO] [timer.py:199:stop] epoch=0/micro_step=11780/global_step=5890, RunningAvgSamplesPerSec=12.108000681554357, CurrSamplesPerSec=12.272808429482781, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:22:31,825] [INFO] [logging.py:96:log_dist] [Rank 0] step=5900, skipped=114, lr=[0.00010398332202301708], mom=[(0.9, 0.95)] [2023-04-19 00:22:31,825] [INFO] [timer.py:199:stop] epoch=0/micro_step=11800/global_step=5900, RunningAvgSamplesPerSec=12.108311411011043, CurrSamplesPerSec=12.293413258664224, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:22:58,598] [INFO] [logging.py:96:log_dist] [Rank 0] step=5910, skipped=114, lr=[0.0001026760605304401], mom=[(0.9, 0.95)] [2023-04-19 00:22:58,598] [INFO] [timer.py:199:stop] epoch=0/micro_step=11820/global_step=5910, RunningAvgSamplesPerSec=12.108065044609365, CurrSamplesPerSec=12.32363466516599, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:23:14,901] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-19 00:23:17,457] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-19 00:23:25,284] [INFO] [logging.py:96:log_dist] [Rank 0] step=5920, skipped=116, lr=[0.00010163552670424131], mom=[(0.9, 0.95)] [2023-04-19 00:23:25,284] [INFO] [timer.py:199:stop] epoch=0/micro_step=11840/global_step=5920, RunningAvgSamplesPerSec=12.107886934641533, CurrSamplesPerSec=12.268482674154024, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:23:51,328] [INFO] [logging.py:96:log_dist] [Rank 0] step=5930, skipped=116, lr=[0.00010034147378321923], mom=[(0.9, 0.95)] [2023-04-19 00:23:51,329] [INFO] [timer.py:199:stop] epoch=0/micro_step=11860/global_step=5930, RunningAvgSamplesPerSec=12.108204630688423, CurrSamplesPerSec=12.314352895582575, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:24:17,368] [INFO] [logging.py:96:log_dist] [Rank 0] step=5940, skipped=116, lr=[9.90547924238045e-05], mom=[(0.9, 0.95)] [2023-04-19 00:24:17,369] [INFO] [timer.py:199:stop] epoch=0/micro_step=11880/global_step=5940, RunningAvgSamplesPerSec=12.108524851006722, CurrSamplesPerSec=12.307948958653046, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:24:44,129] [INFO] [logging.py:96:log_dist] [Rank 0] step=5950, skipped=116, lr=[9.777550635838406e-05], mom=[(0.9, 0.95)] [2023-04-19 00:24:44,129] [INFO] [timer.py:199:stop] epoch=0/micro_step=11900/global_step=5950, RunningAvgSamplesPerSec=12.10828944477431, CurrSamplesPerSec=12.279109561939789, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:25:11,116] [INFO] [logging.py:96:log_dist] [Rank 0] step=5960, skipped=116, lr=[9.650363918294169e-05], mom=[(0.9, 0.95)] [2023-04-19 00:25:11,117] [INFO] [timer.py:199:stop] epoch=0/micro_step=11920/global_step=5960, RunningAvgSamplesPerSec=12.10788000395691, CurrSamplesPerSec=12.305361495136815, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:25:37,149] [INFO] [logging.py:96:log_dist] [Rank 0] step=5970, skipped=116, lr=[9.523921435662236e-05], mom=[(0.9, 0.95)] [2023-04-19 00:25:37,150] [INFO] [timer.py:199:stop] epoch=0/micro_step=11940/global_step=5970, RunningAvgSamplesPerSec=12.108204332864558, CurrSamplesPerSec=12.293345699550564, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:26:03,161] [INFO] [logging.py:96:log_dist] [Rank 0] step=5980, skipped=116, lr=[9.398225520129922e-05], mom=[(0.9, 0.95)] [2023-04-19 00:26:03,161] [INFO] [timer.py:199:stop] epoch=0/micro_step=11960/global_step=5980, RunningAvgSamplesPerSec=12.108544375107407, CurrSamplesPerSec=12.327548753745651, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:26:30,992] [INFO] [logging.py:96:log_dist] [Rank 0] step=5990, skipped=116, lr=[9.273278490114356e-05], mom=[(0.9, 0.95)] [2023-04-19 00:26:30,992] [INFO] [timer.py:199:stop] epoch=0/micro_step=11980/global_step=5990, RunningAvgSamplesPerSec=12.10749091727013, CurrSamplesPerSec=9.126364481562794, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:26:56,990] [INFO] [logging.py:96:log_dist] [Rank 0] step=6000, skipped=116, lr=[9.14908265021982e-05], mom=[(0.9, 0.95)] [2023-04-19 00:26:56,991] [INFO] [timer.py:199:stop] epoch=0/micro_step=12000/global_step=6000, RunningAvgSamplesPerSec=12.107840552263982, CurrSamplesPerSec=12.297277749649867, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:27:23,012] [INFO] [logging.py:96:log_dist] [Rank 0] step=6010, skipped=116, lr=[9.02564029119507e-05], mom=[(0.9, 0.95)] [2023-04-19 00:27:23,012] [INFO] [timer.py:199:stop] epoch=0/micro_step=12020/global_step=6010, RunningAvgSamplesPerSec=12.108171445873758, CurrSamplesPerSec=12.376473200544494, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:27:43,788] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-19 00:27:46,342] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-19 00:27:48,965] [INFO] [logging.py:96:log_dist] [Rank 0] step=6020, skipped=118, lr=[8.927430440713502e-05], mom=[(0.9, 0.95)] [2023-04-19 00:27:48,966] [INFO] [timer.py:199:stop] epoch=0/micro_step=12040/global_step=6020, RunningAvgSamplesPerSec=12.108553424365057, CurrSamplesPerSec=12.209831072644812, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:28:16,759] [INFO] [logging.py:96:log_dist] [Rank 0] step=6030, skipped=118, lr=[8.805350075553081e-05], mom=[(0.9, 0.95)] [2023-04-19 00:28:16,759] [INFO] [timer.py:199:stop] epoch=0/micro_step=12060/global_step=6030, RunningAvgSamplesPerSec=12.107535414243909, CurrSamplesPerSec=12.37075927718961, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:28:42,780] [INFO] [logging.py:96:log_dist] [Rank 0] step=6040, skipped=118, lr=[8.684029531289477e-05], mom=[(0.9, 0.95)] [2023-04-19 00:28:42,780] [INFO] [timer.py:199:stop] epoch=0/micro_step=12080/global_step=6040, RunningAvgSamplesPerSec=12.107865906515608, CurrSamplesPerSec=12.324324939789117, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:29:08,818] [INFO] [logging.py:96:log_dist] [Rank 0] step=6050, skipped=118, lr=[8.563471045637633e-05], mom=[(0.9, 0.95)] [2023-04-19 00:29:08,819] [INFO] [timer.py:199:stop] epoch=0/micro_step=12100/global_step=6050, RunningAvgSamplesPerSec=12.108181617511656, CurrSamplesPerSec=12.323480778414243, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:29:35,745] [INFO] [logging.py:96:log_dist] [Rank 0] step=6060, skipped=118, lr=[8.443676842256626e-05], mom=[(0.9, 0.95)] [2023-04-19 00:29:35,745] [INFO] [timer.py:199:stop] epoch=0/micro_step=12120/global_step=6060, RunningAvgSamplesPerSec=12.107824952831225, CurrSamplesPerSec=12.268690142047548, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:30:02,502] [INFO] [logging.py:96:log_dist] [Rank 0] step=6070, skipped=118, lr=[8.324649130708606e-05], mom=[(0.9, 0.95)] [2023-04-19 00:30:02,502] [INFO] [timer.py:199:stop] epoch=0/micro_step=12140/global_step=6070, RunningAvgSamplesPerSec=12.107597593101485, CurrSamplesPerSec=12.355983929768364, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:30:28,512] [INFO] [logging.py:96:log_dist] [Rank 0] step=6080, skipped=118, lr=[8.206390106418026e-05], mom=[(0.9, 0.95)] [2023-04-19 00:30:28,512] [INFO] [timer.py:199:stop] epoch=0/micro_step=12160/global_step=6080, RunningAvgSamplesPerSec=12.107934133324907, CurrSamplesPerSec=12.301213443322084, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:30:54,576] [INFO] [logging.py:96:log_dist] [Rank 0] step=6090, skipped=118, lr=[8.088901950631216e-05], mom=[(0.9, 0.95)] [2023-04-19 00:30:54,576] [INFO] [timer.py:199:stop] epoch=0/micro_step=12180/global_step=6090, RunningAvgSamplesPerSec=12.108228591828633, CurrSamplesPerSec=12.273450372932844, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:31:21,513] [INFO] [logging.py:96:log_dist] [Rank 0] step=6100, skipped=118, lr=[7.972186830376055e-05], mom=[(0.9, 0.95)] [2023-04-19 00:31:21,514] [INFO] [timer.py:199:stop] epoch=0/micro_step=12200/global_step=6100, RunningAvgSamplesPerSec=12.107865974492551, CurrSamplesPerSec=12.278933194958947, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:31:48,462] [INFO] [logging.py:96:log_dist] [Rank 0] step=6110, skipped=118, lr=[7.856246898422081e-05], mom=[(0.9, 0.95)] [2023-04-19 00:31:48,463] [INFO] [timer.py:199:stop] epoch=0/micro_step=12220/global_step=6110, RunningAvgSamplesPerSec=12.107496117008601, CurrSamplesPerSec=12.29809240731583, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:32:14,443] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-19 00:32:14,444] [INFO] [logging.py:96:log_dist] [Rank 0] step=6120, skipped=119, lr=[7.752565513403776e-05], mom=[(0.9, 0.95)] [2023-04-19 00:32:14,444] [INFO] [timer.py:199:stop] epoch=0/micro_step=12240/global_step=6120, RunningAvgSamplesPerSec=12.107851755738437, CurrSamplesPerSec=12.660282149284816, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:32:16,991] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-19 00:32:40,461] [INFO] [logging.py:96:log_dist] [Rank 0] step=6130, skipped=120, lr=[7.649515312272781e-05], mom=[(0.9, 0.95)] [2023-04-19 00:32:40,462] [INFO] [timer.py:199:stop] epoch=0/micro_step=12260/global_step=6130, RunningAvgSamplesPerSec=12.108179282452058, CurrSamplesPerSec=12.276982262652654, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:33:07,355] [INFO] [logging.py:96:log_dist] [Rank 0] step=6140, skipped=120, lr=[7.535757238439939e-05], mom=[(0.9, 0.95)] [2023-04-19 00:33:07,356] [INFO] [timer.py:199:stop] epoch=0/micro_step=12280/global_step=6140, RunningAvgSamplesPerSec=12.1078513419366, CurrSamplesPerSec=12.333420017308615, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:33:34,354] [INFO] [logging.py:96:log_dist] [Rank 0] step=6150, skipped=120, lr=[7.422782402699319e-05], mom=[(0.9, 0.95)] [2023-04-19 00:33:34,355] [INFO] [timer.py:199:stop] epoch=0/micro_step=12300/global_step=6150, RunningAvgSamplesPerSec=12.107446537725721, CurrSamplesPerSec=12.292945989988159, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:34:00,430] [INFO] [logging.py:96:log_dist] [Rank 0] step=6160, skipped=120, lr=[7.310592888832235e-05], mom=[(0.9, 0.95)] [2023-04-19 00:34:00,431] [INFO] [timer.py:199:stop] epoch=0/micro_step=12320/global_step=6160, RunningAvgSamplesPerSec=12.107729865672335, CurrSamplesPerSec=12.264220481923857, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:34:26,493] [INFO] [logging.py:96:log_dist] [Rank 0] step=6170, skipped=120, lr=[7.199190766134999e-05], mom=[(0.9, 0.95)] [2023-04-19 00:34:26,493] [INFO] [timer.py:199:stop] epoch=0/micro_step=12340/global_step=6170, RunningAvgSamplesPerSec=12.108021930967487, CurrSamplesPerSec=12.313742817320486, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:34:53,248] [INFO] [logging.py:96:log_dist] [Rank 0] step=6180, skipped=120, lr=[7.088578089380754e-05], mom=[(0.9, 0.95)] [2023-04-19 00:34:53,249] [INFO] [timer.py:199:stop] epoch=0/micro_step=12360/global_step=6180, RunningAvgSamplesPerSec=12.107799500395375, CurrSamplesPerSec=12.320640221485203, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:35:20,219] [INFO] [logging.py:96:log_dist] [Rank 0] step=6190, skipped=120, lr=[6.978756898781613e-05], mom=[(0.9, 0.95)] [2023-04-19 00:35:20,219] [INFO] [timer.py:199:stop] epoch=0/micro_step=12380/global_step=6190, RunningAvgSamplesPerSec=12.107418514765584, CurrSamplesPerSec=12.254561367932405, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:35:46,278] [INFO] [logging.py:96:log_dist] [Rank 0] step=6200, skipped=120, lr=[6.86972921995096e-05], mom=[(0.9, 0.95)] [2023-04-19 00:35:46,278] [INFO] [timer.py:199:stop] epoch=0/micro_step=12400/global_step=6200, RunningAvgSamplesPerSec=12.107712469094507, CurrSamplesPerSec=12.322916182736277, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:36:12,378] [INFO] [logging.py:96:log_dist] [Rank 0] step=6210, skipped=120, lr=[6.761497063866206e-05], mom=[(0.9, 0.95)] [2023-04-19 00:36:12,378] [INFO] [timer.py:199:stop] epoch=0/micro_step=12420/global_step=6210, RunningAvgSamplesPerSec=12.107975365798005, CurrSamplesPerSec=12.255889629015192, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:36:39,160] [INFO] [logging.py:96:log_dist] [Rank 0] step=6220, skipped=120, lr=[6.65406242683157e-05], mom=[(0.9, 0.95)] [2023-04-19 00:36:39,161] [INFO] [timer.py:199:stop] epoch=0/micro_step=12440/global_step=6220, RunningAvgSamplesPerSec=12.107734700204562, CurrSamplesPerSec=12.267995993227718, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:36:45,065] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-19 00:36:47,617] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-19 00:37:05,903] [INFO] [logging.py:96:log_dist] [Rank 0] step=6230, skipped=122, lr=[6.568690263046462e-05], mom=[(0.9, 0.95)] [2023-04-19 00:37:05,904] [INFO] [timer.py:199:stop] epoch=0/micro_step=12460/global_step=6230, RunningAvgSamplesPerSec=12.10752409630217, CurrSamplesPerSec=12.27294758634803, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:37:31,935] [INFO] [logging.py:96:log_dist] [Rank 0] step=6240, skipped=122, lr=[6.462696144011149e-05], mom=[(0.9, 0.95)] [2023-04-19 00:37:31,936] [INFO] [timer.py:199:stop] epoch=0/micro_step=12480/global_step=6240, RunningAvgSamplesPerSec=12.107835735371115, CurrSamplesPerSec=12.268151861487278, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:37:57,963] [INFO] [logging.py:96:log_dist] [Rank 0] step=6250, skipped=122, lr=[6.357505055304297e-05], mom=[(0.9, 0.95)] [2023-04-19 00:37:57,963] [INFO] [timer.py:199:stop] epoch=0/micro_step=12500/global_step=6250, RunningAvgSamplesPerSec=12.108149308230367, CurrSamplesPerSec=12.329200932633215, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:38:25,758] [INFO] [logging.py:96:log_dist] [Rank 0] step=6260, skipped=122, lr=[6.253118937138729e-05], mom=[(0.9, 0.95)] [2023-04-19 00:38:25,758] [INFO] [timer.py:199:stop] epoch=0/micro_step=12520/global_step=6260, RunningAvgSamplesPerSec=12.10716854932737, CurrSamplesPerSec=9.133858340591704, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:38:51,788] [INFO] [logging.py:96:log_dist] [Rank 0] step=6270, skipped=122, lr=[6.149539714879854e-05], mom=[(0.9, 0.95)] [2023-04-19 00:38:51,788] [INFO] [timer.py:199:stop] epoch=0/micro_step=12540/global_step=6270, RunningAvgSamplesPerSec=12.107480372688801, CurrSamplesPerSec=12.32171927523342, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:39:17,848] [INFO] [logging.py:96:log_dist] [Rank 0] step=6280, skipped=122, lr=[6.0467692990101686e-05], mom=[(0.9, 0.95)] [2023-04-19 00:39:17,849] [INFO] [timer.py:199:stop] epoch=0/micro_step=12560/global_step=6280, RunningAvgSamplesPerSec=12.10776941363748, CurrSamplesPerSec=12.322774759055928, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:39:43,957] [INFO] [logging.py:96:log_dist] [Rank 0] step=6290, skipped=122, lr=[5.944809585094013e-05], mom=[(0.9, 0.95)] [2023-04-19 00:39:43,958] [INFO] [timer.py:199:stop] epoch=0/micro_step=12580/global_step=6290, RunningAvgSamplesPerSec=12.108022156399379, CurrSamplesPerSec=12.331717987523641, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:40:11,772] [INFO] [logging.py:96:log_dist] [Rank 0] step=6300, skipped=122, lr=[5.843662453742593e-05], mom=[(0.9, 0.95)] [2023-04-19 00:40:11,772] [INFO] [timer.py:199:stop] epoch=0/micro_step=12600/global_step=6300, RunningAvgSamplesPerSec=12.107033555889979, CurrSamplesPerSec=12.292909961107513, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:40:37,854] [INFO] [logging.py:96:log_dist] [Rank 0] step=6310, skipped=122, lr=[5.7433297705793804e-05], mom=[(0.9, 0.95)] [2023-04-19 00:40:37,855] [INFO] [timer.py:199:stop] epoch=0/micro_step=12620/global_step=6310, RunningAvgSamplesPerSec=12.107305824521143, CurrSamplesPerSec=12.279566791654071, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:41:03,936] [INFO] [logging.py:96:log_dist] [Rank 0] step=6320, skipped=122, lr=[5.643813386205571e-05], mom=[(0.9, 0.95)] [2023-04-19 00:41:03,937] [INFO] [timer.py:199:stop] epoch=0/micro_step=12640/global_step=6320, RunningAvgSamplesPerSec=12.107577526103913, CurrSamplesPerSec=12.266680802255602, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:41:14,319] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-19 00:41:16,871] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-19 00:41:30,788] [INFO] [logging.py:96:log_dist] [Rank 0] step=6330, skipped=124, lr=[5.564789247827795e-05], mom=[(0.9, 0.95)] [2023-04-19 00:41:30,788] [INFO] [timer.py:199:stop] epoch=0/micro_step=12660/global_step=6330, RunningAvgSamplesPerSec=12.107291107421652, CurrSamplesPerSec=12.269105098885799, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:41:57,744] [INFO] [logging.py:96:log_dist] [Rank 0] step=6340, skipped=124, lr=[5.466746816708496e-05], mom=[(0.9, 0.95)] [2023-04-19 00:41:57,744] [INFO] [timer.py:199:stop] epoch=0/micro_step=12680/global_step=6340, RunningAvgSamplesPerSec=12.106930322047898, CurrSamplesPerSec=12.254425984278425, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:42:23,749] [INFO] [logging.py:96:log_dist] [Rank 0] step=6350, skipped=124, lr=[5.369525785854368e-05], mom=[(0.9, 0.95)] [2023-04-19 00:42:23,749] [INFO] [timer.py:199:stop] epoch=0/micro_step=12700/global_step=6350, RunningAvgSamplesPerSec=12.107257090374247, CurrSamplesPerSec=12.347456424157672, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:42:49,801] [INFO] [logging.py:96:log_dist] [Rank 0] step=6360, skipped=124, lr=[5.2731279484732775e-05], mom=[(0.9, 0.95)] [2023-04-19 00:42:49,802] [INFO] [timer.py:199:stop] epoch=0/micro_step=12720/global_step=6360, RunningAvgSamplesPerSec=12.107548446853217, CurrSamplesPerSec=12.333321418210785, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:43:16,819] [INFO] [logging.py:96:log_dist] [Rank 0] step=6370, skipped=124, lr=[5.177555082589597e-05], mom=[(0.9, 0.95)] [2023-04-19 00:43:16,819] [INFO] [timer.py:199:stop] epoch=0/micro_step=12740/global_step=6370, RunningAvgSamplesPerSec=12.107145041224761, CurrSamplesPerSec=12.234467855902839, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:43:43,615] [INFO] [logging.py:96:log_dist] [Rank 0] step=6380, skipped=124, lr=[5.082808951011381e-05], mom=[(0.9, 0.95)] [2023-04-19 00:43:43,615] [INFO] [timer.py:199:stop] epoch=0/micro_step=12760/global_step=6380, RunningAvgSamplesPerSec=12.106901845608402, CurrSamplesPerSec=12.263802493700327, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:44:09,701] [INFO] [logging.py:96:log_dist] [Rank 0] step=6390, skipped=124, lr=[4.988891301297866e-05], mom=[(0.9, 0.95)] [2023-04-19 00:44:09,701] [INFO] [timer.py:199:stop] epoch=0/micro_step=12780/global_step=6390, RunningAvgSamplesPerSec=12.107168550718978, CurrSamplesPerSec=12.295211730024752, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:44:35,791] [INFO] [logging.py:96:log_dist] [Rank 0] step=6400, skipped=124, lr=[4.8958038657272e-05], mom=[(0.9, 0.95)] [2023-04-19 00:44:35,791] [INFO] [timer.py:199:stop] epoch=0/micro_step=12800/global_step=6400, RunningAvgSamplesPerSec=12.10743149270927, CurrSamplesPerSec=12.323326895505655, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:45:02,676] [INFO] [logging.py:96:log_dist] [Rank 0] step=6410, skipped=124, lr=[4.803548361264554e-05], mom=[(0.9, 0.95)] [2023-04-19 00:45:02,676] [INFO] [timer.py:199:stop] epoch=0/micro_step=12820/global_step=6410, RunningAvgSamplesPerSec=12.107125268702406, CurrSamplesPerSec=12.30568190694906, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:45:29,617] [INFO] [logging.py:96:log_dist] [Rank 0] step=6420, skipped=124, lr=[4.712126489530422e-05], mom=[(0.9, 0.95)] [2023-04-19 00:45:29,617] [INFO] [timer.py:199:stop] epoch=0/micro_step=12840/global_step=6420, RunningAvgSamplesPerSec=12.106779800558249, CurrSamplesPerSec=12.307544913166701, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:45:45,177] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-19 00:45:47,733] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-19 00:45:55,577] [INFO] [logging.py:96:log_dist] [Rank 0] step=6430, skipped=126, lr=[4.6395903413813334e-05], mom=[(0.9, 0.95)] [2023-04-19 00:45:55,577] [INFO] [timer.py:199:stop] epoch=0/micro_step=12860/global_step=6430, RunningAvgSamplesPerSec=12.107134646730136, CurrSamplesPerSec=12.287338160649, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:46:21,601] [INFO] [logging.py:96:log_dist] [Rank 0] step=6440, skipped=126, lr=[4.5496732475418745e-05], mom=[(0.9, 0.95)] [2023-04-19 00:46:21,601] [INFO] [timer.py:199:stop] epoch=0/micro_step=12880/global_step=6440, RunningAvgSamplesPerSec=12.10744298561596, CurrSamplesPerSec=12.306644370926083, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:46:48,360] [INFO] [logging.py:96:log_dist] [Rank 0] step=6450, skipped=126, lr=[4.460594469068535e-05], mom=[(0.9, 0.95)] [2023-04-19 00:46:48,360] [INFO] [timer.py:199:stop] epoch=0/micro_step=12900/global_step=6450, RunningAvgSamplesPerSec=12.107228501752891, CurrSamplesPerSec=12.327967701684218, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:47:15,313] [INFO] [logging.py:96:log_dist] [Rank 0] step=6460, skipped=126, lr=[4.3723556489881745e-05], mom=[(0.9, 0.95)] [2023-04-19 00:47:15,313] [INFO] [timer.py:199:stop] epoch=0/micro_step=12920/global_step=6460, RunningAvgSamplesPerSec=12.106876782023496, CurrSamplesPerSec=12.248899067983727, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:47:41,335] [INFO] [logging.py:96:log_dist] [Rank 0] step=6470, skipped=126, lr=[4.2849584148349554e-05], mom=[(0.9, 0.95)] [2023-04-19 00:47:41,335] [INFO] [timer.py:199:stop] epoch=0/micro_step=12940/global_step=6470, RunningAvgSamplesPerSec=12.107185502639364, CurrSamplesPerSec=12.33747414482355, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:48:07,405] [INFO] [logging.py:96:log_dist] [Rank 0] step=6480, skipped=126, lr=[4.198404378620269e-05], mom=[(0.9, 0.95)] [2023-04-19 00:48:07,405] [INFO] [timer.py:199:stop] epoch=0/micro_step=12960/global_step=6480, RunningAvgSamplesPerSec=12.107459283307591, CurrSamplesPerSec=12.304637245416119, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:48:34,177] [INFO] [logging.py:96:log_dist] [Rank 0] step=6490, skipped=126, lr=[4.112695136803002e-05], mom=[(0.9, 0.95)] [2023-04-19 00:48:34,178] [INFO] [timer.py:199:stop] epoch=0/micro_step=12980/global_step=6490, RunningAvgSamplesPerSec=12.107235886496557, CurrSamplesPerSec=12.318585565277264, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:49:01,104] [INFO] [logging.py:96:log_dist] [Rank 0] step=6500, skipped=126, lr=[4.027832270260129e-05], mom=[(0.9, 0.95)] [2023-04-19 00:49:01,105] [INFO] [timer.py:199:stop] epoch=0/micro_step=13000/global_step=6500, RunningAvgSamplesPerSec=12.106904761055409, CurrSamplesPerSec=12.287702632196432, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:49:27,123] [INFO] [logging.py:96:log_dist] [Rank 0] step=6510, skipped=126, lr=[3.9438173442575e-05], mom=[(0.9, 0.95)] [2023-04-19 00:49:27,124] [INFO] [timer.py:199:stop] epoch=0/micro_step=13020/global_step=6510, RunningAvgSamplesPerSec=12.107213495091601, CurrSamplesPerSec=12.334647538051083, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:49:53,171] [INFO] [logging.py:96:log_dist] [Rank 0] step=6520, skipped=126, lr=[3.860651908421015e-05], mom=[(0.9, 0.95)] [2023-04-19 00:49:53,172] [INFO] [timer.py:199:stop] epoch=0/micro_step=13040/global_step=6520, RunningAvgSamplesPerSec=12.107500991844331, CurrSamplesPerSec=12.273646785131646, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:50:14,823] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-19 00:50:17,378] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-19 00:50:20,759] [INFO] [logging.py:96:log_dist] [Rank 0] step=6530, skipped=128, lr=[3.7947322240179935e-05], mom=[(0.9, 0.95)] [2023-04-19 00:50:20,760] [INFO] [timer.py:199:stop] epoch=0/micro_step=13060/global_step=6530, RunningAvgSamplesPerSec=12.106706821359627, CurrSamplesPerSec=9.471720557072492, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:50:46,784] [INFO] [logging.py:96:log_dist] [Rank 0] step=6540, skipped=128, lr=[3.71309972550577e-05], mom=[(0.9, 0.95)] [2023-04-19 00:50:46,784] [INFO] [timer.py:199:stop] epoch=0/micro_step=13080/global_step=6540, RunningAvgSamplesPerSec=12.107011157789364, CurrSamplesPerSec=12.335019356162114, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:51:12,849] [INFO] [logging.py:96:log_dist] [Rank 0] step=6550, skipped=128, lr=[3.632320972665415e-05], mom=[(0.9, 0.95)] [2023-04-19 00:51:12,850] [INFO] [timer.py:199:stop] epoch=0/micro_step=13100/global_step=6550, RunningAvgSamplesPerSec=12.10728504197632, CurrSamplesPerSec=12.2925789556498, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:51:38,897] [INFO] [logging.py:96:log_dist] [Rank 0] step=6560, skipped=128, lr=[3.552397455432732e-05], mom=[(0.9, 0.95)] [2023-04-19 00:51:38,897] [INFO] [timer.py:199:stop] epoch=0/micro_step=13120/global_step=6560, RunningAvgSamplesPerSec=12.10757101981572, CurrSamplesPerSec=12.337247333548671, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:52:06,700] [INFO] [logging.py:96:log_dist] [Rank 0] step=6570, skipped=128, lr=[3.473330647969025e-05], mom=[(0.9, 0.95)] [2023-04-19 00:52:06,700] [INFO] [timer.py:199:stop] epoch=0/micro_step=13140/global_step=6570, RunningAvgSamplesPerSec=12.106631913101195, CurrSamplesPerSec=12.33055675215979, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:52:32,727] [INFO] [logging.py:96:log_dist] [Rank 0] step=6580, skipped=128, lr=[3.395122008633883e-05], mom=[(0.9, 0.95)] [2023-04-19 00:52:32,728] [INFO] [timer.py:199:stop] epoch=0/micro_step=13160/global_step=6580, RunningAvgSamplesPerSec=12.10693190693524, CurrSamplesPerSec=12.351318600579829, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:52:58,788] [INFO] [logging.py:96:log_dist] [Rank 0] step=6590, skipped=128, lr=[3.317772979958267e-05], mom=[(0.9, 0.95)] [2023-04-19 00:52:58,789] [INFO] [timer.py:199:stop] epoch=0/micro_step=13180/global_step=6590, RunningAvgSamplesPerSec=12.107207706318313, CurrSamplesPerSec=12.301514472029988, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:53:25,775] [INFO] [logging.py:96:log_dist] [Rank 0] step=6600, skipped=128, lr=[3.2412849886179486e-05], mom=[(0.9, 0.95)] [2023-04-19 00:53:25,775] [INFO] [timer.py:199:stop] epoch=0/micro_step=13200/global_step=6600, RunningAvgSamplesPerSec=12.106840120287597, CurrSamplesPerSec=12.234053008381954, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:53:52,738] [INFO] [logging.py:96:log_dist] [Rank 0] step=6610, skipped=128, lr=[3.165659445407132e-05], mom=[(0.9, 0.95)] [2023-04-19 00:53:52,738] [INFO] [timer.py:199:stop] epoch=0/micro_step=13220/global_step=6610, RunningAvgSamplesPerSec=12.10649028364977, CurrSamplesPerSec=12.267142712735067, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:54:18,853] [INFO] [logging.py:96:log_dist] [Rank 0] step=6620, skipped=128, lr=[3.090897745212512e-05], mom=[(0.9, 0.95)] [2023-04-19 00:54:18,854] [INFO] [timer.py:199:stop] epoch=0/micro_step=13240/global_step=6620, RunningAvgSamplesPerSec=12.106727817652574, CurrSamplesPerSec=12.206875018928617, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:54:44,804] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-19 00:54:44,804] [INFO] [logging.py:96:log_dist] [Rank 0] step=6630, skipped=129, lr=[3.024351940850789e-05], mom=[(0.9, 0.95)] [2023-04-19 00:54:44,804] [INFO] [timer.py:199:stop] epoch=0/micro_step=13260/global_step=6630, RunningAvgSamplesPerSec=12.107078666754425, CurrSamplesPerSec=12.653241484065008, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:54:47,359] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-19 00:55:11,770] [INFO] [logging.py:96:log_dist] [Rank 0] step=6640, skipped=130, lr=[2.958507960694784e-05], mom=[(0.9, 0.95)] [2023-04-19 00:55:11,771] [INFO] [timer.py:199:stop] epoch=0/micro_step=13280/global_step=6640, RunningAvgSamplesPerSec=12.106727405320045, CurrSamplesPerSec=12.246298379149152, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:55:38,720] [INFO] [logging.py:96:log_dist] [Rank 0] step=6650, skipped=130, lr=[2.8861723060212175e-05], mom=[(0.9, 0.95)] [2023-04-19 00:55:38,721] [INFO] [timer.py:199:stop] epoch=0/micro_step=13300/global_step=6650, RunningAvgSamplesPerSec=12.10638845570879, CurrSamplesPerSec=12.28620776144105, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:56:04,754] [INFO] [logging.py:96:log_dist] [Rank 0] step=6660, skipped=130, lr=[2.814705649406285e-05], mom=[(0.9, 0.95)] [2023-04-19 00:56:04,755] [INFO] [timer.py:199:stop] epoch=0/micro_step=13320/global_step=6660, RunningAvgSamplesPerSec=12.106680930959081, CurrSamplesPerSec=12.250980861785566, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:56:30,797] [INFO] [logging.py:96:log_dist] [Rank 0] step=6670, skipped=130, lr=[2.744109309027443e-05], mom=[(0.9, 0.95)] [2023-04-19 00:56:30,797] [INFO] [timer.py:199:stop] epoch=0/micro_step=13340/global_step=6670, RunningAvgSamplesPerSec=12.106966216958831, CurrSamplesPerSec=12.291337281121525, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:56:57,749] [INFO] [logging.py:96:log_dist] [Rank 0] step=6680, skipped=130, lr=[2.6743845870094706e-05], mom=[(0.9, 0.95)] [2023-04-19 00:56:57,749] [INFO] [timer.py:199:stop] epoch=0/micro_step=13360/global_step=6680, RunningAvgSamplesPerSec=12.10662712459984, CurrSamplesPerSec=12.30987587561, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:57:24,548] [INFO] [logging.py:96:log_dist] [Rank 0] step=6690, skipped=130, lr=[2.605532769400476e-05], mom=[(0.9, 0.95)] [2023-04-19 00:57:24,548] [INFO] [timer.py:199:stop] epoch=0/micro_step=13380/global_step=6690, RunningAvgSamplesPerSec=12.106393955165029, CurrSamplesPerSec=12.310824316371711, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:57:50,615] [INFO] [logging.py:96:log_dist] [Rank 0] step=6700, skipped=130, lr=[2.5375551261481223e-05], mom=[(0.9, 0.95)] [2023-04-19 00:57:50,615] [INFO] [timer.py:199:stop] epoch=0/micro_step=13400/global_step=6700, RunningAvgSamplesPerSec=12.106662145751821, CurrSamplesPerSec=12.325023214264725, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:58:16,654] [INFO] [logging.py:96:log_dist] [Rank 0] step=6710, skipped=130, lr=[2.4704529110762263e-05], mom=[(0.9, 0.95)] [2023-04-19 00:58:16,654] [INFO] [timer.py:199:stop] epoch=0/micro_step=13420/global_step=6710, RunningAvgSamplesPerSec=12.106948276090849, CurrSamplesPerSec=12.322804174914067, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:58:43,452] [INFO] [logging.py:96:log_dist] [Rank 0] step=6720, skipped=130, lr=[2.4042273618616772e-05], mom=[(0.9, 0.95)] [2023-04-19 00:58:43,452] [INFO] [timer.py:199:stop] epoch=0/micro_step=13440/global_step=6720, RunningAvgSamplesPerSec=12.106716446571506, CurrSamplesPerSec=12.273063178724136, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:59:10,409] [INFO] [logging.py:96:log_dist] [Rank 0] step=6730, skipped=130, lr=[2.3388797000115424e-05], mom=[(0.9, 0.95)] [2023-04-19 00:59:10,409] [INFO] [timer.py:199:stop] epoch=0/micro_step=13460/global_step=6730, RunningAvgSamplesPerSec=12.106377211643567, CurrSamplesPerSec=12.254423746557208, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 00:59:15,552] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-19 00:59:18,103] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-19 00:59:36,372] [INFO] [logging.py:96:log_dist] [Rank 0] step=6740, skipped=132, lr=[2.2872344599499384e-05], mom=[(0.9, 0.95)] [2023-04-19 00:59:36,373] [INFO] [timer.py:199:stop] epoch=0/micro_step=13480/global_step=6740, RunningAvgSamplesPerSec=12.106713831177517, CurrSamplesPerSec=12.305005000196195, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:00:02,435] [INFO] [logging.py:96:log_dist] [Rank 0] step=6750, skipped=132, lr=[2.223470021853452e-05], mom=[(0.9, 0.95)] [2023-04-19 01:00:02,436] [INFO] [timer.py:199:stop] epoch=0/micro_step=13500/global_step=6750, RunningAvgSamplesPerSec=12.106981849204093, CurrSamplesPerSec=12.261743225139721, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:00:29,179] [INFO] [logging.py:96:log_dist] [Rank 0] step=6760, skipped=132, lr=[2.1605868051272404e-05], mom=[(0.9, 0.95)] [2023-04-19 01:00:29,179] [INFO] [timer.py:199:stop] epoch=0/micro_step=13520/global_step=6760, RunningAvgSamplesPerSec=12.106788239856197, CurrSamplesPerSec=12.344872762511422, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:00:56,122] [INFO] [logging.py:96:log_dist] [Rank 0] step=6770, skipped=132, lr=[2.098585969630207e-05], mom=[(0.9, 0.95)] [2023-04-19 01:00:56,122] [INFO] [timer.py:199:stop] epoch=0/micro_step=13540/global_step=6770, RunningAvgSamplesPerSec=12.10645987343087, CurrSamplesPerSec=12.312398598957053, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:01:22,115] [INFO] [logging.py:96:log_dist] [Rank 0] step=6780, skipped=132, lr=[2.037468658946068e-05], mom=[(0.9, 0.95)] [2023-04-19 01:01:22,115] [INFO] [timer.py:199:stop] epoch=0/micro_step=13560/global_step=6780, RunningAvgSamplesPerSec=12.106774290695286, CurrSamplesPerSec=12.297595487103706, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:01:48,147] [INFO] [logging.py:96:log_dist] [Rank 0] step=6790, skipped=132, lr=[1.977236000362226e-05], mom=[(0.9, 0.95)] [2023-04-19 01:01:48,148] [INFO] [timer.py:199:stop] epoch=0/micro_step=13580/global_step=6790, RunningAvgSamplesPerSec=12.107061303570614, CurrSamplesPerSec=12.284443404706407, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:02:15,057] [INFO] [logging.py:96:log_dist] [Rank 0] step=6800, skipped=132, lr=[1.91788910484898e-05], mom=[(0.9, 0.95)] [2023-04-19 01:02:15,058] [INFO] [timer.py:199:stop] epoch=0/micro_step=13600/global_step=6800, RunningAvgSamplesPerSec=12.10675643179966, CurrSamplesPerSec=12.292251345352078, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:02:41,987] [INFO] [logging.py:96:log_dist] [Rank 0] step=6810, skipped=132, lr=[1.8594290670390425e-05], mom=[(0.9, 0.95)] [2023-04-19 01:02:41,987] [INFO] [timer.py:199:stop] epoch=0/micro_step=13620/global_step=6810, RunningAvgSamplesPerSec=12.106439360342007, CurrSamplesPerSec=12.270189726746421, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:03:08,007] [INFO] [logging.py:96:log_dist] [Rank 0] step=6820, skipped=132, lr=[1.801856965207338e-05], mom=[(0.9, 0.95)] [2023-04-19 01:03:08,008] [INFO] [timer.py:199:stop] epoch=0/micro_step=13640/global_step=6820, RunningAvgSamplesPerSec=12.106733870641586, CurrSamplesPerSec=12.356672145111142, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:03:34,055] [INFO] [logging.py:96:log_dist] [Rank 0] step=6830, skipped=132, lr=[1.745173861251137e-05], mom=[(0.9, 0.95)] [2023-04-19 01:03:34,055] [INFO] [timer.py:199:stop] epoch=0/micro_step=13660/global_step=6830, RunningAvgSamplesPerSec=12.107009188286623, CurrSamplesPerSec=12.317173598551355, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:03:45,274] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-19 01:03:47,825] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-19 01:04:01,627] [INFO] [logging.py:96:log_dist] [Rank 0] step=6840, skipped=134, lr=[1.7004681596839877e-05], mom=[(0.9, 0.95)] [2023-04-19 01:04:01,628] [INFO] [timer.py:199:stop] epoch=0/micro_step=13680/global_step=6840, RunningAvgSamplesPerSec=12.10626248512698, CurrSamplesPerSec=12.291149306933015, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:04:27,659] [INFO] [logging.py:96:log_dist] [Rank 0] step=6850, skipped=134, lr=[1.6453878755324502e-05], mom=[(0.9, 0.95)] [2023-04-19 01:04:27,659] [INFO] [timer.py:199:stop] epoch=0/micro_step=13700/global_step=6850, RunningAvgSamplesPerSec=12.106548790444862, CurrSamplesPerSec=12.332338913331832, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:04:53,697] [INFO] [logging.py:96:log_dist] [Rank 0] step=6860, skipped=134, lr=[1.5911994752739965e-05], mom=[(0.9, 0.95)] [2023-04-19 01:04:53,698] [INFO] [timer.py:199:stop] epoch=0/micro_step=13720/global_step=6860, RunningAvgSamplesPerSec=12.106829066649526, CurrSamplesPerSec=12.337866548023076, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:05:20,595] [INFO] [logging.py:96:log_dist] [Rank 0] step=6870, skipped=134, lr=[1.537903958394704e-05], mom=[(0.9, 0.95)] [2023-04-19 01:05:20,595] [INFO] [timer.py:199:stop] epoch=0/micro_step=13740/global_step=6870, RunningAvgSamplesPerSec=12.10653613529871, CurrSamplesPerSec=12.322375396109289, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:05:47,503] [INFO] [logging.py:96:log_dist] [Rank 0] step=6880, skipped=134, lr=[1.485502307911718e-05], mom=[(0.9, 0.95)] [2023-04-19 01:05:47,504] [INFO] [timer.py:199:stop] epoch=0/micro_step=13760/global_step=6880, RunningAvgSamplesPerSec=12.106236193353809, CurrSamplesPerSec=12.334782432884488, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:06:13,551] [INFO] [logging.py:96:log_dist] [Rank 0] step=6890, skipped=134, lr=[1.43399549035515e-05], mom=[(0.9, 0.95)] [2023-04-19 01:06:13,551] [INFO] [timer.py:199:stop] epoch=0/micro_step=13780/global_step=6890, RunningAvgSamplesPerSec=12.106510098328474, CurrSamplesPerSec=12.32123741026743, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:06:39,618] [INFO] [logging.py:96:log_dist] [Rank 0] step=6900, skipped=134, lr=[1.3833844557502085e-05], mom=[(0.9, 0.95)] [2023-04-19 01:06:39,618] [INFO] [timer.py:199:stop] epoch=0/micro_step=13800/global_step=6900, RunningAvgSamplesPerSec=12.106769938324078, CurrSamplesPerSec=12.252074590125474, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:07:06,497] [INFO] [logging.py:96:log_dist] [Rank 0] step=6910, skipped=134, lr=[1.3336701375997129e-05], mom=[(0.9, 0.95)] [2023-04-19 01:07:06,498] [INFO] [timer.py:199:stop] epoch=0/micro_step=13820/global_step=6910, RunningAvgSamplesPerSec=12.10649044955526, CurrSamplesPerSec=12.333728291324249, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:07:33,437] [INFO] [logging.py:96:log_dist] [Rank 0] step=6920, skipped=134, lr=[1.2848534528668521e-05], mom=[(0.9, 0.95)] [2023-04-19 01:07:33,438] [INFO] [timer.py:199:stop] epoch=0/micro_step=13840/global_step=6920, RunningAvgSamplesPerSec=12.106171793281288, CurrSamplesPerSec=12.27034677252511, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:07:59,488] [INFO] [logging.py:96:log_dist] [Rank 0] step=6930, skipped=134, lr=[1.2369353019582973e-05], mom=[(0.9, 0.95)] [2023-04-19 01:07:59,489] [INFO] [timer.py:199:stop] epoch=0/micro_step=13860/global_step=6930, RunningAvgSamplesPerSec=12.106442068395006, CurrSamplesPerSec=12.330186335768913, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:08:15,048] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-19 01:08:17,602] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-19 01:08:25,418] [INFO] [logging.py:96:log_dist] [Rank 0] step=6940, skipped=136, lr=[1.19924832007835e-05], mom=[(0.9, 0.95)] [2023-04-19 01:08:25,418] [INFO] [timer.py:199:stop] epoch=0/micro_step=13880/global_step=6940, RunningAvgSamplesPerSec=12.106791304452079, CurrSamplesPerSec=12.36452777842008, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:08:52,293] [INFO] [logging.py:96:log_dist] [Rank 0] step=6950, skipped=136, lr=[1.1529497461664373e-05], mom=[(0.9, 0.95)] [2023-04-19 01:08:52,293] [INFO] [timer.py:199:stop] epoch=0/micro_step=13900/global_step=6950, RunningAvgSamplesPerSec=12.106516500504544, CurrSamplesPerSec=12.313384706942998, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:09:19,247] [INFO] [logging.py:96:log_dist] [Rank 0] step=6960, skipped=136, lr=[1.1075521389965204e-05], mom=[(0.9, 0.95)] [2023-04-19 01:09:19,247] [INFO] [timer.py:199:stop] epoch=0/micro_step=13920/global_step=6960, RunningAvgSamplesPerSec=12.106190565298053, CurrSamplesPerSec=12.325089990199082, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:09:45,274] [INFO] [logging.py:96:log_dist] [Rank 0] step=6970, skipped=136, lr=[1.0630563359115664e-05], mom=[(0.9, 0.95)] [2023-04-19 01:09:45,274] [INFO] [timer.py:199:stop] epoch=0/micro_step=13940/global_step=6970, RunningAvgSamplesPerSec=12.106474802987217, CurrSamplesPerSec=12.315027441358804, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:10:11,331] [INFO] [logging.py:96:log_dist] [Rank 0] step=6980, skipped=136, lr=[1.0194631576210922e-05], mom=[(0.9, 0.95)] [2023-04-19 01:10:11,331] [INFO] [timer.py:199:stop] epoch=0/micro_step=13960/global_step=6980, RunningAvgSamplesPerSec=12.106738487440564, CurrSamplesPerSec=12.290874672347682, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:10:38,118] [INFO] [logging.py:96:log_dist] [Rank 0] step=6990, skipped=136, lr=[9.767734081860103e-06], mom=[(0.9, 0.95)] [2023-04-19 01:10:38,119] [INFO] [timer.py:199:stop] epoch=0/micro_step=13980/global_step=6990, RunningAvgSamplesPerSec=12.106522879233912, CurrSamplesPerSec=12.289680604821479, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:11:04,874] [INFO] [logging.py:96:log_dist] [Rank 0] step=7000, skipped=136, lr=[9.349878750038066e-06], mom=[(0.9, 0.95)] [2023-04-19 01:11:04,875] [INFO] [timer.py:199:stop] epoch=0/micro_step=14000/global_step=7000, RunningAvgSamplesPerSec=12.106328459125303, CurrSamplesPerSec=12.310186360959447, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:11:30,881] [INFO] [logging.py:96:log_dist] [Rank 0] step=7010, skipped=136, lr=[8.941073287940083e-06], mom=[(0.9, 0.95)] [2023-04-19 01:11:30,882] [INFO] [timer.py:199:stop] epoch=0/micro_step=14020/global_step=7010, RunningAvgSamplesPerSec=12.106623844176088, CurrSamplesPerSec=12.33911083535135, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:11:56,967] [INFO] [logging.py:96:log_dist] [Rank 0] step=7020, skipped=136, lr=[8.541325235839725e-06], mom=[(0.9, 0.95)] [2023-04-19 01:11:56,967] [INFO] [timer.py:199:stop] epoch=0/micro_step=14040/global_step=7020, RunningAvgSamplesPerSec=12.106867153575708, CurrSamplesPerSec=12.329016328644853, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:12:23,756] [INFO] [logging.py:96:log_dist] [Rank 0] step=7030, skipped=136, lr=[8.150641966950034e-06], mom=[(0.9, 0.95)] [2023-04-19 01:12:23,756] [INFO] [timer.py:199:stop] epoch=0/micro_step=14060/global_step=7030, RunningAvgSamplesPerSec=12.106651513794658, CurrSamplesPerSec=12.273211319709857, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:12:45,401] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-19 01:12:47,954] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-19 01:12:50,589] [INFO] [logging.py:96:log_dist] [Rank 0] step=7040, skipped=138, lr=[7.84462684381393e-06], mom=[(0.9, 0.95)] [2023-04-19 01:12:50,589] [INFO] [timer.py:199:stop] epoch=0/micro_step=14080/global_step=7040, RunningAvgSamplesPerSec=12.106407478931189, CurrSamplesPerSec=12.154819999300873, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:13:16,603] [INFO] [logging.py:96:log_dist] [Rank 0] step=7050, skipped=138, lr=[7.470278231422767e-06], mom=[(0.9, 0.95)] [2023-04-19 01:13:16,604] [INFO] [timer.py:199:stop] epoch=0/micro_step=14100/global_step=7050, RunningAvgSamplesPerSec=12.106696306483371, CurrSamplesPerSec=12.358922746309249, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:13:42,626] [INFO] [logging.py:96:log_dist] [Rank 0] step=7060, skipped=138, lr=[7.105014157329226e-06], mom=[(0.9, 0.95)] [2023-04-19 01:13:42,626] [INFO] [timer.py:199:stop] epoch=0/micro_step=14120/global_step=7060, RunningAvgSamplesPerSec=12.106979078486914, CurrSamplesPerSec=12.30997974532897, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:14:09,549] [INFO] [logging.py:96:log_dist] [Rank 0] step=7070, skipped=138, lr=[6.748841358701297e-06], mom=[(0.9, 0.95)] [2023-04-19 01:14:09,549] [INFO] [timer.py:199:stop] epoch=0/micro_step=14140/global_step=7070, RunningAvgSamplesPerSec=12.106677443630467, CurrSamplesPerSec=12.31389533121884, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:14:36,488] [INFO] [logging.py:96:log_dist] [Rank 0] step=7080, skipped=138, lr=[6.401766405021603e-06], mom=[(0.9, 0.95)] [2023-04-19 01:14:36,488] [INFO] [timer.py:199:stop] epoch=0/micro_step=14160/global_step=7080, RunningAvgSamplesPerSec=12.106366550399876, CurrSamplesPerSec=12.338016257555807, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:15:02,558] [INFO] [logging.py:96:log_dist] [Rank 0] step=7090, skipped=138, lr=[6.0637956979660566e-06], mom=[(0.9, 0.95)] [2023-04-19 01:15:02,559] [INFO] [timer.py:199:stop] epoch=0/micro_step=14180/global_step=7090, RunningAvgSamplesPerSec=12.106617380627489, CurrSamplesPerSec=12.296932989284574, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:15:28,626] [INFO] [logging.py:96:log_dist] [Rank 0] step=7100, skipped=138, lr=[5.734935471286174e-06], mom=[(0.9, 0.95)] [2023-04-19 01:15:28,626] [INFO] [timer.py:199:stop] epoch=0/micro_step=14200/global_step=7100, RunningAvgSamplesPerSec=12.106869674443162, CurrSamplesPerSec=12.299407580419437, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:15:56,425] [INFO] [logging.py:96:log_dist] [Rank 0] step=7110, skipped=138, lr=[5.4151917906936636e-06], mom=[(0.9, 0.95)] [2023-04-19 01:15:56,426] [INFO] [timer.py:199:stop] epoch=0/micro_step=14220/global_step=7110, RunningAvgSamplesPerSec=12.106005365069931, CurrSamplesPerSec=9.136688660123657, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:16:22,429] [INFO] [logging.py:96:log_dist] [Rank 0] step=7120, skipped=138, lr=[5.1045705537487445e-06], mom=[(0.9, 0.95)] [2023-04-19 01:16:22,429] [INFO] [timer.py:199:stop] epoch=0/micro_step=14240/global_step=7120, RunningAvgSamplesPerSec=12.106298795885323, CurrSamplesPerSec=12.32670415069435, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:16:48,481] [INFO] [logging.py:96:log_dist] [Rank 0] step=7130, skipped=138, lr=[4.803077489751395e-06], mom=[(0.9, 0.95)] [2023-04-19 01:16:48,482] [INFO] [timer.py:199:stop] epoch=0/micro_step=14260/global_step=7130, RunningAvgSamplesPerSec=12.106560012623095, CurrSamplesPerSec=12.250648755942994, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:17:15,316] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-19 01:17:15,316] [INFO] [logging.py:96:log_dist] [Rank 0] step=7140, skipped=139, lr=[4.539542919616446e-06], mom=[(0.9, 0.95)] [2023-04-19 01:17:15,316] [INFO] [timer.py:199:stop] epoch=0/micro_step=14280/global_step=7140, RunningAvgSamplesPerSec=12.106318693047513, CurrSamplesPerSec=12.646558969654528, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:17:17,866] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-19 01:17:42,092] [INFO] [logging.py:96:log_dist] [Rank 0] step=7150, skipped=140, lr=[4.283410613337757e-06], mom=[(0.9, 0.95)] [2023-04-19 01:17:42,093] [INFO] [timer.py:199:stop] epoch=0/micro_step=14300/global_step=7150, RunningAvgSamplesPerSec=12.106115469393984, CurrSamplesPerSec=12.227294512701452, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:18:08,135] [INFO] [logging.py:96:log_dist] [Rank 0] step=7160, skipped=140, lr=[4.007505479947493e-06], mom=[(0.9, 0.95)] [2023-04-19 01:18:08,135] [INFO] [timer.py:199:stop] epoch=0/micro_step=14320/global_step=7160, RunningAvgSamplesPerSec=12.106382505613308, CurrSamplesPerSec=12.292442730690293, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:18:34,191] [INFO] [logging.py:96:log_dist] [Rank 0] step=7170, skipped=140, lr=[3.7407487544861563e-06], mom=[(0.9, 0.95)] [2023-04-19 01:18:34,191] [INFO] [timer.py:199:stop] epoch=0/micro_step=14340/global_step=7170, RunningAvgSamplesPerSec=12.10664038876368, CurrSamplesPerSec=12.307098011070707, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:19:01,120] [INFO] [logging.py:96:log_dist] [Rank 0] step=7180, skipped=140, lr=[3.4831453571879666e-06], mom=[(0.9, 0.95)] [2023-04-19 01:19:01,121] [INFO] [timer.py:199:stop] epoch=0/micro_step=14360/global_step=7180, RunningAvgSamplesPerSec=12.106339773291129, CurrSamplesPerSec=12.268004963955384, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:19:28,058] [INFO] [logging.py:96:log_dist] [Rank 0] step=7190, skipped=140, lr=[3.2347000394575278e-06], mom=[(0.9, 0.95)] [2023-04-19 01:19:28,058] [INFO] [timer.py:199:stop] epoch=0/micro_step=14380/global_step=7190, RunningAvgSamplesPerSec=12.106034746811197, CurrSamplesPerSec=12.202789760247763, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:19:54,150] [INFO] [logging.py:96:log_dist] [Rank 0] step=7200, skipped=140, lr=[2.9954173837817845e-06], mom=[(0.9, 0.95)] [2023-04-19 01:19:54,151] [INFO] [timer.py:199:stop] epoch=0/micro_step=14400/global_step=7200, RunningAvgSamplesPerSec=12.106268617260596, CurrSamplesPerSec=12.331278391728587, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:20:20,201] [INFO] [logging.py:96:log_dist] [Rank 0] step=7210, skipped=140, lr=[2.765301803645426e-06], mom=[(0.9, 0.95)] [2023-04-19 01:20:20,201] [INFO] [timer.py:199:stop] epoch=0/micro_step=14420/global_step=7210, RunningAvgSamplesPerSec=12.10652849321684, CurrSamplesPerSec=12.307730003336953, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:20:47,118] [INFO] [logging.py:96:log_dist] [Rank 0] step=7220, skipped=140, lr=[2.5443575434498935e-06], mom=[(0.9, 0.95)] [2023-04-19 01:20:47,119] [INFO] [timer.py:199:stop] epoch=0/micro_step=14440/global_step=7220, RunningAvgSamplesPerSec=12.106237489814143, CurrSamplesPerSec=12.312435871644308, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:21:14,085] [INFO] [logging.py:96:log_dist] [Rank 0] step=7230, skipped=140, lr=[2.3325886784346106e-06], mom=[(0.9, 0.95)] [2023-04-19 01:21:14,085] [INFO] [timer.py:199:stop] epoch=0/micro_step=14460/global_step=7230, RunningAvgSamplesPerSec=12.105915909839451, CurrSamplesPerSec=12.350707127220327, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:21:40,087] [INFO] [logging.py:96:log_dist] [Rank 0] step=7240, skipped=140, lr=[2.1299991146022657e-06], mom=[(0.9, 0.95)] [2023-04-19 01:21:40,088] [INFO] [timer.py:199:stop] epoch=0/micro_step=14480/global_step=7240, RunningAvgSamplesPerSec=12.10620540246003, CurrSamplesPerSec=12.354669139728776, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:21:45,226] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-19 01:21:47,785] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 131072, reducing to 65536 [2023-04-19 01:22:06,049] [INFO] [logging.py:96:log_dist] [Rank 0] step=7250, skipped=142, lr=[1.9745390771366833e-06], mom=[(0.9, 0.95)] [2023-04-19 01:22:06,050] [INFO] [timer.py:199:stop] epoch=0/micro_step=14500/global_step=7250, RunningAvgSamplesPerSec=12.106519806254568, CurrSamplesPerSec=12.304181530727696, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:22:32,984] [INFO] [logging.py:96:log_dist] [Rank 0] step=7260, skipped=142, lr=[1.7884815580815893e-06], mom=[(0.9, 0.95)] [2023-04-19 01:22:32,984] [INFO] [timer.py:199:stop] epoch=0/micro_step=14520/global_step=7260, RunningAvgSamplesPerSec=12.106219517545991, CurrSamplesPerSec=12.324828549568467, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:22:59,941] [INFO] [logging.py:96:log_dist] [Rank 0] step=7270, skipped=142, lr=[1.6116133760747943e-06], mom=[(0.9, 0.95)] [2023-04-19 01:22:59,942] [INFO] [timer.py:199:stop] epoch=0/micro_step=14540/global_step=7270, RunningAvgSamplesPerSec=12.10590562283263, CurrSamplesPerSec=12.286433824643385, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:23:26,001] [INFO] [logging.py:96:log_dist] [Rank 0] step=7280, skipped=142, lr=[1.4439377933879172e-06], mom=[(0.9, 0.95)] [2023-04-19 01:23:26,001] [INFO] [timer.py:199:stop] epoch=0/micro_step=14560/global_step=7280, RunningAvgSamplesPerSec=12.10615754240652, CurrSamplesPerSec=12.24880293348373, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:23:52,064] [INFO] [logging.py:96:log_dist] [Rank 0] step=7290, skipped=142, lr=[1.2854579027384295e-06], mom=[(0.9, 0.95)] [2023-04-19 01:23:52,065] [INFO] [timer.py:199:stop] epoch=0/micro_step=14580/global_step=7290, RunningAvgSamplesPerSec=12.106406341786647, CurrSamplesPerSec=12.315783427200486, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:24:18,832] [INFO] [logging.py:96:log_dist] [Rank 0] step=7300, skipped=142, lr=[1.136176627232366e-06], mom=[(0.9, 0.95)] [2023-04-19 01:24:18,832] [INFO] [timer.py:199:stop] epoch=0/micro_step=14600/global_step=7300, RunningAvgSamplesPerSec=12.106212704124548, CurrSamplesPerSec=12.309391549737542, MemAllocated=2.0GB, MaxMemAllocated=13.66GB [2023-04-19 01:24:45,602] [INFO] [logging.py:96:log_dist] [Rank 0] step=7310, skipped=142, lr=[9.96096720310702e-07], mom=[(0.9, 0.95)] [2023-04-19 01:24:45,602] [INFO] [timer.py:199:stop] epoch=0/micro_step=14620/global_step=7310, RunningAvgSamplesPerSec=12.106018279345193, CurrSamplesPerSec=12.267896194767255, MemAllocated=2.0GB, MaxMemAllocated=13.66GB ***** Evaluating perplexity, Epoch 1/1 ***** Invalidate trace cache @ step 0: expected module 0, but got module 16 ppl: 2.1580886840820312 saving the final model ... [2023-04-19 01:41:42,561] [INFO] [launch.py:460:main] Process 16300 exits successfully. [2023-04-19 01:41:43,562] [INFO] [launch.py:460:main] Process 16298 exits successfully. [2023-04-19 01:41:43,562] [INFO] [launch.py:460:main] Process 16299 exits successfully. [2023-04-19 01:41:54,575] [INFO] [launch.py:460:main] Process 16297 exits successfully.