2023-04-24 13:45:55.136747: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 [2023-04-24 13:45:56,684] [WARNING] [runner.py:190:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only. [2023-04-24 13:45:56,704] [INFO] [runner.py:540:main] cmd = /root/miniconda3/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMV19 --master_addr=127.0.0.1 --master_port=12346 --enable_each_rank_log=None main.py --data_path Dahoas/rm-static --data_split 2,4,4 --actor_model_name_or_path /root/DeepSpeedExamples/applications/DeepSpeed-Chat/training/step1_supervised_finetuning/deepspeed-chat_step1_output --critic_model_name_or_path /root/DeepSpeedExamples/applications/DeepSpeed-Chat/training/step2_reward_model_finetuning/deepspeed-chat_step2_output-accum16 --num_padding_at_beginning 1 --per_device_train_batch_size 4 --per_device_mini_train_batch_size 4 --generation_batch_numbers 1 --ppo_epochs 1 --max_answer_seq_len 256 --max_prompt_seq_len 256 --actor_learning_rate 9.65e-6 --critic_learning_rate 5e-6 --num_train_epochs 1 --lr_scheduler_type cosine --gradient_accumulation_steps 8 --disable_actor_dropout --num_warmup_steps 100 --deepspeed --seed 1234 --enable_hybrid_engine --actor_zero_stage 2 --critic_zero_stage 2 --enable_ema --output_dir ./output 2023-04-24 13:45:58.975645: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 [2023-04-24 13:46:00,527] [INFO] [launch.py:229:main] WORLD INFO DICT: {'localhost': [0, 1]} [2023-04-24 13:46:00,527] [INFO] [launch.py:235:main] nnodes=1, num_local_procs=2, node_rank=0 [2023-04-24 13:46:00,527] [INFO] [launch.py:246:main] global_rank_mapping=defaultdict(, {'localhost': [0, 1]}) [2023-04-24 13:46:00,528] [INFO] [launch.py:247:main] dist_world_size=2 [2023-04-24 13:46:00,528] [INFO] [launch.py:249:main] Setting CUDA_VISIBLE_DEVICES=0,1 2023-04-24 13:46:02.546559: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 2023-04-24 13:46:02.546554: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 [2023-04-24 13:46:05,204] [INFO] [comm.py:586:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl Found cached dataset parquet (/root/.cache/huggingface/datasets/Dahoas___parquet/default-b9d2c4937d617106/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec) 0%| | 0/2 [00:00 [2023-04-24 13:46:58,555] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.float16 ZeRO stage 2 optimizer [2023-04-24 13:46:58,555] [INFO] [stage_1_and_2.py:133:__init__] Reduce bucket size 500,000,000 [2023-04-24 13:46:58,555] [INFO] [stage_1_and_2.py:134:__init__] Allgather bucket size 500,000,000 [2023-04-24 13:46:58,555] [INFO] [stage_1_and_2.py:135:__init__] CPU Offload: False [2023-04-24 13:46:58,555] [INFO] [stage_1_and_2.py:136:__init__] Round robin gradient partitioning: False Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Emitting ninja build file /root/.cache/torch_extensions/py38_cu117/utils/build.ninja... Building extension module utils... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) ninja: no work to do. Loading extension module utils... Time to load utils op: 0.4158661365509033 seconds Loading extension module utils... Time to load utils op: 0.5032038688659668 seconds Rank: 0 partition count [2, 2] and sizes[(657607680, False), (271360, False)] Rank: 1 partition count [2, 2] and sizes[(657607680, False), (271360, False)] Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0006771087646484375 seconds [2023-04-24 13:47:01,741] [INFO] [utils.py:785:see_memory_usage] Before initializing optimizer states [2023-04-24 13:47:01,742] [INFO] [utils.py:786:see_memory_usage] MA 4.9 GB Max_MA 4.9 GB CA 4.9 GB Max_CA 5 GB [2023-04-24 13:47:01,742] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 129.82 GB, percent = 20.6% [2023-04-24 13:47:01,941] [INFO] [utils.py:785:see_memory_usage] After initializing optimizer states [2023-04-24 13:47:01,942] [INFO] [utils.py:786:see_memory_usage] MA 9.8 GB Max_MA 12.26 GB CA 12.26 GB Max_CA 12 GB [2023-04-24 13:47:01,942] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 129.77 GB, percent = 20.6% [2023-04-24 13:47:01,942] [INFO] [stage_1_and_2.py:489:__init__] optimizer state initialized [2023-04-24 13:47:02,147] [INFO] [utils.py:785:see_memory_usage] After initializing ZeRO optimizer [2023-04-24 13:47:02,148] [INFO] [utils.py:786:see_memory_usage] MA 9.8 GB Max_MA 9.8 GB CA 12.26 GB Max_CA 12 GB [2023-04-24 13:47:02,148] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 129.88 GB, percent = 20.6% [2023-04-24 13:47:02,154] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2023-04-24 13:47:02,154] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler [2023-04-24 13:47:02,154] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2023-04-24 13:47:02,154] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 13:47:02,155] [INFO] [config.py:953:print] DeepSpeedEngine configuration: [2023-04-24 13:47:02,155] [INFO] [config.py:957:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2023-04-24 13:47:02,155] [INFO] [config.py:957:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2023-04-24 13:47:02,155] [INFO] [config.py:957:print] amp_enabled .................. False [2023-04-24 13:47:02,155] [INFO] [config.py:957:print] amp_params ................... False [2023-04-24 13:47:02,155] [INFO] [config.py:957:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2023-04-24 13:47:02,155] [INFO] [config.py:957:print] bfloat16_enabled ............. False [2023-04-24 13:47:02,155] [INFO] [config.py:957:print] checkpoint_parallel_write_pipeline False [2023-04-24 13:47:02,155] [INFO] [config.py:957:print] checkpoint_tag_validation_enabled True [2023-04-24 13:47:02,155] [INFO] [config.py:957:print] checkpoint_tag_validation_fail False [2023-04-24 13:47:02,155] [INFO] [config.py:957:print] comms_config ................. [2023-04-24 13:47:02,156] [INFO] [config.py:957:print] communication_data_type ...... None [2023-04-24 13:47:02,156] [INFO] [config.py:957:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2023-04-24 13:47:02,156] [INFO] [config.py:957:print] curriculum_enabled_legacy .... False [2023-04-24 13:47:02,156] [INFO] [config.py:957:print] curriculum_params_legacy ..... False [2023-04-24 13:47:02,156] [INFO] [config.py:957:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} [2023-04-24 13:47:02,156] [INFO] [config.py:957:print] data_efficiency_enabled ...... False [2023-04-24 13:47:02,156] [INFO] [config.py:957:print] dataloader_drop_last ......... False [2023-04-24 13:47:02,156] [INFO] [config.py:957:print] disable_allgather ............ False [2023-04-24 13:47:02,156] [INFO] [config.py:957:print] dump_state ................... False [2023-04-24 13:47:02,156] [INFO] [config.py:957:print] dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 100, 'delayed_shift': 2, 'min_scale': 1} [2023-04-24 13:47:02,156] [INFO] [config.py:957:print] eigenvalue_enabled ........... False [2023-04-24 13:47:02,156] [INFO] [config.py:957:print] eigenvalue_gas_boundary_resolution 1 [2023-04-24 13:47:02,156] [INFO] [config.py:957:print] eigenvalue_layer_name ........ bert.encoder.layer [2023-04-24 13:47:02,156] [INFO] [config.py:957:print] eigenvalue_layer_num ......... 0 [2023-04-24 13:47:02,156] [INFO] [config.py:957:print] eigenvalue_max_iter .......... 100 [2023-04-24 13:47:02,156] [INFO] [config.py:957:print] eigenvalue_stability ......... 1e-06 [2023-04-24 13:47:02,156] [INFO] [config.py:957:print] eigenvalue_tol ............... 0.01 [2023-04-24 13:47:02,156] [INFO] [config.py:957:print] eigenvalue_verbose ........... False [2023-04-24 13:47:02,156] [INFO] [config.py:957:print] elasticity_enabled ........... False [2023-04-24 13:47:02,156] [INFO] [config.py:957:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2023-04-24 13:47:02,156] [INFO] [config.py:957:print] fp16_auto_cast ............... False [2023-04-24 13:47:02,156] [INFO] [config.py:957:print] fp16_enabled ................. True [2023-04-24 13:47:02,156] [INFO] [config.py:957:print] fp16_master_weights_and_gradients False [2023-04-24 13:47:02,156] [INFO] [config.py:957:print] global_rank .................. 0 [2023-04-24 13:47:02,156] [INFO] [config.py:957:print] grad_accum_dtype ............. None [2023-04-24 13:47:02,156] [INFO] [config.py:957:print] gradient_accumulation_steps .. 8 [2023-04-24 13:47:02,156] [INFO] [config.py:957:print] gradient_clipping ............ 1.0 [2023-04-24 13:47:02,156] [INFO] [config.py:957:print] gradient_predivide_factor .... 1.0 [2023-04-24 13:47:02,157] [INFO] [config.py:957:print] hybrid_engine ................ enabled=True max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 [2023-04-24 13:47:02,158] [INFO] [config.py:957:print] initial_dynamic_scale ........ 65536 [2023-04-24 13:47:02,158] [INFO] [config.py:957:print] load_universal_checkpoint .... False [2023-04-24 13:47:02,158] [INFO] [config.py:957:print] loss_scale ................... 0 [2023-04-24 13:47:02,158] [INFO] [config.py:957:print] memory_breakdown ............. False [2023-04-24 13:47:02,159] [INFO] [config.py:957:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False [2023-04-24 13:47:02,159] [INFO] [config.py:957:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2023-04-24 13:47:02,159] [INFO] [config.py:957:print] optimizer_legacy_fusion ...... False [2023-04-24 13:47:02,159] [INFO] [config.py:957:print] optimizer_name ............... None [2023-04-24 13:47:02,159] [INFO] [config.py:957:print] optimizer_params ............. None [2023-04-24 13:47:02,159] [INFO] [config.py:957:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2023-04-24 13:47:02,159] [INFO] [config.py:957:print] pld_enabled .................. False [2023-04-24 13:47:02,159] [INFO] [config.py:957:print] pld_params ................... False [2023-04-24 13:47:02,159] [INFO] [config.py:957:print] prescale_gradients ........... False [2023-04-24 13:47:02,159] [INFO] [config.py:957:print] scheduler_name ............... None [2023-04-24 13:47:02,159] [INFO] [config.py:957:print] scheduler_params ............. None [2023-04-24 13:47:02,159] [INFO] [config.py:957:print] sparse_attention ............. None [2023-04-24 13:47:02,159] [INFO] [config.py:957:print] sparse_gradients_enabled ..... False [2023-04-24 13:47:02,159] [INFO] [config.py:957:print] steps_per_print .............. 10 [2023-04-24 13:47:02,159] [INFO] [config.py:957:print] train_batch_size ............. 64 [2023-04-24 13:47:02,159] [INFO] [config.py:957:print] train_micro_batch_size_per_gpu 4 [2023-04-24 13:47:02,159] [INFO] [config.py:957:print] use_node_local_storage ....... False [2023-04-24 13:47:02,159] [INFO] [config.py:957:print] wall_clock_breakdown ......... False [2023-04-24 13:47:02,159] [INFO] [config.py:957:print] world_size ................... 2 [2023-04-24 13:47:02,159] [INFO] [config.py:957:print] zero_allow_untested_optimizer False [2023-04-24 13:47:02,159] [INFO] [config.py:957:print] zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='none', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=30000000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=30000000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False memory_efficient_linear=False [2023-04-24 13:47:02,159] [INFO] [config.py:957:print] zero_enabled ................. True [2023-04-24 13:47:02,159] [INFO] [config.py:957:print] zero_force_ds_cpu_optimizer .. True [2023-04-24 13:47:02,159] [INFO] [config.py:957:print] zero_optimization_stage ...... 2 [2023-04-24 13:47:02,160] [INFO] [config.py:943:print_user_config] json = { "train_batch_size": 64, "train_micro_batch_size_per_gpu": 4, "steps_per_print": 10, "zero_optimization": { "stage": 2, "offload_param": { "device": "none" }, "offload_optimizer": { "device": "none" }, "stage3_param_persistence_threshold": 1.000000e+04, "stage3_max_live_parameters": 3.000000e+07, "stage3_prefetch_bucket_size": 3.000000e+07, "memory_efficient_linear": false }, "fp16": { "enabled": true, "loss_scale_window": 100 }, "gradient_clipping": 1.0, "prescale_gradients": false, "wall_clock_breakdown": false, "hybrid_engine": { "enabled": true, "max_out_tokens": 512, "inference_tp_size": 1, "release_inference_cache": false, "pin_parameters": true, "tp_gather_partition_size": 8 } } Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.000347137451171875 seconds huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Installed CUDA version 11.0 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Installed CUDA version 11.0 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Detected CUDA files, patching ldflags Emitting ninja build file /root/.cache/torch_extensions/py38_cu117/transformer_inference/build.ninja... Building extension module transformer_inference... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) ninja: no work to do. Loading extension module transformer_inference... Time to load transformer_inference op: 0.4992704391479492 seconds Loading extension module transformer_inference... huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Time to load transformer_inference op: 0.5179216861724854 seconds [2023-04-24 13:47:02,906] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed-Inference config: {'layer_id': 0, 'hidden_size': 2048, 'intermediate_size': 8192, 'heads': 32, 'num_hidden_layers': -1, 'fp16': True, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 1, 'q_int8': False, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 1, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': , 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False, 'max_out_tokens': 512, 'min_out_tokens': 512, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False, 'set_empty_params': True, 'transposed_mode': True} huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Installed CUDA version 11.0 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.08913207054138184 seconds Installed CUDA version 11.0 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.09765148162841797 seconds huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Installed CUDA version 11.0 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.07995152473449707 seconds Installed CUDA version 11.0 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module transformer_inference, skipping build step... Loading extension module transformer_inference... Time to load transformer_inference op: 0.12397336959838867 seconds ******************[end] Initialized Actor Model [end] (duration: 18.59s)****************** *************************[start] Initializing Ref Model [start] ************************** [2023-04-24 13:47:14,890] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.0, git-hash=unknown, git-branch=unknown Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0008111000061035156 seconds [2023-04-24 13:47:15,953] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False [2023-04-24 13:47:15,955] [INFO] [config.py:953:print] DeepSpeedEngine configuration: [2023-04-24 13:47:15,956] [INFO] [config.py:957:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2023-04-24 13:47:15,956] [INFO] [config.py:957:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2023-04-24 13:47:15,956] [INFO] [config.py:957:print] amp_enabled .................. False [2023-04-24 13:47:15,956] [INFO] [config.py:957:print] amp_params ................... False [2023-04-24 13:47:15,956] [INFO] [config.py:957:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2023-04-24 13:47:15,956] [INFO] [config.py:957:print] bfloat16_enabled ............. False [2023-04-24 13:47:15,956] [INFO] [config.py:957:print] checkpoint_parallel_write_pipeline False [2023-04-24 13:47:15,956] [INFO] [config.py:957:print] checkpoint_tag_validation_enabled True [2023-04-24 13:47:15,956] [INFO] [config.py:957:print] checkpoint_tag_validation_fail False [2023-04-24 13:47:15,956] [INFO] [config.py:957:print] comms_config ................. [2023-04-24 13:47:15,956] [INFO] [config.py:957:print] communication_data_type ...... None [2023-04-24 13:47:15,957] [INFO] [config.py:957:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2023-04-24 13:47:15,957] [INFO] [config.py:957:print] curriculum_enabled_legacy .... False [2023-04-24 13:47:15,957] [INFO] [config.py:957:print] curriculum_params_legacy ..... False [2023-04-24 13:47:15,957] [INFO] [config.py:957:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} [2023-04-24 13:47:15,957] [INFO] [config.py:957:print] data_efficiency_enabled ...... False [2023-04-24 13:47:15,957] [INFO] [config.py:957:print] dataloader_drop_last ......... False [2023-04-24 13:47:15,957] [INFO] [config.py:957:print] disable_allgather ............ False [2023-04-24 13:47:15,957] [INFO] [config.py:957:print] dump_state ................... False [2023-04-24 13:47:15,957] [INFO] [config.py:957:print] dynamic_loss_scale_args ...... None [2023-04-24 13:47:15,957] [INFO] [config.py:957:print] eigenvalue_enabled ........... False [2023-04-24 13:47:15,957] [INFO] [config.py:957:print] eigenvalue_gas_boundary_resolution 1 [2023-04-24 13:47:15,957] [INFO] [config.py:957:print] eigenvalue_layer_name ........ bert.encoder.layer [2023-04-24 13:47:15,957] [INFO] [config.py:957:print] eigenvalue_layer_num ......... 0 [2023-04-24 13:47:15,957] [INFO] [config.py:957:print] eigenvalue_max_iter .......... 100 [2023-04-24 13:47:15,957] [INFO] [config.py:957:print] eigenvalue_stability ......... 1e-06 [2023-04-24 13:47:15,957] [INFO] [config.py:957:print] eigenvalue_tol ............... 0.01 [2023-04-24 13:47:15,957] [INFO] [config.py:957:print] eigenvalue_verbose ........... False [2023-04-24 13:47:15,957] [INFO] [config.py:957:print] elasticity_enabled ........... False [2023-04-24 13:47:15,957] [INFO] [config.py:957:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2023-04-24 13:47:15,957] [INFO] [config.py:957:print] fp16_auto_cast ............... False [2023-04-24 13:47:15,957] [INFO] [config.py:957:print] fp16_enabled ................. True [2023-04-24 13:47:15,957] [INFO] [config.py:957:print] fp16_master_weights_and_gradients False [2023-04-24 13:47:15,957] [INFO] [config.py:957:print] global_rank .................. 0 [2023-04-24 13:47:15,957] [INFO] [config.py:957:print] grad_accum_dtype ............. None [2023-04-24 13:47:15,957] [INFO] [config.py:957:print] gradient_accumulation_steps .. 8 [2023-04-24 13:47:15,957] [INFO] [config.py:957:print] gradient_clipping ............ 1.0 [2023-04-24 13:47:15,957] [INFO] [config.py:957:print] gradient_predivide_factor .... 1.0 [2023-04-24 13:47:15,957] [INFO] [config.py:957:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 [2023-04-24 13:47:15,957] [INFO] [config.py:957:print] initial_dynamic_scale ........ 65536 [2023-04-24 13:47:15,957] [INFO] [config.py:957:print] load_universal_checkpoint .... False [2023-04-24 13:47:15,957] [INFO] [config.py:957:print] loss_scale ................... 0 [2023-04-24 13:47:15,957] [INFO] [config.py:957:print] memory_breakdown ............. False [2023-04-24 13:47:15,958] [INFO] [config.py:957:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False [2023-04-24 13:47:15,958] [INFO] [config.py:957:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2023-04-24 13:47:15,958] [INFO] [config.py:957:print] optimizer_legacy_fusion ...... False [2023-04-24 13:47:15,958] [INFO] [config.py:957:print] optimizer_name ............... None [2023-04-24 13:47:15,958] [INFO] [config.py:957:print] optimizer_params ............. None [2023-04-24 13:47:15,958] [INFO] [config.py:957:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2023-04-24 13:47:15,958] [INFO] [config.py:957:print] pld_enabled .................. False [2023-04-24 13:47:15,958] [INFO] [config.py:957:print] pld_params ................... False [2023-04-24 13:47:15,958] [INFO] [config.py:957:print] prescale_gradients ........... False [2023-04-24 13:47:15,958] [INFO] [config.py:957:print] scheduler_name ............... None [2023-04-24 13:47:15,958] [INFO] [config.py:957:print] scheduler_params ............. None [2023-04-24 13:47:15,958] [INFO] [config.py:957:print] sparse_attention ............. None [2023-04-24 13:47:15,958] [INFO] [config.py:957:print] sparse_gradients_enabled ..... False [2023-04-24 13:47:15,958] [INFO] [config.py:957:print] steps_per_print .............. 10 [2023-04-24 13:47:15,958] [INFO] [config.py:957:print] train_batch_size ............. 64 [2023-04-24 13:47:15,958] [INFO] [config.py:957:print] train_micro_batch_size_per_gpu 4 [2023-04-24 13:47:15,958] [INFO] [config.py:957:print] use_node_local_storage ....... False [2023-04-24 13:47:15,958] [INFO] [config.py:957:print] wall_clock_breakdown ......... False [2023-04-24 13:47:15,958] [INFO] [config.py:957:print] world_size ................... 2 [2023-04-24 13:47:15,958] [INFO] [config.py:957:print] zero_allow_untested_optimizer False [2023-04-24 13:47:15,958] [INFO] [config.py:957:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False memory_efficient_linear=False [2023-04-24 13:47:15,958] [INFO] [config.py:957:print] zero_enabled ................. False [2023-04-24 13:47:15,958] [INFO] [config.py:957:print] zero_force_ds_cpu_optimizer .. True [2023-04-24 13:47:15,958] [INFO] [config.py:957:print] zero_optimization_stage ...... 0 [2023-04-24 13:47:15,958] [INFO] [config.py:943:print_user_config] json = { "train_batch_size": 64, "train_micro_batch_size_per_gpu": 4, "steps_per_print": 10, "zero_optimization": { "stage": 0, "stage3_param_persistence_threshold": 1.000000e+04, "offload_param": { "device": "none" }, "memory_efficient_linear": false }, "fp16": { "enabled": true }, "gradient_clipping": 1.0, "prescale_gradients": false, "wall_clock_breakdown": false } Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.001409769058227539 seconds *******************[end] Initialized Ref Model [end] (duration: 12.36s)******************* *************************[start] Initializing EMA Model [start] ************************** [2023-04-24 13:47:26,004] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.0, git-hash=unknown, git-branch=unknown Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0011205673217773438 seconds [2023-04-24 13:47:28,816] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False [2023-04-24 13:47:28,818] [INFO] [config.py:953:print] DeepSpeedEngine configuration: [2023-04-24 13:47:28,818] [INFO] [config.py:957:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2023-04-24 13:47:28,818] [INFO] [config.py:957:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2023-04-24 13:47:28,818] [INFO] [config.py:957:print] amp_enabled .................. False [2023-04-24 13:47:28,818] [INFO] [config.py:957:print] amp_params ................... False [2023-04-24 13:47:28,818] [INFO] [config.py:957:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2023-04-24 13:47:28,818] [INFO] [config.py:957:print] bfloat16_enabled ............. False [2023-04-24 13:47:28,818] [INFO] [config.py:957:print] checkpoint_parallel_write_pipeline False [2023-04-24 13:47:28,818] [INFO] [config.py:957:print] checkpoint_tag_validation_enabled True [2023-04-24 13:47:28,818] [INFO] [config.py:957:print] checkpoint_tag_validation_fail False [2023-04-24 13:47:28,818] [INFO] [config.py:957:print] comms_config ................. [2023-04-24 13:47:28,818] [INFO] [config.py:957:print] communication_data_type ...... None [2023-04-24 13:47:28,818] [INFO] [config.py:957:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2023-04-24 13:47:28,819] [INFO] [config.py:957:print] curriculum_enabled_legacy .... False [2023-04-24 13:47:28,819] [INFO] [config.py:957:print] curriculum_params_legacy ..... False [2023-04-24 13:47:28,819] [INFO] [config.py:957:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} [2023-04-24 13:47:28,819] [INFO] [config.py:957:print] data_efficiency_enabled ...... False [2023-04-24 13:47:28,819] [INFO] [config.py:957:print] dataloader_drop_last ......... False [2023-04-24 13:47:28,819] [INFO] [config.py:957:print] disable_allgather ............ False [2023-04-24 13:47:28,819] [INFO] [config.py:957:print] dump_state ................... False [2023-04-24 13:47:28,819] [INFO] [config.py:957:print] dynamic_loss_scale_args ...... None [2023-04-24 13:47:28,819] [INFO] [config.py:957:print] eigenvalue_enabled ........... False [2023-04-24 13:47:28,819] [INFO] [config.py:957:print] eigenvalue_gas_boundary_resolution 1 [2023-04-24 13:47:28,819] [INFO] [config.py:957:print] eigenvalue_layer_name ........ bert.encoder.layer [2023-04-24 13:47:28,819] [INFO] [config.py:957:print] eigenvalue_layer_num ......... 0 [2023-04-24 13:47:28,819] [INFO] [config.py:957:print] eigenvalue_max_iter .......... 100 [2023-04-24 13:47:28,819] [INFO] [config.py:957:print] eigenvalue_stability ......... 1e-06 [2023-04-24 13:47:28,819] [INFO] [config.py:957:print] eigenvalue_tol ............... 0.01 [2023-04-24 13:47:28,819] [INFO] [config.py:957:print] eigenvalue_verbose ........... False [2023-04-24 13:47:28,819] [INFO] [config.py:957:print] elasticity_enabled ........... False [2023-04-24 13:47:28,819] [INFO] [config.py:957:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2023-04-24 13:47:28,819] [INFO] [config.py:957:print] fp16_auto_cast ............... False [2023-04-24 13:47:28,819] [INFO] [config.py:957:print] fp16_enabled ................. True [2023-04-24 13:47:28,819] [INFO] [config.py:957:print] fp16_master_weights_and_gradients False [2023-04-24 13:47:28,819] [INFO] [config.py:957:print] global_rank .................. 0 [2023-04-24 13:47:28,819] [INFO] [config.py:957:print] grad_accum_dtype ............. None [2023-04-24 13:47:28,819] [INFO] [config.py:957:print] gradient_accumulation_steps .. 8 [2023-04-24 13:47:28,819] [INFO] [config.py:957:print] gradient_clipping ............ 1.0 [2023-04-24 13:47:28,819] [INFO] [config.py:957:print] gradient_predivide_factor .... 1.0 [2023-04-24 13:47:28,819] [INFO] [config.py:957:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 [2023-04-24 13:47:28,819] [INFO] [config.py:957:print] initial_dynamic_scale ........ 65536 [2023-04-24 13:47:28,819] [INFO] [config.py:957:print] load_universal_checkpoint .... False [2023-04-24 13:47:28,819] [INFO] [config.py:957:print] loss_scale ................... 0 [2023-04-24 13:47:28,819] [INFO] [config.py:957:print] memory_breakdown ............. False [2023-04-24 13:47:28,819] [INFO] [config.py:957:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False [2023-04-24 13:47:28,820] [INFO] [config.py:957:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2023-04-24 13:47:28,820] [INFO] [config.py:957:print] optimizer_legacy_fusion ...... False [2023-04-24 13:47:28,820] [INFO] [config.py:957:print] optimizer_name ............... None [2023-04-24 13:47:28,820] [INFO] [config.py:957:print] optimizer_params ............. None [2023-04-24 13:47:28,820] [INFO] [config.py:957:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2023-04-24 13:47:28,820] [INFO] [config.py:957:print] pld_enabled .................. False [2023-04-24 13:47:28,820] [INFO] [config.py:957:print] pld_params ................... False [2023-04-24 13:47:28,820] [INFO] [config.py:957:print] prescale_gradients ........... False [2023-04-24 13:47:28,820] [INFO] [config.py:957:print] scheduler_name ............... None [2023-04-24 13:47:28,820] [INFO] [config.py:957:print] scheduler_params ............. None [2023-04-24 13:47:28,820] [INFO] [config.py:957:print] sparse_attention ............. None [2023-04-24 13:47:28,820] [INFO] [config.py:957:print] sparse_gradients_enabled ..... False [2023-04-24 13:47:28,820] [INFO] [config.py:957:print] steps_per_print .............. 10 [2023-04-24 13:47:28,820] [INFO] [config.py:957:print] train_batch_size ............. 64 [2023-04-24 13:47:28,820] [INFO] [config.py:957:print] train_micro_batch_size_per_gpu 4 [2023-04-24 13:47:28,820] [INFO] [config.py:957:print] use_node_local_storage ....... False [2023-04-24 13:47:28,820] [INFO] [config.py:957:print] wall_clock_breakdown ......... False [2023-04-24 13:47:28,820] [INFO] [config.py:957:print] world_size ................... 2 [2023-04-24 13:47:28,820] [INFO] [config.py:957:print] zero_allow_untested_optimizer False [2023-04-24 13:47:28,820] [INFO] [config.py:957:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False memory_efficient_linear=False [2023-04-24 13:47:28,820] [INFO] [config.py:957:print] zero_enabled ................. False [2023-04-24 13:47:28,820] [INFO] [config.py:957:print] zero_force_ds_cpu_optimizer .. True [2023-04-24 13:47:28,820] [INFO] [config.py:957:print] zero_optimization_stage ...... 0 [2023-04-24 13:47:28,820] [INFO] [config.py:943:print_user_config] json = { "train_batch_size": 64, "train_micro_batch_size_per_gpu": 4, "steps_per_print": 10, "zero_optimization": { "stage": 0, "stage3_param_persistence_threshold": 1.000000e+04, "offload_param": { "device": "none" }, "memory_efficient_linear": false }, "fp16": { "enabled": true }, "gradient_clipping": 1.0, "prescale_gradients": false, "wall_clock_breakdown": false } Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0008184909820556641 seconds *******************[end] Initialized EMA Model [end] (duration: 12.86s)******************* ************************[start] Initializing Critic Model [start] ************************ huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Installed CUDA version 11.0 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module fused_adam, skipping build step... Loading extension module fused_adam... Time to load fused_adam op: 0.0017685890197753906 seconds huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either: - Avoid using `tokenizers` before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) Installed CUDA version 11.0 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module fused_adam, skipping build step... Loading extension module fused_adam... Time to load fused_adam op: 0.0017170906066894531 seconds [2023-04-24 13:47:34,230] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.0, git-hash=unknown, git-branch=unknown Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0005674362182617188 seconds [2023-04-24 13:47:34,716] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False [2023-04-24 13:47:34,717] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer [2023-04-24 13:47:34,718] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer [2023-04-24 13:47:34,733] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam [2023-04-24 13:47:34,733] [INFO] [utils.py:51:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2023-04-24 13:47:34,733] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.float16 ZeRO stage 2 optimizer [2023-04-24 13:47:34,733] [INFO] [stage_1_and_2.py:133:__init__] Reduce bucket size 500,000,000 [2023-04-24 13:47:34,733] [INFO] [stage_1_and_2.py:134:__init__] Allgather bucket size 500,000,000 [2023-04-24 13:47:34,733] [INFO] [stage_1_and_2.py:135:__init__] CPU Offload: False [2023-04-24 13:47:34,733] [INFO] [stage_1_and_2.py:136:__init__] Round robin gradient partitioning: False Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0004725456237792969 seconds Rank: 1 partition count [2, 2] and sizes[(165463296, False), (135168, False)] Rank: 0 partition count [2, 2] and sizes[(165463296, False), (135168, False)] Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.001280069351196289 seconds [2023-04-24 13:47:36,306] [INFO] [utils.py:785:see_memory_usage] Before initializing optimizer states [2023-04-24 13:47:36,307] [INFO] [utils.py:786:see_memory_usage] MA 16.53 GB Max_MA 16.53 GB CA 16.77 GB Max_CA 17 GB [2023-04-24 13:47:36,307] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 140.12 GB, percent = 22.3% [2023-04-24 13:47:36,531] [INFO] [utils.py:785:see_memory_usage] After initializing optimizer states [2023-04-24 13:47:36,531] [INFO] [utils.py:786:see_memory_usage] MA 17.76 GB Max_MA 18.38 GB CA 18.62 GB Max_CA 19 GB [2023-04-24 13:47:36,532] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 140.23 GB, percent = 22.3% [2023-04-24 13:47:36,532] [INFO] [stage_1_and_2.py:489:__init__] optimizer state initialized [2023-04-24 13:47:36,753] [INFO] [utils.py:785:see_memory_usage] After initializing ZeRO optimizer [2023-04-24 13:47:36,754] [INFO] [utils.py:786:see_memory_usage] MA 17.76 GB Max_MA 17.76 GB CA 18.62 GB Max_CA 19 GB [2023-04-24 13:47:36,754] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 140.35 GB, percent = 22.3% [2023-04-24 13:47:36,760] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2023-04-24 13:47:36,760] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler [2023-04-24 13:47:36,760] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2023-04-24 13:47:36,760] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 13:47:36,761] [INFO] [config.py:953:print] DeepSpeedEngine configuration: [2023-04-24 13:47:36,761] [INFO] [config.py:957:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2023-04-24 13:47:36,761] [INFO] [config.py:957:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2023-04-24 13:47:36,761] [INFO] [config.py:957:print] amp_enabled .................. False [2023-04-24 13:47:36,761] [INFO] [config.py:957:print] amp_params ................... False [2023-04-24 13:47:36,761] [INFO] [config.py:957:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2023-04-24 13:47:36,761] [INFO] [config.py:957:print] bfloat16_enabled ............. False [2023-04-24 13:47:36,761] [INFO] [config.py:957:print] checkpoint_parallel_write_pipeline False [2023-04-24 13:47:36,761] [INFO] [config.py:957:print] checkpoint_tag_validation_enabled True [2023-04-24 13:47:36,761] [INFO] [config.py:957:print] checkpoint_tag_validation_fail False [2023-04-24 13:47:36,762] [INFO] [config.py:957:print] comms_config ................. [2023-04-24 13:47:36,762] [INFO] [config.py:957:print] communication_data_type ...... None [2023-04-24 13:47:36,762] [INFO] [config.py:957:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2023-04-24 13:47:36,762] [INFO] [config.py:957:print] curriculum_enabled_legacy .... False [2023-04-24 13:47:36,762] [INFO] [config.py:957:print] curriculum_params_legacy ..... False [2023-04-24 13:47:36,762] [INFO] [config.py:957:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} [2023-04-24 13:47:36,762] [INFO] [config.py:957:print] data_efficiency_enabled ...... False [2023-04-24 13:47:36,762] [INFO] [config.py:957:print] dataloader_drop_last ......... False [2023-04-24 13:47:36,762] [INFO] [config.py:957:print] disable_allgather ............ False [2023-04-24 13:47:36,762] [INFO] [config.py:957:print] dump_state ................... False [2023-04-24 13:47:36,762] [INFO] [config.py:957:print] dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 100, 'delayed_shift': 2, 'min_scale': 1} [2023-04-24 13:47:36,762] [INFO] [config.py:957:print] eigenvalue_enabled ........... False [2023-04-24 13:47:36,762] [INFO] [config.py:957:print] eigenvalue_gas_boundary_resolution 1 [2023-04-24 13:47:36,762] [INFO] [config.py:957:print] eigenvalue_layer_name ........ bert.encoder.layer [2023-04-24 13:47:36,762] [INFO] [config.py:957:print] eigenvalue_layer_num ......... 0 [2023-04-24 13:47:36,762] [INFO] [config.py:957:print] eigenvalue_max_iter .......... 100 [2023-04-24 13:47:36,762] [INFO] [config.py:957:print] eigenvalue_stability ......... 1e-06 [2023-04-24 13:47:36,762] [INFO] [config.py:957:print] eigenvalue_tol ............... 0.01 [2023-04-24 13:47:36,762] [INFO] [config.py:957:print] eigenvalue_verbose ........... False [2023-04-24 13:47:36,762] [INFO] [config.py:957:print] elasticity_enabled ........... False [2023-04-24 13:47:36,762] [INFO] [config.py:957:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2023-04-24 13:47:36,762] [INFO] [config.py:957:print] fp16_auto_cast ............... False [2023-04-24 13:47:36,762] [INFO] [config.py:957:print] fp16_enabled ................. True [2023-04-24 13:47:36,762] [INFO] [config.py:957:print] fp16_master_weights_and_gradients False [2023-04-24 13:47:36,762] [INFO] [config.py:957:print] global_rank .................. 0 [2023-04-24 13:47:36,762] [INFO] [config.py:957:print] grad_accum_dtype ............. None [2023-04-24 13:47:36,762] [INFO] [config.py:957:print] gradient_accumulation_steps .. 8 [2023-04-24 13:47:36,762] [INFO] [config.py:957:print] gradient_clipping ............ 1.0 [2023-04-24 13:47:36,762] [INFO] [config.py:957:print] gradient_predivide_factor .... 1.0 [2023-04-24 13:47:36,762] [INFO] [config.py:957:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 [2023-04-24 13:47:36,762] [INFO] [config.py:957:print] initial_dynamic_scale ........ 65536 [2023-04-24 13:47:36,762] [INFO] [config.py:957:print] load_universal_checkpoint .... False [2023-04-24 13:47:36,762] [INFO] [config.py:957:print] loss_scale ................... 0 [2023-04-24 13:47:36,762] [INFO] [config.py:957:print] memory_breakdown ............. False [2023-04-24 13:47:36,763] [INFO] [config.py:957:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False [2023-04-24 13:47:36,763] [INFO] [config.py:957:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2023-04-24 13:47:36,763] [INFO] [config.py:957:print] optimizer_legacy_fusion ...... False [2023-04-24 13:47:36,763] [INFO] [config.py:957:print] optimizer_name ............... None [2023-04-24 13:47:36,763] [INFO] [config.py:957:print] optimizer_params ............. None [2023-04-24 13:47:36,763] [INFO] [config.py:957:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2023-04-24 13:47:36,763] [INFO] [config.py:957:print] pld_enabled .................. False [2023-04-24 13:47:36,763] [INFO] [config.py:957:print] pld_params ................... False [2023-04-24 13:47:36,763] [INFO] [config.py:957:print] prescale_gradients ........... False [2023-04-24 13:47:36,763] [INFO] [config.py:957:print] scheduler_name ............... None [2023-04-24 13:47:36,763] [INFO] [config.py:957:print] scheduler_params ............. None [2023-04-24 13:47:36,763] [INFO] [config.py:957:print] sparse_attention ............. None [2023-04-24 13:47:36,763] [INFO] [config.py:957:print] sparse_gradients_enabled ..... False [2023-04-24 13:47:36,763] [INFO] [config.py:957:print] steps_per_print .............. 10 [2023-04-24 13:47:36,763] [INFO] [config.py:957:print] train_batch_size ............. 64 [2023-04-24 13:47:36,763] [INFO] [config.py:957:print] train_micro_batch_size_per_gpu 4 [2023-04-24 13:47:36,763] [INFO] [config.py:957:print] use_node_local_storage ....... False [2023-04-24 13:47:36,763] [INFO] [config.py:957:print] wall_clock_breakdown ......... False [2023-04-24 13:47:36,763] [INFO] [config.py:957:print] world_size ................... 2 [2023-04-24 13:47:36,763] [INFO] [config.py:957:print] zero_allow_untested_optimizer False [2023-04-24 13:47:36,763] [INFO] [config.py:957:print] zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='none', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=30000000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=30000000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False memory_efficient_linear=False [2023-04-24 13:47:36,763] [INFO] [config.py:957:print] zero_enabled ................. True [2023-04-24 13:47:36,763] [INFO] [config.py:957:print] zero_force_ds_cpu_optimizer .. True [2023-04-24 13:47:36,763] [INFO] [config.py:957:print] zero_optimization_stage ...... 2 [2023-04-24 13:47:36,763] [INFO] [config.py:943:print_user_config] json = { "train_batch_size": 64, "train_micro_batch_size_per_gpu": 4, "steps_per_print": 10, "zero_optimization": { "stage": 2, "offload_param": { "device": "none" }, "offload_optimizer": { "device": "none" }, "stage3_param_persistence_threshold": 1.000000e+04, "stage3_max_live_parameters": 3.000000e+07, "stage3_prefetch_bucket_size": 3.000000e+07, "memory_efficient_linear": false }, "fp16": { "enabled": true, "loss_scale_window": 100 }, "gradient_clipping": 1.0, "prescale_gradients": false, "wall_clock_breakdown": false, "hybrid_engine": { "enabled": false, "max_out_tokens": 512, "inference_tp_size": 1, "release_inference_cache": false, "pin_parameters": true, "tp_gather_partition_size": 8 } } Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.00041937828063964844 seconds ******************[end] Initialized Critic Model [end] (duration: 7.94s)****************** ************************[start] Initializing Reward Model [start] ************************ Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0018873214721679688 seconds [2023-04-24 13:47:41,376] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.0, git-hash=unknown, git-branch=unknown [2023-04-24 13:47:41,885] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False [2023-04-24 13:47:41,887] [INFO] [config.py:953:print] DeepSpeedEngine configuration: [2023-04-24 13:47:41,887] [INFO] [config.py:957:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2023-04-24 13:47:41,887] [INFO] [config.py:957:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2023-04-24 13:47:41,887] [INFO] [config.py:957:print] amp_enabled .................. False [2023-04-24 13:47:41,887] [INFO] [config.py:957:print] amp_params ................... False [2023-04-24 13:47:41,887] [INFO] [config.py:957:print] autotuning_config ............ { "enabled": false, "start_step": null, "end_step": null, "metric_path": null, "arg_mappings": null, "metric": "throughput", "model_info": null, "results_dir": "autotuning_results", "exps_dir": "autotuning_exps", "overwrite": true, "fast": true, "start_profile_step": 3, "end_profile_step": 5, "tuner_type": "gridsearch", "tuner_early_stopping": 5, "tuner_num_trials": 50, "model_info_path": null, "mp_size": 1, "max_train_batch_size": null, "min_train_batch_size": 1, "max_train_micro_batch_size_per_gpu": 1.024000e+03, "min_train_micro_batch_size_per_gpu": 1, "num_tuning_micro_batch_sizes": 3 } [2023-04-24 13:47:41,887] [INFO] [config.py:957:print] bfloat16_enabled ............. False [2023-04-24 13:47:41,887] [INFO] [config.py:957:print] checkpoint_parallel_write_pipeline False [2023-04-24 13:47:41,887] [INFO] [config.py:957:print] checkpoint_tag_validation_enabled True [2023-04-24 13:47:41,887] [INFO] [config.py:957:print] checkpoint_tag_validation_fail False [2023-04-24 13:47:41,887] [INFO] [config.py:957:print] comms_config ................. [2023-04-24 13:47:41,887] [INFO] [config.py:957:print] communication_data_type ...... None [2023-04-24 13:47:41,887] [INFO] [config.py:957:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}} [2023-04-24 13:47:41,887] [INFO] [config.py:957:print] curriculum_enabled_legacy .... False [2023-04-24 13:47:41,887] [INFO] [config.py:957:print] curriculum_params_legacy ..... False [2023-04-24 13:47:41,887] [INFO] [config.py:957:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}} [2023-04-24 13:47:41,887] [INFO] [config.py:957:print] data_efficiency_enabled ...... False [2023-04-24 13:47:41,887] [INFO] [config.py:957:print] dataloader_drop_last ......... False [2023-04-24 13:47:41,887] [INFO] [config.py:957:print] disable_allgather ............ False [2023-04-24 13:47:41,888] [INFO] [config.py:957:print] dump_state ................... False [2023-04-24 13:47:41,888] [INFO] [config.py:957:print] dynamic_loss_scale_args ...... None [2023-04-24 13:47:41,888] [INFO] [config.py:957:print] eigenvalue_enabled ........... False [2023-04-24 13:47:41,888] [INFO] [config.py:957:print] eigenvalue_gas_boundary_resolution 1 [2023-04-24 13:47:41,888] [INFO] [config.py:957:print] eigenvalue_layer_name ........ bert.encoder.layer [2023-04-24 13:47:41,888] [INFO] [config.py:957:print] eigenvalue_layer_num ......... 0 [2023-04-24 13:47:41,888] [INFO] [config.py:957:print] eigenvalue_max_iter .......... 100 [2023-04-24 13:47:41,888] [INFO] [config.py:957:print] eigenvalue_stability ......... 1e-06 [2023-04-24 13:47:41,888] [INFO] [config.py:957:print] eigenvalue_tol ............... 0.01 [2023-04-24 13:47:41,888] [INFO] [config.py:957:print] eigenvalue_verbose ........... False [2023-04-24 13:47:41,888] [INFO] [config.py:957:print] elasticity_enabled ........... False [2023-04-24 13:47:41,888] [INFO] [config.py:957:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2023-04-24 13:47:41,888] [INFO] [config.py:957:print] fp16_auto_cast ............... False [2023-04-24 13:47:41,888] [INFO] [config.py:957:print] fp16_enabled ................. True [2023-04-24 13:47:41,888] [INFO] [config.py:957:print] fp16_master_weights_and_gradients False [2023-04-24 13:47:41,888] [INFO] [config.py:957:print] global_rank .................. 0 [2023-04-24 13:47:41,888] [INFO] [config.py:957:print] grad_accum_dtype ............. None [2023-04-24 13:47:41,888] [INFO] [config.py:957:print] gradient_accumulation_steps .. 8 [2023-04-24 13:47:41,888] [INFO] [config.py:957:print] gradient_clipping ............ 1.0 [2023-04-24 13:47:41,888] [INFO] [config.py:957:print] gradient_predivide_factor .... 1.0 [2023-04-24 13:47:41,888] [INFO] [config.py:957:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8 [2023-04-24 13:47:41,888] [INFO] [config.py:957:print] initial_dynamic_scale ........ 65536 [2023-04-24 13:47:41,888] [INFO] [config.py:957:print] load_universal_checkpoint .... False [2023-04-24 13:47:41,888] [INFO] [config.py:957:print] loss_scale ................... 0 [2023-04-24 13:47:41,888] [INFO] [config.py:957:print] memory_breakdown ............. False [2023-04-24 13:47:41,888] [INFO] [config.py:957:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False [2023-04-24 13:47:41,888] [INFO] [config.py:957:print] nebula_config ................ { "enabled": false, "persistent_storage_path": null, "persistent_time_interval": 100, "num_of_version_in_retention": 2, "enable_nebula_load": true, "load_path": null } [2023-04-24 13:47:41,888] [INFO] [config.py:957:print] optimizer_legacy_fusion ...... False [2023-04-24 13:47:41,888] [INFO] [config.py:957:print] optimizer_name ............... None [2023-04-24 13:47:41,888] [INFO] [config.py:957:print] optimizer_params ............. None [2023-04-24 13:47:41,888] [INFO] [config.py:957:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2023-04-24 13:47:41,888] [INFO] [config.py:957:print] pld_enabled .................. False [2023-04-24 13:47:41,888] [INFO] [config.py:957:print] pld_params ................... False [2023-04-24 13:47:41,888] [INFO] [config.py:957:print] prescale_gradients ........... False [2023-04-24 13:47:41,888] [INFO] [config.py:957:print] scheduler_name ............... None [2023-04-24 13:47:41,889] [INFO] [config.py:957:print] scheduler_params ............. None [2023-04-24 13:47:41,889] [INFO] [config.py:957:print] sparse_attention ............. None [2023-04-24 13:47:41,889] [INFO] [config.py:957:print] sparse_gradients_enabled ..... False [2023-04-24 13:47:41,889] [INFO] [config.py:957:print] steps_per_print .............. 10 [2023-04-24 13:47:41,889] [INFO] [config.py:957:print] train_batch_size ............. 64 [2023-04-24 13:47:41,889] [INFO] [config.py:957:print] train_micro_batch_size_per_gpu 4 [2023-04-24 13:47:41,889] [INFO] [config.py:957:print] use_node_local_storage ....... False [2023-04-24 13:47:41,889] [INFO] [config.py:957:print] wall_clock_breakdown ......... False [2023-04-24 13:47:41,889] [INFO] [config.py:957:print] world_size ................... 2 [2023-04-24 13:47:41,889] [INFO] [config.py:957:print] zero_allow_untested_optimizer False [2023-04-24 13:47:41,889] [INFO] [config.py:957:print] zero_config .................. stage=0 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='none', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False memory_efficient_linear=False [2023-04-24 13:47:41,889] [INFO] [config.py:957:print] zero_enabled ................. False [2023-04-24 13:47:41,889] [INFO] [config.py:957:print] zero_force_ds_cpu_optimizer .. True [2023-04-24 13:47:41,889] [INFO] [config.py:957:print] zero_optimization_stage ...... 0 [2023-04-24 13:47:41,889] [INFO] [config.py:943:print_user_config] json = { "train_batch_size": 64, "train_micro_batch_size_per_gpu": 4, "steps_per_print": 10, "zero_optimization": { "stage": 0, "stage3_param_persistence_threshold": 1.000000e+04, "offload_param": { "device": "none" }, "memory_efficient_linear": false }, "fp16": { "enabled": true }, "gradient_clipping": 1.0, "prescale_gradients": false, "wall_clock_breakdown": false } Using /root/.cache/torch_extensions/py38_cu117 as PyTorch extensions root... No modifications detected for re-loaded extension module utils, skipping build step... Loading extension module utils... Time to load utils op: 0.0007071495056152344 seconds ******************[end] Initialized Reward Model [end] (duration: 5.13s)****************** ***** Running training ***** Beginning of Epoch 1/1, Total Generation Batches 3813 ------------------------------------------------------ Free memory : 20.200195 (GigaBytes) Total memory: 39.423828 (GigaBytes) Requested memory: 0.515625 (GigaBytes) Setting maximum total tokens (input + output) to 512 WorkSpace: 0x7fe12a000000 ------------------------------------------------------ epoch: 0|step: 0|ppo_ep: 1|act_loss: -0.1026611328125|cri_loss: -0.013671875|unsuper_loss: 0.0 average reward score: 1.009765625 ------------------------------------------------------------------------------------- |E2E latency=3.88s |Gather latency=0.00s (0.00%) |Generate time=3.02s (77.76%) |Training time=0.42s (10.80%) |Others=0.44 (11.44%)|CurSamplesPerSec=2.06 |AvgSamplesPerSec=2.06 epoch: 0|step: 1|ppo_ep: 1|act_loss: 0.2254638671875|cri_loss: 0.197998046875|unsuper_loss: 0.0 average reward score: -1.08984375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.28s (69.91%) |Training time=0.55s (16.97%) |Others=0.43 (13.13%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.24 epoch: 0|step: 2|ppo_ep: 1|act_loss: 0.515625|cri_loss: 0.326416015625|unsuper_loss: 0.0 average reward score: -1.33203125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.27%) |Training time=0.41s (12.54%) |Others=0.43 (13.19%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.31 epoch: 0|step: 3|ppo_ep: 1|act_loss: 0.03265380859375|cri_loss: 0.037841796875|unsuper_loss: 0.0 average reward score: -2.294921875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.06%) |Training time=0.46s (14.03%) |Others=0.42 (12.91%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.34 epoch: 0|step: 4|ppo_ep: 1|act_loss: 0.4609375|cri_loss: 0.26806640625|unsuper_loss: 0.0 average reward score: -3.42578125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.15%) |Training time=0.41s (12.72%) |Others=0.42 (13.13%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.37 epoch: 0|step: 5|ppo_ep: 1|act_loss: 0.26708984375|cri_loss: 0.1539306640625|unsuper_loss: 0.0 average reward score: -2.1015625 ------------------------------------------------------------------------------------- |E2E latency=3.16s |Gather latency=0.00s (0.00%) |Generate time=2.30s (72.80%) |Training time=0.43s (13.75%) |Others=0.42 (13.46%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.40 epoch: 0|step: 6|ppo_ep: 1|act_loss: 0.2822265625|cri_loss: 0.15966796875|unsuper_loss: 0.0 average reward score: -1.7958984375 ------------------------------------------------------------------------------------- |E2E latency=3.16s |Gather latency=0.00s (0.00%) |Generate time=2.30s (72.87%) |Training time=0.43s (13.73%) |Others=0.42 (13.40%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.41 [2023-04-24 13:48:08,669] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 epoch: 0|step: 7|ppo_ep: 1|act_loss: 0.270263671875|cri_loss: 0.157958984375|unsuper_loss: 0.0 average reward score: -2.025390625 ------------------------------------------------------------------------------------- |E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.30s (64.94%) |Training time=0.78s (21.90%) |Others=0.47 (13.16%)|CurSamplesPerSec=2.26 |AvgSamplesPerSec=2.39 epoch: 0|step: 8|ppo_ep: 1|act_loss: 0.36181640625|cri_loss: 0.2052001953125|unsuper_loss: 0.0 average reward score: -1.4013671875 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.31s (72.86%) |Training time=0.43s (13.51%) |Others=0.43 (13.63%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.41 epoch: 0|step: 9|ppo_ep: 1|act_loss: 0.35693359375|cri_loss: 0.25244140625|unsuper_loss: 0.0 average reward score: -1.5322265625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.36%) |Training time=0.40s (12.53%) |Others=0.42 (13.11%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 10|ppo_ep: 1|act_loss: 0.20751953125|cri_loss: 0.12066650390625|unsuper_loss: 0.0 average reward score: -1.716796875 ------------------------------------------------------------------------------------- |E2E latency=3.15s |Gather latency=0.00s (0.00%) |Generate time=2.31s (73.39%) |Training time=0.41s (12.93%) |Others=0.43 (13.68%)|CurSamplesPerSec=2.54 |AvgSamplesPerSec=2.42 epoch: 0|step: 11|ppo_ep: 1|act_loss: 0.142578125|cri_loss: 0.11029052734375|unsuper_loss: 0.0 average reward score: 0.05224609375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.41%) |Training time=0.41s (12.43%) |Others=0.46 (14.16%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.43 epoch: 0|step: 12|ppo_ep: 1|act_loss: 0.188720703125|cri_loss: 0.127197265625|unsuper_loss: 0.0 average reward score: -0.81884765625 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.78%) |Training time=0.41s (12.28%) |Others=0.43 (12.94%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.43 epoch: 0|step: 13|ppo_ep: 1|act_loss: 0.095458984375|cri_loss: 0.10296630859375|unsuper_loss: 0.0 average reward score: 0.8662109375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.34s (72.55%) |Training time=0.46s (14.24%) |Others=0.43 (13.22%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.43 epoch: 0|step: 14|ppo_ep: 1|act_loss: 0.2081298828125|cri_loss: 0.1314697265625|unsuper_loss: 0.0 average reward score: 1.025390625 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.41%) |Training time=0.41s (12.43%) |Others=0.44 (13.16%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.43 [2023-04-24 13:48:34,517] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 [2023-04-24 13:48:34,704] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 epoch: 0|step: 15|ppo_ep: 1|act_loss: 0.6826171875|cri_loss: 0.4306640625|unsuper_loss: 0.0 average reward score: -2.521484375 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.48s (73.81%) |Training time=0.67s (20.04%) |Others=0.21 (6.15%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.43 epoch: 0|step: 16|ppo_ep: 1|act_loss: 0.3896484375|cri_loss: 0.259033203125|unsuper_loss: 0.0 average reward score: -0.041015625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.32%) |Training time=0.64s (19.77%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.43 epoch: 0|step: 17|ppo_ep: 1|act_loss: 0.19189453125|cri_loss: 0.1219482421875|unsuper_loss: 0.0 average reward score: -1.650390625 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.40s (71.85%) |Training time=0.75s (22.27%) |Others=0.20 (5.88%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.43 epoch: 0|step: 18|ppo_ep: 1|act_loss: 0.50927734375|cri_loss: 0.29638671875|unsuper_loss: 0.0 average reward score: -1.5625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.28%) |Training time=0.64s (19.79%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.43 epoch: 0|step: 19|ppo_ep: 1|act_loss: 0.322509765625|cri_loss: 0.19921875|unsuper_loss: 0.0 average reward score: -2.24609375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.05%) |Training time=0.64s (19.72%) |Others=0.20 (6.23%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.43 epoch: 0|step: 20|ppo_ep: 1|act_loss: 0.3232421875|cri_loss: 0.224609375|unsuper_loss: 0.0 average reward score: -1.021484375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.41%) |Training time=0.64s (19.61%) |Others=0.20 (5.98%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.43 epoch: 0|step: 21|ppo_ep: 1|act_loss: 0.481201171875|cri_loss: 0.280517578125|unsuper_loss: 0.0 average reward score: -2.18359375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.56%) |Training time=0.64s (19.44%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.43 epoch: 0|step: 22|ppo_ep: 1|act_loss: 0.149169921875|cri_loss: 0.102294921875|unsuper_loss: 0.0 average reward score: -0.18798828125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.30%) |Training time=0.64s (19.78%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.43 epoch: 0|step: 23|ppo_ep: 1|act_loss: 0.329833984375|cri_loss: 0.209228515625|unsuper_loss: 0.0 average reward score: 0.2353515625 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.37s (66.33%) |Training time=0.93s (25.98%) |Others=0.28 (7.69%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.42 epoch: 0|step: 24|ppo_ep: 1|act_loss: 0.2724609375|cri_loss: 0.15869140625|unsuper_loss: 0.0 average reward score: -2.400390625 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.53s (75.28%) |Training time=0.64s (18.94%) |Others=0.19 (5.77%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.42 epoch: 0|step: 25|ppo_ep: 1|act_loss: 0.48828125|cri_loss: 0.2744140625|unsuper_loss: 0.0 average reward score: -1.833984375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.30%) |Training time=0.64s (19.66%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 26|ppo_ep: 1|act_loss: 0.3310546875|cri_loss: 0.205810546875|unsuper_loss: 0.0 average reward score: -1.357421875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.65%) |Training time=0.64s (19.34%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.42 epoch: 0|step: 27|ppo_ep: 1|act_loss: 0.292236328125|cri_loss: 0.201171875|unsuper_loss: 0.0 average reward score: -2.30859375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.13%) |Training time=0.64s (19.88%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.43 epoch: 0|step: 28|ppo_ep: 1|act_loss: 0.357421875|cri_loss: 0.2344970703125|unsuper_loss: 0.0 average reward score: -1.1953125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.63%) |Training time=0.65s (20.00%) |Others=0.21 (6.37%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.43 epoch: 0|step: 29|ppo_ep: 1|act_loss: 0.1160888671875|cri_loss: 0.092041015625|unsuper_loss: 0.0 average reward score: -1.5625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.08%) |Training time=0.65s (20.06%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.43 epoch: 0|step: 30|ppo_ep: 1|act_loss: 0.10723876953125|cri_loss: 0.09051513671875|unsuper_loss: 0.0 average reward score: 0.38720703125 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.94%) |Training time=0.64s (20.04%) |Others=0.19 (6.02%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.43 [2023-04-24 13:49:27,135] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 epoch: 0|step: 31|ppo_ep: 1|act_loss: 0.466552734375|cri_loss: 0.35888671875|unsuper_loss: 0.0 average reward score: -0.49462890625 ------------------------------------------------------------------------------------- |E2E latency=3.42s |Gather latency=0.00s (0.00%) |Generate time=2.49s (72.71%) |Training time=0.65s (19.12%) |Others=0.28 (8.17%)|CurSamplesPerSec=2.34 |AvgSamplesPerSec=2.43 epoch: 0|step: 32|ppo_ep: 1|act_loss: 0.282470703125|cri_loss: 0.166015625|unsuper_loss: 0.0 average reward score: -1.283203125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.34s (72.69%) |Training time=0.64s (20.03%) |Others=0.23 (7.28%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.43 epoch: 0|step: 33|ppo_ep: 1|act_loss: 0.4033203125|cri_loss: 0.296142578125|unsuper_loss: 0.0 average reward score: -0.029541015625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.35%) |Training time=0.64s (19.72%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.43 epoch: 0|step: 34|ppo_ep: 1|act_loss: 0.20166015625|cri_loss: 0.1737060546875|unsuper_loss: 0.0 average reward score: -0.6552734375 ------------------------------------------------------------------------------------- |E2E latency=3.16s |Gather latency=0.00s (0.00%) |Generate time=2.32s (73.54%) |Training time=0.64s (20.17%) |Others=0.20 (6.29%)|CurSamplesPerSec=2.54 |AvgSamplesPerSec=2.43 epoch: 0|step: 35|ppo_ep: 1|act_loss: 0.18994140625|cri_loss: 0.1446533203125|unsuper_loss: 0.0 average reward score: 1.1455078125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.41%) |Training time=0.64s (19.65%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.43 epoch: 0|step: 36|ppo_ep: 1|act_loss: 0.277099609375|cri_loss: 0.190673828125|unsuper_loss: 0.0 average reward score: -0.132080078125 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.27%) |Training time=0.66s (19.89%) |Others=0.20 (5.84%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.43 epoch: 0|step: 37|ppo_ep: 1|act_loss: 0.3974609375|cri_loss: 0.25|unsuper_loss: 0.0 average reward score: -2.375 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.90%) |Training time=0.64s (20.04%) |Others=0.19 (6.06%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.43 epoch: 0|step: 38|ppo_ep: 1|act_loss: 0.3408203125|cri_loss: 0.1827392578125|unsuper_loss: 0.0 average reward score: -2.771484375 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.35%) |Training time=0.65s (20.43%) |Others=0.20 (6.23%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.44 epoch: 0|step: 39|ppo_ep: 1|act_loss: 0.3193359375|cri_loss: 0.218994140625|unsuper_loss: 0.0 average reward score: -0.4091796875 ------------------------------------------------------------------------------------- |E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.34s (65.88%) |Training time=0.93s (26.21%) |Others=0.28 (7.91%)|CurSamplesPerSec=2.26 |AvgSamplesPerSec=2.43 epoch: 0|step: 40|ppo_ep: 1|act_loss: 0.35595703125|cri_loss: 0.2388916015625|unsuper_loss: 0.0 average reward score: -1.6884765625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.04%) |Training time=0.64s (19.92%) |Others=0.19 (6.03%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.43 epoch: 0|step: 41|ppo_ep: 1|act_loss: 0.3408203125|cri_loss: 0.1982421875|unsuper_loss: 0.0 average reward score: -0.82373046875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.12%) |Training time=0.64s (19.75%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.43 epoch: 0|step: 42|ppo_ep: 1|act_loss: 0.48388671875|cri_loss: 0.27392578125|unsuper_loss: 0.0 average reward score: -0.67041015625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.18%) |Training time=0.64s (19.84%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.43 epoch: 0|step: 43|ppo_ep: 1|act_loss: 0.39990234375|cri_loss: 0.235107421875|unsuper_loss: 0.0 average reward score: -0.459228515625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.55%) |Training time=0.64s (19.57%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.43 epoch: 0|step: 44|ppo_ep: 1|act_loss: 0.25830078125|cri_loss: 0.153076171875|unsuper_loss: 0.0 average reward score: -2.810546875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.36%) |Training time=0.66s (20.32%) |Others=0.21 (6.32%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.44 epoch: 0|step: 45|ppo_ep: 1|act_loss: 0.252685546875|cri_loss: 0.2120361328125|unsuper_loss: 0.0 average reward score: -1.412109375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.30%) |Training time=0.64s (19.70%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.44 epoch: 0|step: 46|ppo_ep: 1|act_loss: 0.4296875|cri_loss: 0.265869140625|unsuper_loss: 0.0 average reward score: -2.18359375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.44%) |Training time=0.64s (19.73%) |Others=0.19 (5.83%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.44 epoch: 0|step: 47|ppo_ep: 1|act_loss: 0.11328125|cri_loss: 0.11639404296875|unsuper_loss: 0.0 average reward score: 1.162109375 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.42s (66.29%) |Training time=0.95s (25.97%) |Others=0.28 (7.74%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.43 epoch: 0|step: 48|ppo_ep: 1|act_loss: 0.431640625|cri_loss: 0.27001953125|unsuper_loss: 0.0 average reward score: -2.359375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.77%) |Training time=0.64s (19.38%) |Others=0.19 (5.85%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.43 epoch: 0|step: 49|ppo_ep: 1|act_loss: 0.2174072265625|cri_loss: 0.133056640625|unsuper_loss: 0.0 average reward score: -0.146240234375 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.77%) |Training time=0.65s (19.45%) |Others=0.19 (5.78%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.43 epoch: 0|step: 50|ppo_ep: 1|act_loss: 0.208251953125|cri_loss: 0.119873046875|unsuper_loss: 0.0 average reward score: -2.7265625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.53%) |Training time=0.64s (19.59%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.43 epoch: 0|step: 51|ppo_ep: 1|act_loss: 0.490234375|cri_loss: 0.28076171875|unsuper_loss: 0.0 average reward score: -2.005859375 ------------------------------------------------------------------------------------- |E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.60s (75.17%) |Training time=0.64s (18.56%) |Others=0.22 (6.28%)|CurSamplesPerSec=2.32 |AvgSamplesPerSec=2.43 epoch: 0|step: 52|ppo_ep: 1|act_loss: 0.2333984375|cri_loss: 0.162841796875|unsuper_loss: 0.0 average reward score: -0.7734375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.59%) |Training time=0.64s (19.52%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.43 epoch: 0|step: 53|ppo_ep: 1|act_loss: 0.33740234375|cri_loss: 0.281494140625|unsuper_loss: 0.0 average reward score: 0.48486328125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.52%) |Training time=0.64s (19.67%) |Others=0.19 (5.81%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.43 epoch: 0|step: 54|ppo_ep: 1|act_loss: 0.409912109375|cri_loss: 0.2431640625|unsuper_loss: 0.0 average reward score: -1.576171875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.28%) |Training time=0.64s (19.67%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.43 [2023-04-24 13:50:46,150] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384 epoch: 0|step: 55|ppo_ep: 1|act_loss: 0.299560546875|cri_loss: 0.184326171875|unsuper_loss: 0.0 average reward score: -0.03076171875 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.39s (71.94%) |Training time=0.66s (19.77%) |Others=0.28 (8.29%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.43 epoch: 0|step: 56|ppo_ep: 1|act_loss: -0.210693359375|cri_loss: -0.040283203125|unsuper_loss: 0.0 average reward score: 0.161376953125 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.22%) |Training time=0.64s (19.89%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.43 epoch: 0|step: 57|ppo_ep: 1|act_loss: 0.12408447265625|cri_loss: 0.08526611328125|unsuper_loss: 0.0 average reward score: 0.2900390625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.93%) |Training time=0.65s (20.08%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.43 epoch: 0|step: 58|ppo_ep: 1|act_loss: 0.196533203125|cri_loss: 0.1134033203125|unsuper_loss: 0.0 average reward score: -1.5810546875 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.88%) |Training time=0.64s (19.99%) |Others=0.20 (6.14%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.43 epoch: 0|step: 59|ppo_ep: 1|act_loss: 0.1123046875|cri_loss: 0.090576171875|unsuper_loss: 0.0 average reward score: -1.1875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.22%) |Training time=0.64s (19.90%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.43 epoch: 0|step: 60|ppo_ep: 1|act_loss: 0.345947265625|cri_loss: 0.2099609375|unsuper_loss: 0.0 average reward score: 0.1337890625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.38s (72.58%) |Training time=0.70s (21.47%) |Others=0.20 (5.95%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.43 epoch: 0|step: 61|ppo_ep: 1|act_loss: 0.46728515625|cri_loss: 0.28857421875|unsuper_loss: 0.0 average reward score: -1.177734375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.42%) |Training time=0.66s (20.45%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.43 epoch: 0|step: 62|ppo_ep: 1|act_loss: 0.3271484375|cri_loss: 0.19482421875|unsuper_loss: 0.0 average reward score: -0.890625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.90%) |Training time=0.65s (20.10%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.43 epoch: 0|step: 63|ppo_ep: 1|act_loss: 0.23291015625|cri_loss: 0.15673828125|unsuper_loss: 0.0 average reward score: -0.82080078125 ------------------------------------------------------------------------------------- |E2E latency=3.59s |Gather latency=0.00s (0.00%) |Generate time=2.39s (66.59%) |Training time=0.92s (25.70%) |Others=0.28 (7.71%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.43 epoch: 0|step: 64|ppo_ep: 1|act_loss: 0.22216796875|cri_loss: 0.181884765625|unsuper_loss: 0.0 average reward score: -1.720703125 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.12%) |Training time=0.63s (19.80%) |Others=0.19 (6.08%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.43 epoch: 0|step: 65|ppo_ep: 1|act_loss: 0.34912109375|cri_loss: 0.261962890625|unsuper_loss: 0.0 average reward score: -1.814453125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.70%) |Training time=0.64s (19.77%) |Others=0.21 (6.52%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.43 epoch: 0|step: 66|ppo_ep: 1|act_loss: 0.2410888671875|cri_loss: 0.1336669921875|unsuper_loss: 0.0 average reward score: -0.98681640625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.43%) |Training time=0.64s (19.59%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.43 epoch: 0|step: 67|ppo_ep: 1|act_loss: 0.28271484375|cri_loss: 0.188720703125|unsuper_loss: 0.0 average reward score: -2.6484375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.24%) |Training time=0.64s (19.84%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.43 epoch: 0|step: 68|ppo_ep: 1|act_loss: 0.4365234375|cri_loss: 0.328857421875|unsuper_loss: 0.0 average reward score: -0.2666015625 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.82%) |Training time=0.64s (20.03%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.43 epoch: 0|step: 69|ppo_ep: 1|act_loss: 0.396240234375|cri_loss: 0.230712890625|unsuper_loss: 0.0 average reward score: -1.9462890625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.85%) |Training time=0.64s (20.03%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.43 epoch: 0|step: 70|ppo_ep: 1|act_loss: 0.37841796875|cri_loss: 0.205810546875|unsuper_loss: 0.0 average reward score: -2.21875 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.90%) |Training time=0.64s (19.95%) |Others=0.20 (6.16%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.44 epoch: 0|step: 71|ppo_ep: 1|act_loss: 0.2880859375|cri_loss: 0.1964111328125|unsuper_loss: 0.0 average reward score: -1.322265625 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.37s (65.64%) |Training time=0.94s (25.91%) |Others=0.31 (8.45%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.43 epoch: 0|step: 72|ppo_ep: 1|act_loss: 0.288330078125|cri_loss: 0.1658935546875|unsuper_loss: 0.0 average reward score: -0.5546875 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.35s (70.51%) |Training time=0.79s (23.74%) |Others=0.19 (5.75%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.43 epoch: 0|step: 73|ppo_ep: 1|act_loss: 0.006744384765625|cri_loss: 0.03021240234375|unsuper_loss: 0.0 average reward score: -2.177734375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.37%) |Training time=0.64s (19.73%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.43 epoch: 0|step: 74|ppo_ep: 1|act_loss: 0.19677734375|cri_loss: 0.1204833984375|unsuper_loss: 0.0 average reward score: -1.6181640625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.21%) |Training time=0.64s (19.82%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.43 epoch: 0|step: 75|ppo_ep: 1|act_loss: 0.2286376953125|cri_loss: 0.12225341796875|unsuper_loss: 0.0 average reward score: -0.91162109375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.29%) |Training time=0.64s (19.70%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.43 epoch: 0|step: 76|ppo_ep: 1|act_loss: 0.29736328125|cri_loss: 0.169677734375|unsuper_loss: 0.0 average reward score: -0.91748046875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.93%) |Training time=0.65s (20.02%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.43 epoch: 0|step: 77|ppo_ep: 1|act_loss: 0.119140625|cri_loss: 0.077880859375|unsuper_loss: 0.0 average reward score: -1.93359375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.33s (71.63%) |Training time=0.73s (22.43%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.43 epoch: 0|step: 78|ppo_ep: 1|act_loss: 0.384521484375|cri_loss: 0.215576171875|unsuper_loss: 0.0 average reward score: -3.107421875 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.03%) |Training time=0.64s (19.35%) |Others=0.22 (6.62%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.43 [2023-04-24 13:52:04,756] [INFO] [logging.py:96:log_dist] [Rank 0] step=10, skipped=3, lr=[6.755000000000001e-07, 6.755000000000001e-07], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 13:52:05,001] [INFO] [timer.py:199:stop] epoch=0/micro_step=80/global_step=10, RunningAvgSamplesPerSec=15.87852486249518, CurrSamplesPerSec=14.913515866434993, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 13:52:05,204] [INFO] [logging.py:96:log_dist] [Rank 0] step=10, skipped=2, lr=[4.0000000000000003e-07, 4.0000000000000003e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 79|ppo_ep: 1|act_loss: 0.36474609375|cri_loss: 0.212890625|unsuper_loss: 0.0 average reward score: -1.8115234375 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.40s (66.55%) |Training time=0.93s (25.72%) |Others=0.28 (7.73%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.43 epoch: 0|step: 80|ppo_ep: 1|act_loss: 0.436279296875|cri_loss: 0.27978515625|unsuper_loss: 0.0 average reward score: -1.1806640625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.36%) |Training time=0.68s (20.83%) |Others=0.19 (5.81%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.43 epoch: 0|step: 81|ppo_ep: 1|act_loss: 0.1502685546875|cri_loss: 0.0931396484375|unsuper_loss: 0.0 average reward score: -3.20703125 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.18%) |Training time=0.64s (19.88%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.43 epoch: 0|step: 82|ppo_ep: 1|act_loss: 0.2135009765625|cri_loss: 0.162109375|unsuper_loss: 0.0 average reward score: -0.62890625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.17%) |Training time=0.64s (19.80%) |Others=0.19 (6.03%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.43 epoch: 0|step: 83|ppo_ep: 1|act_loss: 0.117919921875|cri_loss: 0.083984375|unsuper_loss: 0.0 average reward score: -1.5126953125 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.78%) |Training time=0.65s (20.28%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.43 epoch: 0|step: 84|ppo_ep: 1|act_loss: 0.26708984375|cri_loss: 0.1605224609375|unsuper_loss: 0.0 average reward score: -2.171875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (74.02%) |Training time=0.64s (19.88%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.43 epoch: 0|step: 85|ppo_ep: 1|act_loss: 0.15283203125|cri_loss: 0.1292724609375|unsuper_loss: 0.0 average reward score: -0.414794921875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.34%) |Training time=0.64s (19.80%) |Others=0.19 (5.85%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.43 epoch: 0|step: 86|ppo_ep: 1|act_loss: 0.27294921875|cri_loss: 0.1805419921875|unsuper_loss: 0.0 average reward score: -2.59375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.84%) |Training time=0.65s (20.14%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.43 epoch: 0|step: 87|ppo_ep: 1|act_loss: 0.35498046875|cri_loss: 0.2587890625|unsuper_loss: 0.0 average reward score: -1.3369140625 ------------------------------------------------------------------------------------- |E2E latency=5.99s |Gather latency=0.00s (0.00%) |Generate time=2.36s (39.42%) |Training time=3.19s (53.32%) |Others=0.43 (7.25%)|CurSamplesPerSec=1.34 |AvgSamplesPerSec=2.41 epoch: 0|step: 88|ppo_ep: 1|act_loss: 0.04296875|cri_loss: 0.06475830078125|unsuper_loss: 0.0 average reward score: 1.119140625 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.52s (75.14%) |Training time=0.64s (19.10%) |Others=0.19 (5.76%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.41 epoch: 0|step: 89|ppo_ep: 1|act_loss: 0.00115966796875|cri_loss: 0.039031982421875|unsuper_loss: 0.0 average reward score: -0.517578125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.29%) |Training time=0.64s (19.71%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 90|ppo_ep: 1|act_loss: -0.2276611328125|cri_loss: -0.00244140625|unsuper_loss: 0.0 average reward score: -1.873046875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.27%) |Training time=0.64s (19.73%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 91|ppo_ep: 1|act_loss: -0.2476806640625|cri_loss: -0.0487060546875|unsuper_loss: 0.0 average reward score: -1.9609375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.12%) |Training time=0.64s (19.92%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 92|ppo_ep: 1|act_loss: -0.5341796875|cri_loss: -0.1044921875|unsuper_loss: 0.0 average reward score: -2.37890625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.98%) |Training time=0.64s (19.87%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 93|ppo_ep: 1|act_loss: -0.1387939453125|cri_loss: -0.02069091796875|unsuper_loss: 0.0 average reward score: -2.11328125 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.73%) |Training time=0.64s (20.18%) |Others=0.19 (6.09%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.42 epoch: 0|step: 94|ppo_ep: 1|act_loss: -0.1142578125|cri_loss: 0.00653076171875|unsuper_loss: 0.0 average reward score: -1.57421875 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.36s (70.40%) |Training time=0.79s (23.66%) |Others=0.20 (5.95%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.42 [2023-04-24 13:52:59,848] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, reducing to 8192 epoch: 0|step: 95|ppo_ep: 1|act_loss: 0.22265625|cri_loss: 0.145751953125|unsuper_loss: 0.0 average reward score: -1.837890625 ------------------------------------------------------------------------------------- |E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.32s (67.05%) |Training time=0.86s (24.93%) |Others=0.28 (8.01%)|CurSamplesPerSec=2.31 |AvgSamplesPerSec=2.41 epoch: 0|step: 96|ppo_ep: 1|act_loss: -0.56884765625|cri_loss: -0.1773681640625|unsuper_loss: 0.0 average reward score: -3.82421875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.47%) |Training time=0.64s (19.68%) |Others=0.19 (5.85%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 97|ppo_ep: 1|act_loss: -0.177490234375|cri_loss: -0.0399169921875|unsuper_loss: 0.0 average reward score: -1.869140625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.75%) |Training time=0.66s (20.36%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 98|ppo_ep: 1|act_loss: 0.000213623046875|cri_loss: 0.0198516845703125|unsuper_loss: 0.0 average reward score: -1.9580078125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.30%) |Training time=0.64s (19.79%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 99|ppo_ep: 1|act_loss: -0.166259765625|cri_loss: -0.02337646484375|unsuper_loss: 0.0 average reward score: -2.35546875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.11%) |Training time=0.64s (19.69%) |Others=0.20 (6.20%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 100|ppo_ep: 1|act_loss: -0.2080078125|cri_loss: -0.072265625|unsuper_loss: 0.0 average reward score: -2.40625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.03%) |Training time=0.64s (19.72%) |Others=0.20 (6.25%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 101|ppo_ep: 1|act_loss: -0.28173828125|cri_loss: -0.08636474609375|unsuper_loss: 0.0 average reward score: -0.666015625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.12%) |Training time=0.64s (19.87%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 102|ppo_ep: 1|act_loss: 0.18408203125|cri_loss: 0.14794921875|unsuper_loss: 0.0 average reward score: 0.16015625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.52%) |Training time=0.64s (19.49%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 103|ppo_ep: 1|act_loss: -0.51123046875|cri_loss: -0.1610107421875|unsuper_loss: 0.0 average reward score: -1.3798828125 ------------------------------------------------------------------------------------- |E2E latency=3.66s |Gather latency=0.00s (0.00%) |Generate time=2.43s (66.45%) |Training time=0.93s (25.33%) |Others=0.30 (8.22%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.42 epoch: 0|step: 104|ppo_ep: 1|act_loss: 0.07763671875|cri_loss: 0.10894775390625|unsuper_loss: 0.0 average reward score: -1.453125 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.46%) |Training time=0.64s (19.21%) |Others=0.21 (6.33%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.42 epoch: 0|step: 105|ppo_ep: 1|act_loss: -0.183349609375|cri_loss: -0.07952880859375|unsuper_loss: 0.0 average reward score: -1.6181640625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.31%) |Training time=0.67s (20.66%) |Others=0.20 (6.02%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 106|ppo_ep: 1|act_loss: 0.1873779296875|cri_loss: 0.1256103515625|unsuper_loss: 0.0 average reward score: -2.875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.27%) |Training time=0.64s (19.73%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 107|ppo_ep: 1|act_loss: -0.396484375|cri_loss: -0.09228515625|unsuper_loss: 0.0 average reward score: -0.110107421875 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.41s (72.70%) |Training time=0.71s (21.36%) |Others=0.20 (5.94%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.42 epoch: 0|step: 108|ppo_ep: 1|act_loss: -0.40283203125|cri_loss: -0.144287109375|unsuper_loss: 0.0 average reward score: -0.6533203125 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.39s (72.88%) |Training time=0.69s (21.13%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 109|ppo_ep: 1|act_loss: 0.0316162109375|cri_loss: 0.055999755859375|unsuper_loss: 0.0 average reward score: -0.59228515625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.17%) |Training time=0.64s (19.80%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 110|ppo_ep: 1|act_loss: 0.107177734375|cri_loss: 0.10064697265625|unsuper_loss: 0.0 average reward score: -0.625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.00%) |Training time=0.64s (19.90%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 111|ppo_ep: 1|act_loss: -0.11151123046875|cri_loss: -0.002197265625|unsuper_loss: 0.0 average reward score: -1.4189453125 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.39s (66.40%) |Training time=0.92s (25.69%) |Others=0.28 (7.91%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.42 epoch: 0|step: 112|ppo_ep: 1|act_loss: -0.08746337890625|cri_loss: -0.0050048828125|unsuper_loss: 0.0 average reward score: -0.94873046875 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.38s (71.00%) |Training time=0.78s (23.34%) |Others=0.19 (5.66%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.42 epoch: 0|step: 113|ppo_ep: 1|act_loss: -0.289794921875|cri_loss: -0.10382080078125|unsuper_loss: 0.0 average reward score: -2.904296875 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.71%) |Training time=0.66s (20.39%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 114|ppo_ep: 1|act_loss: -0.39599609375|cri_loss: -0.119384765625|unsuper_loss: 0.0 average reward score: -0.32275390625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.11%) |Training time=0.64s (19.76%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 115|ppo_ep: 1|act_loss: -0.0130615234375|cri_loss: 0.0491943359375|unsuper_loss: 0.0 average reward score: -2.669921875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.19%) |Training time=0.64s (19.88%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 116|ppo_ep: 1|act_loss: -0.14697265625|cri_loss: -0.02520751953125|unsuper_loss: 0.0 average reward score: -1.173828125 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.05%) |Training time=0.64s (19.91%) |Others=0.19 (6.04%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 117|ppo_ep: 1|act_loss: 0.0123291015625|cri_loss: 0.04827880859375|unsuper_loss: 0.0 average reward score: -0.419921875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.58%) |Training time=0.67s (20.57%) |Others=0.19 (5.85%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 118|ppo_ep: 1|act_loss: -0.2198486328125|cri_loss: -0.0615234375|unsuper_loss: 0.0 average reward score: -0.28076171875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.92%) |Training time=0.64s (19.98%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 119|ppo_ep: 1|act_loss: 0.236572265625|cri_loss: 0.14208984375|unsuper_loss: 0.0 average reward score: -2.12890625 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.38s (65.20%) |Training time=0.99s (27.08%) |Others=0.28 (7.72%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.42 epoch: 0|step: 120|ppo_ep: 1|act_loss: -0.267578125|cri_loss: -0.0972900390625|unsuper_loss: 0.0 average reward score: -1.490234375 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.61%) |Training time=0.64s (19.39%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.42 epoch: 0|step: 121|ppo_ep: 1|act_loss: 0.0848388671875|cri_loss: 0.1334228515625|unsuper_loss: 0.0 average reward score: 1.1513671875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.58%) |Training time=0.64s (19.58%) |Others=0.19 (5.84%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 122|ppo_ep: 1|act_loss: -0.4248046875|cri_loss: -0.1495361328125|unsuper_loss: 0.0 average reward score: 0.0616455078125 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.96%) |Training time=0.64s (19.21%) |Others=0.19 (5.83%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.42 epoch: 0|step: 123|ppo_ep: 1|act_loss: -0.043609619140625|cri_loss: 0.003448486328125|unsuper_loss: 0.0 average reward score: -1.7333984375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.51%) |Training time=0.64s (19.60%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 124|ppo_ep: 1|act_loss: -0.215087890625|cri_loss: -0.044189453125|unsuper_loss: 0.0 average reward score: -0.4248046875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.60%) |Training time=0.64s (19.57%) |Others=0.19 (5.83%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 125|ppo_ep: 1|act_loss: -0.44384765625|cri_loss: -0.1405029296875|unsuper_loss: 0.0 average reward score: 0.7802734375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.49%) |Training time=0.64s (19.63%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 126|ppo_ep: 1|act_loss: -0.422119140625|cri_loss: -0.13623046875|unsuper_loss: 0.0 average reward score: -2.5390625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.55%) |Training time=0.64s (19.48%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 [2023-04-24 13:54:45,778] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384 epoch: 0|step: 127|ppo_ep: 1|act_loss: -0.1170654296875|cri_loss: 0.0079345703125|unsuper_loss: 0.0 average reward score: -0.8603515625 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.45s (68.40%) |Training time=0.93s (25.99%) |Others=0.20 (5.61%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.42 epoch: 0|step: 128|ppo_ep: 1|act_loss: -0.2120361328125|cri_loss: -0.06494140625|unsuper_loss: 0.0 average reward score: 0.13623046875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.58%) |Training time=0.64s (19.58%) |Others=0.19 (5.84%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 129|ppo_ep: 1|act_loss: -0.420166015625|cri_loss: -0.147705078125|unsuper_loss: 0.0 average reward score: 0.104736328125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.49%) |Training time=0.64s (19.69%) |Others=0.19 (5.82%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 130|ppo_ep: 1|act_loss: 0.2890625|cri_loss: 0.23291015625|unsuper_loss: 0.0 average reward score: 2.38671875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.78%) |Training time=0.64s (19.43%) |Others=0.19 (5.80%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.42 epoch: 0|step: 131|ppo_ep: 1|act_loss: -0.114013671875|cri_loss: 0.0164794921875|unsuper_loss: 0.0 average reward score: -0.64892578125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.51%) |Training time=0.64s (19.66%) |Others=0.19 (5.83%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 132|ppo_ep: 1|act_loss: -0.1591796875|cri_loss: -0.0338134765625|unsuper_loss: 0.0 average reward score: -0.154296875 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.11%) |Training time=0.65s (19.78%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 133|ppo_ep: 1|act_loss: 0.0562744140625|cri_loss: 0.10205078125|unsuper_loss: 0.0 average reward score: -0.5146484375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.34%) |Training time=0.65s (19.85%) |Others=0.19 (5.80%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 134|ppo_ep: 1|act_loss: -0.343505859375|cri_loss: -0.12335205078125|unsuper_loss: 0.0 average reward score: -1.80859375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.37%) |Training time=0.64s (19.68%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 135|ppo_ep: 1|act_loss: -0.0316162109375|cri_loss: -0.0007781982421875|unsuper_loss: 0.0 average reward score: -0.38525390625 ------------------------------------------------------------------------------------- |E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.43s (66.93%) |Training time=0.93s (25.52%) |Others=0.27 (7.54%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.42 epoch: 0|step: 136|ppo_ep: 1|act_loss: 0.1356201171875|cri_loss: 0.08160400390625|unsuper_loss: 0.0 average reward score: -2.5859375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.39%) |Training time=0.64s (19.67%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 137|ppo_ep: 1|act_loss: -0.04376220703125|cri_loss: 0.01422119140625|unsuper_loss: 0.0 average reward score: -2.27734375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.33%) |Training time=0.64s (19.51%) |Others=0.20 (6.16%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 138|ppo_ep: 1|act_loss: -0.310302734375|cri_loss: -0.09844970703125|unsuper_loss: 0.0 average reward score: -0.923828125 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.12%) |Training time=0.64s (19.83%) |Others=0.19 (6.05%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 139|ppo_ep: 1|act_loss: 0.113037109375|cri_loss: 0.06793212890625|unsuper_loss: 0.0 average reward score: -1.50390625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.06%) |Training time=0.68s (20.91%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 140|ppo_ep: 1|act_loss: 0.0029296875|cri_loss: 0.077392578125|unsuper_loss: 0.0 average reward score: -0.9453125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.17%) |Training time=0.64s (19.81%) |Others=0.19 (6.02%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 141|ppo_ep: 1|act_loss: -0.116455078125|cri_loss: 0.00347900390625|unsuper_loss: 0.0 average reward score: -2.552734375 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.48%) |Training time=0.64s (19.58%) |Others=0.20 (5.94%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.42 epoch: 0|step: 142|ppo_ep: 1|act_loss: -0.49951171875|cri_loss: -0.1353759765625|unsuper_loss: 0.0 average reward score: 0.3125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.70%) |Training time=0.64s (19.94%) |Others=0.21 (6.36%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 143|ppo_ep: 1|act_loss: -0.212646484375|cri_loss: -0.05352783203125|unsuper_loss: 0.0 average reward score: -0.32080078125 ------------------------------------------------------------------------------------- |E2E latency=3.65s |Gather latency=0.00s (0.00%) |Generate time=2.44s (66.79%) |Training time=0.93s (25.44%) |Others=0.28 (7.76%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.42 epoch: 0|step: 144|ppo_ep: 1|act_loss: 0.145751953125|cri_loss: 0.08221435546875|unsuper_loss: 0.0 average reward score: -1.134765625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.49%) |Training time=0.63s (19.57%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 145|ppo_ep: 1|act_loss: 0.1236572265625|cri_loss: 0.090576171875|unsuper_loss: 0.0 average reward score: -2.08984375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.25%) |Training time=0.64s (19.81%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 146|ppo_ep: 1|act_loss: -0.18896484375|cri_loss: -0.0633544921875|unsuper_loss: 0.0 average reward score: -0.98583984375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.08%) |Training time=0.68s (20.98%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 147|ppo_ep: 1|act_loss: 0.069091796875|cri_loss: 0.0595703125|unsuper_loss: 0.0 average reward score: 0.269775390625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.21%) |Training time=0.68s (20.90%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 148|ppo_ep: 1|act_loss: -0.2364501953125|cri_loss: -0.0283203125|unsuper_loss: 0.0 average reward score: -0.79833984375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.13%) |Training time=0.68s (20.87%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 149|ppo_ep: 1|act_loss: -0.08172607421875|cri_loss: -0.00299072265625|unsuper_loss: 0.0 average reward score: -2.86328125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.68%) |Training time=0.66s (20.36%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 150|ppo_ep: 1|act_loss: -0.0985107421875|cri_loss: -0.00323486328125|unsuper_loss: 0.0 average reward score: -1.9541015625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.23%) |Training time=0.67s (20.85%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 151|ppo_ep: 1|act_loss: 0.01739501953125|cri_loss: 0.055633544921875|unsuper_loss: 0.0 average reward score: -1.2939453125 ------------------------------------------------------------------------------------- |E2E latency=3.59s |Gather latency=0.00s (0.00%) |Generate time=2.37s (65.93%) |Training time=0.95s (26.40%) |Others=0.28 (7.67%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.42 epoch: 0|step: 152|ppo_ep: 1|act_loss: 0.298583984375|cri_loss: 0.20751953125|unsuper_loss: 0.0 average reward score: -1.869140625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.16%) |Training time=0.64s (19.87%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 153|ppo_ep: 1|act_loss: 0.004852294921875|cri_loss: 0.0251617431640625|unsuper_loss: 0.0 average reward score: -3.08203125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.74%) |Training time=0.66s (20.25%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 154|ppo_ep: 1|act_loss: 0.07958984375|cri_loss: 0.1014404296875|unsuper_loss: 0.0 average reward score: -0.263671875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.36s (72.96%) |Training time=0.68s (20.88%) |Others=0.20 (6.17%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 155|ppo_ep: 1|act_loss: 0.181396484375|cri_loss: 0.1124267578125|unsuper_loss: 0.0 average reward score: -0.9111328125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.64%) |Training time=0.66s (20.41%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 156|ppo_ep: 1|act_loss: 0.330810546875|cri_loss: 0.25927734375|unsuper_loss: 0.0 average reward score: -1.412109375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.98%) |Training time=0.64s (19.93%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 157|ppo_ep: 1|act_loss: -0.1376953125|cri_loss: -0.02423095703125|unsuper_loss: 0.0 average reward score: 0.451171875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.23%) |Training time=0.65s (19.65%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.42 epoch: 0|step: 158|ppo_ep: 1|act_loss: 0.09613037109375|cri_loss: 0.0765380859375|unsuper_loss: 0.0 average reward score: -1.0390625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.35%) |Training time=0.64s (19.72%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 [2023-04-24 13:56:30,752] [INFO] [logging.py:96:log_dist] [Rank 0] step=20, skipped=4, lr=[1.5440000000000002e-06, 1.5440000000000002e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 13:56:30,997] [INFO] [timer.py:199:stop] epoch=0/micro_step=160/global_step=20, RunningAvgSamplesPerSec=15.626584111901188, CurrSamplesPerSec=15.65801075033613, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 13:56:31,198] [INFO] [logging.py:96:log_dist] [Rank 0] step=20, skipped=3, lr=[8.500000000000001e-07, 8.500000000000001e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 159|ppo_ep: 1|act_loss: -0.0870361328125|cri_loss: 0.00689697265625|unsuper_loss: 0.0 average reward score: -0.0712890625 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.38s (66.59%) |Training time=0.92s (25.71%) |Others=0.28 (7.70%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.42 epoch: 0|step: 160|ppo_ep: 1|act_loss: 0.1484375|cri_loss: 0.08837890625|unsuper_loss: 0.0 average reward score: -2.646484375 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (74.05%) |Training time=0.64s (19.88%) |Others=0.19 (6.07%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 161|ppo_ep: 1|act_loss: 0.06768798828125|cri_loss: 0.04998779296875|unsuper_loss: 0.0 average reward score: -2.3515625 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.34%) |Training time=0.64s (20.14%) |Others=0.21 (6.52%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.42 epoch: 0|step: 162|ppo_ep: 1|act_loss: 0.2724609375|cri_loss: 0.211181640625|unsuper_loss: 0.0 average reward score: -2.62890625 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.59%) |Training time=0.64s (20.21%) |Others=0.20 (6.20%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.42 epoch: 0|step: 163|ppo_ep: 1|act_loss: -0.06671142578125|cri_loss: 0.00750732421875|unsuper_loss: 0.0 average reward score: 0.14892578125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.39%) |Training time=0.64s (19.73%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 164|ppo_ep: 1|act_loss: 0.189697265625|cri_loss: 0.1748046875|unsuper_loss: 0.0 average reward score: 0.265380859375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.23%) |Training time=0.64s (19.76%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 165|ppo_ep: 1|act_loss: 0.54052734375|cri_loss: 0.383544921875|unsuper_loss: 0.0 average reward score: -0.7783203125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.59%) |Training time=0.67s (20.44%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 166|ppo_ep: 1|act_loss: 0.18408203125|cri_loss: 0.10345458984375|unsuper_loss: 0.0 average reward score: -3.015625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.00%) |Training time=0.64s (19.72%) |Others=0.20 (6.28%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 167|ppo_ep: 1|act_loss: 0.10491943359375|cri_loss: 0.098388671875|unsuper_loss: 0.0 average reward score: -0.7958984375 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.44s (66.90%) |Training time=0.93s (25.43%) |Others=0.28 (7.67%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.42 epoch: 0|step: 168|ppo_ep: 1|act_loss: 0.18994140625|cri_loss: 0.1297607421875|unsuper_loss: 0.0 average reward score: -1.3310546875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.30%) |Training time=0.64s (19.78%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 169|ppo_ep: 1|act_loss: 0.55859375|cri_loss: 0.36865234375|unsuper_loss: 0.0 average reward score: -1.455078125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.32s (71.72%) |Training time=0.72s (22.27%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 170|ppo_ep: 1|act_loss: 0.0665283203125|cri_loss: 0.049530029296875|unsuper_loss: 0.0 average reward score: -3.45703125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.38%) |Training time=0.64s (19.74%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 171|ppo_ep: 1|act_loss: 0.129150390625|cri_loss: 0.093505859375|unsuper_loss: 0.0 average reward score: -0.48486328125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.15%) |Training time=0.64s (19.86%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 172|ppo_ep: 1|act_loss: 0.26171875|cri_loss: 0.18310546875|unsuper_loss: 0.0 average reward score: -0.1630859375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.20%) |Training time=0.64s (19.75%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 173|ppo_ep: 1|act_loss: 0.478515625|cri_loss: 0.283935546875|unsuper_loss: 0.0 average reward score: -2.515625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.00%) |Training time=0.65s (19.94%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 174|ppo_ep: 1|act_loss: 0.08642578125|cri_loss: 0.08642578125|unsuper_loss: 0.0 average reward score: -1.23828125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.22%) |Training time=0.64s (19.75%) |Others=0.19 (6.02%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 175|ppo_ep: 1|act_loss: -0.050537109375|cri_loss: -0.006134033203125|unsuper_loss: 0.0 average reward score: -2.728515625 ------------------------------------------------------------------------------------- |E2E latency=3.66s |Gather latency=0.00s (0.00%) |Generate time=2.46s (67.25%) |Training time=0.92s (25.19%) |Others=0.28 (7.56%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.42 epoch: 0|step: 176|ppo_ep: 1|act_loss: -0.00897216796875|cri_loss: 0.032470703125|unsuper_loss: 0.0 average reward score: -1.376953125 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.44s (73.35%) |Training time=0.66s (20.01%) |Others=0.22 (6.65%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.42 epoch: 0|step: 177|ppo_ep: 1|act_loss: 0.0283203125|cri_loss: 0.1163330078125|unsuper_loss: 0.0 average reward score: -0.38623046875 ------------------------------------------------------------------------------------- |E2E latency=5.30s |Gather latency=0.00s (0.00%) |Generate time=2.41s (45.50%) |Training time=0.64s (12.01%) |Others=2.25 (42.49%)|CurSamplesPerSec=1.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 178|ppo_ep: 1|act_loss: -0.00726318359375|cri_loss: 0.0416259765625|unsuper_loss: 0.0 average reward score: 0.15576171875 ------------------------------------------------------------------------------------- |E2E latency=4.53s |Gather latency=0.00s (0.00%) |Generate time=3.10s (68.47%) |Training time=0.69s (15.23%) |Others=0.74 (16.30%)|CurSamplesPerSec=1.76 |AvgSamplesPerSec=2.41 epoch: 0|step: 179|ppo_ep: 1|act_loss: 0.0716552734375|cri_loss: 0.07244873046875|unsuper_loss: 0.0 average reward score: -0.498291015625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.93%) |Training time=0.65s (19.82%) |Others=0.20 (6.25%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 180|ppo_ep: 1|act_loss: -0.10223388671875|cri_loss: -0.00518798828125|unsuper_loss: 0.0 average reward score: 0.49609375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.15%) |Training time=0.65s (19.95%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 181|ppo_ep: 1|act_loss: -0.2080078125|cri_loss: -0.06640625|unsuper_loss: 0.0 average reward score: -0.3125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.16%) |Training time=0.64s (19.82%) |Others=0.19 (6.02%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 182|ppo_ep: 1|act_loss: 0.0406494140625|cri_loss: 0.0614013671875|unsuper_loss: 0.0 average reward score: -1.740234375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.69%) |Training time=0.67s (20.47%) |Others=0.19 (5.83%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 183|ppo_ep: 1|act_loss: 0.171630859375|cri_loss: 0.1126708984375|unsuper_loss: 0.0 average reward score: -0.68505859375 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.40s (66.23%) |Training time=0.93s (25.62%) |Others=0.29 (8.15%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.41 epoch: 0|step: 184|ppo_ep: 1|act_loss: -0.276123046875|cri_loss: -0.08349609375|unsuper_loss: 0.0 average reward score: -0.8515625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.86%) |Training time=0.65s (20.20%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 185|ppo_ep: 1|act_loss: -0.05279541015625|cri_loss: 0.005462646484375|unsuper_loss: 0.0 average reward score: 0.87744140625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.49%) |Training time=0.67s (20.61%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 186|ppo_ep: 1|act_loss: 0.21630859375|cri_loss: 0.12054443359375|unsuper_loss: 0.0 average reward score: -2.58203125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.53%) |Training time=0.66s (20.30%) |Others=0.20 (6.16%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 187|ppo_ep: 1|act_loss: 0.034088134765625|cri_loss: 0.042633056640625|unsuper_loss: 0.0 average reward score: -1.185546875 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.13%) |Training time=0.64s (19.88%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 188|ppo_ep: 1|act_loss: 0.0968017578125|cri_loss: 0.0924072265625|unsuper_loss: 0.0 average reward score: -2.0859375 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.09%) |Training time=0.64s (19.86%) |Others=0.19 (6.05%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 189|ppo_ep: 1|act_loss: 0.2890625|cri_loss: 0.22021484375|unsuper_loss: 0.0 average reward score: -2.154296875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (74.07%) |Training time=0.64s (19.86%) |Others=0.19 (6.06%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 190|ppo_ep: 1|act_loss: 0.043609619140625|cri_loss: 0.0709228515625|unsuper_loss: 0.0 average reward score: -1.376953125 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.36s (74.01%) |Training time=0.64s (20.07%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 191|ppo_ep: 1|act_loss: 0.197998046875|cri_loss: 0.1453857421875|unsuper_loss: 0.0 average reward score: -0.7998046875 ------------------------------------------------------------------------------------- |E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.36s (66.27%) |Training time=0.92s (25.91%) |Others=0.28 (7.82%)|CurSamplesPerSec=2.25 |AvgSamplesPerSec=2.41 epoch: 0|step: 192|ppo_ep: 1|act_loss: 0.1702880859375|cri_loss: 0.1217041015625|unsuper_loss: 0.0 average reward score: -0.9287109375 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.36s (74.14%) |Training time=0.63s (19.88%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 193|ppo_ep: 1|act_loss: -0.0963134765625|cri_loss: 0.01605224609375|unsuper_loss: 0.0 average reward score: 1.1953125 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.77%) |Training time=0.65s (20.24%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 194|ppo_ep: 1|act_loss: 0.030548095703125|cri_loss: 0.0679931640625|unsuper_loss: 0.0 average reward score: 0.087890625 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.88%) |Training time=0.64s (20.07%) |Others=0.19 (6.05%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 195|ppo_ep: 1|act_loss: -0.0679931640625|cri_loss: 0.0390625|unsuper_loss: 0.0 average reward score: 0.4306640625 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.88%) |Training time=0.64s (20.13%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 196|ppo_ep: 1|act_loss: 0.134521484375|cri_loss: 0.13037109375|unsuper_loss: 0.0 average reward score: -1.9990234375 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.73%) |Training time=0.64s (20.00%) |Others=0.20 (6.26%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 197|ppo_ep: 1|act_loss: -0.0072174072265625|cri_loss: 0.0101470947265625|unsuper_loss: 0.0 average reward score: -0.321533203125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.33%) |Training time=0.64s (19.75%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 198|ppo_ep: 1|act_loss: 0.05908203125|cri_loss: 0.1268310546875|unsuper_loss: 0.0 average reward score: -0.54541015625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.13%) |Training time=0.64s (19.77%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 199|ppo_ep: 1|act_loss: 0.051910400390625|cri_loss: 0.048309326171875|unsuper_loss: 0.0 average reward score: -1.2919921875 ------------------------------------------------------------------------------------- |E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.45%) |Training time=0.94s (25.86%) |Others=0.28 (7.70%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.41 epoch: 0|step: 200|ppo_ep: 1|act_loss: -0.00396728515625|cri_loss: 0.049560546875|unsuper_loss: 0.0 average reward score: 0.3486328125 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.38s (72.51%) |Training time=0.71s (21.62%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 201|ppo_ep: 1|act_loss: 0.0267486572265625|cri_loss: 0.04522705078125|unsuper_loss: 0.0 average reward score: -1.29296875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.98%) |Training time=0.64s (19.91%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 202|ppo_ep: 1|act_loss: -0.0191650390625|cri_loss: 0.0255126953125|unsuper_loss: 0.0 average reward score: -0.73046875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.34%) |Training time=0.64s (19.67%) |Others=0.23 (6.99%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 203|ppo_ep: 1|act_loss: -0.0316162109375|cri_loss: 0.022430419921875|unsuper_loss: 0.0 average reward score: 1.0556640625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.66%) |Training time=0.66s (20.39%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 204|ppo_ep: 1|act_loss: 0.009735107421875|cri_loss: 0.0274810791015625|unsuper_loss: 0.0 average reward score: -1.3671875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.31%) |Training time=0.64s (19.54%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 205|ppo_ep: 1|act_loss: 0.06585693359375|cri_loss: 0.082763671875|unsuper_loss: 0.0 average reward score: -0.92333984375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.81%) |Training time=0.66s (19.98%) |Others=0.20 (6.20%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 206|ppo_ep: 1|act_loss: 0.197021484375|cri_loss: 0.10791015625|unsuper_loss: 0.0 average reward score: -0.9833984375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.35s (71.53%) |Training time=0.73s (22.29%) |Others=0.20 (6.18%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 207|ppo_ep: 1|act_loss: -0.1688232421875|cri_loss: -0.054962158203125|unsuper_loss: 0.0 average reward score: -1.814453125 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.38s (66.04%) |Training time=0.94s (25.99%) |Others=0.29 (7.96%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.41 epoch: 0|step: 208|ppo_ep: 1|act_loss: 0.421875|cri_loss: 0.31005859375|unsuper_loss: 0.0 average reward score: -1.05078125 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.68%) |Training time=0.64s (19.32%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.41 epoch: 0|step: 209|ppo_ep: 1|act_loss: -0.212646484375|cri_loss: -0.070068359375|unsuper_loss: 0.0 average reward score: 0.1016845703125 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.33s (72.71%) |Training time=0.67s (21.04%) |Others=0.20 (6.25%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 210|ppo_ep: 1|act_loss: 0.245361328125|cri_loss: 0.14501953125|unsuper_loss: 0.0 average reward score: -0.0283203125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.29s (70.20%) |Training time=0.77s (23.60%) |Others=0.20 (6.21%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 211|ppo_ep: 1|act_loss: -0.046600341796875|cri_loss: 0.0068359375|unsuper_loss: 0.0 average reward score: -1.3115234375 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.32s (72.74%) |Training time=0.67s (21.17%) |Others=0.19 (6.09%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 212|ppo_ep: 1|act_loss: -0.00762939453125|cri_loss: 0.04388427734375|unsuper_loss: 0.0 average reward score: 0.244140625 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.31s (72.87%) |Training time=0.66s (20.87%) |Others=0.20 (6.26%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.41 epoch: 0|step: 213|ppo_ep: 1|act_loss: -0.0408935546875|cri_loss: 0.02545166015625|unsuper_loss: 0.0 average reward score: 1.3876953125 ------------------------------------------------------------------------------------- |E2E latency=4.62s |Gather latency=0.00s (0.00%) |Generate time=2.91s (62.92%) |Training time=1.37s (29.72%) |Others=0.34 (7.36%)|CurSamplesPerSec=1.73 |AvgSamplesPerSec=2.41 epoch: 0|step: 214|ppo_ep: 1|act_loss: 0.3916015625|cri_loss: 0.2578125|unsuper_loss: 0.0 average reward score: -0.40625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.81%) |Training time=0.64s (20.04%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 215|ppo_ep: 1|act_loss: -0.079833984375|cri_loss: 0.03167724609375|unsuper_loss: 0.0 average reward score: 1.19140625 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.68%) |Training time=0.93s (25.71%) |Others=0.28 (7.61%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.41 epoch: 0|step: 216|ppo_ep: 1|act_loss: 0.20703125|cri_loss: 0.131591796875|unsuper_loss: 0.0 average reward score: -0.7216796875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.14%) |Training time=0.64s (19.83%) |Others=0.20 (6.02%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 217|ppo_ep: 1|act_loss: 0.1685791015625|cri_loss: 0.104736328125|unsuper_loss: 0.0 average reward score: -0.3125 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.39%) |Training time=0.64s (19.53%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 218|ppo_ep: 1|act_loss: 0.218994140625|cri_loss: 0.1671142578125|unsuper_loss: 0.0 average reward score: 0.5859375 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.98%) |Training time=0.64s (20.07%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 219|ppo_ep: 1|act_loss: -0.02490234375|cri_loss: 0.078125|unsuper_loss: 0.0 average reward score: 0.29248046875 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.30%) |Training time=0.64s (19.40%) |Others=0.21 (6.29%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 220|ppo_ep: 1|act_loss: 0.08447265625|cri_loss: 0.094482421875|unsuper_loss: 0.0 average reward score: 0.08172607421875 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.23%) |Training time=0.65s (20.41%) |Others=0.20 (6.35%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 221|ppo_ep: 1|act_loss: -0.13623046875|cri_loss: -0.02606201171875|unsuper_loss: 0.0 average reward score: -0.6240234375 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.29s (72.13%) |Training time=0.69s (21.82%) |Others=0.19 (6.06%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.41 epoch: 0|step: 222|ppo_ep: 1|act_loss: 0.01361083984375|cri_loss: 0.060882568359375|unsuper_loss: 0.0 average reward score: 0.396484375 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.29s (71.62%) |Training time=0.71s (22.29%) |Others=0.19 (6.10%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 223|ppo_ep: 1|act_loss: 0.223388671875|cri_loss: 0.1678466796875|unsuper_loss: 0.0 average reward score: 0.171875 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.29s (63.57%) |Training time=1.04s (28.75%) |Others=0.28 (7.68%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.41 epoch: 0|step: 224|ppo_ep: 1|act_loss: 0.93798828125|cri_loss: 0.5693359375|unsuper_loss: 0.0 average reward score: -1.109375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.30s (70.94%) |Training time=0.74s (22.95%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 225|ppo_ep: 1|act_loss: -0.1162109375|cri_loss: -0.02740478515625|unsuper_loss: 0.0 average reward score: 1.5859375 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.18%) |Training time=0.64s (19.60%) |Others=0.20 (6.22%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 226|ppo_ep: 1|act_loss: -0.162109375|cri_loss: -0.062469482421875|unsuper_loss: 0.0 average reward score: -1.07421875 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.18%) |Training time=0.64s (19.58%) |Others=0.20 (6.23%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 227|ppo_ep: 1|act_loss: -0.125244140625|cri_loss: -0.0184326171875|unsuper_loss: 0.0 average reward score: -0.109619140625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.44%) |Training time=0.64s (19.68%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 228|ppo_ep: 1|act_loss: -0.0309295654296875|cri_loss: -0.006256103515625|unsuper_loss: 0.0 average reward score: -0.64501953125 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.28s (70.89%) |Training time=0.74s (22.87%) |Others=0.20 (6.24%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 229|ppo_ep: 1|act_loss: -0.0458984375|cri_loss: -0.011871337890625|unsuper_loss: 0.0 average reward score: -0.316650390625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.33s (72.42%) |Training time=0.70s (21.65%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 230|ppo_ep: 1|act_loss: -0.12493896484375|cri_loss: -0.03387451171875|unsuper_loss: 0.0 average reward score: 1.080078125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.35s (72.47%) |Training time=0.69s (21.26%) |Others=0.20 (6.26%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 231|ppo_ep: 1|act_loss: -0.003662109375|cri_loss: 0.10089111328125|unsuper_loss: 0.0 average reward score: -1.2197265625 ------------------------------------------------------------------------------------- |E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.40s (66.03%) |Training time=0.95s (26.19%) |Others=0.28 (7.78%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.41 epoch: 0|step: 232|ppo_ep: 1|act_loss: -0.0673828125|cri_loss: 0.0211181640625|unsuper_loss: 0.0 average reward score: 1.47265625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.38%) |Training time=0.64s (19.66%) |Others=0.20 (5.96%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 233|ppo_ep: 1|act_loss: 0.297119140625|cri_loss: 0.18408203125|unsuper_loss: 0.0 average reward score: -1.09765625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.02%) |Training time=0.65s (19.72%) |Others=0.21 (6.26%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 234|ppo_ep: 1|act_loss: -0.1925048828125|cri_loss: -0.064453125|unsuper_loss: 0.0 average reward score: -0.498291015625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.38%) |Training time=0.64s (19.67%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 235|ppo_ep: 1|act_loss: -0.18408203125|cri_loss: -0.072509765625|unsuper_loss: 0.0 average reward score: -2.0546875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.42%) |Training time=0.64s (19.59%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 236|ppo_ep: 1|act_loss: 0.266357421875|cri_loss: 0.188720703125|unsuper_loss: 0.0 average reward score: 0.1689453125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.31s (71.75%) |Training time=0.72s (22.37%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 237|ppo_ep: 1|act_loss: 0.193603515625|cri_loss: 0.1441650390625|unsuper_loss: 0.0 average reward score: -2.17578125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.27s (70.35%) |Training time=0.77s (23.75%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 238|ppo_ep: 1|act_loss: -0.330322265625|cri_loss: -0.1104736328125|unsuper_loss: 0.0 average reward score: -0.06689453125 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.48%) |Training time=0.65s (19.62%) |Others=0.20 (5.91%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.41 [2023-04-24 14:00:58,373] [INFO] [logging.py:96:log_dist] [Rank 0] step=30, skipped=4, lr=[2.5090000000000005e-06, 2.5090000000000005e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 14:00:58,621] [INFO] [timer.py:199:stop] epoch=0/micro_step=240/global_step=30, RunningAvgSamplesPerSec=15.462015129370691, CurrSamplesPerSec=14.978535424411854, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 14:00:58,835] [INFO] [logging.py:96:log_dist] [Rank 0] step=30, skipped=3, lr=[1.3500000000000002e-06, 1.3500000000000002e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 239|ppo_ep: 1|act_loss: -0.01336669921875|cri_loss: 0.040191650390625|unsuper_loss: 0.0 average reward score: 1.1220703125 ------------------------------------------------------------------------------------- |E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.39s (65.87%) |Training time=0.95s (26.13%) |Others=0.29 (8.00%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.41 epoch: 0|step: 240|ppo_ep: 1|act_loss: -0.11962890625|cri_loss: 0.01611328125|unsuper_loss: 0.0 average reward score: -0.423828125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.23%) |Training time=0.64s (19.82%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 241|ppo_ep: 1|act_loss: -0.40576171875|cri_loss: -0.1114501953125|unsuper_loss: 0.0 average reward score: -0.853515625 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.57%) |Training time=0.64s (20.28%) |Others=0.19 (6.15%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.41 epoch: 0|step: 242|ppo_ep: 1|act_loss: -0.21435546875|cri_loss: -0.06103515625|unsuper_loss: 0.0 average reward score: -1.484375 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.31s (72.93%) |Training time=0.67s (21.03%) |Others=0.19 (6.04%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.41 epoch: 0|step: 243|ppo_ep: 1|act_loss: 0.36669921875|cri_loss: 0.2164306640625|unsuper_loss: 0.0 average reward score: -1.486328125 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.30s (72.36%) |Training time=0.68s (21.56%) |Others=0.19 (6.07%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.41 epoch: 0|step: 244|ppo_ep: 1|act_loss: -0.04498291015625|cri_loss: 0.01904296875|unsuper_loss: 0.0 average reward score: -1.548828125 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.31s (72.90%) |Training time=0.66s (20.87%) |Others=0.20 (6.23%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.41 epoch: 0|step: 245|ppo_ep: 1|act_loss: 0.0958251953125|cri_loss: 0.0689697265625|unsuper_loss: 0.0 average reward score: -1.482421875 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.31s (72.88%) |Training time=0.66s (20.97%) |Others=0.20 (6.16%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.41 epoch: 0|step: 246|ppo_ep: 1|act_loss: -0.1650390625|cri_loss: -0.0126953125|unsuper_loss: 0.0 average reward score: 0.873046875 ------------------------------------------------------------------------------------- |E2E latency=3.16s |Gather latency=0.00s (0.00%) |Generate time=2.25s (71.28%) |Training time=0.71s (22.62%) |Others=0.19 (6.10%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.41 epoch: 0|step: 247|ppo_ep: 1|act_loss: 0.0090789794921875|cri_loss: 0.031982421875|unsuper_loss: 0.0 average reward score: -1.8662109375 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.38s (65.98%) |Training time=0.95s (26.29%) |Others=0.28 (7.73%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.41 epoch: 0|step: 248|ppo_ep: 1|act_loss: 0.0550537109375|cri_loss: 0.0450439453125|unsuper_loss: 0.0 average reward score: -0.11572265625 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.41s (72.99%) |Training time=0.69s (20.98%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.41 epoch: 0|step: 249|ppo_ep: 1|act_loss: -0.267822265625|cri_loss: -0.0570068359375|unsuper_loss: 0.0 average reward score: 0.3720703125 ------------------------------------------------------------------------------------- |E2E latency=4.30s |Gather latency=0.00s (0.00%) |Generate time=2.81s (65.31%) |Training time=1.20s (27.99%) |Others=0.29 (6.70%)|CurSamplesPerSec=1.86 |AvgSamplesPerSec=2.41 epoch: 0|step: 250|ppo_ep: 1|act_loss: -0.10992431640625|cri_loss: -0.01385498046875|unsuper_loss: 0.0 average reward score: -0.338134765625 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.49%) |Training time=0.66s (20.64%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 251|ppo_ep: 1|act_loss: -0.2958984375|cri_loss: -0.03759765625|unsuper_loss: 0.0 average reward score: 0.8720703125 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.27s (71.66%) |Training time=0.71s (22.32%) |Others=0.19 (6.02%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.41 epoch: 0|step: 252|ppo_ep: 1|act_loss: -0.14208984375|cri_loss: -0.03558349609375|unsuper_loss: 0.0 average reward score: 0.53173828125 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.30s (72.42%) |Training time=0.68s (21.29%) |Others=0.20 (6.29%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.41 epoch: 0|step: 253|ppo_ep: 1|act_loss: 0.05908203125|cri_loss: 0.084228515625|unsuper_loss: 0.0 average reward score: -0.1693115234375 ------------------------------------------------------------------------------------- |E2E latency=3.16s |Gather latency=0.00s (0.00%) |Generate time=2.27s (71.89%) |Training time=0.70s (22.01%) |Others=0.19 (6.09%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.41 epoch: 0|step: 254|ppo_ep: 1|act_loss: 0.4033203125|cri_loss: 0.252685546875|unsuper_loss: 0.0 average reward score: -1.6728515625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.57%) |Training time=0.64s (19.55%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 255|ppo_ep: 1|act_loss: 0.1490478515625|cri_loss: 0.09100341796875|unsuper_loss: 0.0 average reward score: -0.85302734375 ------------------------------------------------------------------------------------- |E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.35s (66.15%) |Training time=0.93s (26.10%) |Others=0.28 (7.76%)|CurSamplesPerSec=2.25 |AvgSamplesPerSec=2.41 epoch: 0|step: 256|ppo_ep: 1|act_loss: 0.146484375|cri_loss: 0.0968017578125|unsuper_loss: 0.0 average reward score: -1.1953125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.46%) |Training time=0.64s (19.64%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 257|ppo_ep: 1|act_loss: 0.0950927734375|cri_loss: 0.09442138671875|unsuper_loss: 0.0 average reward score: -1.91796875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.11%) |Training time=0.65s (19.91%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 258|ppo_ep: 1|act_loss: 0.06243896484375|cri_loss: 0.047332763671875|unsuper_loss: 0.0 average reward score: -1.0166015625 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.55%) |Training time=0.65s (19.47%) |Others=0.20 (5.98%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.41 epoch: 0|step: 259|ppo_ep: 1|act_loss: 0.297607421875|cri_loss: 0.183837890625|unsuper_loss: 0.0 average reward score: -0.55126953125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.27%) |Training time=0.64s (19.90%) |Others=0.19 (5.83%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 260|ppo_ep: 1|act_loss: -0.048828125|cri_loss: 0.00799560546875|unsuper_loss: 0.0 average reward score: -0.6884765625 ------------------------------------------------------------------------------------- |E2E latency=3.16s |Gather latency=0.00s (0.00%) |Generate time=2.28s (72.29%) |Training time=0.68s (21.52%) |Others=0.20 (6.20%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.41 epoch: 0|step: 261|ppo_ep: 1|act_loss: -0.1109619140625|cri_loss: -0.024871826171875|unsuper_loss: 0.0 average reward score: 2.1171875 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.29s (72.31%) |Training time=0.69s (21.75%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.41 epoch: 0|step: 262|ppo_ep: 1|act_loss: -0.17626953125|cri_loss: -0.0592041015625|unsuper_loss: 0.0 average reward score: 0.265625 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.93%) |Training time=0.64s (20.03%) |Others=0.19 (6.04%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 263|ppo_ep: 1|act_loss: 0.1923828125|cri_loss: 0.151611328125|unsuper_loss: 0.0 average reward score: -1.9208984375 ------------------------------------------------------------------------------------- |E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.28s (64.42%) |Training time=0.98s (27.81%) |Others=0.28 (7.77%)|CurSamplesPerSec=2.26 |AvgSamplesPerSec=2.41 epoch: 0|step: 264|ppo_ep: 1|act_loss: 0.1231689453125|cri_loss: 0.1292724609375|unsuper_loss: 0.0 average reward score: -0.5830078125 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.29s (72.03%) |Training time=0.70s (21.97%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.41 epoch: 0|step: 265|ppo_ep: 1|act_loss: -0.097900390625|cri_loss: -0.017852783203125|unsuper_loss: 0.0 average reward score: 0.529296875 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.28s (71.28%) |Training time=0.73s (22.73%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 266|ppo_ep: 1|act_loss: -0.062744140625|cri_loss: -0.013153076171875|unsuper_loss: 0.0 average reward score: -0.363525390625 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.27s (71.38%) |Training time=0.72s (22.61%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 267|ppo_ep: 1|act_loss: 0.23193359375|cri_loss: 0.13720703125|unsuper_loss: 0.0 average reward score: -2.2578125 ------------------------------------------------------------------------------------- |E2E latency=4.76s |Gather latency=0.00s (0.00%) |Generate time=2.30s (48.39%) |Training time=1.20s (25.22%) |Others=1.26 (26.39%)|CurSamplesPerSec=1.68 |AvgSamplesPerSec=2.41 epoch: 0|step: 268|ppo_ep: 1|act_loss: 0.1251220703125|cri_loss: 0.0804443359375|unsuper_loss: 0.0 average reward score: -0.70654296875 ------------------------------------------------------------------------------------- |E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.53s (73.01%) |Training time=0.64s (18.43%) |Others=0.30 (8.57%)|CurSamplesPerSec=2.30 |AvgSamplesPerSec=2.41 epoch: 0|step: 269|ppo_ep: 1|act_loss: -0.01373291015625|cri_loss: 0.00677490234375|unsuper_loss: 0.0 average reward score: -0.257568359375 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.71%) |Training time=0.64s (19.44%) |Others=0.19 (5.85%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 270|ppo_ep: 1|act_loss: 0.2486572265625|cri_loss: 0.153564453125|unsuper_loss: 0.0 average reward score: -0.97021484375 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.94%) |Training time=0.64s (19.94%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 271|ppo_ep: 1|act_loss: -0.038818359375|cri_loss: 0.0458984375|unsuper_loss: 0.0 average reward score: 1.62890625 ------------------------------------------------------------------------------------- |E2E latency=3.57s |Gather latency=0.00s (0.00%) |Generate time=2.36s (66.17%) |Training time=0.93s (25.94%) |Others=0.28 (7.88%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.41 epoch: 0|step: 272|ppo_ep: 1|act_loss: -0.18408203125|cri_loss: -0.040283203125|unsuper_loss: 0.0 average reward score: -0.55859375 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.85%) |Training time=0.64s (20.02%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 273|ppo_ep: 1|act_loss: 0.364013671875|cri_loss: 0.227783203125|unsuper_loss: 0.0 average reward score: -2.138671875 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.27%) |Training time=0.64s (19.75%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 274|ppo_ep: 1|act_loss: -0.1171875|cri_loss: 0.015625|unsuper_loss: 0.0 average reward score: 1.16015625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.33s (72.59%) |Training time=0.69s (21.32%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 275|ppo_ep: 1|act_loss: -0.0606689453125|cri_loss: 6.103515625e-05|unsuper_loss: 0.0 average reward score: -0.5625 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.46%) |Training time=0.66s (20.53%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 276|ppo_ep: 1|act_loss: 0.214111328125|cri_loss: 0.146240234375|unsuper_loss: 0.0 average reward score: 1.857421875 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.38%) |Training time=0.65s (20.28%) |Others=0.20 (6.34%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 277|ppo_ep: 1|act_loss: 0.34228515625|cri_loss: 0.24560546875|unsuper_loss: 0.0 average reward score: 0.17529296875 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.32s (72.63%) |Training time=0.68s (21.35%) |Others=0.19 (6.02%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 278|ppo_ep: 1|act_loss: -0.108154296875|cri_loss: -0.02545166015625|unsuper_loss: 0.0 average reward score: -0.712890625 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.32s (73.07%) |Training time=0.65s (20.67%) |Others=0.20 (6.26%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.41 epoch: 0|step: 279|ppo_ep: 1|act_loss: 0.08599853515625|cri_loss: 0.09912109375|unsuper_loss: 0.0 average reward score: 1.787109375 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.44s (66.96%) |Training time=0.93s (25.54%) |Others=0.27 (7.51%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.41 epoch: 0|step: 280|ppo_ep: 1|act_loss: 0.0406494140625|cri_loss: 0.06243896484375|unsuper_loss: 0.0 average reward score: -0.124755859375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.91%) |Training time=0.65s (20.06%) |Others=0.19 (6.03%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 281|ppo_ep: 1|act_loss: 0.0341796875|cri_loss: 0.10321044921875|unsuper_loss: 0.0 average reward score: 0.1544189453125 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.17%) |Training time=0.66s (20.55%) |Others=0.20 (6.28%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 282|ppo_ep: 1|act_loss: 0.11383056640625|cri_loss: 0.11456298828125|unsuper_loss: 0.0 average reward score: 0.3408203125 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.49%) |Training time=0.65s (19.52%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.41 epoch: 0|step: 283|ppo_ep: 1|act_loss: 0.31591796875|cri_loss: 0.2193603515625|unsuper_loss: 0.0 average reward score: 2.029296875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.59%) |Training time=0.64s (19.62%) |Others=0.19 (5.79%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 284|ppo_ep: 1|act_loss: 0.12548828125|cri_loss: 0.11492919921875|unsuper_loss: 0.0 average reward score: 0.1802978515625 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.43s (73.74%) |Training time=0.66s (20.12%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 285|ppo_ep: 1|act_loss: 0.01409912109375|cri_loss: 0.06732177734375|unsuper_loss: 0.0 average reward score: 0.7216796875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.47%) |Training time=0.64s (19.50%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 286|ppo_ep: 1|act_loss: 0.29296875|cri_loss: 0.15673828125|unsuper_loss: 0.0 average reward score: -1.50390625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.38%) |Training time=0.64s (19.72%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 287|ppo_ep: 1|act_loss: -0.05621337890625|cri_loss: -0.00244140625|unsuper_loss: 0.0 average reward score: -2.265625 ------------------------------------------------------------------------------------- |E2E latency=3.65s |Gather latency=0.00s (0.00%) |Generate time=2.44s (66.82%) |Training time=0.93s (25.43%) |Others=0.28 (7.76%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.41 epoch: 0|step: 288|ppo_ep: 1|act_loss: 0.45458984375|cri_loss: 0.284423828125|unsuper_loss: 0.0 average reward score: 2.109375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.80%) |Training time=0.64s (19.37%) |Others=0.19 (5.82%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 289|ppo_ep: 1|act_loss: -0.1297607421875|cri_loss: 0.0084228515625|unsuper_loss: 0.0 average reward score: -1.421875 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.32s (72.83%) |Training time=0.67s (21.08%) |Others=0.19 (6.09%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 290|ppo_ep: 1|act_loss: 0.06597900390625|cri_loss: 0.09429931640625|unsuper_loss: 0.0 average reward score: 0.0888671875 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.31s (72.83%) |Training time=0.67s (21.13%) |Others=0.19 (6.04%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.41 epoch: 0|step: 291|ppo_ep: 1|act_loss: 0.154541015625|cri_loss: 0.1092529296875|unsuper_loss: 0.0 average reward score: -0.849609375 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.31s (72.56%) |Training time=0.68s (21.37%) |Others=0.19 (6.07%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 292|ppo_ep: 1|act_loss: 0.1043701171875|cri_loss: 0.1141357421875|unsuper_loss: 0.0 average reward score: 0.9052734375 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.85%) |Training time=0.64s (20.00%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 293|ppo_ep: 1|act_loss: -0.194580078125|cri_loss: -0.0272216796875|unsuper_loss: 0.0 average reward score: 1.5576171875 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.02%) |Training time=0.65s (20.25%) |Others=0.22 (6.73%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 294|ppo_ep: 1|act_loss: -0.09912109375|cri_loss: -0.031341552734375|unsuper_loss: 0.0 average reward score: -1.203125 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.69%) |Training time=0.65s (20.28%) |Others=0.19 (6.03%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 295|ppo_ep: 1|act_loss: -0.1529541015625|cri_loss: -0.0006103515625|unsuper_loss: 0.0 average reward score: -0.39697265625 ------------------------------------------------------------------------------------- |E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.34s (65.68%) |Training time=0.95s (26.58%) |Others=0.28 (7.74%)|CurSamplesPerSec=2.25 |AvgSamplesPerSec=2.41 epoch: 0|step: 296|ppo_ep: 1|act_loss: 0.058746337890625|cri_loss: 0.058868408203125|unsuper_loss: 0.0 average reward score: -2.001953125 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.74%) |Training time=0.64s (19.25%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.41 epoch: 0|step: 297|ppo_ep: 1|act_loss: 0.00347900390625|cri_loss: 0.059478759765625|unsuper_loss: 0.0 average reward score: 1.037109375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.04%) |Training time=0.65s (19.93%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 298|ppo_ep: 1|act_loss: 0.095703125|cri_loss: 0.1549072265625|unsuper_loss: 0.0 average reward score: 0.081298828125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.38%) |Training time=0.64s (19.72%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 299|ppo_ep: 1|act_loss: 0.018829345703125|cri_loss: 0.0740966796875|unsuper_loss: 0.0 average reward score: -0.8154296875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.34%) |Training time=0.64s (19.76%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 300|ppo_ep: 1|act_loss: -0.04132080078125|cri_loss: 0.01776123046875|unsuper_loss: 0.0 average reward score: -0.8994140625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.16%) |Training time=0.65s (19.85%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 301|ppo_ep: 1|act_loss: -0.0982666015625|cri_loss: 0.01593017578125|unsuper_loss: 0.0 average reward score: 0.5400390625 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.31s (72.20%) |Training time=0.70s (21.89%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 302|ppo_ep: 1|act_loss: -0.4736328125|cri_loss: -0.1510009765625|unsuper_loss: 0.0 average reward score: -0.0908203125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.10%) |Training time=0.64s (19.63%) |Others=0.20 (6.26%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 303|ppo_ep: 1|act_loss: 0.259033203125|cri_loss: 0.1495361328125|unsuper_loss: 0.0 average reward score: -2.185546875 ------------------------------------------------------------------------------------- |E2E latency=3.72s |Gather latency=0.00s (0.00%) |Generate time=2.51s (67.54%) |Training time=0.93s (24.96%) |Others=0.28 (7.50%)|CurSamplesPerSec=2.15 |AvgSamplesPerSec=2.41 epoch: 0|step: 304|ppo_ep: 1|act_loss: 0.303955078125|cri_loss: 0.216064453125|unsuper_loss: 0.0 average reward score: 0.144775390625 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.43s (73.53%) |Training time=0.68s (20.65%) |Others=0.19 (5.82%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.41 epoch: 0|step: 305|ppo_ep: 1|act_loss: 0.2154541015625|cri_loss: 0.1768798828125|unsuper_loss: 0.0 average reward score: -0.79150390625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (73.95%) |Training time=0.66s (19.97%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 306|ppo_ep: 1|act_loss: 0.33349609375|cri_loss: 0.2000732421875|unsuper_loss: 0.0 average reward score: 0.7099609375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.92%) |Training time=0.64s (20.00%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 307|ppo_ep: 1|act_loss: 0.0596923828125|cri_loss: 0.1370849609375|unsuper_loss: 0.0 average reward score: 1.8291015625 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.53%) |Training time=0.65s (20.36%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 308|ppo_ep: 1|act_loss: 0.374755859375|cri_loss: 0.267333984375|unsuper_loss: 0.0 average reward score: -0.3623046875 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.99%) |Training time=0.64s (19.89%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 309|ppo_ep: 1|act_loss: -0.248046875|cri_loss: -0.0101318359375|unsuper_loss: 0.0 average reward score: 1.0244140625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.29s (71.45%) |Training time=0.72s (22.39%) |Others=0.20 (6.17%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 310|ppo_ep: 1|act_loss: -0.053466796875|cri_loss: 0.06817626953125|unsuper_loss: 0.0 average reward score: -0.49560546875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.55%) |Training time=0.65s (20.23%) |Others=0.20 (6.21%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 311|ppo_ep: 1|act_loss: 0.36181640625|cri_loss: 0.20654296875|unsuper_loss: 0.0 average reward score: -1.228515625 ------------------------------------------------------------------------------------- |E2E latency=3.59s |Gather latency=0.00s (0.00%) |Generate time=2.34s (65.14%) |Training time=0.95s (26.59%) |Others=0.30 (8.26%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.41 epoch: 0|step: 312|ppo_ep: 1|act_loss: 0.2822265625|cri_loss: 0.1915283203125|unsuper_loss: 0.0 average reward score: -0.372314453125 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.76%) |Training time=0.64s (19.44%) |Others=0.19 (5.80%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 313|ppo_ep: 1|act_loss: -0.1455078125|cri_loss: -0.0123291015625|unsuper_loss: 0.0 average reward score: -0.2119140625 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.67%) |Training time=0.65s (20.21%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 314|ppo_ep: 1|act_loss: 0.27685546875|cri_loss: 0.2137451171875|unsuper_loss: 0.0 average reward score: -0.077392578125 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.37%) |Training time=0.65s (19.65%) |Others=0.20 (5.98%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 315|ppo_ep: 1|act_loss: -0.2254638671875|cri_loss: -0.0390625|unsuper_loss: 0.0 average reward score: 0.5 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.33%) |Training time=0.65s (19.73%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 316|ppo_ep: 1|act_loss: 0.2001953125|cri_loss: 0.178466796875|unsuper_loss: 0.0 average reward score: -0.06195068359375 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.43s (73.02%) |Training time=0.68s (20.50%) |Others=0.22 (6.47%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.41 epoch: 0|step: 317|ppo_ep: 1|act_loss: 0.300537109375|cri_loss: 0.1773681640625|unsuper_loss: 0.0 average reward score: -0.10302734375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.13%) |Training time=0.65s (19.86%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 318|ppo_ep: 1|act_loss: 0.31298828125|cri_loss: 0.237548828125|unsuper_loss: 0.0 average reward score: -0.33544921875 ------------------------------------------------------------------------------------- |E2E latency=3.16s |Gather latency=0.00s (0.00%) |Generate time=2.31s (72.93%) |Training time=0.66s (20.73%) |Others=0.20 (6.34%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.41 [2023-04-24 14:05:23,039] [INFO] [logging.py:96:log_dist] [Rank 0] step=40, skipped=4, lr=[3.474e-06, 3.474e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 14:05:23,282] [INFO] [timer.py:199:stop] epoch=0/micro_step=320/global_step=40, RunningAvgSamplesPerSec=15.406867733677492, CurrSamplesPerSec=15.730976295776427, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 14:05:23,498] [INFO] [logging.py:96:log_dist] [Rank 0] step=40, skipped=3, lr=[1.85e-06, 1.85e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 319|ppo_ep: 1|act_loss: 0.302001953125|cri_loss: 0.2236328125|unsuper_loss: 0.0 average reward score: 2.15625 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.40%) |Training time=0.93s (25.59%) |Others=0.29 (8.01%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.41 epoch: 0|step: 320|ppo_ep: 1|act_loss: 0.2005615234375|cri_loss: 0.164306640625|unsuper_loss: 0.0 average reward score: 1.482421875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.17%) |Training time=0.64s (19.90%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 321|ppo_ep: 1|act_loss: 0.703125|cri_loss: 0.419921875|unsuper_loss: 0.0 average reward score: -0.8515625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.39s (72.98%) |Training time=0.69s (20.98%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 322|ppo_ep: 1|act_loss: 0.50634765625|cri_loss: 0.30126953125|unsuper_loss: 0.0 average reward score: 0.24462890625 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.67%) |Training time=0.64s (19.35%) |Others=0.20 (5.98%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.41 epoch: 0|step: 323|ppo_ep: 1|act_loss: 0.38623046875|cri_loss: 0.263427734375|unsuper_loss: 0.0 average reward score: 0.2381591796875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.38s (72.41%) |Training time=0.71s (21.59%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 324|ppo_ep: 1|act_loss: 0.328857421875|cri_loss: 0.205322265625|unsuper_loss: 0.0 average reward score: -0.76171875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.39%) |Training time=0.64s (19.57%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 325|ppo_ep: 1|act_loss: 0.1793212890625|cri_loss: 0.17578125|unsuper_loss: 0.0 average reward score: 1.146484375 ------------------------------------------------------------------------------------- |E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.44s (71.90%) |Training time=0.74s (21.79%) |Others=0.21 (6.31%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.41 epoch: 0|step: 326|ppo_ep: 1|act_loss: 0.439453125|cri_loss: 0.267333984375|unsuper_loss: 0.0 average reward score: 0.66552734375 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.44s (73.38%) |Training time=0.68s (20.50%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.41 epoch: 0|step: 327|ppo_ep: 1|act_loss: 0.483154296875|cri_loss: 0.2958984375|unsuper_loss: 0.0 average reward score: -0.359130859375 ------------------------------------------------------------------------------------- |E2E latency=3.59s |Gather latency=0.00s (0.00%) |Generate time=2.39s (66.50%) |Training time=0.92s (25.74%) |Others=0.28 (7.76%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.41 epoch: 0|step: 328|ppo_ep: 1|act_loss: 0.4951171875|cri_loss: 0.322265625|unsuper_loss: 0.0 average reward score: -0.46728515625 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.82%) |Training time=0.64s (20.06%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 329|ppo_ep: 1|act_loss: 0.27734375|cri_loss: 0.2142333984375|unsuper_loss: 0.0 average reward score: -0.05413818359375 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.32%) |Training time=0.65s (20.53%) |Others=0.20 (6.14%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.41 epoch: 0|step: 330|ppo_ep: 1|act_loss: 0.347412109375|cri_loss: 0.2451171875|unsuper_loss: 0.0 average reward score: 0.367919921875 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.32s (73.02%) |Training time=0.66s (20.83%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.41 epoch: 0|step: 331|ppo_ep: 1|act_loss: 0.3515625|cri_loss: 0.346435546875|unsuper_loss: 0.0 average reward score: 0.461669921875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.85%) |Training time=0.65s (20.04%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 332|ppo_ep: 1|act_loss: 0.36767578125|cri_loss: 0.269775390625|unsuper_loss: 0.0 average reward score: 0.99658203125 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.47%) |Training time=0.64s (19.52%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 333|ppo_ep: 1|act_loss: 0.223876953125|cri_loss: 0.229736328125|unsuper_loss: 0.0 average reward score: 1.537109375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.80%) |Training time=0.65s (20.21%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 334|ppo_ep: 1|act_loss: 0.380859375|cri_loss: 0.25439453125|unsuper_loss: 0.0 average reward score: 0.31396484375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.99%) |Training time=0.65s (19.92%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 335|ppo_ep: 1|act_loss: 0.361328125|cri_loss: 0.2093505859375|unsuper_loss: 0.0 average reward score: -1.37890625 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.38s (65.57%) |Training time=0.95s (26.17%) |Others=0.30 (8.26%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.41 epoch: 0|step: 336|ppo_ep: 1|act_loss: -0.01611328125|cri_loss: 0.150634765625|unsuper_loss: 0.0 average reward score: 0.2421875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.47%) |Training time=0.64s (19.45%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 337|ppo_ep: 1|act_loss: 0.405029296875|cri_loss: 0.34765625|unsuper_loss: 0.0 average reward score: 1.61328125 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.16%) |Training time=0.65s (19.79%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.41 epoch: 0|step: 338|ppo_ep: 1|act_loss: 0.17578125|cri_loss: 0.156982421875|unsuper_loss: 0.0 average reward score: 0.94921875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.43s (73.85%) |Training time=0.66s (19.94%) |Others=0.20 (6.21%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 339|ppo_ep: 1|act_loss: -0.056884765625|cri_loss: 0.0968017578125|unsuper_loss: 0.0 average reward score: 1.94140625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.02%) |Training time=0.65s (19.95%) |Others=0.20 (6.02%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 340|ppo_ep: 1|act_loss: 0.15087890625|cri_loss: 0.13525390625|unsuper_loss: 0.0 average reward score: 1.3046875 ------------------------------------------------------------------------------------- |E2E latency=3.43s |Gather latency=0.00s (0.00%) |Generate time=2.58s (75.04%) |Training time=0.65s (18.95%) |Others=0.21 (6.01%)|CurSamplesPerSec=2.33 |AvgSamplesPerSec=2.41 epoch: 0|step: 341|ppo_ep: 1|act_loss: 0.19384765625|cri_loss: 0.1690673828125|unsuper_loss: 0.0 average reward score: 3.18359375 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.45s (73.73%) |Training time=0.68s (20.33%) |Others=0.20 (5.95%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.41 epoch: 0|step: 342|ppo_ep: 1|act_loss: 0.3671875|cri_loss: 0.21533203125|unsuper_loss: 0.0 average reward score: 0.1953125 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.12%) |Training time=0.65s (19.43%) |Others=0.21 (6.45%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.41 epoch: 0|step: 343|ppo_ep: 1|act_loss: 0.421875|cri_loss: 0.27294921875|unsuper_loss: 0.0 average reward score: -1.94921875 ------------------------------------------------------------------------------------- |E2E latency=3.66s |Gather latency=0.00s (0.00%) |Generate time=2.44s (66.78%) |Training time=0.93s (25.40%) |Others=0.29 (7.82%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.41 epoch: 0|step: 344|ppo_ep: 1|act_loss: 0.194091796875|cri_loss: 0.1856689453125|unsuper_loss: 0.0 average reward score: 0.175537109375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.06%) |Training time=0.65s (19.95%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 345|ppo_ep: 1|act_loss: -0.474609375|cri_loss: -0.1151123046875|unsuper_loss: 0.0 average reward score: 0.41455078125 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.92%) |Training time=0.66s (20.00%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 346|ppo_ep: 1|act_loss: 0.055633544921875|cri_loss: 0.085205078125|unsuper_loss: 0.0 average reward score: 1.5205078125 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.40s (72.59%) |Training time=0.70s (21.17%) |Others=0.21 (6.23%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.41 epoch: 0|step: 347|ppo_ep: 1|act_loss: 0.1319580078125|cri_loss: 0.148193359375|unsuper_loss: 0.0 average reward score: 1.4755859375 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.95%) |Training time=0.64s (19.98%) |Others=0.19 (6.07%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 348|ppo_ep: 1|act_loss: 0.21142578125|cri_loss: 0.140625|unsuper_loss: 0.0 average reward score: 2.01171875 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.40s (72.14%) |Training time=0.72s (21.81%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.41 epoch: 0|step: 349|ppo_ep: 1|act_loss: 0.0401611328125|cri_loss: 0.0888671875|unsuper_loss: 0.0 average reward score: 0.6845703125 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.39s (71.66%) |Training time=0.75s (22.48%) |Others=0.20 (5.86%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.41 epoch: 0|step: 350|ppo_ep: 1|act_loss: -0.08709716796875|cri_loss: 0.015380859375|unsuper_loss: 0.0 average reward score: -0.26513671875 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.42s (72.49%) |Training time=0.71s (21.39%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.41 epoch: 0|step: 351|ppo_ep: 1|act_loss: 0.0577392578125|cri_loss: 0.11431884765625|unsuper_loss: 0.0 average reward score: 1.7958984375 ------------------------------------------------------------------------------------- |E2E latency=3.75s |Gather latency=0.00s (0.00%) |Generate time=2.52s (67.32%) |Training time=0.95s (25.27%) |Others=0.28 (7.41%)|CurSamplesPerSec=2.13 |AvgSamplesPerSec=2.41 epoch: 0|step: 352|ppo_ep: 1|act_loss: 0.48046875|cri_loss: 0.342529296875|unsuper_loss: 0.0 average reward score: -0.0325927734375 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.30s (69.40%) |Training time=0.82s (24.64%) |Others=0.20 (5.96%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.41 epoch: 0|step: 353|ppo_ep: 1|act_loss: 0.043212890625|cri_loss: 0.11376953125|unsuper_loss: 0.0 average reward score: 0.1806640625 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.42s (72.25%) |Training time=0.72s (21.43%) |Others=0.21 (6.32%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.41 epoch: 0|step: 354|ppo_ep: 1|act_loss: 0.036834716796875|cri_loss: 0.0848388671875|unsuper_loss: 0.0 average reward score: 0.2685546875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.39%) |Training time=0.64s (19.61%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 355|ppo_ep: 1|act_loss: 0.0517578125|cri_loss: 0.11767578125|unsuper_loss: 0.0 average reward score: 2.98828125 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.26%) |Training time=0.65s (19.50%) |Others=0.21 (6.24%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.41 epoch: 0|step: 356|ppo_ep: 1|act_loss: -0.1380615234375|cri_loss: -0.01470947265625|unsuper_loss: 0.0 average reward score: 1.23828125 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.33%) |Training time=0.65s (19.70%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 357|ppo_ep: 1|act_loss: -0.06201171875|cri_loss: 0.02886962890625|unsuper_loss: 0.0 average reward score: 2.38671875 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.89%) |Training time=0.64s (19.22%) |Others=0.20 (5.89%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.41 epoch: 0|step: 358|ppo_ep: 1|act_loss: 0.42333984375|cri_loss: 0.321533203125|unsuper_loss: 0.0 average reward score: 0.521484375 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.31s (72.38%) |Training time=0.69s (21.48%) |Others=0.20 (6.14%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 359|ppo_ep: 1|act_loss: 0.00830078125|cri_loss: 0.10076904296875|unsuper_loss: 0.0 average reward score: 0.640625 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.32s (64.82%) |Training time=0.98s (27.27%) |Others=0.28 (7.92%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.41 epoch: 0|step: 360|ppo_ep: 1|act_loss: 0.27001953125|cri_loss: 0.2496337890625|unsuper_loss: 0.0 average reward score: 2.1328125 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.74%) |Training time=0.64s (19.51%) |Others=0.22 (6.76%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 361|ppo_ep: 1|act_loss: 0.157958984375|cri_loss: 0.12298583984375|unsuper_loss: 0.0 average reward score: 2.693359375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.20%) |Training time=0.65s (19.92%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 362|ppo_ep: 1|act_loss: 0.03564453125|cri_loss: 0.10443115234375|unsuper_loss: 0.0 average reward score: 1.537109375 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.32s (72.99%) |Training time=0.66s (20.88%) |Others=0.19 (6.13%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 363|ppo_ep: 1|act_loss: 0.34912109375|cri_loss: 0.244384765625|unsuper_loss: 0.0 average reward score: 2.0390625 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.52%) |Training time=0.65s (20.39%) |Others=0.19 (6.09%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 364|ppo_ep: 1|act_loss: -0.2235107421875|cri_loss: -0.0562744140625|unsuper_loss: 0.0 average reward score: 2.705078125 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.35%) |Training time=0.65s (19.65%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.41 epoch: 0|step: 365|ppo_ep: 1|act_loss: -0.0640869140625|cri_loss: 0.03948974609375|unsuper_loss: 0.0 average reward score: 1.65625 ------------------------------------------------------------------------------------- |E2E latency=3.16s |Gather latency=0.00s (0.00%) |Generate time=2.32s (73.44%) |Training time=0.65s (20.40%) |Others=0.19 (6.16%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.41 epoch: 0|step: 366|ppo_ep: 1|act_loss: 0.2861328125|cri_loss: 0.1968994140625|unsuper_loss: 0.0 average reward score: 1.0302734375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.35s (71.56%) |Training time=0.74s (22.58%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 367|ppo_ep: 1|act_loss: 0.13427734375|cri_loss: 0.1380615234375|unsuper_loss: 0.0 average reward score: -0.3291015625 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.35s (65.24%) |Training time=0.97s (27.06%) |Others=0.28 (7.71%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.41 epoch: 0|step: 368|ppo_ep: 1|act_loss: 0.26953125|cri_loss: 0.230712890625|unsuper_loss: 0.0 average reward score: 1.158203125 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.33s (72.97%) |Training time=0.67s (21.06%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 369|ppo_ep: 1|act_loss: 0.07525634765625|cri_loss: 0.09942626953125|unsuper_loss: 0.0 average reward score: 1.376953125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.16%) |Training time=0.64s (19.84%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 370|ppo_ep: 1|act_loss: 0.0352783203125|cri_loss: 0.08038330078125|unsuper_loss: 0.0 average reward score: 1.451171875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.22%) |Training time=0.64s (19.69%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 371|ppo_ep: 1|act_loss: -0.0491943359375|cri_loss: 0.02215576171875|unsuper_loss: 0.0 average reward score: 2.4921875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.50%) |Training time=0.64s (19.65%) |Others=0.19 (5.85%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 372|ppo_ep: 1|act_loss: 0.222412109375|cri_loss: 0.2230224609375|unsuper_loss: 0.0 average reward score: 3.201171875 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.51s (75.13%) |Training time=0.64s (19.16%) |Others=0.19 (5.71%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.41 epoch: 0|step: 373|ppo_ep: 1|act_loss: 0.564453125|cri_loss: 0.341064453125|unsuper_loss: 0.0 average reward score: 2.7734375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.59%) |Training time=0.64s (19.62%) |Others=0.19 (5.79%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 374|ppo_ep: 1|act_loss: -0.010986328125|cri_loss: 0.107666015625|unsuper_loss: 0.0 average reward score: 2.578125 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.50%) |Training time=0.64s (19.44%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.41 epoch: 0|step: 375|ppo_ep: 1|act_loss: 0.07794189453125|cri_loss: 0.147216796875|unsuper_loss: 0.0 average reward score: 4.23828125 ------------------------------------------------------------------------------------- |E2E latency=3.69s |Gather latency=0.00s (0.00%) |Generate time=2.48s (67.16%) |Training time=0.93s (25.21%) |Others=0.28 (7.63%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.41 epoch: 0|step: 376|ppo_ep: 1|act_loss: -0.16650390625|cri_loss: -0.04217529296875|unsuper_loss: 0.0 average reward score: -0.038818359375 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.80%) |Training time=0.64s (19.21%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.41 epoch: 0|step: 377|ppo_ep: 1|act_loss: -0.013916015625|cri_loss: 0.033416748046875|unsuper_loss: 0.0 average reward score: 1.427734375 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.67%) |Training time=0.64s (19.48%) |Others=0.19 (5.85%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.41 epoch: 0|step: 378|ppo_ep: 1|act_loss: 0.367919921875|cri_loss: 0.230224609375|unsuper_loss: 0.0 average reward score: 1.0068359375 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.54%) |Training time=0.64s (19.45%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 379|ppo_ep: 1|act_loss: -0.1016845703125|cri_loss: 0.00286865234375|unsuper_loss: 0.0 average reward score: 1.046875 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.07%) |Training time=0.67s (20.17%) |Others=0.19 (5.76%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.41 epoch: 0|step: 380|ppo_ep: 1|act_loss: 0.083984375|cri_loss: 0.118408203125|unsuper_loss: 0.0 average reward score: 0.079345703125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.91%) |Training time=0.65s (19.90%) |Others=0.20 (6.19%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 381|ppo_ep: 1|act_loss: 0.4189453125|cri_loss: 0.2626953125|unsuper_loss: 0.0 average reward score: -0.115234375 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.92%) |Training time=0.64s (20.03%) |Others=0.19 (6.05%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 382|ppo_ep: 1|act_loss: 0.108642578125|cri_loss: 0.117431640625|unsuper_loss: 0.0 average reward score: 2.5 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.70%) |Training time=0.64s (20.00%) |Others=0.20 (6.30%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 383|ppo_ep: 1|act_loss: -0.00408935546875|cri_loss: 0.056304931640625|unsuper_loss: 0.0 average reward score: 1.3984375 ------------------------------------------------------------------------------------- |E2E latency=3.65s |Gather latency=0.00s (0.00%) |Generate time=2.44s (66.90%) |Training time=0.93s (25.34%) |Others=0.28 (7.76%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.41 epoch: 0|step: 384|ppo_ep: 1|act_loss: 0.040618896484375|cri_loss: 0.0828857421875|unsuper_loss: 0.0 average reward score: 2.84765625 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.08%) |Training time=0.65s (19.64%) |Others=0.21 (6.27%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 385|ppo_ep: 1|act_loss: -0.0146484375|cri_loss: 0.04083251953125|unsuper_loss: 0.0 average reward score: 2.9375 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.45s (73.51%) |Training time=0.68s (20.50%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.41 epoch: 0|step: 386|ppo_ep: 1|act_loss: 0.72705078125|cri_loss: 0.541015625|unsuper_loss: 0.0 average reward score: 1.79296875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.32%) |Training time=0.64s (19.69%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 387|ppo_ep: 1|act_loss: 0.3837890625|cri_loss: 0.247314453125|unsuper_loss: 0.0 average reward score: 2.21875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.31%) |Training time=0.64s (19.78%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 388|ppo_ep: 1|act_loss: 0.38720703125|cri_loss: 0.29736328125|unsuper_loss: 0.0 average reward score: 1.8759765625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.37%) |Training time=0.64s (19.58%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 389|ppo_ep: 1|act_loss: 0.1793212890625|cri_loss: 0.1365966796875|unsuper_loss: 0.0 average reward score: 2.072265625 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.43s (72.72%) |Training time=0.71s (21.39%) |Others=0.20 (5.88%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.41 epoch: 0|step: 390|ppo_ep: 1|act_loss: 0.43017578125|cri_loss: 0.29443359375|unsuper_loss: 0.0 average reward score: 1.62890625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.56%) |Training time=0.65s (19.95%) |Others=0.21 (6.49%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 391|ppo_ep: 1|act_loss: 0.189208984375|cri_loss: 0.14697265625|unsuper_loss: 0.0 average reward score: 3.25 ------------------------------------------------------------------------------------- |E2E latency=3.65s |Gather latency=0.00s (0.00%) |Generate time=2.43s (66.59%) |Training time=0.93s (25.58%) |Others=0.29 (7.83%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.41 epoch: 0|step: 392|ppo_ep: 1|act_loss: 0.177490234375|cri_loss: 0.156494140625|unsuper_loss: 0.0 average reward score: 0.5712890625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.42%) |Training time=0.63s (19.63%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 393|ppo_ep: 1|act_loss: 0.171875|cri_loss: 0.1309814453125|unsuper_loss: 0.0 average reward score: -0.098388671875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.97%) |Training time=0.64s (19.97%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 394|ppo_ep: 1|act_loss: 0.196044921875|cri_loss: 0.1529541015625|unsuper_loss: 0.0 average reward score: 0.818359375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.57%) |Training time=0.64s (19.54%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 395|ppo_ep: 1|act_loss: 0.2880859375|cri_loss: 0.174072265625|unsuper_loss: 0.0 average reward score: 0.229736328125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.40%) |Training time=0.64s (19.68%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 396|ppo_ep: 1|act_loss: -0.1395263671875|cri_loss: -0.0223388671875|unsuper_loss: 0.0 average reward score: 1.607421875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.28%) |Training time=0.64s (19.70%) |Others=0.20 (6.02%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 397|ppo_ep: 1|act_loss: -0.052734375|cri_loss: 0.03240966796875|unsuper_loss: 0.0 average reward score: 1.626953125 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.19%) |Training time=0.64s (20.11%) |Others=0.21 (6.71%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 398|ppo_ep: 1|act_loss: 0.29638671875|cri_loss: 0.213623046875|unsuper_loss: 0.0 average reward score: 1.6669921875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.44%) |Training time=0.64s (19.64%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 [2023-04-24 14:09:48,547] [INFO] [logging.py:96:log_dist] [Rank 0] step=50, skipped=4, lr=[4.439e-06, 4.439e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 14:09:48,792] [INFO] [timer.py:199:stop] epoch=0/micro_step=400/global_step=50, RunningAvgSamplesPerSec=15.414199032797283, CurrSamplesPerSec=15.869793131108088, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 14:09:49,002] [INFO] [logging.py:96:log_dist] [Rank 0] step=50, skipped=3, lr=[2.35e-06, 2.35e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 399|ppo_ep: 1|act_loss: 0.300048828125|cri_loss: 0.205810546875|unsuper_loss: 0.0 average reward score: 1.0673828125 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.43s (66.70%) |Training time=0.93s (25.46%) |Others=0.29 (7.84%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.41 epoch: 0|step: 400|ppo_ep: 1|act_loss: 0.150390625|cri_loss: 0.129150390625|unsuper_loss: 0.0 average reward score: 2.11328125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.65%) |Training time=0.64s (19.50%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 401|ppo_ep: 1|act_loss: 0.0643310546875|cri_loss: 0.06378173828125|unsuper_loss: 0.0 average reward score: 0.5576171875 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.41%) |Training time=0.64s (19.59%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 402|ppo_ep: 1|act_loss: -0.0911865234375|cri_loss: 0.019287109375|unsuper_loss: 0.0 average reward score: 2.048828125 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.57%) |Training time=0.64s (19.58%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 403|ppo_ep: 1|act_loss: -0.0206298828125|cri_loss: 0.0322265625|unsuper_loss: 0.0 average reward score: 0.734375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.46%) |Training time=0.64s (19.56%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 404|ppo_ep: 1|act_loss: 0.11279296875|cri_loss: 0.1063232421875|unsuper_loss: 0.0 average reward score: 0.52587890625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.21%) |Training time=0.64s (19.71%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 405|ppo_ep: 1|act_loss: 0.11920166015625|cri_loss: 0.151611328125|unsuper_loss: 0.0 average reward score: 3.525390625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.14%) |Training time=0.65s (19.89%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 406|ppo_ep: 1|act_loss: -0.0198974609375|cri_loss: 0.06781005859375|unsuper_loss: 0.0 average reward score: 2.12890625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.90%) |Training time=0.64s (19.94%) |Others=0.20 (6.16%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 407|ppo_ep: 1|act_loss: 0.28564453125|cri_loss: 0.211669921875|unsuper_loss: 0.0 average reward score: 1.1796875 ------------------------------------------------------------------------------------- |E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.42s (66.57%) |Training time=0.94s (25.81%) |Others=0.28 (7.62%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.41 epoch: 0|step: 408|ppo_ep: 1|act_loss: 0.28466796875|cri_loss: 0.1920166015625|unsuper_loss: 0.0 average reward score: 2.0546875 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.85%) |Training time=0.64s (19.17%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.41 epoch: 0|step: 409|ppo_ep: 1|act_loss: 0.298828125|cri_loss: 0.2017822265625|unsuper_loss: 0.0 average reward score: 0.6611328125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.40%) |Training time=0.64s (19.67%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 410|ppo_ep: 1|act_loss: 0.0621337890625|cri_loss: 0.09783935546875|unsuper_loss: 0.0 average reward score: 2.05859375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.10%) |Training time=0.65s (19.83%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 411|ppo_ep: 1|act_loss: 0.227783203125|cri_loss: 0.158203125|unsuper_loss: 0.0 average reward score: 2.31640625 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.58%) |Training time=0.64s (19.52%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 412|ppo_ep: 1|act_loss: 0.14111328125|cri_loss: 0.149658203125|unsuper_loss: 0.0 average reward score: 2.41015625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.44%) |Training time=0.64s (19.53%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 413|ppo_ep: 1|act_loss: 0.18359375|cri_loss: 0.1165771484375|unsuper_loss: 0.0 average reward score: 0.31982421875 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.66%) |Training time=0.64s (19.51%) |Others=0.19 (5.83%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 414|ppo_ep: 1|act_loss: 0.32763671875|cri_loss: 0.23291015625|unsuper_loss: 0.0 average reward score: 1.390625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.71%) |Training time=0.64s (19.94%) |Others=0.20 (6.34%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 415|ppo_ep: 1|act_loss: 0.185546875|cri_loss: 0.1583251953125|unsuper_loss: 0.0 average reward score: 1.7294921875 ------------------------------------------------------------------------------------- |E2E latency=3.65s |Gather latency=0.00s (0.00%) |Generate time=2.45s (66.98%) |Training time=0.93s (25.35%) |Others=0.28 (7.67%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.41 epoch: 0|step: 416|ppo_ep: 1|act_loss: -0.1787109375|cri_loss: -0.02716064453125|unsuper_loss: 0.0 average reward score: 2.84765625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.48%) |Training time=0.64s (19.63%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 417|ppo_ep: 1|act_loss: -0.096435546875|cri_loss: -0.020111083984375|unsuper_loss: 0.0 average reward score: 3.291015625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.28%) |Training time=0.64s (19.77%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 418|ppo_ep: 1|act_loss: -0.021240234375|cri_loss: 0.040740966796875|unsuper_loss: 0.0 average reward score: 0.123046875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.23%) |Training time=0.64s (19.74%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 419|ppo_ep: 1|act_loss: -0.06640625|cri_loss: 0.0003662109375|unsuper_loss: 0.0 average reward score: 1.91015625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.46%) |Training time=0.64s (19.71%) |Others=0.19 (5.83%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 420|ppo_ep: 1|act_loss: -0.1754150390625|cri_loss: -0.0263671875|unsuper_loss: 0.0 average reward score: 3.033203125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.38%) |Training time=0.64s (19.65%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 421|ppo_ep: 1|act_loss: 0.02471923828125|cri_loss: 0.04437255859375|unsuper_loss: 0.0 average reward score: 1.90625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.14%) |Training time=0.64s (19.81%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 422|ppo_ep: 1|act_loss: -0.204345703125|cri_loss: -0.04132080078125|unsuper_loss: 0.0 average reward score: 1.5126953125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.34%) |Training time=0.64s (19.66%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 423|ppo_ep: 1|act_loss: 0.11407470703125|cri_loss: 0.08245849609375|unsuper_loss: 0.0 average reward score: 3.158203125 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.72%) |Training time=0.92s (25.54%) |Others=0.28 (7.74%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.41 epoch: 0|step: 424|ppo_ep: 1|act_loss: 0.49609375|cri_loss: 0.306884765625|unsuper_loss: 0.0 average reward score: 1.7578125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.38%) |Training time=0.64s (19.55%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 425|ppo_ep: 1|act_loss: 0.142333984375|cri_loss: 0.09912109375|unsuper_loss: 0.0 average reward score: 2.203125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.52%) |Training time=0.64s (19.65%) |Others=0.19 (5.83%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 426|ppo_ep: 1|act_loss: 0.0693359375|cri_loss: 0.0799560546875|unsuper_loss: 0.0 average reward score: 2.234375 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.27%) |Training time=0.65s (19.69%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.41 epoch: 0|step: 427|ppo_ep: 1|act_loss: 0.1669921875|cri_loss: 0.1484375|unsuper_loss: 0.0 average reward score: 1.669921875 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.59%) |Training time=0.64s (19.54%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 428|ppo_ep: 1|act_loss: 0.4658203125|cri_loss: 0.3095703125|unsuper_loss: 0.0 average reward score: 2.1015625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.22%) |Training time=0.64s (19.81%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 429|ppo_ep: 1|act_loss: 0.16064453125|cri_loss: 0.13525390625|unsuper_loss: 0.0 average reward score: 2.171875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.35%) |Training time=0.64s (19.72%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 430|ppo_ep: 1|act_loss: 0.130126953125|cri_loss: 0.137451171875|unsuper_loss: 0.0 average reward score: 3.435546875 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.78%) |Training time=0.64s (20.09%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 431|ppo_ep: 1|act_loss: 0.6484375|cri_loss: 0.4296875|unsuper_loss: 0.0 average reward score: 2.322265625 ------------------------------------------------------------------------------------- |E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.43s (66.83%) |Training time=0.93s (25.50%) |Others=0.28 (7.67%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.41 epoch: 0|step: 432|ppo_ep: 1|act_loss: 0.402099609375|cri_loss: 0.242919921875|unsuper_loss: 0.0 average reward score: 1.556640625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.52%) |Training time=0.64s (19.64%) |Others=0.19 (5.85%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 433|ppo_ep: 1|act_loss: 0.16162109375|cri_loss: 0.1163330078125|unsuper_loss: 0.0 average reward score: 1.78515625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.37%) |Training time=0.64s (19.72%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 434|ppo_ep: 1|act_loss: 0.1373291015625|cri_loss: 0.11083984375|unsuper_loss: 0.0 average reward score: 2.00390625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.48%) |Training time=0.64s (19.67%) |Others=0.19 (5.85%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 435|ppo_ep: 1|act_loss: 0.50927734375|cri_loss: 0.300048828125|unsuper_loss: 0.0 average reward score: 1.076171875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.30%) |Training time=0.64s (19.74%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 436|ppo_ep: 1|act_loss: 0.0831298828125|cri_loss: 0.07281494140625|unsuper_loss: 0.0 average reward score: 1.044921875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.14%) |Training time=0.64s (19.82%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 437|ppo_ep: 1|act_loss: 0.301025390625|cri_loss: 0.2005615234375|unsuper_loss: 0.0 average reward score: 1.595703125 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.55%) |Training time=0.64s (20.12%) |Others=0.20 (6.33%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 438|ppo_ep: 1|act_loss: 0.302734375|cri_loss: 0.2213134765625|unsuper_loss: 0.0 average reward score: 2.65234375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.33%) |Training time=0.64s (19.72%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 439|ppo_ep: 1|act_loss: 0.2705078125|cri_loss: 0.166259765625|unsuper_loss: 0.0 average reward score: 3.76171875 ------------------------------------------------------------------------------------- |E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.47%) |Training time=0.94s (25.83%) |Others=0.28 (7.70%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.41 epoch: 0|step: 440|ppo_ep: 1|act_loss: -0.2335205078125|cri_loss: -0.090087890625|unsuper_loss: 0.0 average reward score: 0.74609375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.09%) |Training time=0.64s (19.82%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 441|ppo_ep: 1|act_loss: -0.014923095703125|cri_loss: 0.0090484619140625|unsuper_loss: 0.0 average reward score: 2.0546875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.29%) |Training time=0.64s (19.69%) |Others=0.20 (6.02%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 442|ppo_ep: 1|act_loss: -0.0439453125|cri_loss: -0.000946044921875|unsuper_loss: 0.0 average reward score: 3.07421875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.98%) |Training time=0.64s (19.98%) |Others=0.19 (6.05%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 443|ppo_ep: 1|act_loss: 0.2132568359375|cri_loss: 0.1708984375|unsuper_loss: 0.0 average reward score: 0.8662109375 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.93%) |Training time=0.64s (19.30%) |Others=0.19 (5.78%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.41 epoch: 0|step: 444|ppo_ep: 1|act_loss: 0.06842041015625|cri_loss: 0.08612060546875|unsuper_loss: 0.0 average reward score: 4.4921875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.33s (72.72%) |Training time=0.65s (20.14%) |Others=0.23 (7.14%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 445|ppo_ep: 1|act_loss: 0.139404296875|cri_loss: 0.1104736328125|unsuper_loss: 0.0 average reward score: 1.892578125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.24%) |Training time=0.65s (19.79%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 446|ppo_ep: 1|act_loss: -0.0172119140625|cri_loss: 0.0487060546875|unsuper_loss: 0.0 average reward score: 2.23046875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.06%) |Training time=0.64s (19.82%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 447|ppo_ep: 1|act_loss: -0.1790771484375|cri_loss: -0.0445556640625|unsuper_loss: 0.0 average reward score: 1.505859375 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.44s (67.00%) |Training time=0.92s (25.35%) |Others=0.28 (7.66%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.41 epoch: 0|step: 448|ppo_ep: 1|act_loss: -0.011077880859375|cri_loss: 0.0230712890625|unsuper_loss: 0.0 average reward score: 3.1953125 ------------------------------------------------------------------------------------- |E2E latency=3.86s |Gather latency=0.00s (0.00%) |Generate time=2.42s (62.75%) |Training time=0.76s (19.82%) |Others=0.67 (17.43%)|CurSamplesPerSec=2.07 |AvgSamplesPerSec=2.41 epoch: 0|step: 449|ppo_ep: 1|act_loss: 0.01708984375|cri_loss: 0.04107666015625|unsuper_loss: 0.0 average reward score: 2.865234375 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.44s (73.41%) |Training time=0.64s (19.26%) |Others=0.24 (7.32%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.41 epoch: 0|step: 450|ppo_ep: 1|act_loss: 0.1728515625|cri_loss: 0.126220703125|unsuper_loss: 0.0 average reward score: 2.80078125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.08%) |Training time=0.64s (19.95%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 451|ppo_ep: 1|act_loss: -0.10247802734375|cri_loss: -0.01959228515625|unsuper_loss: 0.0 average reward score: 2.65625 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.80%) |Training time=0.64s (19.39%) |Others=0.19 (5.81%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 452|ppo_ep: 1|act_loss: 0.0638427734375|cri_loss: 0.050201416015625|unsuper_loss: 0.0 average reward score: 1.408203125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.18%) |Training time=0.64s (19.85%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 453|ppo_ep: 1|act_loss: 0.11328125|cri_loss: 0.10638427734375|unsuper_loss: 0.0 average reward score: 2.794921875 ------------------------------------------------------------------------------------- |E2E latency=3.41s |Gather latency=0.00s (0.00%) |Generate time=2.40s (70.27%) |Training time=0.82s (24.10%) |Others=0.19 (5.63%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.41 epoch: 0|step: 454|ppo_ep: 1|act_loss: 0.269775390625|cri_loss: 0.17724609375|unsuper_loss: 0.0 average reward score: 0.88623046875 ------------------------------------------------------------------------------------- |E2E latency=3.41s |Gather latency=0.00s (0.00%) |Generate time=2.39s (70.06%) |Training time=0.83s (24.36%) |Others=0.19 (5.58%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.41 epoch: 0|step: 455|ppo_ep: 1|act_loss: -0.170166015625|cri_loss: 0.0364990234375|unsuper_loss: 0.0 average reward score: 3.6796875 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.38s (66.51%) |Training time=0.93s (25.89%) |Others=0.27 (7.60%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.41 epoch: 0|step: 456|ppo_ep: 1|act_loss: 0.11041259765625|cri_loss: 0.08648681640625|unsuper_loss: 0.0 average reward score: 3.0859375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.24%) |Training time=0.64s (19.80%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 457|ppo_ep: 1|act_loss: 0.134521484375|cri_loss: 0.10272216796875|unsuper_loss: 0.0 average reward score: 3.25390625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.06%) |Training time=0.64s (19.99%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 458|ppo_ep: 1|act_loss: 0.1038818359375|cri_loss: 0.0777587890625|unsuper_loss: 0.0 average reward score: 3.220703125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.28%) |Training time=0.64s (19.81%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 459|ppo_ep: 1|act_loss: 0.418212890625|cri_loss: 0.24853515625|unsuper_loss: 0.0 average reward score: 2.703125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.27%) |Training time=0.64s (19.87%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 460|ppo_ep: 1|act_loss: 0.58203125|cri_loss: 0.32958984375|unsuper_loss: 0.0 average reward score: 1.029296875 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.36s (74.08%) |Training time=0.64s (19.94%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 461|ppo_ep: 1|act_loss: 0.353759765625|cri_loss: 0.2236328125|unsuper_loss: 0.0 average reward score: 2.29296875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.34s (72.97%) |Training time=0.64s (19.91%) |Others=0.23 (7.12%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 462|ppo_ep: 1|act_loss: 0.16796875|cri_loss: 0.1368408203125|unsuper_loss: 0.0 average reward score: 4.09375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.86%) |Training time=0.64s (19.97%) |Others=0.20 (6.16%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 463|ppo_ep: 1|act_loss: 0.33837890625|cri_loss: 0.195556640625|unsuper_loss: 0.0 average reward score: 1.833984375 ------------------------------------------------------------------------------------- |E2E latency=3.57s |Gather latency=0.00s (0.00%) |Generate time=2.37s (66.26%) |Training time=0.93s (25.92%) |Others=0.28 (7.82%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.41 epoch: 0|step: 464|ppo_ep: 1|act_loss: 0.16943359375|cri_loss: 0.132080078125|unsuper_loss: 0.0 average reward score: 2.29296875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.16%) |Training time=0.64s (19.83%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 465|ppo_ep: 1|act_loss: 0.347900390625|cri_loss: 0.215087890625|unsuper_loss: 0.0 average reward score: 2.994140625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.88%) |Training time=0.65s (20.06%) |Others=0.19 (6.06%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 466|ppo_ep: 1|act_loss: 0.11676025390625|cri_loss: 0.1429443359375|unsuper_loss: 0.0 average reward score: 3.412109375 ------------------------------------------------------------------------------------- |E2E latency=3.78s |Gather latency=0.00s (0.00%) |Generate time=2.38s (62.84%) |Training time=0.64s (16.94%) |Others=0.76 (20.21%)|CurSamplesPerSec=2.12 |AvgSamplesPerSec=2.41 epoch: 0|step: 467|ppo_ep: 1|act_loss: 0.0379638671875|cri_loss: 0.0523681640625|unsuper_loss: 0.0 average reward score: 1.318359375 ------------------------------------------------------------------------------------- |E2E latency=3.59s |Gather latency=0.00s (0.00%) |Generate time=2.59s (72.16%) |Training time=0.64s (17.79%) |Others=0.36 (10.06%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.41 epoch: 0|step: 468|ppo_ep: 1|act_loss: 0.1046142578125|cri_loss: 0.088134765625|unsuper_loss: 0.0 average reward score: 3.546875 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.95%) |Training time=0.64s (19.92%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 469|ppo_ep: 1|act_loss: -0.15673828125|cri_loss: -0.03619384765625|unsuper_loss: 0.0 average reward score: 2.44921875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.93%) |Training time=0.64s (20.01%) |Others=0.19 (6.06%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 470|ppo_ep: 1|act_loss: 0.1767578125|cri_loss: 0.1468505859375|unsuper_loss: 0.0 average reward score: 2.76953125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.09%) |Training time=0.64s (19.85%) |Others=0.19 (6.05%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 471|ppo_ep: 1|act_loss: 0.8291015625|cri_loss: 0.51953125|unsuper_loss: 0.0 average reward score: 1.2744140625 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.38s (66.44%) |Training time=0.93s (25.89%) |Others=0.27 (7.67%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.41 epoch: 0|step: 472|ppo_ep: 1|act_loss: -0.030029296875|cri_loss: 0.05792236328125|unsuper_loss: 0.0 average reward score: 1.89453125 ------------------------------------------------------------------------------------- |E2E latency=3.16s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.63%) |Training time=0.63s (20.09%) |Others=0.20 (6.28%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.41 epoch: 0|step: 473|ppo_ep: 1|act_loss: 0.0609130859375|cri_loss: 0.07830810546875|unsuper_loss: 0.0 average reward score: 1.349609375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.59%) |Training time=0.64s (19.47%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 474|ppo_ep: 1|act_loss: -0.259033203125|cri_loss: -0.07818603515625|unsuper_loss: 0.0 average reward score: 2.62890625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.78%) |Training time=0.64s (19.99%) |Others=0.20 (6.23%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 475|ppo_ep: 1|act_loss: 0.046356201171875|cri_loss: 0.0941162109375|unsuper_loss: 0.0 average reward score: 1.5732421875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.55%) |Training time=0.64s (20.07%) |Others=0.21 (6.38%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 476|ppo_ep: 1|act_loss: -0.1787109375|cri_loss: -0.03826904296875|unsuper_loss: 0.0 average reward score: 3.0390625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.17%) |Training time=0.64s (19.77%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 477|ppo_ep: 1|act_loss: -0.05712890625|cri_loss: 0.00604248046875|unsuper_loss: 0.0 average reward score: 2.48046875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.07%) |Training time=0.64s (19.73%) |Others=0.20 (6.19%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 478|ppo_ep: 1|act_loss: -0.255615234375|cri_loss: -0.0711669921875|unsuper_loss: 0.0 average reward score: 2.91015625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.11%) |Training time=0.64s (19.76%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 [2023-04-24 14:14:13,593] [INFO] [logging.py:96:log_dist] [Rank 0] step=60, skipped=4, lr=[5.404000000000001e-06, 5.404000000000001e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 14:14:13,838] [INFO] [timer.py:199:stop] epoch=0/micro_step=480/global_step=60, RunningAvgSamplesPerSec=15.463003624707108, CurrSamplesPerSec=15.878707430987026, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 14:14:14,048] [INFO] [logging.py:96:log_dist] [Rank 0] step=60, skipped=3, lr=[2.85e-06, 2.85e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 479|ppo_ep: 1|act_loss: 0.05584716796875|cri_loss: 0.069580078125|unsuper_loss: 0.0 average reward score: 2.66015625 ------------------------------------------------------------------------------------- |E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.38%) |Training time=0.94s (25.81%) |Others=0.28 (7.81%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.41 epoch: 0|step: 480|ppo_ep: 1|act_loss: -0.08502197265625|cri_loss: -0.00054931640625|unsuper_loss: 0.0 average reward score: 3.32421875 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.76%) |Training time=0.64s (20.19%) |Others=0.19 (6.05%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.41 epoch: 0|step: 481|ppo_ep: 1|act_loss: 0.05780029296875|cri_loss: 0.084228515625|unsuper_loss: 0.0 average reward score: 2.375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.27%) |Training time=0.64s (19.65%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 482|ppo_ep: 1|act_loss: -0.09332275390625|cri_loss: -0.010009765625|unsuper_loss: 0.0 average reward score: 0.6875 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.41%) |Training time=0.64s (19.23%) |Others=0.21 (6.36%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.41 epoch: 0|step: 483|ppo_ep: 1|act_loss: -0.153076171875|cri_loss: -0.0224609375|unsuper_loss: 0.0 average reward score: 2.20703125 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.26%) |Training time=0.65s (19.74%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 484|ppo_ep: 1|act_loss: -0.02313232421875|cri_loss: 0.029998779296875|unsuper_loss: 0.0 average reward score: 3.0546875 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.65%) |Training time=0.64s (19.43%) |Others=0.20 (5.92%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.41 epoch: 0|step: 485|ppo_ep: 1|act_loss: -0.0174560546875|cri_loss: 0.04840087890625|unsuper_loss: 0.0 average reward score: 3.62890625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.30%) |Training time=0.64s (19.67%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 486|ppo_ep: 1|act_loss: -0.06231689453125|cri_loss: 0.026611328125|unsuper_loss: 0.0 average reward score: 3.51953125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.22%) |Training time=0.64s (19.64%) |Others=0.20 (6.14%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 487|ppo_ep: 1|act_loss: -0.0770263671875|cri_loss: 0.02740478515625|unsuper_loss: 0.0 average reward score: 2.384765625 ------------------------------------------------------------------------------------- |E2E latency=3.66s |Gather latency=0.00s (0.00%) |Generate time=2.46s (67.21%) |Training time=0.93s (25.36%) |Others=0.27 (7.43%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.41 epoch: 0|step: 488|ppo_ep: 1|act_loss: -0.015625|cri_loss: 0.033050537109375|unsuper_loss: 0.0 average reward score: 1.662109375 ------------------------------------------------------------------------------------- |E2E latency=3.16s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.73%) |Training time=0.63s (20.11%) |Others=0.19 (6.16%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.41 epoch: 0|step: 489|ppo_ep: 1|act_loss: 0.0865478515625|cri_loss: 0.07635498046875|unsuper_loss: 0.0 average reward score: 3.10546875 ------------------------------------------------------------------------------------- |E2E latency=3.16s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.57%) |Training time=0.64s (20.34%) |Others=0.19 (6.09%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.41 epoch: 0|step: 490|ppo_ep: 1|act_loss: 0.09246826171875|cri_loss: 0.07647705078125|unsuper_loss: 0.0 average reward score: 2.2265625 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.43%) |Training time=0.64s (20.24%) |Others=0.20 (6.33%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.41 epoch: 0|step: 491|ppo_ep: 1|act_loss: 0.2025146484375|cri_loss: 0.1402587890625|unsuper_loss: 0.0 average reward score: 0.818359375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.04%) |Training time=0.64s (20.01%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 492|ppo_ep: 1|act_loss: -0.09051513671875|cri_loss: -0.010498046875|unsuper_loss: 0.0 average reward score: 2.1953125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.08%) |Training time=0.64s (19.85%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 493|ppo_ep: 1|act_loss: 0.098388671875|cri_loss: 0.116455078125|unsuper_loss: 0.0 average reward score: 2.91796875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.87%) |Training time=0.64s (20.06%) |Others=0.19 (6.07%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 494|ppo_ep: 1|act_loss: 0.059906005859375|cri_loss: 0.0679931640625|unsuper_loss: 0.0 average reward score: 2.73828125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.16%) |Training time=0.64s (19.80%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 495|ppo_ep: 1|act_loss: -0.234130859375|cri_loss: -0.052001953125|unsuper_loss: 0.0 average reward score: 2.8125 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.38s (66.41%) |Training time=0.93s (25.96%) |Others=0.27 (7.63%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.41 epoch: 0|step: 496|ppo_ep: 1|act_loss: 0.129638671875|cri_loss: 0.112548828125|unsuper_loss: 0.0 average reward score: 2.97265625 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.36s (74.13%) |Training time=0.63s (19.88%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.41 epoch: 0|step: 497|ppo_ep: 1|act_loss: 0.3310546875|cri_loss: 0.1851806640625|unsuper_loss: 0.0 average reward score: 2.10546875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.04%) |Training time=0.64s (19.86%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 498|ppo_ep: 1|act_loss: 0.2021484375|cri_loss: 0.132080078125|unsuper_loss: 0.0 average reward score: 2.865234375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.38%) |Training time=0.64s (19.63%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 499|ppo_ep: 1|act_loss: 0.40771484375|cri_loss: 0.2646484375|unsuper_loss: 0.0 average reward score: 3.30859375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.45%) |Training time=0.64s (19.73%) |Others=0.19 (5.82%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 500|ppo_ep: 1|act_loss: 0.30712890625|cri_loss: 0.20263671875|unsuper_loss: 0.0 average reward score: 2.02734375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.56%) |Training time=0.64s (19.62%) |Others=0.19 (5.83%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 501|ppo_ep: 1|act_loss: 0.1522216796875|cri_loss: 0.09930419921875|unsuper_loss: 0.0 average reward score: 2.60546875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.34%) |Training time=0.64s (19.70%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 502|ppo_ep: 1|act_loss: 0.2191162109375|cri_loss: 0.1351318359375|unsuper_loss: 0.0 average reward score: 3.005859375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.14%) |Training time=0.64s (19.81%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 503|ppo_ep: 1|act_loss: 0.27685546875|cri_loss: 0.172119140625|unsuper_loss: 0.0 average reward score: 1.4072265625 ------------------------------------------------------------------------------------- |E2E latency=3.66s |Gather latency=0.00s (0.00%) |Generate time=2.43s (66.54%) |Training time=0.93s (25.40%) |Others=0.29 (8.06%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.41 epoch: 0|step: 504|ppo_ep: 1|act_loss: 0.25146484375|cri_loss: 0.1746826171875|unsuper_loss: 0.0 average reward score: 2.55078125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.41%) |Training time=0.64s (19.65%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 505|ppo_ep: 1|act_loss: 0.177001953125|cri_loss: 0.1199951171875|unsuper_loss: 0.0 average reward score: 2.74609375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.27%) |Training time=0.64s (19.87%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 506|ppo_ep: 1|act_loss: 0.225341796875|cri_loss: 0.1483154296875|unsuper_loss: 0.0 average reward score: 3.828125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.11%) |Training time=0.64s (19.92%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 507|ppo_ep: 1|act_loss: 0.374267578125|cri_loss: 0.220947265625|unsuper_loss: 0.0 average reward score: 3.314453125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.40%) |Training time=0.64s (19.76%) |Others=0.19 (5.84%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 508|ppo_ep: 1|act_loss: 0.1810302734375|cri_loss: 0.12451171875|unsuper_loss: 0.0 average reward score: 2.5546875 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.34%) |Training time=0.65s (19.67%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 509|ppo_ep: 1|act_loss: 0.34765625|cri_loss: 0.2166748046875|unsuper_loss: 0.0 average reward score: 2.189453125 ------------------------------------------------------------------------------------- |E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.56s (75.42%) |Training time=0.64s (18.88%) |Others=0.19 (5.70%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.41 epoch: 0|step: 510|ppo_ep: 1|act_loss: 0.1663818359375|cri_loss: 0.1356201171875|unsuper_loss: 0.0 average reward score: 2.75 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.47%) |Training time=0.64s (19.69%) |Others=0.19 (5.84%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 511|ppo_ep: 1|act_loss: 0.33251953125|cri_loss: 0.20458984375|unsuper_loss: 0.0 average reward score: 0.8837890625 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.86%) |Training time=0.92s (25.51%) |Others=0.28 (7.64%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.41 epoch: 0|step: 512|ppo_ep: 1|act_loss: 0.09613037109375|cri_loss: 0.07305908203125|unsuper_loss: 0.0 average reward score: 3.4140625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.15%) |Training time=0.64s (19.88%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 513|ppo_ep: 1|act_loss: 0.0872802734375|cri_loss: 0.08349609375|unsuper_loss: 0.0 average reward score: 3.4765625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.82%) |Training time=0.66s (20.27%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 514|ppo_ep: 1|act_loss: -0.0372314453125|cri_loss: 0.02593994140625|unsuper_loss: 0.0 average reward score: 3.197265625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.09%) |Training time=0.69s (21.10%) |Others=0.19 (5.81%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 515|ppo_ep: 1|act_loss: 0.43994140625|cri_loss: 0.2724609375|unsuper_loss: 0.0 average reward score: 1.501953125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.35%) |Training time=0.68s (20.82%) |Others=0.19 (5.83%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 516|ppo_ep: 1|act_loss: 0.1959228515625|cri_loss: 0.131103515625|unsuper_loss: 0.0 average reward score: 0.03466796875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.36s (72.91%) |Training time=0.68s (21.12%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 517|ppo_ep: 1|act_loss: 0.051300048828125|cri_loss: 0.042938232421875|unsuper_loss: 0.0 average reward score: 1.3359375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.30s (71.25%) |Training time=0.73s (22.73%) |Others=0.19 (6.02%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 518|ppo_ep: 1|act_loss: -0.006500244140625|cri_loss: 0.0160980224609375|unsuper_loss: 0.0 average reward score: 2.31640625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.34s (72.35%) |Training time=0.70s (21.59%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 519|ppo_ep: 1|act_loss: 0.05133056640625|cri_loss: 0.08197021484375|unsuper_loss: 0.0 average reward score: 2.880859375 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.78%) |Training time=0.93s (25.64%) |Others=0.27 (7.58%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.41 epoch: 0|step: 520|ppo_ep: 1|act_loss: -0.11749267578125|cri_loss: -0.01953125|unsuper_loss: 0.0 average reward score: 2.76953125 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.46s (75.08%) |Training time=0.63s (19.19%) |Others=0.19 (5.73%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 521|ppo_ep: 1|act_loss: -0.01434326171875|cri_loss: 0.038360595703125|unsuper_loss: 0.0 average reward score: 2.7421875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.56%) |Training time=0.64s (19.63%) |Others=0.19 (5.81%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 522|ppo_ep: 1|act_loss: 0.24609375|cri_loss: 0.1611328125|unsuper_loss: 0.0 average reward score: 1.9638671875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.46%) |Training time=0.63s (19.60%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 523|ppo_ep: 1|act_loss: 0.1466064453125|cri_loss: 0.1107177734375|unsuper_loss: 0.0 average reward score: 2.5234375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.25%) |Training time=0.64s (19.85%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 524|ppo_ep: 1|act_loss: -0.06640625|cri_loss: 0.0146484375|unsuper_loss: 0.0 average reward score: 3.03515625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.19%) |Training time=0.64s (19.64%) |Others=0.20 (6.17%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 525|ppo_ep: 1|act_loss: 0.0582275390625|cri_loss: 0.08721923828125|unsuper_loss: 0.0 average reward score: 4.21875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.30%) |Training time=0.64s (19.81%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 526|ppo_ep: 1|act_loss: 0.2056884765625|cri_loss: 0.1533203125|unsuper_loss: 0.0 average reward score: 2.373046875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.17%) |Training time=0.64s (19.79%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 527|ppo_ep: 1|act_loss: -0.035308837890625|cri_loss: 0.01177978515625|unsuper_loss: 0.0 average reward score: 2.91796875 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.64%) |Training time=0.93s (25.60%) |Others=0.28 (7.77%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.41 epoch: 0|step: 528|ppo_ep: 1|act_loss: 0.237060546875|cri_loss: 0.160888671875|unsuper_loss: 0.0 average reward score: 2.37890625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.76%) |Training time=0.64s (19.87%) |Others=0.21 (6.37%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 529|ppo_ep: 1|act_loss: 0.2958984375|cri_loss: 0.180908203125|unsuper_loss: 0.0 average reward score: 2.03515625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.91%) |Training time=0.64s (20.07%) |Others=0.19 (6.02%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 530|ppo_ep: 1|act_loss: 0.26806640625|cri_loss: 0.1966552734375|unsuper_loss: 0.0 average reward score: 3.734375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.06%) |Training time=0.64s (19.81%) |Others=0.20 (6.14%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 531|ppo_ep: 1|act_loss: 0.27197265625|cri_loss: 0.1695556640625|unsuper_loss: 0.0 average reward score: 3.36328125 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.13%) |Training time=0.64s (19.91%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 532|ppo_ep: 1|act_loss: 0.29736328125|cri_loss: 0.2265625|unsuper_loss: 0.0 average reward score: 4.37890625 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (74.01%) |Training time=0.64s (19.96%) |Others=0.19 (6.03%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 533|ppo_ep: 1|act_loss: 0.20263671875|cri_loss: 0.162109375|unsuper_loss: 0.0 average reward score: 2.6875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.12%) |Training time=0.64s (19.97%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 534|ppo_ep: 1|act_loss: 0.1358642578125|cri_loss: 0.10595703125|unsuper_loss: 0.0 average reward score: 3.00390625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.95%) |Training time=0.64s (19.97%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 535|ppo_ep: 1|act_loss: 0.42333984375|cri_loss: 0.288330078125|unsuper_loss: 0.0 average reward score: 4.4765625 ------------------------------------------------------------------------------------- |E2E latency=3.57s |Gather latency=0.00s (0.00%) |Generate time=2.38s (66.61%) |Training time=0.92s (25.76%) |Others=0.27 (7.63%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.41 epoch: 0|step: 536|ppo_ep: 1|act_loss: 0.19482421875|cri_loss: 0.164306640625|unsuper_loss: 0.0 average reward score: 5.203125 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.11%) |Training time=0.64s (19.98%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 537|ppo_ep: 1|act_loss: 0.43408203125|cri_loss: 0.27001953125|unsuper_loss: 0.0 average reward score: 2.61328125 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.24%) |Training time=0.64s (19.89%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 538|ppo_ep: 1|act_loss: 0.1689453125|cri_loss: 0.140625|unsuper_loss: 0.0 average reward score: 2.126953125 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.38s (70.83%) |Training time=0.77s (23.02%) |Others=0.21 (6.14%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.41 epoch: 0|step: 539|ppo_ep: 1|act_loss: 0.2274169921875|cri_loss: 0.140869140625|unsuper_loss: 0.0 average reward score: 1.76171875 ------------------------------------------------------------------------------------- |E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.42s (68.95%) |Training time=0.84s (23.99%) |Others=0.25 (7.06%)|CurSamplesPerSec=2.28 |AvgSamplesPerSec=2.41 epoch: 0|step: 540|ppo_ep: 1|act_loss: 0.093994140625|cri_loss: 0.073486328125|unsuper_loss: 0.0 average reward score: 3.12890625 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.40s (71.93%) |Training time=0.74s (22.10%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.41 epoch: 0|step: 541|ppo_ep: 1|act_loss: 0.0850830078125|cri_loss: 0.093017578125|unsuper_loss: 0.0 average reward score: 1.7431640625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.16%) |Training time=0.69s (21.03%) |Others=0.19 (5.80%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 542|ppo_ep: 1|act_loss: 0.404541015625|cri_loss: 0.2705078125|unsuper_loss: 0.0 average reward score: 2.607421875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.39%) |Training time=0.67s (20.66%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 543|ppo_ep: 1|act_loss: 0.44140625|cri_loss: 0.322021484375|unsuper_loss: 0.0 average reward score: 2.703125 ------------------------------------------------------------------------------------- |E2E latency=3.57s |Gather latency=0.00s (0.00%) |Generate time=2.38s (66.68%) |Training time=0.92s (25.71%) |Others=0.27 (7.61%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.41 epoch: 0|step: 544|ppo_ep: 1|act_loss: 0.6240234375|cri_loss: 0.39501953125|unsuper_loss: 0.0 average reward score: 2.76953125 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.71%) |Training time=0.64s (19.89%) |Others=0.20 (6.40%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 545|ppo_ep: 1|act_loss: 0.248779296875|cri_loss: 0.1885986328125|unsuper_loss: 0.0 average reward score: 1.998046875 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (74.15%) |Training time=0.64s (19.90%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 546|ppo_ep: 1|act_loss: 0.1771240234375|cri_loss: 0.163330078125|unsuper_loss: 0.0 average reward score: 3.826171875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.05%) |Training time=0.64s (20.05%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 547|ppo_ep: 1|act_loss: 0.2095947265625|cri_loss: 0.142822265625|unsuper_loss: 0.0 average reward score: 3.25390625 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (74.14%) |Training time=0.64s (19.90%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 548|ppo_ep: 1|act_loss: 0.057098388671875|cri_loss: 0.06396484375|unsuper_loss: 0.0 average reward score: 2.314453125 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (74.22%) |Training time=0.64s (19.89%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 549|ppo_ep: 1|act_loss: 0.18994140625|cri_loss: 0.1220703125|unsuper_loss: 0.0 average reward score: 2.4609375 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.24%) |Training time=0.64s (19.92%) |Others=0.19 (5.84%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 550|ppo_ep: 1|act_loss: 0.30712890625|cri_loss: 0.21142578125|unsuper_loss: 0.0 average reward score: 1.7490234375 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (74.17%) |Training time=0.64s (19.94%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 551|ppo_ep: 1|act_loss: 0.107177734375|cri_loss: 0.1197509765625|unsuper_loss: 0.0 average reward score: 3.015625 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.38s (66.47%) |Training time=0.93s (25.88%) |Others=0.27 (7.65%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.41 epoch: 0|step: 552|ppo_ep: 1|act_loss: 0.0126953125|cri_loss: 0.06231689453125|unsuper_loss: 0.0 average reward score: 3.267578125 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.10%) |Training time=0.64s (20.01%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 553|ppo_ep: 1|act_loss: -0.01690673828125|cri_loss: 0.037506103515625|unsuper_loss: 0.0 average reward score: 2.689453125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.73%) |Training time=0.64s (19.93%) |Others=0.20 (6.34%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 554|ppo_ep: 1|act_loss: -0.0076904296875|cri_loss: 0.05450439453125|unsuper_loss: 0.0 average reward score: 1.8994140625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.92%) |Training time=0.64s (20.00%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 555|ppo_ep: 1|act_loss: -0.033935546875|cri_loss: 0.04443359375|unsuper_loss: 0.0 average reward score: 1.69140625 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.12%) |Training time=0.64s (19.98%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 556|ppo_ep: 1|act_loss: 0.023193359375|cri_loss: 0.08984375|unsuper_loss: 0.0 average reward score: 2.64453125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.53%) |Training time=0.64s (19.57%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 557|ppo_ep: 1|act_loss: -0.22607421875|cri_loss: -0.046630859375|unsuper_loss: 0.0 average reward score: 3.0234375 ------------------------------------------------------------------------------------- |E2E latency=3.14s |Gather latency=0.00s (0.00%) |Generate time=2.31s (73.57%) |Training time=0.64s (20.36%) |Others=0.19 (6.07%)|CurSamplesPerSec=2.55 |AvgSamplesPerSec=2.42 epoch: 0|step: 558|ppo_ep: 1|act_loss: -0.0484619140625|cri_loss: 0.0443115234375|unsuper_loss: 0.0 average reward score: 2.494140625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.72%) |Training time=0.65s (20.06%) |Others=0.20 (6.22%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 [2023-04-24 14:18:36,365] [INFO] [logging.py:96:log_dist] [Rank 0] step=70, skipped=4, lr=[6.369000000000001e-06, 6.369000000000001e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 14:18:36,612] [INFO] [timer.py:199:stop] epoch=0/micro_step=560/global_step=70, RunningAvgSamplesPerSec=15.481377386414133, CurrSamplesPerSec=15.826877303777065, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 14:18:36,835] [INFO] [logging.py:96:log_dist] [Rank 0] step=70, skipped=3, lr=[3.3500000000000005e-06, 3.3500000000000005e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 559|ppo_ep: 1|act_loss: -0.02484130859375|cri_loss: 0.030181884765625|unsuper_loss: 0.0 average reward score: 1.9814453125 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.39s (66.11%) |Training time=0.92s (25.52%) |Others=0.30 (8.37%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.42 epoch: 0|step: 560|ppo_ep: 1|act_loss: 0.197509765625|cri_loss: 0.17236328125|unsuper_loss: 0.0 average reward score: 2.5703125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.37%) |Training time=0.64s (19.74%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 561|ppo_ep: 1|act_loss: 0.0443115234375|cri_loss: 0.06365966796875|unsuper_loss: 0.0 average reward score: 3.0625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.26%) |Training time=0.64s (19.83%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 562|ppo_ep: 1|act_loss: -0.1038818359375|cri_loss: -0.00335693359375|unsuper_loss: 0.0 average reward score: 4.25390625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.22%) |Training time=0.64s (19.82%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 563|ppo_ep: 1|act_loss: -0.07867431640625|cri_loss: -0.01507568359375|unsuper_loss: 0.0 average reward score: 1.7333984375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.27%) |Training time=0.64s (19.85%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 564|ppo_ep: 1|act_loss: -0.263916015625|cri_loss: -0.08367919921875|unsuper_loss: 0.0 average reward score: 2.794921875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.12%) |Training time=0.64s (19.85%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 565|ppo_ep: 1|act_loss: 0.1600341796875|cri_loss: 0.1424560546875|unsuper_loss: 0.0 average reward score: 2.84765625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.40%) |Training time=0.64s (19.74%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 566|ppo_ep: 1|act_loss: -0.1199951171875|cri_loss: -0.0145263671875|unsuper_loss: 0.0 average reward score: 3.3046875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.17%) |Training time=0.65s (19.87%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 567|ppo_ep: 1|act_loss: 0.0279388427734375|cri_loss: 0.052734375|unsuper_loss: 0.0 average reward score: 0.7392578125 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.42s (66.75%) |Training time=0.93s (25.60%) |Others=0.28 (7.65%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.42 epoch: 0|step: 568|ppo_ep: 1|act_loss: 0.00225830078125|cri_loss: 0.05108642578125|unsuper_loss: 0.0 average reward score: 3.548828125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.20%) |Training time=0.64s (19.72%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 569|ppo_ep: 1|act_loss: 0.08306884765625|cri_loss: 0.09637451171875|unsuper_loss: 0.0 average reward score: 2.681640625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.13%) |Training time=0.64s (19.76%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 570|ppo_ep: 1|act_loss: 0.11181640625|cri_loss: 0.08447265625|unsuper_loss: 0.0 average reward score: 2.138671875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.33%) |Training time=0.64s (19.64%) |Others=0.19 (6.03%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 571|ppo_ep: 1|act_loss: 0.1771240234375|cri_loss: 0.1417236328125|unsuper_loss: 0.0 average reward score: 3.65234375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.10%) |Training time=0.65s (19.93%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 572|ppo_ep: 1|act_loss: 0.192138671875|cri_loss: 0.149169921875|unsuper_loss: 0.0 average reward score: 2.875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.21%) |Training time=0.64s (19.74%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 573|ppo_ep: 1|act_loss: 0.125244140625|cri_loss: 0.0926513671875|unsuper_loss: 0.0 average reward score: 2.861328125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.29%) |Training time=0.64s (19.79%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 574|ppo_ep: 1|act_loss: 0.089599609375|cri_loss: 0.11273193359375|unsuper_loss: 0.0 average reward score: 2.8203125 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.40s (72.73%) |Training time=0.70s (21.17%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.42 epoch: 0|step: 575|ppo_ep: 1|act_loss: 0.33642578125|cri_loss: 0.2113037109375|unsuper_loss: 0.0 average reward score: 2.26171875 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.56%) |Training time=0.93s (25.75%) |Others=0.28 (7.69%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.42 epoch: 0|step: 576|ppo_ep: 1|act_loss: -0.2197265625|cri_loss: -0.067138671875|unsuper_loss: 0.0 average reward score: 1.91796875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.23%) |Training time=0.64s (19.61%) |Others=0.20 (6.16%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 577|ppo_ep: 1|act_loss: 0.10980224609375|cri_loss: 0.0955810546875|unsuper_loss: 0.0 average reward score: 0.9921875 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.10%) |Training time=0.64s (19.83%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 578|ppo_ep: 1|act_loss: -0.007568359375|cri_loss: 0.0284423828125|unsuper_loss: 0.0 average reward score: 2.41015625 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.33s (70.79%) |Training time=0.76s (23.07%) |Others=0.20 (6.14%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.42 epoch: 0|step: 579|ppo_ep: 1|act_loss: 0.11639404296875|cri_loss: 0.09832763671875|unsuper_loss: 0.0 average reward score: 1.5625 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.40%) |Training time=0.64s (20.26%) |Others=0.20 (6.34%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.42 epoch: 0|step: 580|ppo_ep: 1|act_loss: -0.144287109375|cri_loss: -0.04656982421875|unsuper_loss: 0.0 average reward score: 1.970703125 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.59%) |Training time=0.64s (20.04%) |Others=0.20 (6.36%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 581|ppo_ep: 1|act_loss: 0.177001953125|cri_loss: 0.1441650390625|unsuper_loss: 0.0 average reward score: 1.56640625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.92%) |Training time=0.64s (19.82%) |Others=0.20 (6.26%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 582|ppo_ep: 1|act_loss: 0.2197265625|cri_loss: 0.1680908203125|unsuper_loss: 0.0 average reward score: 0.962890625 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.39s (72.52%) |Training time=0.71s (21.41%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.42 epoch: 0|step: 583|ppo_ep: 1|act_loss: 0.03399658203125|cri_loss: 0.0570068359375|unsuper_loss: 0.0 average reward score: 3.08984375 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.39s (66.44%) |Training time=0.93s (25.72%) |Others=0.28 (7.84%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.42 epoch: 0|step: 584|ppo_ep: 1|act_loss: -0.04449462890625|cri_loss: 0.0341796875|unsuper_loss: 0.0 average reward score: 1.8251953125 ------------------------------------------------------------------------------------- |E2E latency=3.15s |Gather latency=0.00s (0.00%) |Generate time=2.32s (73.70%) |Training time=0.64s (20.16%) |Others=0.19 (6.14%)|CurSamplesPerSec=2.54 |AvgSamplesPerSec=2.42 epoch: 0|step: 585|ppo_ep: 1|act_loss: -0.23095703125|cri_loss: -0.07830810546875|unsuper_loss: 0.0 average reward score: 1.8876953125 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.22%) |Training time=0.65s (19.89%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 586|ppo_ep: 1|act_loss: -0.18798828125|cri_loss: -0.04180908203125|unsuper_loss: 0.0 average reward score: 1.5517578125 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.40s (72.76%) |Training time=0.70s (21.18%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.42 epoch: 0|step: 587|ppo_ep: 1|act_loss: -0.10980224609375|cri_loss: -0.0028076171875|unsuper_loss: 0.0 average reward score: 2.376953125 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.96%) |Training time=0.65s (19.72%) |Others=0.21 (6.32%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 588|ppo_ep: 1|act_loss: -0.131591796875|cri_loss: -0.01513671875|unsuper_loss: 0.0 average reward score: 1.521484375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.01%) |Training time=0.65s (19.79%) |Others=0.20 (6.20%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 589|ppo_ep: 1|act_loss: -0.23681640625|cri_loss: -0.02880859375|unsuper_loss: 0.0 average reward score: 3.1640625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.30%) |Training time=0.64s (19.78%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 590|ppo_ep: 1|act_loss: -0.1064453125|cri_loss: 0.0203857421875|unsuper_loss: 0.0 average reward score: 0.89306640625 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.59%) |Training time=0.64s (20.10%) |Others=0.20 (6.31%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 591|ppo_ep: 1|act_loss: -0.16748046875|cri_loss: 0.0059814453125|unsuper_loss: 0.0 average reward score: 2.19140625 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.38s (65.99%) |Training time=0.95s (26.22%) |Others=0.28 (7.79%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.42 epoch: 0|step: 592|ppo_ep: 1|act_loss: -0.2646484375|cri_loss: -0.0352783203125|unsuper_loss: 0.0 average reward score: 2.6875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.37%) |Training time=0.64s (19.66%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 593|ppo_ep: 1|act_loss: -0.1973876953125|cri_loss: -0.003173828125|unsuper_loss: 0.0 average reward score: 2.76171875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.17%) |Training time=0.64s (19.53%) |Others=0.21 (6.29%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 594|ppo_ep: 1|act_loss: -0.0987548828125|cri_loss: -0.00439453125|unsuper_loss: 0.0 average reward score: 2.41796875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.70%) |Training time=0.64s (19.41%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.42 epoch: 0|step: 595|ppo_ep: 1|act_loss: -0.06658935546875|cri_loss: 0.0167236328125|unsuper_loss: 0.0 average reward score: 1.244140625 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.62%) |Training time=0.64s (19.41%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.42 epoch: 0|step: 596|ppo_ep: 1|act_loss: 0.046295166015625|cri_loss: 0.081298828125|unsuper_loss: 0.0 average reward score: 2.84765625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.56%) |Training time=0.64s (19.54%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 597|ppo_ep: 1|act_loss: 0.08074951171875|cri_loss: 0.07891845703125|unsuper_loss: 0.0 average reward score: 2.240234375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.02%) |Training time=0.65s (20.29%) |Others=0.22 (6.70%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 598|ppo_ep: 1|act_loss: 0.2294921875|cri_loss: 0.23388671875|unsuper_loss: 0.0 average reward score: 0.525390625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.26%) |Training time=0.64s (19.72%) |Others=0.20 (6.02%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 599|ppo_ep: 1|act_loss: -0.11492919921875|cri_loss: 0.00457763671875|unsuper_loss: 0.0 average reward score: 2.0703125 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.33s (64.09%) |Training time=1.03s (28.36%) |Others=0.28 (7.55%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.42 epoch: 0|step: 600|ppo_ep: 1|act_loss: 0.74609375|cri_loss: 0.48583984375|unsuper_loss: 0.0 average reward score: 2.33203125 ------------------------------------------------------------------------------------- |E2E latency=3.15s |Gather latency=0.00s (0.00%) |Generate time=2.30s (72.91%) |Training time=0.66s (20.90%) |Others=0.20 (6.19%)|CurSamplesPerSec=2.54 |AvgSamplesPerSec=2.42 epoch: 0|step: 601|ppo_ep: 1|act_loss: 0.6318359375|cri_loss: 0.45703125|unsuper_loss: 0.0 average reward score: 2.3046875 ------------------------------------------------------------------------------------- |E2E latency=3.16s |Gather latency=0.00s (0.00%) |Generate time=2.32s (73.55%) |Training time=0.65s (20.42%) |Others=0.19 (6.03%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.42 epoch: 0|step: 602|ppo_ep: 1|act_loss: 0.8974609375|cri_loss: 0.54931640625|unsuper_loss: 0.0 average reward score: 1.099609375 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.28s (71.78%) |Training time=0.70s (22.13%) |Others=0.19 (6.10%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.42 epoch: 0|step: 603|ppo_ep: 1|act_loss: 0.4326171875|cri_loss: 0.31982421875|unsuper_loss: 0.0 average reward score: 0.89453125 ------------------------------------------------------------------------------------- |E2E latency=3.16s |Gather latency=0.00s (0.00%) |Generate time=2.28s (72.12%) |Training time=0.69s (21.81%) |Others=0.19 (6.08%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.42 epoch: 0|step: 604|ppo_ep: 1|act_loss: 0.30029296875|cri_loss: 0.2332763671875|unsuper_loss: 0.0 average reward score: 2.2734375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.30s (71.05%) |Training time=0.74s (22.91%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 605|ppo_ep: 1|act_loss: 0.54833984375|cri_loss: 0.36376953125|unsuper_loss: 0.0 average reward score: 2.1953125 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.28s (69.29%) |Training time=0.81s (24.72%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.42 epoch: 0|step: 606|ppo_ep: 1|act_loss: 0.755859375|cri_loss: 0.44677734375|unsuper_loss: 0.0 average reward score: 1.384765625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.27%) |Training time=0.65s (19.65%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 [2023-04-24 14:21:14,170] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8192, reducing to 4096 epoch: 0|step: 607|ppo_ep: 1|act_loss: 0.008758544921875|cri_loss: 0.065673828125|unsuper_loss: 0.0 average reward score: 1.96875 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.34s (70.76%) |Training time=0.69s (20.76%) |Others=0.28 (8.48%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.42 epoch: 0|step: 608|ppo_ep: 1|act_loss: 0.49853515625|cri_loss: 0.3115234375|unsuper_loss: 0.0 average reward score: 1.833984375 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.30s (71.70%) |Training time=0.72s (22.37%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 609|ppo_ep: 1|act_loss: 0.900390625|cri_loss: 0.5869140625|unsuper_loss: 0.0 average reward score: 2.0703125 ------------------------------------------------------------------------------------- |E2E latency=3.16s |Gather latency=0.00s (0.00%) |Generate time=2.31s (73.03%) |Training time=0.66s (21.01%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.42 epoch: 0|step: 610|ppo_ep: 1|act_loss: 0.5556640625|cri_loss: 0.356201171875|unsuper_loss: 0.0 average reward score: 2.00390625 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.31s (72.06%) |Training time=0.70s (21.98%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 611|ppo_ep: 1|act_loss: 0.875|cri_loss: 0.52587890625|unsuper_loss: 0.0 average reward score: 1.943359375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.34%) |Training time=0.67s (20.64%) |Others=0.19 (6.02%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 612|ppo_ep: 1|act_loss: 0.7861328125|cri_loss: 0.48974609375|unsuper_loss: 0.0 average reward score: 1.9501953125 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.33s (72.70%) |Training time=0.68s (21.07%) |Others=0.20 (6.22%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 613|ppo_ep: 1|act_loss: 0.962890625|cri_loss: 0.625|unsuper_loss: 0.0 average reward score: 1.216796875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.29%) |Training time=0.64s (19.57%) |Others=0.20 (6.14%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 614|ppo_ep: 1|act_loss: 0.457275390625|cri_loss: 0.29931640625|unsuper_loss: 0.0 average reward score: 1.759765625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.74%) |Training time=0.65s (20.09%) |Others=0.20 (6.16%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 615|ppo_ep: 1|act_loss: 0.43603515625|cri_loss: 0.283203125|unsuper_loss: 0.0 average reward score: 1.5986328125 ------------------------------------------------------------------------------------- |E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.35s (64.71%) |Training time=1.00s (27.63%) |Others=0.28 (7.65%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.42 epoch: 0|step: 616|ppo_ep: 1|act_loss: 0.240478515625|cri_loss: 0.164306640625|unsuper_loss: 0.0 average reward score: 1.921875 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.87%) |Training time=0.64s (19.26%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.42 epoch: 0|step: 617|ppo_ep: 1|act_loss: 0.220458984375|cri_loss: 0.162353515625|unsuper_loss: 0.0 average reward score: 3.20703125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.23%) |Training time=0.65s (19.78%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 618|ppo_ep: 1|act_loss: -0.08734130859375|cri_loss: -0.00823974609375|unsuper_loss: 0.0 average reward score: 1.1953125 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.60%) |Training time=0.65s (19.56%) |Others=0.19 (5.84%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.42 epoch: 0|step: 619|ppo_ep: 1|act_loss: 0.1015625|cri_loss: 0.10382080078125|unsuper_loss: 0.0 average reward score: 1.85546875 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.85%) |Training time=0.65s (20.22%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 620|ppo_ep: 1|act_loss: 0.78564453125|cri_loss: 0.470703125|unsuper_loss: 0.0 average reward score: 0.5732421875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.34%) |Training time=0.64s (19.69%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 621|ppo_ep: 1|act_loss: 0.52099609375|cri_loss: 0.328125|unsuper_loss: 0.0 average reward score: 0.5068359375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.15%) |Training time=0.64s (19.54%) |Others=0.21 (6.31%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 622|ppo_ep: 1|act_loss: 0.51611328125|cri_loss: 0.3388671875|unsuper_loss: 0.0 average reward score: 2.6484375 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.32s (72.56%) |Training time=0.68s (21.31%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 623|ppo_ep: 1|act_loss: 0.10089111328125|cri_loss: 0.0955810546875|unsuper_loss: 0.0 average reward score: 1.0244140625 ------------------------------------------------------------------------------------- |E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.29s (64.40%) |Training time=0.99s (27.86%) |Others=0.28 (7.74%)|CurSamplesPerSec=2.25 |AvgSamplesPerSec=2.42 epoch: 0|step: 624|ppo_ep: 1|act_loss: -0.326171875|cri_loss: -0.10430908203125|unsuper_loss: 0.0 average reward score: 1.033203125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.32s (71.21%) |Training time=0.71s (21.66%) |Others=0.23 (7.13%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 625|ppo_ep: 1|act_loss: -0.44970703125|cri_loss: -0.1422119140625|unsuper_loss: 0.0 average reward score: 1.328125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.55%) |Training time=0.66s (20.38%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 626|ppo_ep: 1|act_loss: -0.4990234375|cri_loss: -0.1427001953125|unsuper_loss: 0.0 average reward score: 2.392578125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.35s (72.69%) |Training time=0.65s (19.98%) |Others=0.24 (7.33%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 627|ppo_ep: 1|act_loss: -0.245849609375|cri_loss: -0.0394287109375|unsuper_loss: 0.0 average reward score: 0.010009765625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.45%) |Training time=0.65s (20.11%) |Others=0.21 (6.44%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 628|ppo_ep: 1|act_loss: -0.4580078125|cri_loss: -0.08935546875|unsuper_loss: 0.0 average reward score: 2.03515625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.38%) |Training time=0.66s (20.42%) |Others=0.20 (6.20%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 629|ppo_ep: 1|act_loss: -0.37060546875|cri_loss: -0.1119384765625|unsuper_loss: 0.0 average reward score: 1.052734375 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.34s (72.99%) |Training time=0.67s (21.02%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 630|ppo_ep: 1|act_loss: -0.51513671875|cri_loss: -0.1475830078125|unsuper_loss: 0.0 average reward score: 1.1728515625 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.26%) |Training time=0.65s (19.53%) |Others=0.21 (6.20%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.42 epoch: 0|step: 631|ppo_ep: 1|act_loss: 0.0601806640625|cri_loss: 0.074462890625|unsuper_loss: 0.0 average reward score: 0.21533203125 ------------------------------------------------------------------------------------- |E2E latency=3.57s |Gather latency=0.00s (0.00%) |Generate time=2.34s (65.43%) |Training time=0.95s (26.72%) |Others=0.28 (7.85%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.42 epoch: 0|step: 632|ppo_ep: 1|act_loss: 0.1815185546875|cri_loss: 0.161376953125|unsuper_loss: 0.0 average reward score: 0.55078125 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.98%) |Training time=0.64s (19.09%) |Others=0.20 (5.93%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.42 epoch: 0|step: 633|ppo_ep: 1|act_loss: 0.1319580078125|cri_loss: 0.1129150390625|unsuper_loss: 0.0 average reward score: -0.029296875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.26%) |Training time=0.65s (19.89%) |Others=0.19 (5.85%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 634|ppo_ep: 1|act_loss: 0.235107421875|cri_loss: 0.16748046875|unsuper_loss: 0.0 average reward score: 1.5361328125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.35%) |Training time=0.64s (19.54%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 635|ppo_ep: 1|act_loss: 0.1380615234375|cri_loss: 0.12420654296875|unsuper_loss: 0.0 average reward score: 2.37109375 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.95%) |Training time=0.64s (20.11%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.42 epoch: 0|step: 636|ppo_ep: 1|act_loss: 0.189208984375|cri_loss: 0.11944580078125|unsuper_loss: 0.0 average reward score: 1.955078125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.02%) |Training time=0.65s (19.89%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 637|ppo_ep: 1|act_loss: 0.034698486328125|cri_loss: 0.0753173828125|unsuper_loss: 0.0 average reward score: 2.408203125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.38%) |Training time=0.64s (19.73%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 638|ppo_ep: 1|act_loss: 0.08013916015625|cri_loss: 0.091796875|unsuper_loss: 0.0 average reward score: 1.501953125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.84%) |Training time=0.64s (19.81%) |Others=0.21 (6.34%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 [2023-04-24 14:22:59,198] [INFO] [logging.py:96:log_dist] [Rank 0] step=80, skipped=5, lr=[7.237500000000001e-06, 7.237500000000001e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 14:22:59,446] [INFO] [timer.py:199:stop] epoch=0/micro_step=640/global_step=80, RunningAvgSamplesPerSec=15.471969475552612, CurrSamplesPerSec=15.82306724422272, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 14:22:59,652] [INFO] [logging.py:96:log_dist] [Rank 0] step=80, skipped=3, lr=[3.85e-06, 3.85e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 639|ppo_ep: 1|act_loss: 0.017333984375|cri_loss: 0.0927734375|unsuper_loss: 0.0 average reward score: 1.658203125 ------------------------------------------------------------------------------------- |E2E latency=3.68s |Gather latency=0.00s (0.00%) |Generate time=2.44s (66.20%) |Training time=0.96s (26.19%) |Others=0.28 (7.61%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.42 epoch: 0|step: 640|ppo_ep: 1|act_loss: 0.7978515625|cri_loss: 0.491455078125|unsuper_loss: 0.0 average reward score: 0.755859375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.96%) |Training time=0.66s (20.01%) |Others=0.20 (6.02%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 641|ppo_ep: 1|act_loss: 0.7724609375|cri_loss: 0.50341796875|unsuper_loss: 0.0 average reward score: 1.126953125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.76%) |Training time=0.66s (20.15%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 642|ppo_ep: 1|act_loss: -0.1124267578125|cri_loss: 0.0311279296875|unsuper_loss: 0.0 average reward score: 2.361328125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.05%) |Training time=0.64s (19.71%) |Others=0.20 (6.24%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 643|ppo_ep: 1|act_loss: 0.41943359375|cri_loss: 0.26806640625|unsuper_loss: 0.0 average reward score: 2.05859375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.02%) |Training time=0.65s (19.85%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 644|ppo_ep: 1|act_loss: 0.14892578125|cri_loss: 0.2025146484375|unsuper_loss: 0.0 average reward score: 2.224609375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.04%) |Training time=0.67s (20.91%) |Others=0.19 (6.05%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 645|ppo_ep: 1|act_loss: 0.4560546875|cri_loss: 0.34326171875|unsuper_loss: 0.0 average reward score: 1.091796875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.32s (71.80%) |Training time=0.71s (21.87%) |Others=0.20 (6.33%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 646|ppo_ep: 1|act_loss: 0.546875|cri_loss: 0.35546875|unsuper_loss: 0.0 average reward score: 2.20703125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.16%) |Training time=0.64s (19.56%) |Others=0.21 (6.28%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 647|ppo_ep: 1|act_loss: 0.38720703125|cri_loss: 0.24658203125|unsuper_loss: 0.0 average reward score: 2.62109375 ------------------------------------------------------------------------------------- |E2E latency=3.73s |Gather latency=0.00s (0.00%) |Generate time=2.50s (67.03%) |Training time=0.95s (25.32%) |Others=0.29 (7.65%)|CurSamplesPerSec=2.14 |AvgSamplesPerSec=2.42 epoch: 0|step: 648|ppo_ep: 1|act_loss: 1.310546875|cri_loss: 0.8310546875|unsuper_loss: 0.0 average reward score: 2.1953125 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.34%) |Training time=0.64s (19.13%) |Others=0.22 (6.53%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.42 epoch: 0|step: 649|ppo_ep: 1|act_loss: 1.7880859375|cri_loss: 1.091796875|unsuper_loss: 0.0 average reward score: -0.378662109375 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.44s (73.23%) |Training time=0.68s (20.31%) |Others=0.22 (6.46%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.42 epoch: 0|step: 650|ppo_ep: 1|act_loss: 1.494140625|cri_loss: 0.9375|unsuper_loss: 0.0 average reward score: 1.294921875 ------------------------------------------------------------------------------------- |E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.54s (75.16%) |Training time=0.64s (19.04%) |Others=0.20 (5.79%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.42 epoch: 0|step: 651|ppo_ep: 1|act_loss: 1.0224609375|cri_loss: 0.642578125|unsuper_loss: 0.0 average reward score: 2.287109375 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.73%) |Training time=0.65s (20.27%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 652|ppo_ep: 1|act_loss: 1.275390625|cri_loss: 0.78564453125|unsuper_loss: 0.0 average reward score: 1.4228515625 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.32s (72.59%) |Training time=0.68s (21.31%) |Others=0.19 (6.10%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 653|ppo_ep: 1|act_loss: 1.830078125|cri_loss: 1.1279296875|unsuper_loss: 0.0 average reward score: 2.84375 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.03%) |Training time=0.67s (21.05%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 654|ppo_ep: 1|act_loss: 1.1748046875|cri_loss: 0.7412109375|unsuper_loss: 0.0 average reward score: 2.99609375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.33s (72.21%) |Training time=0.67s (20.73%) |Others=0.23 (7.06%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 [2023-04-24 14:23:52,724] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, reducing to 8192 epoch: 0|step: 655|ppo_ep: 1|act_loss: 1.1025390625|cri_loss: 0.69091796875|unsuper_loss: 0.0 average reward score: 1.93359375 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.46s (67.54%) |Training time=0.93s (25.55%) |Others=0.25 (6.92%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.42 epoch: 0|step: 656|ppo_ep: 1|act_loss: 1.296875|cri_loss: 0.7763671875|unsuper_loss: 0.0 average reward score: 0.525390625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.06%) |Training time=0.65s (19.87%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 657|ppo_ep: 1|act_loss: 1.0107421875|cri_loss: 0.63623046875|unsuper_loss: 0.0 average reward score: 1.484375 ------------------------------------------------------------------------------------- |E2E latency=3.59s |Gather latency=0.00s (0.00%) |Generate time=2.70s (75.27%) |Training time=0.68s (19.02%) |Others=0.21 (5.71%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.42 epoch: 0|step: 658|ppo_ep: 1|act_loss: 1.65625|cri_loss: 1.009765625|unsuper_loss: 0.0 average reward score: 0.8974609375 ------------------------------------------------------------------------------------- |E2E latency=3.44s |Gather latency=0.00s (0.00%) |Generate time=2.59s (75.23%) |Training time=0.65s (18.93%) |Others=0.20 (5.84%)|CurSamplesPerSec=2.32 |AvgSamplesPerSec=2.42 epoch: 0|step: 659|ppo_ep: 1|act_loss: 1.541015625|cri_loss: 0.95947265625|unsuper_loss: 0.0 average reward score: 1.525390625 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.62%) |Training time=0.65s (19.59%) |Others=0.19 (5.79%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.42 epoch: 0|step: 660|ppo_ep: 1|act_loss: 1.302734375|cri_loss: 0.78515625|unsuper_loss: 0.0 average reward score: 2.28515625 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.31%) |Training time=0.66s (20.53%) |Others=0.20 (6.16%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 661|ppo_ep: 1|act_loss: 1.9169921875|cri_loss: 1.1982421875|unsuper_loss: 0.0 average reward score: 1.849609375 ------------------------------------------------------------------------------------- |E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.59s (73.95%) |Training time=0.72s (20.50%) |Others=0.19 (5.55%)|CurSamplesPerSec=2.28 |AvgSamplesPerSec=2.42 epoch: 0|step: 662|ppo_ep: 1|act_loss: 1.591796875|cri_loss: 0.96240234375|unsuper_loss: 0.0 average reward score: 1.5927734375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.84%) |Training time=0.65s (20.08%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 663|ppo_ep: 1|act_loss: 2.1015625|cri_loss: 1.2958984375|unsuper_loss: 0.0 average reward score: 0.88916015625 ------------------------------------------------------------------------------------- |E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.32s (65.66%) |Training time=0.93s (26.16%) |Others=0.29 (8.19%)|CurSamplesPerSec=2.26 |AvgSamplesPerSec=2.42 epoch: 0|step: 664|ppo_ep: 1|act_loss: 1.251953125|cri_loss: 0.7509765625|unsuper_loss: 0.0 average reward score: 2.021484375 ------------------------------------------------------------------------------------- |E2E latency=3.14s |Gather latency=0.00s (0.00%) |Generate time=2.29s (73.05%) |Training time=0.65s (20.69%) |Others=0.20 (6.25%)|CurSamplesPerSec=2.55 |AvgSamplesPerSec=2.42 epoch: 0|step: 665|ppo_ep: 1|act_loss: 0.63671875|cri_loss: 0.412109375|unsuper_loss: 0.0 average reward score: 2.083984375 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.39%) |Training time=0.65s (20.40%) |Others=0.20 (6.22%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 666|ppo_ep: 1|act_loss: 0.765625|cri_loss: 0.48681640625|unsuper_loss: 0.0 average reward score: 2.34375 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.04%) |Training time=0.67s (20.80%) |Others=0.20 (6.16%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 667|ppo_ep: 1|act_loss: 1.0439453125|cri_loss: 0.6123046875|unsuper_loss: 0.0 average reward score: 0.369384765625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.94%) |Training time=0.65s (20.14%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 668|ppo_ep: 1|act_loss: 1.099609375|cri_loss: 0.6669921875|unsuper_loss: 0.0 average reward score: 0.71728515625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.30%) |Training time=0.67s (20.72%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 669|ppo_ep: 1|act_loss: 1.1025390625|cri_loss: 0.697265625|unsuper_loss: 0.0 average reward score: 0.427734375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.12%) |Training time=0.64s (19.73%) |Others=0.20 (6.16%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 670|ppo_ep: 1|act_loss: 0.947265625|cri_loss: 0.5908203125|unsuper_loss: 0.0 average reward score: 3.3359375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.43%) |Training time=0.64s (19.50%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 671|ppo_ep: 1|act_loss: 0.921875|cri_loss: 0.55078125|unsuper_loss: 0.0 average reward score: 0.68505859375 ------------------------------------------------------------------------------------- |E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.33s (65.89%) |Training time=0.93s (26.22%) |Others=0.28 (7.89%)|CurSamplesPerSec=2.26 |AvgSamplesPerSec=2.42 epoch: 0|step: 672|ppo_ep: 1|act_loss: 0.372314453125|cri_loss: 0.336181640625|unsuper_loss: 0.0 average reward score: 1.6328125 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.32s (72.82%) |Training time=0.67s (21.15%) |Others=0.19 (6.03%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.42 epoch: 0|step: 673|ppo_ep: 1|act_loss: 0.53515625|cri_loss: 0.350341796875|unsuper_loss: 0.0 average reward score: -0.86962890625 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.35%) |Training time=0.65s (20.50%) |Others=0.20 (6.16%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.42 epoch: 0|step: 674|ppo_ep: 1|act_loss: 0.77685546875|cri_loss: 0.455078125|unsuper_loss: 0.0 average reward score: 0.419677734375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.91%) |Training time=0.64s (19.82%) |Others=0.20 (6.28%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 675|ppo_ep: 1|act_loss: 0.5361328125|cri_loss: 0.35205078125|unsuper_loss: 0.0 average reward score: 0.164794921875 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.33s (72.89%) |Training time=0.67s (21.11%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 676|ppo_ep: 1|act_loss: 0.0411376953125|cri_loss: 0.055328369140625|unsuper_loss: 0.0 average reward score: 0.54931640625 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.31s (72.33%) |Training time=0.69s (21.63%) |Others=0.19 (6.04%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 677|ppo_ep: 1|act_loss: -0.0550537109375|cri_loss: 0.03021240234375|unsuper_loss: 0.0 average reward score: 1.30078125 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.24%) |Training time=0.66s (20.78%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.42 epoch: 0|step: 678|ppo_ep: 1|act_loss: 0.74169921875|cri_loss: 0.482177734375|unsuper_loss: 0.0 average reward score: 0.552734375 ------------------------------------------------------------------------------------- |E2E latency=3.15s |Gather latency=0.00s (0.00%) |Generate time=2.32s (73.66%) |Training time=0.64s (20.25%) |Others=0.19 (6.09%)|CurSamplesPerSec=2.54 |AvgSamplesPerSec=2.42 epoch: 0|step: 679|ppo_ep: 1|act_loss: 0.1807861328125|cri_loss: 0.176513671875|unsuper_loss: 0.0 average reward score: 1.4921875 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.34s (65.27%) |Training time=0.97s (26.95%) |Others=0.28 (7.78%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.42 epoch: 0|step: 680|ppo_ep: 1|act_loss: -0.78564453125|cri_loss: -0.2421875|unsuper_loss: 0.0 average reward score: 1.03515625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.36%) |Training time=0.64s (19.60%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 681|ppo_ep: 1|act_loss: -0.403076171875|cri_loss: -0.07470703125|unsuper_loss: 0.0 average reward score: 0.734375 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.76%) |Training time=0.64s (20.09%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 682|ppo_ep: 1|act_loss: -0.46484375|cri_loss: -0.1287841796875|unsuper_loss: 0.0 average reward score: 1.4345703125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.42%) |Training time=0.64s (19.66%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 683|ppo_ep: 1|act_loss: -0.7109375|cri_loss: -0.124267578125|unsuper_loss: 0.0 average reward score: 0.73095703125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.59%) |Training time=0.64s (19.52%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 684|ppo_ep: 1|act_loss: -0.0242919921875|cri_loss: 0.06671142578125|unsuper_loss: 0.0 average reward score: 1.146484375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.09%) |Training time=0.64s (19.78%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 685|ppo_ep: 1|act_loss: -0.8408203125|cri_loss: -0.26171875|unsuper_loss: 0.0 average reward score: 1.62109375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.91%) |Training time=0.65s (19.90%) |Others=0.20 (6.20%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 686|ppo_ep: 1|act_loss: -0.25048828125|cri_loss: 0.0126953125|unsuper_loss: 0.0 average reward score: 1.61328125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.48%) |Training time=0.64s (19.55%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 687|ppo_ep: 1|act_loss: -0.59765625|cri_loss: -0.19970703125|unsuper_loss: 0.0 average reward score: 1.966796875 ------------------------------------------------------------------------------------- |E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.35s (66.19%) |Training time=0.93s (26.07%) |Others=0.27 (7.74%)|CurSamplesPerSec=2.25 |AvgSamplesPerSec=2.42 epoch: 0|step: 688|ppo_ep: 1|act_loss: -0.322265625|cri_loss: -0.04833984375|unsuper_loss: 0.0 average reward score: -0.74560546875 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.70%) |Training time=0.64s (20.11%) |Others=0.20 (6.19%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.42 epoch: 0|step: 689|ppo_ep: 1|act_loss: -0.8232421875|cri_loss: -0.263671875|unsuper_loss: 0.0 average reward score: 0.59130859375 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.36s (74.01%) |Training time=0.64s (20.06%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.42 epoch: 0|step: 690|ppo_ep: 1|act_loss: -0.626953125|cri_loss: -0.182373046875|unsuper_loss: 0.0 average reward score: 1.01953125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.64%) |Training time=0.66s (20.44%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 691|ppo_ep: 1|act_loss: -0.4091796875|cri_loss: -0.07421875|unsuper_loss: 0.0 average reward score: -0.179443359375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.24%) |Training time=0.64s (19.80%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 692|ppo_ep: 1|act_loss: -0.244140625|cri_loss: -0.0224609375|unsuper_loss: 0.0 average reward score: 0.765625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.33s (72.78%) |Training time=0.67s (21.01%) |Others=0.20 (6.21%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 693|ppo_ep: 1|act_loss: -0.173828125|cri_loss: -0.0157470703125|unsuper_loss: 0.0 average reward score: 0.35888671875 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.31s (72.52%) |Training time=0.68s (21.49%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.42 epoch: 0|step: 694|ppo_ep: 1|act_loss: -0.66015625|cri_loss: -0.171875|unsuper_loss: 0.0 average reward score: 2.318359375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.28s (70.80%) |Training time=0.74s (23.00%) |Others=0.20 (6.21%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 695|ppo_ep: 1|act_loss: -0.96533203125|cri_loss: -0.3486328125|unsuper_loss: 0.0 average reward score: 0.58056640625 ------------------------------------------------------------------------------------- |E2E latency=3.67s |Gather latency=0.00s (0.00%) |Generate time=2.46s (66.91%) |Training time=0.93s (25.46%) |Others=0.28 (7.63%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.42 epoch: 0|step: 696|ppo_ep: 1|act_loss: -0.1356201171875|cri_loss: 0.00439453125|unsuper_loss: 0.0 average reward score: 1.42578125 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.32s (73.04%) |Training time=0.66s (20.89%) |Others=0.19 (6.07%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.42 epoch: 0|step: 697|ppo_ep: 1|act_loss: -0.03704833984375|cri_loss: 0.03802490234375|unsuper_loss: 0.0 average reward score: 0.619140625 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.31s (72.08%) |Training time=0.69s (21.58%) |Others=0.20 (6.34%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 698|ppo_ep: 1|act_loss: 0.03277587890625|cri_loss: 0.06658935546875|unsuper_loss: 0.0 average reward score: 0.0029296875 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.54%) |Training time=0.65s (20.51%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.42 epoch: 0|step: 699|ppo_ep: 1|act_loss: -0.256591796875|cri_loss: -0.006103515625|unsuper_loss: 0.0 average reward score: -0.329833984375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.34%) |Training time=0.68s (20.73%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 700|ppo_ep: 1|act_loss: 0.02435302734375|cri_loss: 0.049652099609375|unsuper_loss: 0.0 average reward score: 1.4658203125 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.31s (72.05%) |Training time=0.70s (21.78%) |Others=0.20 (6.18%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 701|ppo_ep: 1|act_loss: -0.082763671875|cri_loss: -0.010986328125|unsuper_loss: 0.0 average reward score: 0.2359619140625 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.48%) |Training time=0.64s (19.56%) |Others=0.20 (5.96%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.42 epoch: 0|step: 702|ppo_ep: 1|act_loss: 0.093505859375|cri_loss: 0.11480712890625|unsuper_loss: 0.0 average reward score: 0.257568359375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.14%) |Training time=0.64s (19.78%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 703|ppo_ep: 1|act_loss: 0.1824951171875|cri_loss: 0.192626953125|unsuper_loss: 0.0 average reward score: 1.056640625 ------------------------------------------------------------------------------------- |E2E latency=3.66s |Gather latency=0.00s (0.00%) |Generate time=2.45s (66.95%) |Training time=0.92s (25.30%) |Others=0.28 (7.75%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.42 epoch: 0|step: 704|ppo_ep: 1|act_loss: 1.068359375|cri_loss: 0.640625|unsuper_loss: 0.0 average reward score: 0.57421875 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.33%) |Training time=0.64s (19.68%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 705|ppo_ep: 1|act_loss: 0.5400390625|cri_loss: 0.3447265625|unsuper_loss: 0.0 average reward score: 1.4130859375 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.65%) |Training time=0.66s (20.00%) |Others=0.21 (6.35%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.42 epoch: 0|step: 706|ppo_ep: 1|act_loss: 0.7822265625|cri_loss: 0.46435546875|unsuper_loss: 0.0 average reward score: 0.69287109375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.13%) |Training time=0.64s (19.67%) |Others=0.20 (6.20%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 707|ppo_ep: 1|act_loss: 1.056640625|cri_loss: 0.6298828125|unsuper_loss: 0.0 average reward score: -0.78564453125 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.80%) |Training time=0.64s (19.41%) |Others=0.19 (5.78%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.42 epoch: 0|step: 708|ppo_ep: 1|act_loss: 0.72509765625|cri_loss: 0.4736328125|unsuper_loss: 0.0 average reward score: 1.064453125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.18%) |Training time=0.66s (20.50%) |Others=0.20 (6.33%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 709|ppo_ep: 1|act_loss: 1.1337890625|cri_loss: 0.6865234375|unsuper_loss: 0.0 average reward score: 1.068359375 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.46%) |Training time=0.65s (20.48%) |Others=0.19 (6.06%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.42 epoch: 0|step: 710|ppo_ep: 1|act_loss: 0.83544921875|cri_loss: 0.5390625|unsuper_loss: 0.0 average reward score: 1.1416015625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.82%) |Training time=0.64s (19.79%) |Others=0.21 (6.39%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 711|ppo_ep: 1|act_loss: 1.4853515625|cri_loss: 0.9130859375|unsuper_loss: 0.0 average reward score: -0.08935546875 ------------------------------------------------------------------------------------- |E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.42s (66.69%) |Training time=0.93s (25.61%) |Others=0.28 (7.70%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.42 epoch: 0|step: 712|ppo_ep: 1|act_loss: 1.1220703125|cri_loss: 0.66796875|unsuper_loss: 0.0 average reward score: 0.145751953125 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (74.18%) |Training time=0.63s (19.81%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 713|ppo_ep: 1|act_loss: 0.80517578125|cri_loss: 0.51025390625|unsuper_loss: 0.0 average reward score: 1.322265625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.91%) |Training time=0.65s (19.97%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 714|ppo_ep: 1|act_loss: 1.052734375|cri_loss: 0.69873046875|unsuper_loss: 0.0 average reward score: 0.171875 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.58%) |Training time=0.65s (19.49%) |Others=0.20 (5.93%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.42 epoch: 0|step: 715|ppo_ep: 1|act_loss: 1.4296875|cri_loss: 0.8916015625|unsuper_loss: 0.0 average reward score: -0.202880859375 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (74.03%) |Training time=0.64s (20.04%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 716|ppo_ep: 1|act_loss: 1.1474609375|cri_loss: 0.6953125|unsuper_loss: 0.0 average reward score: 0.244140625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.30s (71.62%) |Training time=0.71s (22.22%) |Others=0.20 (6.17%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 717|ppo_ep: 1|act_loss: 1.591796875|cri_loss: 0.97412109375|unsuper_loss: 0.0 average reward score: 1.62109375 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.70%) |Training time=0.64s (20.05%) |Others=0.20 (6.25%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.42 epoch: 0|step: 718|ppo_ep: 1|act_loss: 0.19482421875|cri_loss: 0.1697998046875|unsuper_loss: 0.0 average reward score: 0.40283203125 ------------------------------------------------------------------------------------- |E2E latency=3.16s |Gather latency=0.00s (0.00%) |Generate time=2.31s (73.03%) |Training time=0.65s (20.57%) |Others=0.20 (6.40%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.42 [2023-04-24 14:27:22,407] [INFO] [logging.py:96:log_dist] [Rank 0] step=90, skipped=5, lr=[8.2025e-06, 8.2025e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 14:27:22,651] [INFO] [timer.py:199:stop] epoch=0/micro_step=720/global_step=90, RunningAvgSamplesPerSec=15.474863804717721, CurrSamplesPerSec=15.283450693098914, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 14:27:22,861] [INFO] [logging.py:96:log_dist] [Rank 0] step=90, skipped=4, lr=[4.3e-06, 4.3e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 719|ppo_ep: 1|act_loss: 1.740234375|cri_loss: 1.0537109375|unsuper_loss: 0.0 average reward score: -0.8203125 ------------------------------------------------------------------------------------- |E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.28s (63.99%) |Training time=0.99s (27.81%) |Others=0.29 (8.20%)|CurSamplesPerSec=2.25 |AvgSamplesPerSec=2.42 epoch: 0|step: 720|ppo_ep: 1|act_loss: 0.28564453125|cri_loss: 0.2476806640625|unsuper_loss: 0.0 average reward score: 1.5771484375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.31s (70.36%) |Training time=0.78s (23.80%) |Others=0.19 (5.84%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 721|ppo_ep: 1|act_loss: 1.1337890625|cri_loss: 0.677734375|unsuper_loss: 0.0 average reward score: -0.08740234375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.88%) |Training time=0.64s (19.91%) |Others=0.20 (6.21%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 722|ppo_ep: 1|act_loss: 0.916015625|cri_loss: 0.552734375|unsuper_loss: 0.0 average reward score: 0.978515625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.11%) |Training time=0.64s (19.75%) |Others=0.20 (6.14%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 723|ppo_ep: 1|act_loss: 0.861328125|cri_loss: 0.5029296875|unsuper_loss: 0.0 average reward score: 0.485595703125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.43%) |Training time=0.64s (19.68%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 724|ppo_ep: 1|act_loss: 1.052734375|cri_loss: 0.625|unsuper_loss: 0.0 average reward score: 1.3271484375 ------------------------------------------------------------------------------------- |E2E latency=3.16s |Gather latency=0.00s (0.00%) |Generate time=2.31s (73.27%) |Training time=0.64s (20.32%) |Others=0.20 (6.40%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.42 epoch: 0|step: 725|ppo_ep: 1|act_loss: 0.470947265625|cri_loss: 0.31298828125|unsuper_loss: 0.0 average reward score: 2.185546875 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.92%) |Training time=0.65s (20.07%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 726|ppo_ep: 1|act_loss: 1.1171875|cri_loss: 0.6611328125|unsuper_loss: 0.0 average reward score: 0.4951171875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.01%) |Training time=0.65s (19.79%) |Others=0.20 (6.20%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 727|ppo_ep: 1|act_loss: 1.2109375|cri_loss: 0.701171875|unsuper_loss: 0.0 average reward score: 0.52978515625 ------------------------------------------------------------------------------------- |E2E latency=3.72s |Gather latency=0.00s (0.00%) |Generate time=2.49s (67.00%) |Training time=0.93s (25.10%) |Others=0.29 (7.91%)|CurSamplesPerSec=2.15 |AvgSamplesPerSec=2.42 epoch: 0|step: 728|ppo_ep: 1|act_loss: 0.2191162109375|cri_loss: 0.2020263671875|unsuper_loss: 0.0 average reward score: 0.2239990234375 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.47s (73.82%) |Training time=0.67s (20.06%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.42 epoch: 0|step: 729|ppo_ep: 1|act_loss: 0.880859375|cri_loss: 0.5166015625|unsuper_loss: 0.0 average reward score: -0.01220703125 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.84%) |Training time=0.64s (19.23%) |Others=0.20 (5.93%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.42 epoch: 0|step: 730|ppo_ep: 1|act_loss: 0.81640625|cri_loss: 0.47998046875|unsuper_loss: 0.0 average reward score: -0.6591796875 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.44s (73.65%) |Training time=0.64s (19.37%) |Others=0.23 (6.98%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.42 epoch: 0|step: 731|ppo_ep: 1|act_loss: 0.232177734375|cri_loss: 0.21142578125|unsuper_loss: 0.0 average reward score: 0.078369140625 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.23%) |Training time=0.66s (19.90%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.42 epoch: 0|step: 732|ppo_ep: 1|act_loss: 0.26220703125|cri_loss: 0.170166015625|unsuper_loss: 0.0 average reward score: 0.23388671875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.48%) |Training time=0.64s (19.67%) |Others=0.19 (5.85%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 733|ppo_ep: 1|act_loss: 0.703125|cri_loss: 0.41455078125|unsuper_loss: 0.0 average reward score: 0.859375 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.44s (73.67%) |Training time=0.64s (19.39%) |Others=0.23 (6.94%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.42 epoch: 0|step: 734|ppo_ep: 1|act_loss: 0.50048828125|cri_loss: 0.2900390625|unsuper_loss: 0.0 average reward score: 0.051025390625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.86%) |Training time=0.64s (19.85%) |Others=0.20 (6.30%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 735|ppo_ep: 1|act_loss: 0.51123046875|cri_loss: 0.325439453125|unsuper_loss: 0.0 average reward score: 0.50439453125 ------------------------------------------------------------------------------------- |E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.30s (64.83%) |Training time=0.97s (27.25%) |Others=0.28 (7.92%)|CurSamplesPerSec=2.25 |AvgSamplesPerSec=2.42 epoch: 0|step: 736|ppo_ep: 1|act_loss: 0.2154541015625|cri_loss: 0.216064453125|unsuper_loss: 0.0 average reward score: -0.322509765625 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.31s (72.52%) |Training time=0.68s (21.43%) |Others=0.19 (6.05%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.42 epoch: 0|step: 737|ppo_ep: 1|act_loss: 0.1544189453125|cri_loss: 0.1573486328125|unsuper_loss: 0.0 average reward score: 0.76904296875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.98%) |Training time=0.65s (19.76%) |Others=0.20 (6.26%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 738|ppo_ep: 1|act_loss: 0.1898193359375|cri_loss: 0.15478515625|unsuper_loss: 0.0 average reward score: 1.177734375 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.35s (70.85%) |Training time=0.77s (23.31%) |Others=0.19 (5.83%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.42 epoch: 0|step: 739|ppo_ep: 1|act_loss: 0.173828125|cri_loss: 0.15234375|unsuper_loss: 0.0 average reward score: 0.4658203125 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.40%) |Training time=0.66s (20.46%) |Others=0.20 (6.14%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 740|ppo_ep: 1|act_loss: 0.49072265625|cri_loss: 0.387939453125|unsuper_loss: 0.0 average reward score: 0.71142578125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.35s (72.14%) |Training time=0.70s (21.40%) |Others=0.21 (6.46%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 741|ppo_ep: 1|act_loss: 0.051788330078125|cri_loss: 0.1116943359375|unsuper_loss: 0.0 average reward score: 0.771484375 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.65%) |Training time=0.65s (20.26%) |Others=0.19 (6.08%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.42 epoch: 0|step: 742|ppo_ep: 1|act_loss: -0.11981201171875|cri_loss: -0.02081298828125|unsuper_loss: 0.0 average reward score: -0.302001953125 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.71%) |Training time=0.64s (20.06%) |Others=0.20 (6.23%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.42 epoch: 0|step: 743|ppo_ep: 1|act_loss: 0.2281494140625|cri_loss: 0.232666015625|unsuper_loss: 0.0 average reward score: 1.3720703125 ------------------------------------------------------------------------------------- |E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.31s (63.61%) |Training time=1.03s (28.26%) |Others=0.30 (8.13%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.42 epoch: 0|step: 744|ppo_ep: 1|act_loss: 1.9150390625|cri_loss: 1.2197265625|unsuper_loss: 0.0 average reward score: 0.57177734375 ------------------------------------------------------------------------------------- |E2E latency=3.43s |Gather latency=0.00s (0.00%) |Generate time=2.32s (67.65%) |Training time=0.91s (26.65%) |Others=0.20 (5.70%)|CurSamplesPerSec=2.34 |AvgSamplesPerSec=2.42 epoch: 0|step: 745|ppo_ep: 1|act_loss: 1.330078125|cri_loss: 0.89306640625|unsuper_loss: 0.0 average reward score: 1.3671875 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.78%) |Training time=0.64s (19.15%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.42 epoch: 0|step: 746|ppo_ep: 1|act_loss: 1.5859375|cri_loss: 1.0087890625|unsuper_loss: 0.0 average reward score: 1.369140625 ------------------------------------------------------------------------------------- |E2E latency=3.42s |Gather latency=0.00s (0.00%) |Generate time=2.57s (75.26%) |Training time=0.64s (18.82%) |Others=0.20 (5.92%)|CurSamplesPerSec=2.34 |AvgSamplesPerSec=2.42 epoch: 0|step: 747|ppo_ep: 1|act_loss: 0.939453125|cri_loss: 0.6171875|unsuper_loss: 0.0 average reward score: 0.6884765625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.10%) |Training time=0.64s (19.88%) |Others=0.19 (6.02%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 748|ppo_ep: 1|act_loss: 1.4609375|cri_loss: 0.95166015625|unsuper_loss: 0.0 average reward score: 1.470703125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.39%) |Training time=0.66s (20.32%) |Others=0.21 (6.30%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 749|ppo_ep: 1|act_loss: 1.0771484375|cri_loss: 0.712890625|unsuper_loss: 0.0 average reward score: 1.8515625 ------------------------------------------------------------------------------------- |E2E latency=3.44s |Gather latency=0.00s (0.00%) |Generate time=2.35s (68.49%) |Training time=0.87s (25.31%) |Others=0.21 (6.21%)|CurSamplesPerSec=2.33 |AvgSamplesPerSec=2.42 epoch: 0|step: 750|ppo_ep: 1|act_loss: 0.833984375|cri_loss: 0.537109375|unsuper_loss: 0.0 average reward score: 0.7578125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.34s (72.51%) |Training time=0.68s (21.01%) |Others=0.21 (6.48%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 751|ppo_ep: 1|act_loss: 1.0927734375|cri_loss: 0.6650390625|unsuper_loss: 0.0 average reward score: -0.13134765625 ------------------------------------------------------------------------------------- |E2E latency=3.71s |Gather latency=0.00s (0.00%) |Generate time=2.49s (67.21%) |Training time=0.94s (25.32%) |Others=0.28 (7.47%)|CurSamplesPerSec=2.16 |AvgSamplesPerSec=2.42 epoch: 0|step: 752|ppo_ep: 1|act_loss: 1.8212890625|cri_loss: 1.162109375|unsuper_loss: 0.0 average reward score: -0.8525390625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.15%) |Training time=0.64s (19.78%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 753|ppo_ep: 1|act_loss: 0.7548828125|cri_loss: 0.463134765625|unsuper_loss: 0.0 average reward score: 0.375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.47%) |Training time=0.64s (19.72%) |Others=0.19 (5.81%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 754|ppo_ep: 1|act_loss: 0.853515625|cri_loss: 0.544921875|unsuper_loss: 0.0 average reward score: 0.57177734375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.86%) |Training time=0.66s (20.15%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 755|ppo_ep: 1|act_loss: 0.8828125|cri_loss: 0.58154296875|unsuper_loss: 0.0 average reward score: 2.00390625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.84%) |Training time=0.66s (20.30%) |Others=0.19 (5.85%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 756|ppo_ep: 1|act_loss: 0.794921875|cri_loss: 0.49560546875|unsuper_loss: 0.0 average reward score: 0.0166015625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.24%) |Training time=0.64s (19.81%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 757|ppo_ep: 1|act_loss: 0.5263671875|cri_loss: 0.34130859375|unsuper_loss: 0.0 average reward score: -0.04510498046875 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.22%) |Training time=0.64s (19.84%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 758|ppo_ep: 1|act_loss: 0.359375|cri_loss: 0.2822265625|unsuper_loss: 0.0 average reward score: 1.611328125 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.43s (73.05%) |Training time=0.70s (20.94%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.42 epoch: 0|step: 759|ppo_ep: 1|act_loss: 1.0439453125|cri_loss: 0.69384765625|unsuper_loss: 0.0 average reward score: 0.61767578125 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.66%) |Training time=0.93s (25.68%) |Others=0.28 (7.66%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.42 epoch: 0|step: 760|ppo_ep: 1|act_loss: -0.00872802734375|cri_loss: 0.0562744140625|unsuper_loss: 0.0 average reward score: -0.97705078125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.93%) |Training time=0.65s (19.97%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 761|ppo_ep: 1|act_loss: 0.35400390625|cri_loss: 0.248291015625|unsuper_loss: 0.0 average reward score: 0.0908203125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.46%) |Training time=0.64s (19.62%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 762|ppo_ep: 1|act_loss: -0.222412109375|cri_loss: 0.01416015625|unsuper_loss: 0.0 average reward score: 1.0283203125 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.77%) |Training time=0.64s (19.31%) |Others=0.20 (5.92%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.42 epoch: 0|step: 763|ppo_ep: 1|act_loss: 0.0831298828125|cri_loss: 0.09027099609375|unsuper_loss: 0.0 average reward score: 0.08740234375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.04%) |Training time=0.65s (19.87%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 764|ppo_ep: 1|act_loss: 0.248779296875|cri_loss: 0.205322265625|unsuper_loss: 0.0 average reward score: 0.5986328125 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.44%) |Training time=0.64s (19.55%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 765|ppo_ep: 1|act_loss: 0.1533203125|cri_loss: 0.177490234375|unsuper_loss: 0.0 average reward score: 0.50830078125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.28%) |Training time=0.65s (19.80%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 766|ppo_ep: 1|act_loss: 0.31884765625|cri_loss: 0.285400390625|unsuper_loss: 0.0 average reward score: 0.28857421875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.28%) |Training time=0.64s (19.69%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 767|ppo_ep: 1|act_loss: 0.058685302734375|cri_loss: 0.100830078125|unsuper_loss: 0.0 average reward score: 0.3203125 ------------------------------------------------------------------------------------- |E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.36s (64.88%) |Training time=1.00s (27.39%) |Others=0.28 (7.73%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.42 epoch: 0|step: 768|ppo_ep: 1|act_loss: -0.433349609375|cri_loss: -0.02978515625|unsuper_loss: 0.0 average reward score: 1.189453125 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.30%) |Training time=0.64s (19.44%) |Others=0.20 (6.26%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 769|ppo_ep: 1|act_loss: -0.736328125|cri_loss: -0.213134765625|unsuper_loss: 0.0 average reward score: 1.482421875 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.89%) |Training time=0.64s (19.30%) |Others=0.19 (5.81%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.42 epoch: 0|step: 770|ppo_ep: 1|act_loss: -0.75732421875|cri_loss: -0.21630859375|unsuper_loss: 0.0 average reward score: 0.595703125 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.59%) |Training time=0.64s (19.26%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.42 epoch: 0|step: 771|ppo_ep: 1|act_loss: -0.78564453125|cri_loss: -0.22119140625|unsuper_loss: 0.0 average reward score: 1.7294921875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.04%) |Training time=0.65s (19.88%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 772|ppo_ep: 1|act_loss: -0.3994140625|cri_loss: -0.1151123046875|unsuper_loss: 0.0 average reward score: 0.307861328125 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.46s (73.93%) |Training time=0.66s (19.72%) |Others=0.21 (6.35%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.42 epoch: 0|step: 773|ppo_ep: 1|act_loss: -0.0247802734375|cri_loss: 0.1021728515625|unsuper_loss: 0.0 average reward score: 0.85546875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.43s (73.84%) |Training time=0.66s (20.10%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.42 epoch: 0|step: 774|ppo_ep: 1|act_loss: -0.3916015625|cri_loss: -0.05126953125|unsuper_loss: 0.0 average reward score: 1.865234375 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.33%) |Training time=0.65s (19.46%) |Others=0.21 (6.21%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.42 epoch: 0|step: 775|ppo_ep: 1|act_loss: 0.05718994140625|cri_loss: 0.171630859375|unsuper_loss: 0.0 average reward score: -0.427734375 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.42s (66.79%) |Training time=0.92s (25.54%) |Others=0.28 (7.66%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.42 epoch: 0|step: 776|ppo_ep: 1|act_loss: 0.0748291015625|cri_loss: 0.1285400390625|unsuper_loss: 0.0 average reward score: 1.51171875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (74.09%) |Training time=0.64s (19.89%) |Others=0.19 (6.02%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 777|ppo_ep: 1|act_loss: 0.71875|cri_loss: 0.4501953125|unsuper_loss: 0.0 average reward score: 0.68896484375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.27%) |Training time=0.64s (19.66%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 778|ppo_ep: 1|act_loss: 0.01318359375|cri_loss: 0.062744140625|unsuper_loss: 0.0 average reward score: 0.859375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.14%) |Training time=0.64s (19.73%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 779|ppo_ep: 1|act_loss: -0.1585693359375|cri_loss: 0.013916015625|unsuper_loss: 0.0 average reward score: 1.3427734375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.17%) |Training time=0.64s (19.79%) |Others=0.19 (6.03%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 780|ppo_ep: 1|act_loss: -0.1455078125|cri_loss: 0.0618896484375|unsuper_loss: 0.0 average reward score: 0.7294921875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.89%) |Training time=0.64s (19.94%) |Others=0.20 (6.17%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 781|ppo_ep: 1|act_loss: -0.33935546875|cri_loss: -0.0977783203125|unsuper_loss: 0.0 average reward score: 0.4130859375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.05%) |Training time=0.64s (19.89%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 782|ppo_ep: 1|act_loss: -0.30712890625|cri_loss: -0.0323486328125|unsuper_loss: 0.0 average reward score: 1.337890625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.94%) |Training time=0.64s (19.92%) |Others=0.20 (6.14%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 783|ppo_ep: 1|act_loss: -0.2646484375|cri_loss: -0.04150390625|unsuper_loss: 0.0 average reward score: 0.640625 ------------------------------------------------------------------------------------- |E2E latency=3.70s |Gather latency=0.00s (0.00%) |Generate time=2.42s (65.32%) |Training time=0.95s (25.55%) |Others=0.34 (9.13%)|CurSamplesPerSec=2.16 |AvgSamplesPerSec=2.42 epoch: 0|step: 784|ppo_ep: 1|act_loss: 1.2255859375|cri_loss: 0.7626953125|unsuper_loss: 0.0 average reward score: -0.71240234375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.97%) |Training time=0.65s (19.89%) |Others=0.20 (6.14%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 785|ppo_ep: 1|act_loss: 1.119140625|cri_loss: 0.7578125|unsuper_loss: 0.0 average reward score: 1.2255859375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.21%) |Training time=0.64s (19.72%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 786|ppo_ep: 1|act_loss: 1.0361328125|cri_loss: 0.6845703125|unsuper_loss: 0.0 average reward score: 1.65625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.12%) |Training time=0.64s (19.48%) |Others=0.24 (7.40%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 787|ppo_ep: 1|act_loss: 1.373046875|cri_loss: 0.8935546875|unsuper_loss: 0.0 average reward score: 1.3515625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.14%) |Training time=0.64s (19.81%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 788|ppo_ep: 1|act_loss: 1.185546875|cri_loss: 0.77392578125|unsuper_loss: 0.0 average reward score: 1.0615234375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.21%) |Training time=0.64s (19.69%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 789|ppo_ep: 1|act_loss: 1.533203125|cri_loss: 1.00390625|unsuper_loss: 0.0 average reward score: 0.2225341796875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.23%) |Training time=0.64s (19.86%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 790|ppo_ep: 1|act_loss: 1.1982421875|cri_loss: 0.791015625|unsuper_loss: 0.0 average reward score: -0.323974609375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.30%) |Training time=0.64s (19.70%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 791|ppo_ep: 1|act_loss: 0.8955078125|cri_loss: 0.60546875|unsuper_loss: 0.0 average reward score: 1.08984375 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.39s (66.50%) |Training time=0.93s (25.89%) |Others=0.27 (7.61%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.42 epoch: 0|step: 792|ppo_ep: 1|act_loss: 1.666015625|cri_loss: 1.099609375|unsuper_loss: 0.0 average reward score: 0.033447265625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.44%) |Training time=0.63s (19.64%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 793|ppo_ep: 1|act_loss: 1.0224609375|cri_loss: 0.68359375|unsuper_loss: 0.0 average reward score: -0.15380859375 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.84%) |Training time=0.64s (19.34%) |Others=0.19 (5.82%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.42 epoch: 0|step: 794|ppo_ep: 1|act_loss: 1.52734375|cri_loss: 1.05078125|unsuper_loss: 0.0 average reward score: -0.28857421875 ------------------------------------------------------------------------------------- |E2E latency=3.41s |Gather latency=0.00s (0.00%) |Generate time=2.55s (74.95%) |Training time=0.64s (18.92%) |Others=0.21 (6.13%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.42 epoch: 0|step: 795|ppo_ep: 1|act_loss: 1.3349609375|cri_loss: 0.87109375|unsuper_loss: 0.0 average reward score: 0.155029296875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.48%) |Training time=0.64s (19.59%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 796|ppo_ep: 1|act_loss: 1.04296875|cri_loss: 0.6591796875|unsuper_loss: 0.0 average reward score: 0.52490234375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.31%) |Training time=0.64s (19.74%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 797|ppo_ep: 1|act_loss: 1.0654296875|cri_loss: 0.7001953125|unsuper_loss: 0.0 average reward score: 1.21875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.25%) |Training time=0.64s (19.88%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 798|ppo_ep: 1|act_loss: 1.201171875|cri_loss: 0.7880859375|unsuper_loss: 0.0 average reward score: 0.2276611328125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.20%) |Training time=0.64s (19.79%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 [2023-04-24 14:31:47,742] [INFO] [logging.py:96:log_dist] [Rank 0] step=100, skipped=5, lr=[9.1675e-06, 9.1675e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 14:31:47,988] [INFO] [timer.py:199:stop] epoch=0/micro_step=800/global_step=100, RunningAvgSamplesPerSec=15.46629525087213, CurrSamplesPerSec=15.906962957730697, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 14:31:48,189] [INFO] [logging.py:96:log_dist] [Rank 0] step=100, skipped=4, lr=[4.800000000000001e-06, 4.800000000000001e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 799|ppo_ep: 1|act_loss: 0.93359375|cri_loss: 0.6533203125|unsuper_loss: 0.0 average reward score: 0.323974609375 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.71%) |Training time=0.92s (25.61%) |Others=0.28 (7.68%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.42 epoch: 0|step: 800|ppo_ep: 1|act_loss: 0.59716796875|cri_loss: 0.409423828125|unsuper_loss: 0.0 average reward score: 0.23046875 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.97%) |Training time=0.63s (20.01%) |Others=0.19 (6.02%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.42 epoch: 0|step: 801|ppo_ep: 1|act_loss: 1.087890625|cri_loss: 0.68359375|unsuper_loss: 0.0 average reward score: 0.5380859375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.90%) |Training time=0.64s (20.00%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 802|ppo_ep: 1|act_loss: 0.45751953125|cri_loss: 0.34326171875|unsuper_loss: 0.0 average reward score: 1.611328125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.05%) |Training time=0.67s (20.62%) |Others=0.21 (6.33%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 803|ppo_ep: 1|act_loss: 0.75|cri_loss: 0.51953125|unsuper_loss: 0.0 average reward score: 1.9326171875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.15%) |Training time=0.65s (19.89%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 804|ppo_ep: 1|act_loss: 0.9267578125|cri_loss: 0.57421875|unsuper_loss: 0.0 average reward score: 0.9052734375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.28%) |Training time=0.64s (19.70%) |Others=0.20 (6.02%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 805|ppo_ep: 1|act_loss: 1.2353515625|cri_loss: 0.8271484375|unsuper_loss: 0.0 average reward score: 1.1552734375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.83%) |Training time=0.64s (19.72%) |Others=0.21 (6.46%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 806|ppo_ep: 1|act_loss: 0.6025390625|cri_loss: 0.419189453125|unsuper_loss: 0.0 average reward score: -0.6083984375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.95%) |Training time=0.64s (19.99%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 807|ppo_ep: 1|act_loss: 0.9814453125|cri_loss: 0.61376953125|unsuper_loss: 0.0 average reward score: 0.7431640625 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.39s (66.33%) |Training time=0.94s (25.99%) |Others=0.28 (7.68%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.42 epoch: 0|step: 808|ppo_ep: 1|act_loss: -0.118896484375|cri_loss: 0.0562744140625|unsuper_loss: 0.0 average reward score: -0.2578125 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (74.13%) |Training time=0.64s (19.83%) |Others=0.19 (6.04%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 809|ppo_ep: 1|act_loss: -0.137939453125|cri_loss: -0.0040283203125|unsuper_loss: 0.0 average reward score: -0.260009765625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.21%) |Training time=0.64s (19.67%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 810|ppo_ep: 1|act_loss: -0.06689453125|cri_loss: 0.05169677734375|unsuper_loss: 0.0 average reward score: 1.890625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.24%) |Training time=0.64s (19.77%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 811|ppo_ep: 1|act_loss: -0.33251953125|cri_loss: -0.0858154296875|unsuper_loss: 0.0 average reward score: 1.705078125 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.80%) |Training time=0.64s (20.12%) |Others=0.19 (6.08%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.42 epoch: 0|step: 812|ppo_ep: 1|act_loss: -0.098876953125|cri_loss: 0.0072021484375|unsuper_loss: 0.0 average reward score: 0.77734375 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.38%) |Training time=0.64s (20.28%) |Others=0.20 (6.34%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.42 epoch: 0|step: 813|ppo_ep: 1|act_loss: -0.1802978515625|cri_loss: -0.0025634765625|unsuper_loss: 0.0 average reward score: 1.0029296875 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.49s (73.96%) |Training time=0.67s (20.01%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.42 epoch: 0|step: 814|ppo_ep: 1|act_loss: -0.3603515625|cri_loss: -0.08056640625|unsuper_loss: 0.0 average reward score: -0.26953125 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.66%) |Training time=0.64s (19.30%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.42 epoch: 0|step: 815|ppo_ep: 1|act_loss: -0.271728515625|cri_loss: -0.0107421875|unsuper_loss: 0.0 average reward score: 0.478759765625 ------------------------------------------------------------------------------------- |E2E latency=3.79s |Gather latency=0.00s (0.00%) |Generate time=2.58s (68.20%) |Training time=0.92s (24.37%) |Others=0.28 (7.43%)|CurSamplesPerSec=2.11 |AvgSamplesPerSec=2.42 epoch: 0|step: 816|ppo_ep: 1|act_loss: -0.2188720703125|cri_loss: 0.0074462890625|unsuper_loss: 0.0 average reward score: 0.034912109375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.61%) |Training time=0.63s (19.53%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 817|ppo_ep: 1|act_loss: -0.25830078125|cri_loss: -0.049560546875|unsuper_loss: 0.0 average reward score: 0.962890625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.42%) |Training time=0.64s (19.72%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 818|ppo_ep: 1|act_loss: -0.177001953125|cri_loss: -0.025390625|unsuper_loss: 0.0 average reward score: 1.5341796875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.33%) |Training time=0.64s (19.65%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 819|ppo_ep: 1|act_loss: -0.427734375|cri_loss: -0.1351318359375|unsuper_loss: 0.0 average reward score: 0.0516357421875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.71%) |Training time=0.64s (19.40%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.42 epoch: 0|step: 820|ppo_ep: 1|act_loss: -0.30224609375|cri_loss: 0.057373046875|unsuper_loss: 0.0 average reward score: 1.1767578125 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.66%) |Training time=0.64s (19.41%) |Others=0.20 (5.93%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.42 epoch: 0|step: 821|ppo_ep: 1|act_loss: -0.2705078125|cri_loss: -0.0301513671875|unsuper_loss: 0.0 average reward score: 0.56787109375 ------------------------------------------------------------------------------------- |E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.56s (75.37%) |Training time=0.64s (18.88%) |Others=0.19 (5.74%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.42 epoch: 0|step: 822|ppo_ep: 1|act_loss: -0.384765625|cri_loss: -0.018798828125|unsuper_loss: 0.0 average reward score: 0.7138671875 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.87%) |Training time=0.64s (19.14%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.42 epoch: 0|step: 823|ppo_ep: 1|act_loss: -0.434814453125|cri_loss: -0.14306640625|unsuper_loss: 0.0 average reward score: 0.292724609375 ------------------------------------------------------------------------------------- |E2E latency=3.73s |Gather latency=0.00s (0.00%) |Generate time=2.50s (67.00%) |Training time=0.95s (25.43%) |Others=0.28 (7.57%)|CurSamplesPerSec=2.15 |AvgSamplesPerSec=2.42 epoch: 0|step: 824|ppo_ep: 1|act_loss: -0.012939453125|cri_loss: 0.0699462890625|unsuper_loss: 0.0 average reward score: -0.107421875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.29%) |Training time=0.64s (19.74%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 825|ppo_ep: 1|act_loss: 0.026458740234375|cri_loss: 0.0792236328125|unsuper_loss: 0.0 average reward score: 0.047119140625 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.84%) |Training time=0.65s (19.45%) |Others=0.19 (5.71%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.42 epoch: 0|step: 826|ppo_ep: 1|act_loss: 0.17333984375|cri_loss: 0.2265625|unsuper_loss: 0.0 average reward score: 0.8564453125 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.88%) |Training time=0.66s (20.15%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 827|ppo_ep: 1|act_loss: -0.266845703125|cri_loss: -0.0325927734375|unsuper_loss: 0.0 average reward score: 0.6748046875 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.38%) |Training time=0.65s (19.68%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 828|ppo_ep: 1|act_loss: -0.1402587890625|cri_loss: 0.00634765625|unsuper_loss: 0.0 average reward score: 0.84814453125 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.41%) |Training time=0.64s (19.59%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 829|ppo_ep: 1|act_loss: 0.4873046875|cri_loss: 0.4521484375|unsuper_loss: 0.0 average reward score: 2.47265625 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.43s (73.28%) |Training time=0.69s (20.75%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.42 epoch: 0|step: 830|ppo_ep: 1|act_loss: 0.06732177734375|cri_loss: 0.181884765625|unsuper_loss: 0.0 average reward score: 0.82861328125 ------------------------------------------------------------------------------------- |E2E latency=3.57s |Gather latency=0.00s (0.00%) |Generate time=2.41s (67.38%) |Training time=0.64s (17.94%) |Others=0.52 (14.67%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.42 epoch: 0|step: 831|ppo_ep: 1|act_loss: -0.149169921875|cri_loss: -0.012939453125|unsuper_loss: 0.0 average reward score: -0.7578125 ------------------------------------------------------------------------------------- |E2E latency=3.66s |Gather latency=0.00s (0.00%) |Generate time=2.39s (65.31%) |Training time=0.99s (26.94%) |Others=0.28 (7.75%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.42 epoch: 0|step: 832|ppo_ep: 1|act_loss: -0.275390625|cri_loss: -0.0845947265625|unsuper_loss: 0.0 average reward score: 0.68359375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.15%) |Training time=0.63s (19.65%) |Others=0.20 (6.20%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 833|ppo_ep: 1|act_loss: 0.11676025390625|cri_loss: 0.216552734375|unsuper_loss: 0.0 average reward score: 1.046875 ------------------------------------------------------------------------------------- |E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.40s (70.64%) |Training time=0.80s (23.67%) |Others=0.19 (5.69%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.42 epoch: 0|step: 834|ppo_ep: 1|act_loss: -0.278564453125|cri_loss: -0.068603515625|unsuper_loss: 0.0 average reward score: 0.2939453125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.07%) |Training time=0.64s (19.82%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 835|ppo_ep: 1|act_loss: -0.0078125|cri_loss: 0.06011962890625|unsuper_loss: 0.0 average reward score: 0.615234375 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.14%) |Training time=0.64s (19.80%) |Others=0.19 (6.06%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 836|ppo_ep: 1|act_loss: -0.2164306640625|cri_loss: -0.0230712890625|unsuper_loss: 0.0 average reward score: 1.298828125 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.48%) |Training time=0.64s (19.46%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 837|ppo_ep: 1|act_loss: 0.0810546875|cri_loss: 0.1435546875|unsuper_loss: 0.0 average reward score: 0.47802734375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.99%) |Training time=0.65s (20.00%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 838|ppo_ep: 1|act_loss: -0.240966796875|cri_loss: 0.00146484375|unsuper_loss: 0.0 average reward score: -0.55615234375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.46%) |Training time=0.67s (20.55%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 839|ppo_ep: 1|act_loss: -0.351318359375|cri_loss: -0.062744140625|unsuper_loss: 0.0 average reward score: 0.642578125 ------------------------------------------------------------------------------------- |E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.40s (66.12%) |Training time=0.93s (25.62%) |Others=0.30 (8.25%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.42 epoch: 0|step: 840|ppo_ep: 1|act_loss: -0.445556640625|cri_loss: -0.110595703125|unsuper_loss: 0.0 average reward score: 0.8701171875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.51%) |Training time=0.65s (19.84%) |Others=0.22 (6.65%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 841|ppo_ep: 1|act_loss: -0.5908203125|cri_loss: -0.17431640625|unsuper_loss: 0.0 average reward score: 1.130859375 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.65%) |Training time=0.64s (19.46%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.42 epoch: 0|step: 842|ppo_ep: 1|act_loss: -0.38232421875|cri_loss: -0.1099853515625|unsuper_loss: 0.0 average reward score: 0.79150390625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.49%) |Training time=0.66s (20.49%) |Others=0.19 (6.02%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 843|ppo_ep: 1|act_loss: -0.490234375|cri_loss: -0.1741943359375|unsuper_loss: 0.0 average reward score: 1.505859375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.13%) |Training time=0.64s (19.88%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 844|ppo_ep: 1|act_loss: -0.333984375|cri_loss: -0.1197509765625|unsuper_loss: 0.0 average reward score: 0.3984375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.32s (71.94%) |Training time=0.71s (22.01%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 845|ppo_ep: 1|act_loss: -0.5048828125|cri_loss: -0.166748046875|unsuper_loss: 0.0 average reward score: 1.3369140625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.16%) |Training time=0.64s (19.81%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 846|ppo_ep: 1|act_loss: -0.39794921875|cri_loss: -0.06689453125|unsuper_loss: 0.0 average reward score: 0.76171875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.31%) |Training time=0.64s (19.69%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 847|ppo_ep: 1|act_loss: -0.32568359375|cri_loss: -0.0875244140625|unsuper_loss: 0.0 average reward score: -0.1298828125 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.42s (66.75%) |Training time=0.93s (25.59%) |Others=0.28 (7.67%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.42 epoch: 0|step: 848|ppo_ep: 1|act_loss: 0.238037109375|cri_loss: 0.189453125|unsuper_loss: 0.0 average reward score: 0.2186279296875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.42%) |Training time=0.64s (19.53%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 849|ppo_ep: 1|act_loss: 0.427978515625|cri_loss: 0.34716796875|unsuper_loss: 0.0 average reward score: -0.11328125 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.41%) |Training time=0.64s (19.61%) |Others=0.20 (5.98%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 850|ppo_ep: 1|act_loss: 0.13916015625|cri_loss: 0.14892578125|unsuper_loss: 0.0 average reward score: 0.95751953125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.26%) |Training time=0.64s (19.84%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 851|ppo_ep: 1|act_loss: 0.2357177734375|cri_loss: 0.164306640625|unsuper_loss: 0.0 average reward score: 0.81201171875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.46%) |Training time=0.64s (19.70%) |Others=0.19 (5.84%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 852|ppo_ep: 1|act_loss: -0.158935546875|cri_loss: 0.024169921875|unsuper_loss: 0.0 average reward score: -0.79248046875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.19%) |Training time=0.64s (19.74%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 853|ppo_ep: 1|act_loss: 0.2357177734375|cri_loss: 0.2122802734375|unsuper_loss: 0.0 average reward score: 0.8095703125 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.72%) |Training time=0.64s (19.41%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.42 epoch: 0|step: 854|ppo_ep: 1|act_loss: -0.10015869140625|cri_loss: 0.00274658203125|unsuper_loss: 0.0 average reward score: -0.07623291015625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.98%) |Training time=0.64s (19.99%) |Others=0.19 (6.03%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 855|ppo_ep: 1|act_loss: -0.19482421875|cri_loss: -0.0595703125|unsuper_loss: 0.0 average reward score: 0.365478515625 ------------------------------------------------------------------------------------- |E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.30s (65.62%) |Training time=0.93s (26.54%) |Others=0.27 (7.84%)|CurSamplesPerSec=2.28 |AvgSamplesPerSec=2.42 epoch: 0|step: 856|ppo_ep: 1|act_loss: 0.7861328125|cri_loss: 0.489501953125|unsuper_loss: 0.0 average reward score: 0.351806640625 ------------------------------------------------------------------------------------- |E2E latency=3.10s |Gather latency=0.00s (0.00%) |Generate time=2.27s (73.29%) |Training time=0.64s (20.58%) |Others=0.19 (6.13%)|CurSamplesPerSec=2.58 |AvgSamplesPerSec=2.42 epoch: 0|step: 857|ppo_ep: 1|act_loss: 0.8544921875|cri_loss: 0.52880859375|unsuper_loss: 0.0 average reward score: 0.469482421875 ------------------------------------------------------------------------------------- |E2E latency=3.12s |Gather latency=0.00s (0.00%) |Generate time=2.27s (72.89%) |Training time=0.65s (20.92%) |Others=0.19 (6.19%)|CurSamplesPerSec=2.57 |AvgSamplesPerSec=2.42 epoch: 0|step: 858|ppo_ep: 1|act_loss: 0.69140625|cri_loss: 0.418212890625|unsuper_loss: 0.0 average reward score: 0.04345703125 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.72%) |Training time=0.64s (20.03%) |Others=0.20 (6.25%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.42 epoch: 0|step: 859|ppo_ep: 1|act_loss: 0.91015625|cri_loss: 0.5693359375|unsuper_loss: 0.0 average reward score: 0.1912841796875 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.43%) |Training time=0.65s (20.27%) |Others=0.20 (6.30%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.42 epoch: 0|step: 860|ppo_ep: 1|act_loss: 0.42724609375|cri_loss: 0.27392578125|unsuper_loss: 0.0 average reward score: 0.419921875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.65%) |Training time=0.64s (19.42%) |Others=0.20 (5.94%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.42 epoch: 0|step: 861|ppo_ep: 1|act_loss: 0.6240234375|cri_loss: 0.390625|unsuper_loss: 0.0 average reward score: -0.1484375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.25%) |Training time=0.64s (19.77%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 862|ppo_ep: 1|act_loss: 0.4267578125|cri_loss: 0.261474609375|unsuper_loss: 0.0 average reward score: -0.72998046875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.12%) |Training time=0.64s (19.89%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 863|ppo_ep: 1|act_loss: 0.490478515625|cri_loss: 0.29833984375|unsuper_loss: 0.0 average reward score: 0.59375 ------------------------------------------------------------------------------------- |E2E latency=3.59s |Gather latency=0.00s (0.00%) |Generate time=2.39s (66.64%) |Training time=0.92s (25.64%) |Others=0.28 (7.73%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.42 epoch: 0|step: 864|ppo_ep: 1|act_loss: 0.7197265625|cri_loss: 0.45263671875|unsuper_loss: 0.0 average reward score: 1.287109375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.37%) |Training time=0.64s (19.72%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 865|ppo_ep: 1|act_loss: 0.513671875|cri_loss: 0.29736328125|unsuper_loss: 0.0 average reward score: -0.72119140625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.42%) |Training time=0.68s (20.72%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 866|ppo_ep: 1|act_loss: 0.74365234375|cri_loss: 0.44091796875|unsuper_loss: 0.0 average reward score: -0.53857421875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.20%) |Training time=0.64s (19.83%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 867|ppo_ep: 1|act_loss: 0.826171875|cri_loss: 0.513671875|unsuper_loss: 0.0 average reward score: 1.78515625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.68%) |Training time=0.64s (19.52%) |Others=0.22 (6.80%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 868|ppo_ep: 1|act_loss: 0.46875|cri_loss: 0.27978515625|unsuper_loss: 0.0 average reward score: -0.72900390625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.30%) |Training time=0.67s (20.49%) |Others=0.20 (6.21%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 869|ppo_ep: 1|act_loss: 0.5703125|cri_loss: 0.345703125|unsuper_loss: 0.0 average reward score: -0.1494140625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.16%) |Training time=0.64s (19.75%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 870|ppo_ep: 1|act_loss: 0.64697265625|cri_loss: 0.407958984375|unsuper_loss: 0.0 average reward score: 1.0693359375 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.39s (72.35%) |Training time=0.72s (21.73%) |Others=0.20 (5.92%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.42 epoch: 0|step: 871|ppo_ep: 1|act_loss: 0.9501953125|cri_loss: 0.60791015625|unsuper_loss: 0.0 average reward score: 1.0087890625 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.39s (66.24%) |Training time=0.93s (25.87%) |Others=0.28 (7.90%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.42 epoch: 0|step: 872|ppo_ep: 1|act_loss: 0.446533203125|cri_loss: 0.2744140625|unsuper_loss: 0.0 average reward score: -0.1107177734375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.29%) |Training time=0.64s (19.86%) |Others=0.19 (5.85%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 873|ppo_ep: 1|act_loss: 0.7900390625|cri_loss: 0.501953125|unsuper_loss: 0.0 average reward score: 0.8583984375 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.34s (70.88%) |Training time=0.77s (23.26%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.42 epoch: 0|step: 874|ppo_ep: 1|act_loss: 0.63037109375|cri_loss: 0.40576171875|unsuper_loss: 0.0 average reward score: 1.013671875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.00%) |Training time=0.64s (19.68%) |Others=0.21 (6.32%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 875|ppo_ep: 1|act_loss: 0.71923828125|cri_loss: 0.4599609375|unsuper_loss: 0.0 average reward score: 0.04638671875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.71%) |Training time=0.64s (19.44%) |Others=0.19 (5.85%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.42 epoch: 0|step: 876|ppo_ep: 1|act_loss: 0.70556640625|cri_loss: 0.43359375|unsuper_loss: 0.0 average reward score: 0.368896484375 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.09%) |Training time=0.65s (19.73%) |Others=0.20 (6.18%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.42 epoch: 0|step: 877|ppo_ep: 1|act_loss: 0.666015625|cri_loss: 0.41357421875|unsuper_loss: 0.0 average reward score: 0.73974609375 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.17%) |Training time=0.65s (19.83%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.42 epoch: 0|step: 878|ppo_ep: 1|act_loss: 0.79296875|cri_loss: 0.48291015625|unsuper_loss: 0.0 average reward score: -0.196533203125 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.46%) |Training time=0.65s (19.93%) |Others=0.22 (6.61%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 [2023-04-24 14:36:12,336] [INFO] [logging.py:96:log_dist] [Rank 0] step=110, skipped=5, lr=[9.645790134492441e-06, 9.645790134492441e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 14:36:12,581] [INFO] [timer.py:199:stop] epoch=0/micro_step=880/global_step=110, RunningAvgSamplesPerSec=15.472432310673302, CurrSamplesPerSec=15.321293779708554, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 14:36:12,788] [INFO] [logging.py:96:log_dist] [Rank 0] step=110, skipped=4, lr=[4.996859161456965e-06, 4.996859161456965e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 879|ppo_ep: 1|act_loss: 0.92822265625|cri_loss: 0.5517578125|unsuper_loss: 0.0 average reward score: 0.034912109375 ------------------------------------------------------------------------------------- |E2E latency=3.66s |Gather latency=0.00s (0.00%) |Generate time=2.45s (66.96%) |Training time=0.93s (25.31%) |Others=0.28 (7.73%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.42 epoch: 0|step: 880|ppo_ep: 1|act_loss: 0.0042724609375|cri_loss: 0.0496826171875|unsuper_loss: 0.0 average reward score: 1.525390625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.42%) |Training time=0.64s (19.58%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 881|ppo_ep: 1|act_loss: 0.17626953125|cri_loss: 0.1602783203125|unsuper_loss: 0.0 average reward score: 1.1201171875 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.45s (73.99%) |Training time=0.66s (19.88%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.42 epoch: 0|step: 882|ppo_ep: 1|act_loss: -0.2406005859375|cri_loss: -0.0408935546875|unsuper_loss: 0.0 average reward score: 0.916015625 ------------------------------------------------------------------------------------- |E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.54s (74.68%) |Training time=0.66s (19.35%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.42 epoch: 0|step: 883|ppo_ep: 1|act_loss: 0.18505859375|cri_loss: 0.157958984375|unsuper_loss: 0.0 average reward score: 0.765625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.50%) |Training time=0.64s (19.59%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 884|ppo_ep: 1|act_loss: 0.1258544921875|cri_loss: 0.11767578125|unsuper_loss: 0.0 average reward score: 0.50390625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.83%) |Training time=0.64s (19.93%) |Others=0.20 (6.24%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 885|ppo_ep: 1|act_loss: 0.38720703125|cri_loss: 0.2452392578125|unsuper_loss: 0.0 average reward score: 0.33935546875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.86%) |Training time=0.65s (19.85%) |Others=0.21 (6.29%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 886|ppo_ep: 1|act_loss: 0.01165771484375|cri_loss: 0.03448486328125|unsuper_loss: 0.0 average reward score: 0.68310546875 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.44%) |Training time=0.64s (19.33%) |Others=0.21 (6.24%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.42 epoch: 0|step: 887|ppo_ep: 1|act_loss: 0.277099609375|cri_loss: 0.2017822265625|unsuper_loss: 0.0 average reward score: 1.056640625 ------------------------------------------------------------------------------------- |E2E latency=3.67s |Gather latency=0.00s (0.00%) |Generate time=2.45s (66.80%) |Training time=0.94s (25.59%) |Others=0.28 (7.62%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.42 epoch: 0|step: 888|ppo_ep: 1|act_loss: -0.62890625|cri_loss: -0.224365234375|unsuper_loss: 0.0 average reward score: 1.2548828125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.39%) |Training time=0.64s (19.65%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 889|ppo_ep: 1|act_loss: -0.380859375|cri_loss: -0.1185302734375|unsuper_loss: 0.0 average reward score: 0.36572265625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.46%) |Training time=0.64s (19.72%) |Others=0.19 (5.82%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 890|ppo_ep: 1|act_loss: -0.376708984375|cri_loss: -0.058837890625|unsuper_loss: 0.0 average reward score: -0.59716796875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.34%) |Training time=0.64s (19.70%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 891|ppo_ep: 1|act_loss: -0.37353515625|cri_loss: -0.1016845703125|unsuper_loss: 0.0 average reward score: -1.0078125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.34%) |Training time=0.64s (19.68%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 892|ppo_ep: 1|act_loss: -0.39306640625|cri_loss: -0.144775390625|unsuper_loss: 0.0 average reward score: 0.329345703125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.20%) |Training time=0.64s (19.62%) |Others=0.20 (6.18%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 893|ppo_ep: 1|act_loss: -0.28857421875|cri_loss: -0.071533203125|unsuper_loss: 0.0 average reward score: 0.2880859375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.40%) |Training time=0.64s (19.69%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 894|ppo_ep: 1|act_loss: -0.38623046875|cri_loss: -0.140869140625|unsuper_loss: 0.0 average reward score: 0.152587890625 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.92%) |Training time=0.64s (20.06%) |Others=0.19 (6.02%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 895|ppo_ep: 1|act_loss: -0.14404296875|cri_loss: -0.0015869140625|unsuper_loss: 0.0 average reward score: 0.75634765625 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.66%) |Training time=0.93s (25.71%) |Others=0.28 (7.62%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.42 epoch: 0|step: 896|ppo_ep: 1|act_loss: -0.10205078125|cri_loss: -0.01434326171875|unsuper_loss: 0.0 average reward score: -0.7626953125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.94%) |Training time=0.64s (19.98%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 897|ppo_ep: 1|act_loss: -0.036376953125|cri_loss: 0.0670166015625|unsuper_loss: 0.0 average reward score: -0.8173828125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.01%) |Training time=0.65s (19.89%) |Others=0.23 (7.10%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 898|ppo_ep: 1|act_loss: -0.418212890625|cri_loss: -0.135009765625|unsuper_loss: 0.0 average reward score: 0.00732421875 ------------------------------------------------------------------------------------- |E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.53s (74.84%) |Training time=0.65s (19.23%) |Others=0.20 (5.93%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.42 epoch: 0|step: 899|ppo_ep: 1|act_loss: -0.0806884765625|cri_loss: 0.01617431640625|unsuper_loss: 0.0 average reward score: -1.083984375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.23%) |Training time=0.65s (19.80%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 900|ppo_ep: 1|act_loss: -0.26025390625|cri_loss: -0.08038330078125|unsuper_loss: 0.0 average reward score: 1.3212890625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.15%) |Training time=0.64s (19.79%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 901|ppo_ep: 1|act_loss: -0.275146484375|cri_loss: -0.08819580078125|unsuper_loss: 0.0 average reward score: -0.54150390625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.03%) |Training time=0.65s (19.92%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 902|ppo_ep: 1|act_loss: 0.0816650390625|cri_loss: 0.1375732421875|unsuper_loss: 0.0 average reward score: 0.04931640625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.14%) |Training time=0.64s (19.85%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 903|ppo_ep: 1|act_loss: 0.1357421875|cri_loss: 0.128662109375|unsuper_loss: 0.0 average reward score: 0.6591796875 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.40s (66.47%) |Training time=0.93s (25.73%) |Others=0.28 (7.80%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.42 epoch: 0|step: 904|ppo_ep: 1|act_loss: 0.00726318359375|cri_loss: 0.060546875|unsuper_loss: 0.0 average reward score: 1.5712890625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.38%) |Training time=0.64s (19.77%) |Others=0.19 (5.85%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 905|ppo_ep: 1|act_loss: 0.154541015625|cri_loss: 0.1767578125|unsuper_loss: 0.0 average reward score: 0.066162109375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.88%) |Training time=0.65s (20.06%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 906|ppo_ep: 1|act_loss: 0.5537109375|cri_loss: 0.32373046875|unsuper_loss: 0.0 average reward score: -0.291015625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.20%) |Training time=0.64s (19.88%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 907|ppo_ep: 1|act_loss: 0.0975341796875|cri_loss: 0.116943359375|unsuper_loss: 0.0 average reward score: 0.013671875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.17%) |Training time=0.64s (19.92%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 908|ppo_ep: 1|act_loss: 0.06890869140625|cri_loss: 0.06915283203125|unsuper_loss: 0.0 average reward score: 0.99853515625 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.53s (75.22%) |Training time=0.64s (19.03%) |Others=0.19 (5.75%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.42 epoch: 0|step: 909|ppo_ep: 1|act_loss: 0.25244140625|cri_loss: 0.178955078125|unsuper_loss: 0.0 average reward score: -0.176513671875 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.89%) |Training time=0.64s (20.08%) |Others=0.19 (6.03%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.42 epoch: 0|step: 910|ppo_ep: 1|act_loss: 0.1478271484375|cri_loss: 0.128662109375|unsuper_loss: 0.0 average reward score: 0.219970703125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.91%) |Training time=0.64s (19.67%) |Others=0.21 (6.42%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 911|ppo_ep: 1|act_loss: 0.18310546875|cri_loss: 0.1456298828125|unsuper_loss: 0.0 average reward score: 0.64404296875 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.40s (66.58%) |Training time=0.93s (25.70%) |Others=0.28 (7.72%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.42 epoch: 0|step: 912|ppo_ep: 1|act_loss: 0.23095703125|cri_loss: 0.15087890625|unsuper_loss: 0.0 average reward score: 0.8154296875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.45%) |Training time=0.64s (19.58%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 913|ppo_ep: 1|act_loss: 0.6796875|cri_loss: 0.42578125|unsuper_loss: 0.0 average reward score: 0.0810546875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.24%) |Training time=0.64s (19.75%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 914|ppo_ep: 1|act_loss: 0.32763671875|cri_loss: 0.2080078125|unsuper_loss: 0.0 average reward score: 1.0693359375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.47%) |Training time=0.64s (19.54%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 915|ppo_ep: 1|act_loss: 0.5224609375|cri_loss: 0.306396484375|unsuper_loss: 0.0 average reward score: 0.86962890625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.33%) |Training time=0.63s (19.67%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 916|ppo_ep: 1|act_loss: 0.57568359375|cri_loss: 0.35009765625|unsuper_loss: 0.0 average reward score: 1.935546875 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.32s (73.28%) |Training time=0.65s (20.63%) |Others=0.19 (6.09%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.42 epoch: 0|step: 917|ppo_ep: 1|act_loss: 0.3017578125|cri_loss: 0.1898193359375|unsuper_loss: 0.0 average reward score: 0.29443359375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.46%) |Training time=0.64s (19.62%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 918|ppo_ep: 1|act_loss: 0.4130859375|cri_loss: 0.261962890625|unsuper_loss: 0.0 average reward score: 0.7451171875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.30%) |Training time=0.65s (19.72%) |Others=0.20 (5.98%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 919|ppo_ep: 1|act_loss: 0.3359375|cri_loss: 0.2139892578125|unsuper_loss: 0.0 average reward score: 0.529296875 ------------------------------------------------------------------------------------- |E2E latency=3.68s |Gather latency=0.00s (0.00%) |Generate time=2.48s (67.29%) |Training time=0.92s (25.05%) |Others=0.28 (7.67%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.42 epoch: 0|step: 920|ppo_ep: 1|act_loss: 0.076904296875|cri_loss: 0.100341796875|unsuper_loss: 0.0 average reward score: -0.765625 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.32s (73.32%) |Training time=0.64s (20.33%) |Others=0.20 (6.34%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.42 epoch: 0|step: 921|ppo_ep: 1|act_loss: -0.1986083984375|cri_loss: -0.0526123046875|unsuper_loss: 0.0 average reward score: 1.05859375 ------------------------------------------------------------------------------------- |E2E latency=3.15s |Gather latency=0.00s (0.00%) |Generate time=2.31s (73.33%) |Training time=0.64s (20.40%) |Others=0.20 (6.27%)|CurSamplesPerSec=2.54 |AvgSamplesPerSec=2.42 epoch: 0|step: 922|ppo_ep: 1|act_loss: -0.330078125|cri_loss: -0.0977783203125|unsuper_loss: 0.0 average reward score: 0.76953125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.29s (70.22%) |Training time=0.78s (23.75%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 923|ppo_ep: 1|act_loss: -0.094970703125|cri_loss: 0.005126953125|unsuper_loss: 0.0 average reward score: 2.091796875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.32s (71.57%) |Training time=0.73s (22.40%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 924|ppo_ep: 1|act_loss: -0.30126953125|cri_loss: -0.085205078125|unsuper_loss: 0.0 average reward score: 1.5009765625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.31s (71.12%) |Training time=0.74s (22.74%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 925|ppo_ep: 1|act_loss: -0.09808349609375|cri_loss: -0.00738525390625|unsuper_loss: 0.0 average reward score: 0.75244140625 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.38s (72.32%) |Training time=0.72s (21.75%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.42 epoch: 0|step: 926|ppo_ep: 1|act_loss: 0.113525390625|cri_loss: 0.10223388671875|unsuper_loss: 0.0 average reward score: -0.72705078125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.70%) |Training time=0.65s (20.00%) |Others=0.20 (6.30%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 927|ppo_ep: 1|act_loss: -0.1953125|cri_loss: -0.05059814453125|unsuper_loss: 0.0 average reward score: 0.56396484375 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.38s (65.65%) |Training time=0.95s (26.27%) |Others=0.29 (8.08%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.42 epoch: 0|step: 928|ppo_ep: 1|act_loss: 0.14111328125|cri_loss: 0.1326904296875|unsuper_loss: 0.0 average reward score: 1.314453125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.59%) |Training time=0.63s (19.51%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 929|ppo_ep: 1|act_loss: -0.279541015625|cri_loss: -0.08404541015625|unsuper_loss: 0.0 average reward score: 0.4443359375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.97%) |Training time=0.64s (19.53%) |Others=0.21 (6.51%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 930|ppo_ep: 1|act_loss: 0.0853271484375|cri_loss: 0.111572265625|unsuper_loss: 0.0 average reward score: 0.9619140625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.37s (72.72%) |Training time=0.69s (21.17%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 931|ppo_ep: 1|act_loss: -0.310546875|cri_loss: -0.10321044921875|unsuper_loss: 0.0 average reward score: 1.5869140625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (73.97%) |Training time=0.66s (20.14%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 932|ppo_ep: 1|act_loss: -0.292236328125|cri_loss: -0.09698486328125|unsuper_loss: 0.0 average reward score: 1.2578125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.15%) |Training time=0.64s (19.77%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 933|ppo_ep: 1|act_loss: 0.12005615234375|cri_loss: 0.11590576171875|unsuper_loss: 0.0 average reward score: 0.7080078125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.25%) |Training time=0.64s (19.75%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 934|ppo_ep: 1|act_loss: -0.1619873046875|cri_loss: -0.03173828125|unsuper_loss: 0.0 average reward score: 0.576171875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.48%) |Training time=0.65s (19.83%) |Others=0.22 (6.69%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 935|ppo_ep: 1|act_loss: 0.0906982421875|cri_loss: 0.1109619140625|unsuper_loss: 0.0 average reward score: 1.64453125 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.42s (66.77%) |Training time=0.93s (25.60%) |Others=0.28 (7.63%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.42 epoch: 0|step: 936|ppo_ep: 1|act_loss: 0.36474609375|cri_loss: 0.2232666015625|unsuper_loss: 0.0 average reward score: 0.9599609375 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.43s (73.49%) |Training time=0.68s (20.71%) |Others=0.19 (5.81%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.42 epoch: 0|step: 937|ppo_ep: 1|act_loss: 0.572265625|cri_loss: 0.37060546875|unsuper_loss: 0.0 average reward score: 0.62890625 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.32%) |Training time=0.65s (19.69%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.42 epoch: 0|step: 938|ppo_ep: 1|act_loss: 0.525390625|cri_loss: 0.3203125|unsuper_loss: 0.0 average reward score: 1.486328125 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.92%) |Training time=0.64s (19.19%) |Others=0.20 (5.89%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.42 epoch: 0|step: 939|ppo_ep: 1|act_loss: 1.1044921875|cri_loss: 0.67919921875|unsuper_loss: 0.0 average reward score: 1.748046875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.42%) |Training time=0.64s (19.56%) |Others=0.20 (6.02%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 940|ppo_ep: 1|act_loss: 0.6201171875|cri_loss: 0.367919921875|unsuper_loss: 0.0 average reward score: 0.021728515625 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.28%) |Training time=0.67s (19.92%) |Others=0.19 (5.80%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.42 epoch: 0|step: 941|ppo_ep: 1|act_loss: 0.5830078125|cri_loss: 0.35009765625|unsuper_loss: 0.0 average reward score: 1.001953125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.58%) |Training time=0.64s (19.63%) |Others=0.19 (5.79%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 942|ppo_ep: 1|act_loss: 0.56396484375|cri_loss: 0.361328125|unsuper_loss: 0.0 average reward score: 1.330078125 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.22%) |Training time=0.68s (20.69%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.42 epoch: 0|step: 943|ppo_ep: 1|act_loss: 0.43896484375|cri_loss: 0.274658203125|unsuper_loss: 0.0 average reward score: 1.0283203125 ------------------------------------------------------------------------------------- |E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.34s (65.74%) |Training time=0.92s (25.89%) |Others=0.30 (8.37%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.42 epoch: 0|step: 944|ppo_ep: 1|act_loss: 0.2060546875|cri_loss: 0.1287841796875|unsuper_loss: 0.0 average reward score: -0.2626953125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.45%) |Training time=0.64s (19.52%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 945|ppo_ep: 1|act_loss: 0.5263671875|cri_loss: 0.316162109375|unsuper_loss: 0.0 average reward score: 0.9990234375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.34%) |Training time=0.64s (19.69%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 946|ppo_ep: 1|act_loss: 0.3505859375|cri_loss: 0.211669921875|unsuper_loss: 0.0 average reward score: 1.01953125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.14%) |Training time=0.64s (19.81%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 947|ppo_ep: 1|act_loss: 0.197021484375|cri_loss: 0.1727294921875|unsuper_loss: 0.0 average reward score: 2.23046875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.37%) |Training time=0.64s (19.77%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 948|ppo_ep: 1|act_loss: 0.45654296875|cri_loss: 0.29345703125|unsuper_loss: 0.0 average reward score: 1.09765625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.08%) |Training time=0.64s (19.88%) |Others=0.19 (6.04%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 949|ppo_ep: 1|act_loss: -0.0361328125|cri_loss: 0.017791748046875|unsuper_loss: 0.0 average reward score: -0.4287109375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.62%) |Training time=0.64s (19.54%) |Others=0.19 (5.84%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 950|ppo_ep: 1|act_loss: 0.6513671875|cri_loss: 0.37646484375|unsuper_loss: 0.0 average reward score: 0.329345703125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.29%) |Training time=0.64s (19.70%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 951|ppo_ep: 1|act_loss: 0.7333984375|cri_loss: 0.42919921875|unsuper_loss: 0.0 average reward score: 0.140380859375 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.42s (66.82%) |Training time=0.93s (25.58%) |Others=0.28 (7.60%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.42 epoch: 0|step: 952|ppo_ep: 1|act_loss: -0.145751953125|cri_loss: -0.04864501953125|unsuper_loss: 0.0 average reward score: 0.13427734375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.93%) |Training time=0.64s (19.48%) |Others=0.22 (6.59%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 953|ppo_ep: 1|act_loss: 0.26220703125|cri_loss: 0.163818359375|unsuper_loss: 0.0 average reward score: 0.41845703125 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.51%) |Training time=0.64s (19.48%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 954|ppo_ep: 1|act_loss: -0.1419677734375|cri_loss: -0.053375244140625|unsuper_loss: 0.0 average reward score: 1.99609375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.35%) |Training time=0.64s (19.69%) |Others=0.23 (6.96%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 955|ppo_ep: 1|act_loss: -0.33837890625|cri_loss: -0.1002197265625|unsuper_loss: 0.0 average reward score: 0.833984375 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.58%) |Training time=0.64s (20.24%) |Others=0.20 (6.19%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.42 epoch: 0|step: 956|ppo_ep: 1|act_loss: 0.2353515625|cri_loss: 0.156982421875|unsuper_loss: 0.0 average reward score: 0.5546875 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.78%) |Training time=0.64s (19.90%) |Others=0.20 (6.31%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 957|ppo_ep: 1|act_loss: -0.077392578125|cri_loss: -0.010223388671875|unsuper_loss: 0.0 average reward score: 0.6181640625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.83%) |Training time=0.65s (20.09%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 958|ppo_ep: 1|act_loss: -0.1832275390625|cri_loss: -0.054443359375|unsuper_loss: 0.0 average reward score: 1.48046875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.88%) |Training time=0.64s (19.91%) |Others=0.20 (6.21%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 [2023-04-24 14:40:36,449] [INFO] [logging.py:96:log_dist] [Rank 0] step=120, skipped=5, lr=[9.612155275459368e-06, 9.612155275459368e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 14:40:36,696] [INFO] [timer.py:199:stop] epoch=0/micro_step=960/global_step=120, RunningAvgSamplesPerSec=15.487365186778234, CurrSamplesPerSec=15.820720921742922, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 14:40:36,896] [INFO] [logging.py:96:log_dist] [Rank 0] step=120, skipped=4, lr=[4.977693720386951e-06, 4.977693720386951e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 959|ppo_ep: 1|act_loss: -0.223876953125|cri_loss: -0.08709716796875|unsuper_loss: 0.0 average reward score: 0.24658203125 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.40s (66.68%) |Training time=0.93s (25.67%) |Others=0.28 (7.65%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.42 epoch: 0|step: 960|ppo_ep: 1|act_loss: 0.2294921875|cri_loss: 0.140625|unsuper_loss: 0.0 average reward score: 1.1494140625 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (74.15%) |Training time=0.63s (19.81%) |Others=0.19 (6.04%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 961|ppo_ep: 1|act_loss: 0.025299072265625|cri_loss: 0.03961181640625|unsuper_loss: 0.0 average reward score: 1.708984375 ------------------------------------------------------------------------------------- |E2E latency=3.12s |Gather latency=0.00s (0.00%) |Generate time=2.29s (73.24%) |Training time=0.64s (20.54%) |Others=0.19 (6.21%)|CurSamplesPerSec=2.56 |AvgSamplesPerSec=2.42 epoch: 0|step: 962|ppo_ep: 1|act_loss: 0.259765625|cri_loss: 0.16455078125|unsuper_loss: 0.0 average reward score: -0.29541015625 ------------------------------------------------------------------------------------- |E2E latency=3.14s |Gather latency=0.00s (0.00%) |Generate time=2.29s (72.72%) |Training time=0.65s (20.76%) |Others=0.20 (6.52%)|CurSamplesPerSec=2.55 |AvgSamplesPerSec=2.42 epoch: 0|step: 963|ppo_ep: 1|act_loss: 0.2197265625|cri_loss: 0.136962890625|unsuper_loss: 0.0 average reward score: 1.125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.24%) |Training time=0.64s (19.86%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 964|ppo_ep: 1|act_loss: -0.164794921875|cri_loss: -0.06695556640625|unsuper_loss: 0.0 average reward score: 0.607421875 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.78%) |Training time=0.64s (19.97%) |Others=0.20 (6.25%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 965|ppo_ep: 1|act_loss: -0.014312744140625|cri_loss: 0.02288818359375|unsuper_loss: 0.0 average reward score: 0.22900390625 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.81%) |Training time=0.64s (20.20%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.42 epoch: 0|step: 966|ppo_ep: 1|act_loss: 0.3623046875|cri_loss: 0.214599609375|unsuper_loss: 0.0 average reward score: -0.51123046875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.40%) |Training time=0.63s (19.58%) |Others=0.19 (6.02%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 967|ppo_ep: 1|act_loss: 0.6484375|cri_loss: 0.387451171875|unsuper_loss: 0.0 average reward score: 0.55517578125 ------------------------------------------------------------------------------------- |E2E latency=3.68s |Gather latency=0.00s (0.00%) |Generate time=2.42s (65.73%) |Training time=0.92s (25.10%) |Others=0.34 (9.18%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.42 epoch: 0|step: 968|ppo_ep: 1|act_loss: -0.34130859375|cri_loss: -0.125|unsuper_loss: 0.0 average reward score: -0.16259765625 ------------------------------------------------------------------------------------- |E2E latency=3.16s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.73%) |Training time=0.64s (20.25%) |Others=0.19 (6.02%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.42 epoch: 0|step: 969|ppo_ep: 1|act_loss: -0.2406005859375|cri_loss: -0.0628662109375|unsuper_loss: 0.0 average reward score: 0.3232421875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.16%) |Training time=0.64s (19.83%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 970|ppo_ep: 1|act_loss: -0.282958984375|cri_loss: -0.111328125|unsuper_loss: 0.0 average reward score: -0.35986328125 ------------------------------------------------------------------------------------- |E2E latency=3.14s |Gather latency=0.00s (0.00%) |Generate time=2.31s (73.42%) |Training time=0.64s (20.36%) |Others=0.20 (6.23%)|CurSamplesPerSec=2.55 |AvgSamplesPerSec=2.42 epoch: 0|step: 971|ppo_ep: 1|act_loss: 0.0184173583984375|cri_loss: 0.03765869140625|unsuper_loss: 0.0 average reward score: -0.265869140625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.39%) |Training time=0.64s (19.73%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 972|ppo_ep: 1|act_loss: -0.292236328125|cri_loss: -0.09771728515625|unsuper_loss: 0.0 average reward score: -0.462890625 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.34%) |Training time=0.69s (20.82%) |Others=0.19 (5.84%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.42 epoch: 0|step: 973|ppo_ep: 1|act_loss: 0.158935546875|cri_loss: 0.1246337890625|unsuper_loss: 0.0 average reward score: 0.11083984375 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.83%) |Training time=0.64s (19.39%) |Others=0.19 (5.78%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.42 epoch: 0|step: 974|ppo_ep: 1|act_loss: -0.003021240234375|cri_loss: 0.0231170654296875|unsuper_loss: 0.0 average reward score: 0.1097412109375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.29%) |Training time=0.64s (19.62%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 975|ppo_ep: 1|act_loss: -0.11212158203125|cri_loss: -0.0169677734375|unsuper_loss: 0.0 average reward score: -0.421875 ------------------------------------------------------------------------------------- |E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.34s (66.04%) |Training time=0.92s (26.13%) |Others=0.28 (7.83%)|CurSamplesPerSec=2.26 |AvgSamplesPerSec=2.42 epoch: 0|step: 976|ppo_ep: 1|act_loss: 0.1287841796875|cri_loss: 0.12200927734375|unsuper_loss: 0.0 average reward score: -0.95263671875 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.47%) |Training time=0.63s (19.67%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 977|ppo_ep: 1|act_loss: -0.0841064453125|cri_loss: 0.0001220703125|unsuper_loss: 0.0 average reward score: 0.5 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.35%) |Training time=0.64s (19.83%) |Others=0.19 (5.82%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 978|ppo_ep: 1|act_loss: 0.0750732421875|cri_loss: 0.08074951171875|unsuper_loss: 0.0 average reward score: -1.525390625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.85%) |Training time=0.64s (19.56%) |Others=0.21 (6.59%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 979|ppo_ep: 1|act_loss: 0.0223388671875|cri_loss: 0.058074951171875|unsuper_loss: 0.0 average reward score: -0.081298828125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.31%) |Training time=0.64s (19.80%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 980|ppo_ep: 1|act_loss: -0.27685546875|cri_loss: -0.09130859375|unsuper_loss: 0.0 average reward score: -0.6494140625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.54%) |Training time=0.64s (19.59%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 981|ppo_ep: 1|act_loss: -0.019775390625|cri_loss: 0.025054931640625|unsuper_loss: 0.0 average reward score: -0.026611328125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.52%) |Training time=0.64s (19.64%) |Others=0.19 (5.84%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 982|ppo_ep: 1|act_loss: 0.00128173828125|cri_loss: 0.048583984375|unsuper_loss: 0.0 average reward score: 0.6953125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.37%) |Training time=0.64s (19.67%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 983|ppo_ep: 1|act_loss: 0.12939453125|cri_loss: 0.0985107421875|unsuper_loss: 0.0 average reward score: -0.05810546875 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.79%) |Training time=0.92s (25.61%) |Others=0.27 (7.60%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.42 epoch: 0|step: 984|ppo_ep: 1|act_loss: 0.20263671875|cri_loss: 0.14990234375|unsuper_loss: 0.0 average reward score: 0.26904296875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.35%) |Training time=0.64s (19.74%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 985|ppo_ep: 1|act_loss: 0.5126953125|cri_loss: 0.299072265625|unsuper_loss: 0.0 average reward score: -0.9365234375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.58%) |Training time=0.64s (19.58%) |Others=0.19 (5.83%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 986|ppo_ep: 1|act_loss: 0.1097412109375|cri_loss: 0.0810546875|unsuper_loss: 0.0 average reward score: -1.0478515625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.55%) |Training time=0.64s (19.46%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 987|ppo_ep: 1|act_loss: 0.55517578125|cri_loss: 0.35302734375|unsuper_loss: 0.0 average reward score: 0.75 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.72%) |Training time=0.64s (19.48%) |Others=0.19 (5.80%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 988|ppo_ep: 1|act_loss: 0.40869140625|cri_loss: 0.2490234375|unsuper_loss: 0.0 average reward score: -0.108642578125 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.48%) |Training time=0.64s (19.61%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 989|ppo_ep: 1|act_loss: 0.458984375|cri_loss: 0.293212890625|unsuper_loss: 0.0 average reward score: 0.66015625 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.45s (72.92%) |Training time=0.72s (21.41%) |Others=0.19 (5.68%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.42 epoch: 0|step: 990|ppo_ep: 1|act_loss: 0.32958984375|cri_loss: 0.1983642578125|unsuper_loss: 0.0 average reward score: 0.76953125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.40%) |Training time=0.64s (19.69%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 991|ppo_ep: 1|act_loss: 0.301513671875|cri_loss: 0.1854248046875|unsuper_loss: 0.0 average reward score: 0.55712890625 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.42s (66.86%) |Training time=0.92s (25.52%) |Others=0.28 (7.62%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.42 epoch: 0|step: 992|ppo_ep: 1|act_loss: 0.490234375|cri_loss: 0.30908203125|unsuper_loss: 0.0 average reward score: 0.9990234375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.53%) |Training time=0.63s (19.57%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 993|ppo_ep: 1|act_loss: 0.2083740234375|cri_loss: 0.13720703125|unsuper_loss: 0.0 average reward score: 0.13427734375 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.82%) |Training time=0.64s (19.30%) |Others=0.20 (5.88%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.42 epoch: 0|step: 994|ppo_ep: 1|act_loss: 0.153564453125|cri_loss: 0.11676025390625|unsuper_loss: 0.0 average reward score: -0.63330078125 ------------------------------------------------------------------------------------- |E2E latency=3.93s |Gather latency=0.00s (0.00%) |Generate time=2.68s (68.27%) |Training time=0.99s (25.29%) |Others=0.25 (6.44%)|CurSamplesPerSec=2.04 |AvgSamplesPerSec=2.42 epoch: 0|step: 995|ppo_ep: 1|act_loss: 0.7001953125|cri_loss: 0.41259765625|unsuper_loss: 0.0 average reward score: 1.4375 ------------------------------------------------------------------------------------- |E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.20%) |Training time=0.64s (18.91%) |Others=0.23 (6.89%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.42 epoch: 0|step: 996|ppo_ep: 1|act_loss: 0.1982421875|cri_loss: 0.132568359375|unsuper_loss: 0.0 average reward score: 0.1943359375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.54%) |Training time=0.67s (20.45%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 997|ppo_ep: 1|act_loss: 0.5595703125|cri_loss: 0.33544921875|unsuper_loss: 0.0 average reward score: -0.129638671875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.16%) |Training time=0.64s (19.81%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 998|ppo_ep: 1|act_loss: 0.3193359375|cri_loss: 0.202392578125|unsuper_loss: 0.0 average reward score: 0.025634765625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.72%) |Training time=0.64s (19.51%) |Others=0.19 (5.77%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 999|ppo_ep: 1|act_loss: 0.3291015625|cri_loss: 0.216064453125|unsuper_loss: 0.0 average reward score: 0.2330322265625 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.70%) |Training time=0.93s (25.73%) |Others=0.27 (7.57%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.42 epoch: 0|step: 1000|ppo_ep: 1|act_loss: -0.09832763671875|cri_loss: -0.01495361328125|unsuper_loss: 0.0 average reward score: 0.334228515625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.39%) |Training time=0.64s (19.72%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 1001|ppo_ep: 1|act_loss: -0.202392578125|cri_loss: -0.0491943359375|unsuper_loss: 0.0 average reward score: 0.48388671875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.21%) |Training time=0.64s (19.80%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1002|ppo_ep: 1|act_loss: 0.0115814208984375|cri_loss: 0.03570556640625|unsuper_loss: 0.0 average reward score: -0.1484375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.32%) |Training time=0.64s (19.79%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1003|ppo_ep: 1|act_loss: -0.259765625|cri_loss: -0.08984375|unsuper_loss: 0.0 average reward score: 1.013671875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.34%) |Training time=0.64s (19.77%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 1004|ppo_ep: 1|act_loss: -0.196044921875|cri_loss: -0.0341796875|unsuper_loss: 0.0 average reward score: 1.375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.41%) |Training time=0.64s (19.72%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 1005|ppo_ep: 1|act_loss: -0.38671875|cri_loss: -0.1295166015625|unsuper_loss: 0.0 average reward score: 1.875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.86%) |Training time=0.64s (19.42%) |Others=0.19 (5.72%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.42 epoch: 0|step: 1006|ppo_ep: 1|act_loss: -0.08209228515625|cri_loss: 0.01519775390625|unsuper_loss: 0.0 average reward score: 0.25927734375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.99%) |Training time=0.65s (20.01%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 1007|ppo_ep: 1|act_loss: -0.199462890625|cri_loss: -0.06915283203125|unsuper_loss: 0.0 average reward score: -0.7314453125 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.79%) |Training time=0.92s (25.54%) |Others=0.28 (7.67%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.42 epoch: 0|step: 1008|ppo_ep: 1|act_loss: 0.03521728515625|cri_loss: 0.06976318359375|unsuper_loss: 0.0 average reward score: 1.9892578125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.41%) |Training time=0.64s (19.68%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 1009|ppo_ep: 1|act_loss: -0.29296875|cri_loss: -0.09698486328125|unsuper_loss: 0.0 average reward score: 1.2734375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.23%) |Training time=0.64s (19.81%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 1010|ppo_ep: 1|act_loss: -0.25537109375|cri_loss: -0.09478759765625|unsuper_loss: 0.0 average reward score: -1.1259765625 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.39s (72.37%) |Training time=0.68s (20.58%) |Others=0.23 (7.04%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.42 epoch: 0|step: 1011|ppo_ep: 1|act_loss: -0.159423828125|cri_loss: -0.04010009765625|unsuper_loss: 0.0 average reward score: 1.45703125 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.39s (72.80%) |Training time=0.70s (21.27%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 1012|ppo_ep: 1|act_loss: -0.11944580078125|cri_loss: -0.02557373046875|unsuper_loss: 0.0 average reward score: -0.49072265625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.69%) |Training time=0.64s (19.49%) |Others=0.19 (5.82%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 1013|ppo_ep: 1|act_loss: 0.143798828125|cri_loss: 0.116943359375|unsuper_loss: 0.0 average reward score: -1.220703125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.40%) |Training time=0.64s (19.67%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 1014|ppo_ep: 1|act_loss: -0.2435302734375|cri_loss: -0.0528564453125|unsuper_loss: 0.0 average reward score: 1.03125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.78%) |Training time=0.64s (20.05%) |Others=0.20 (6.18%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 1015|ppo_ep: 1|act_loss: -0.146240234375|cri_loss: -0.029541015625|unsuper_loss: 0.0 average reward score: 0.456298828125 ------------------------------------------------------------------------------------- |E2E latency=3.69s |Gather latency=0.00s (0.00%) |Generate time=2.48s (67.05%) |Training time=0.94s (25.36%) |Others=0.28 (7.59%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.42 epoch: 0|step: 1016|ppo_ep: 1|act_loss: 0.09881591796875|cri_loss: 0.0916748046875|unsuper_loss: 0.0 average reward score: 1.515625 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.25%) |Training time=0.64s (19.28%) |Others=0.21 (6.46%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.42 epoch: 0|step: 1017|ppo_ep: 1|act_loss: -0.00811767578125|cri_loss: 0.03021240234375|unsuper_loss: 0.0 average reward score: 0.265869140625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.60%) |Training time=0.64s (19.47%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 1018|ppo_ep: 1|act_loss: 0.217041015625|cri_loss: 0.1298828125|unsuper_loss: 0.0 average reward score: -0.705078125 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.54%) |Training time=0.64s (19.34%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.42 epoch: 0|step: 1019|ppo_ep: 1|act_loss: 0.1038818359375|cri_loss: 0.07794189453125|unsuper_loss: 0.0 average reward score: 0.5146484375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.39%) |Training time=0.64s (19.65%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1020|ppo_ep: 1|act_loss: 0.240234375|cri_loss: 0.165283203125|unsuper_loss: 0.0 average reward score: -0.794921875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.34s (72.81%) |Training time=0.67s (20.96%) |Others=0.20 (6.24%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 1021|ppo_ep: 1|act_loss: 0.427734375|cri_loss: 0.25634765625|unsuper_loss: 0.0 average reward score: -1.14453125 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.31s (72.99%) |Training time=0.66s (20.86%) |Others=0.19 (6.15%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.42 epoch: 0|step: 1022|ppo_ep: 1|act_loss: 0.347412109375|cri_loss: 0.213623046875|unsuper_loss: 0.0 average reward score: -0.935546875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.34s (71.99%) |Training time=0.71s (21.84%) |Others=0.20 (6.18%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1023|ppo_ep: 1|act_loss: 0.45947265625|cri_loss: 0.276611328125|unsuper_loss: 0.0 average reward score: 0.132568359375 ------------------------------------------------------------------------------------- |E2E latency=3.69s |Gather latency=0.00s (0.00%) |Generate time=2.47s (66.83%) |Training time=0.93s (25.24%) |Others=0.29 (7.93%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.42 epoch: 0|step: 1024|ppo_ep: 1|act_loss: 0.66259765625|cri_loss: 0.38720703125|unsuper_loss: 0.0 average reward score: -0.239501953125 ------------------------------------------------------------------------------------- |E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.56s (75.39%) |Training time=0.64s (18.82%) |Others=0.20 (5.79%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.42 epoch: 0|step: 1025|ppo_ep: 1|act_loss: 0.7587890625|cri_loss: 0.439453125|unsuper_loss: 0.0 average reward score: 0.1053466796875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.72%) |Training time=0.67s (20.23%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.42 epoch: 0|step: 1026|ppo_ep: 1|act_loss: 0.408447265625|cri_loss: 0.2333984375|unsuper_loss: 0.0 average reward score: -1.3251953125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.31s (71.11%) |Training time=0.69s (21.09%) |Others=0.25 (7.80%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1027|ppo_ep: 1|act_loss: 0.53564453125|cri_loss: 0.33349609375|unsuper_loss: 0.0 average reward score: 0.67041015625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.22%) |Training time=0.65s (19.78%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 1028|ppo_ep: 1|act_loss: 0.37060546875|cri_loss: 0.2083740234375|unsuper_loss: 0.0 average reward score: 0.7412109375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.11%) |Training time=0.65s (19.69%) |Others=0.20 (6.20%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 1029|ppo_ep: 1|act_loss: 0.366455078125|cri_loss: 0.2237548828125|unsuper_loss: 0.0 average reward score: -0.45947265625 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.45s (73.29%) |Training time=0.70s (20.84%) |Others=0.20 (5.87%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.42 epoch: 0|step: 1030|ppo_ep: 1|act_loss: 0.6650390625|cri_loss: 0.40234375|unsuper_loss: 0.0 average reward score: 0.087646484375 ------------------------------------------------------------------------------------- |E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.54s (72.83%) |Training time=0.70s (20.14%) |Others=0.24 (7.02%)|CurSamplesPerSec=2.30 |AvgSamplesPerSec=2.42 epoch: 0|step: 1031|ppo_ep: 1|act_loss: 0.4580078125|cri_loss: 0.2646484375|unsuper_loss: 0.0 average reward score: -0.6748046875 ------------------------------------------------------------------------------------- |E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.42s (66.64%) |Training time=0.93s (25.54%) |Others=0.28 (7.82%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.42 epoch: 0|step: 1032|ppo_ep: 1|act_loss: 0.494140625|cri_loss: 0.295166015625|unsuper_loss: 0.0 average reward score: -1.822265625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.53%) |Training time=0.64s (19.58%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 1033|ppo_ep: 1|act_loss: 0.439453125|cri_loss: 0.277587890625|unsuper_loss: 0.0 average reward score: -0.02294921875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.24%) |Training time=0.65s (19.84%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1034|ppo_ep: 1|act_loss: 0.48388671875|cri_loss: 0.27685546875|unsuper_loss: 0.0 average reward score: -0.404541015625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.30%) |Training time=0.64s (19.77%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1035|ppo_ep: 1|act_loss: 0.3837890625|cri_loss: 0.234130859375|unsuper_loss: 0.0 average reward score: -0.7275390625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.18%) |Training time=0.64s (19.67%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1036|ppo_ep: 1|act_loss: 0.59521484375|cri_loss: 0.35546875|unsuper_loss: 0.0 average reward score: 1.07421875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.16%) |Training time=0.64s (19.82%) |Others=0.20 (6.02%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1037|ppo_ep: 1|act_loss: 0.4873046875|cri_loss: 0.28125|unsuper_loss: 0.0 average reward score: -0.83349609375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.15%) |Training time=0.65s (19.92%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 1038|ppo_ep: 1|act_loss: 0.46630859375|cri_loss: 0.298095703125|unsuper_loss: 0.0 average reward score: 0.5322265625 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.44s (73.14%) |Training time=0.70s (20.93%) |Others=0.20 (5.94%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.42 [2023-04-24 14:45:01,489] [INFO] [logging.py:96:log_dist] [Rank 0] step=130, skipped=5, lr=[9.545120229243806e-06, 9.545120229243806e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 14:45:01,731] [INFO] [timer.py:199:stop] epoch=0/micro_step=1040/global_step=130, RunningAvgSamplesPerSec=15.491575413577998, CurrSamplesPerSec=15.580935169117883, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 14:45:01,932] [INFO] [logging.py:96:log_dist] [Rank 0] step=130, skipped=4, lr=[4.941241304217962e-06, 4.941241304217962e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 1039|ppo_ep: 1|act_loss: 0.251708984375|cri_loss: 0.165771484375|unsuper_loss: 0.0 average reward score: 0.72021484375 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.68%) |Training time=0.93s (25.61%) |Others=0.28 (7.71%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.42 epoch: 0|step: 1040|ppo_ep: 1|act_loss: -0.17529296875|cri_loss: -0.03668212890625|unsuper_loss: 0.0 average reward score: 0.380859375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.58%) |Training time=0.63s (19.48%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 1041|ppo_ep: 1|act_loss: -0.39599609375|cri_loss: -0.1473388671875|unsuper_loss: 0.0 average reward score: 1.8056640625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.33%) |Training time=0.64s (19.68%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1042|ppo_ep: 1|act_loss: -0.3671875|cri_loss: -0.10986328125|unsuper_loss: 0.0 average reward score: 0.146728515625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.92%) |Training time=0.66s (20.09%) |Others=0.20 (5.98%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 1043|ppo_ep: 1|act_loss: 0.36767578125|cri_loss: 0.22802734375|unsuper_loss: 0.0 average reward score: -0.20947265625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.41%) |Training time=0.64s (19.57%) |Others=0.20 (6.02%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 1044|ppo_ep: 1|act_loss: -0.07513427734375|cri_loss: -0.00482177734375|unsuper_loss: 0.0 average reward score: 1.025390625 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.65%) |Training time=0.64s (19.35%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.42 epoch: 0|step: 1045|ppo_ep: 1|act_loss: 0.010650634765625|cri_loss: 0.0291595458984375|unsuper_loss: 0.0 average reward score: -0.4443359375 ------------------------------------------------------------------------------------- |E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.48s (71.90%) |Training time=0.78s (22.52%) |Others=0.19 (5.57%)|CurSamplesPerSec=2.32 |AvgSamplesPerSec=2.42 epoch: 0|step: 1046|ppo_ep: 1|act_loss: 0.23681640625|cri_loss: 0.171630859375|unsuper_loss: 0.0 average reward score: -1.328125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.26%) |Training time=0.64s (19.68%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1047|ppo_ep: 1|act_loss: -0.225341796875|cri_loss: -0.0672607421875|unsuper_loss: 0.0 average reward score: -0.943359375 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.40s (66.44%) |Training time=0.93s (25.73%) |Others=0.28 (7.83%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.42 epoch: 0|step: 1048|ppo_ep: 1|act_loss: 0.12255859375|cri_loss: 0.1290283203125|unsuper_loss: 0.0 average reward score: -0.9921875 ------------------------------------------------------------------------------------- |E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.50s (73.92%) |Training time=0.67s (19.83%) |Others=0.21 (6.25%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.42 epoch: 0|step: 1049|ppo_ep: 1|act_loss: 0.1632080078125|cri_loss: 0.12255859375|unsuper_loss: 0.0 average reward score: -0.6474609375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.50%) |Training time=0.64s (19.69%) |Others=0.19 (5.82%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 1050|ppo_ep: 1|act_loss: 0.1365966796875|cri_loss: 0.1063232421875|unsuper_loss: 0.0 average reward score: -0.28955078125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.33%) |Training time=0.64s (19.69%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 1051|ppo_ep: 1|act_loss: 0.188232421875|cri_loss: 0.135498046875|unsuper_loss: 0.0 average reward score: -0.61669921875 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.43%) |Training time=0.64s (19.46%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 1052|ppo_ep: 1|act_loss: -0.1796875|cri_loss: -0.0345458984375|unsuper_loss: 0.0 average reward score: 0.25390625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.25%) |Training time=0.64s (19.64%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1053|ppo_ep: 1|act_loss: -0.045654296875|cri_loss: 0.04248046875|unsuper_loss: 0.0 average reward score: -0.41943359375 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.42s (72.16%) |Training time=0.74s (21.96%) |Others=0.20 (5.87%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.42 epoch: 0|step: 1054|ppo_ep: 1|act_loss: 0.7138671875|cri_loss: 0.462890625|unsuper_loss: 0.0 average reward score: -0.4755859375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.33s (71.00%) |Training time=0.75s (22.91%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 1055|ppo_ep: 1|act_loss: 0.06597900390625|cri_loss: 0.09503173828125|unsuper_loss: 0.0 average reward score: -0.8779296875 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.43s (66.58%) |Training time=0.93s (25.39%) |Others=0.29 (8.03%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.42 epoch: 0|step: 1056|ppo_ep: 1|act_loss: -0.0633544921875|cri_loss: 0.032958984375|unsuper_loss: 0.0 average reward score: -0.18115234375 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.39s (72.66%) |Training time=0.71s (21.52%) |Others=0.19 (5.83%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.42 epoch: 0|step: 1057|ppo_ep: 1|act_loss: 0.403076171875|cri_loss: 0.252197265625|unsuper_loss: 0.0 average reward score: -1.064453125 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.39s (72.03%) |Training time=0.73s (22.09%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.42 epoch: 0|step: 1058|ppo_ep: 1|act_loss: 0.50830078125|cri_loss: 0.306640625|unsuper_loss: 0.0 average reward score: -2.37890625 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.42s (72.65%) |Training time=0.72s (21.54%) |Others=0.19 (5.81%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.42 epoch: 0|step: 1059|ppo_ep: 1|act_loss: 0.228271484375|cri_loss: 0.193115234375|unsuper_loss: 0.0 average reward score: -1.115234375 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.39s (72.38%) |Training time=0.72s (21.70%) |Others=0.20 (5.91%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.42 epoch: 0|step: 1060|ppo_ep: 1|act_loss: 0.5517578125|cri_loss: 0.363525390625|unsuper_loss: 0.0 average reward score: -0.98046875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.22%) |Training time=0.67s (20.62%) |Others=0.20 (6.16%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 1061|ppo_ep: 1|act_loss: 0.466064453125|cri_loss: 0.29345703125|unsuper_loss: 0.0 average reward score: -0.2109375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.21%) |Training time=0.65s (19.73%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 1062|ppo_ep: 1|act_loss: 0.034423828125|cri_loss: 0.08319091796875|unsuper_loss: 0.0 average reward score: -0.388671875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.22%) |Training time=0.64s (19.66%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1063|ppo_ep: 1|act_loss: 0.1324462890625|cri_loss: 0.1075439453125|unsuper_loss: 0.0 average reward score: -1.171875 ------------------------------------------------------------------------------------- |E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.26%) |Training time=0.94s (25.81%) |Others=0.29 (7.92%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.42 epoch: 0|step: 1064|ppo_ep: 1|act_loss: 0.102783203125|cri_loss: 0.0875244140625|unsuper_loss: 0.0 average reward score: -0.287109375 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.69%) |Training time=0.63s (19.32%) |Others=0.20 (5.98%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 1065|ppo_ep: 1|act_loss: -0.19677734375|cri_loss: -0.001220703125|unsuper_loss: 0.0 average reward score: 0.517578125 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.46%) |Training time=0.64s (19.49%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.42 epoch: 0|step: 1066|ppo_ep: 1|act_loss: 0.2081298828125|cri_loss: 0.1317138671875|unsuper_loss: 0.0 average reward score: 0.775390625 ------------------------------------------------------------------------------------- |E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.61s (74.43%) |Training time=0.66s (18.72%) |Others=0.24 (6.85%)|CurSamplesPerSec=2.28 |AvgSamplesPerSec=2.42 epoch: 0|step: 1067|ppo_ep: 1|act_loss: 0.8076171875|cri_loss: 0.55224609375|unsuper_loss: 0.0 average reward score: -0.11962890625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.44%) |Training time=0.64s (19.67%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 1068|ppo_ep: 1|act_loss: 0.6298828125|cri_loss: 0.408203125|unsuper_loss: 0.0 average reward score: -0.48583984375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.22%) |Training time=0.64s (19.75%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1069|ppo_ep: 1|act_loss: -0.158447265625|cri_loss: -0.04486083984375|unsuper_loss: 0.0 average reward score: -0.595703125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.39%) |Training time=0.64s (19.75%) |Others=0.19 (5.85%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 1070|ppo_ep: 1|act_loss: 0.8662109375|cri_loss: 0.5458984375|unsuper_loss: 0.0 average reward score: -0.64794921875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.30%) |Training time=0.65s (19.83%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 1071|ppo_ep: 1|act_loss: 0.1593017578125|cri_loss: 0.176025390625|unsuper_loss: 0.0 average reward score: -0.58935546875 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.66%) |Training time=0.93s (25.66%) |Others=0.28 (7.68%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.42 epoch: 0|step: 1072|ppo_ep: 1|act_loss: 0.2646484375|cri_loss: 0.2100830078125|unsuper_loss: 0.0 average reward score: -0.72314453125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.97%) |Training time=0.64s (19.82%) |Others=0.20 (6.20%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 1073|ppo_ep: 1|act_loss: -0.0361328125|cri_loss: 0.02679443359375|unsuper_loss: 0.0 average reward score: 0.51123046875 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.36%) |Training time=0.65s (19.69%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 1074|ppo_ep: 1|act_loss: 0.197509765625|cri_loss: 0.151123046875|unsuper_loss: 0.0 average reward score: 0.01247406005859375 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.82%) |Training time=0.64s (19.43%) |Others=0.19 (5.75%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.42 epoch: 0|step: 1075|ppo_ep: 1|act_loss: 0.169921875|cri_loss: 0.11505126953125|unsuper_loss: 0.0 average reward score: -0.822265625 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.89%) |Training time=0.64s (20.04%) |Others=0.19 (6.07%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.42 epoch: 0|step: 1076|ppo_ep: 1|act_loss: -0.0633544921875|cri_loss: 0.020751953125|unsuper_loss: 0.0 average reward score: 0.875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.37%) |Training time=0.64s (19.69%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1077|ppo_ep: 1|act_loss: -0.0679931640625|cri_loss: 0.0140380859375|unsuper_loss: 0.0 average reward score: 0.595703125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.04%) |Training time=0.66s (20.14%) |Others=0.19 (5.82%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 1078|ppo_ep: 1|act_loss: 0.08673095703125|cri_loss: 0.09515380859375|unsuper_loss: 0.0 average reward score: 0.04443359375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.80%) |Training time=0.65s (20.07%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 1079|ppo_ep: 1|act_loss: 0.06939697265625|cri_loss: 0.076171875|unsuper_loss: 0.0 average reward score: 0.062255859375 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.38s (66.41%) |Training time=0.93s (25.89%) |Others=0.28 (7.70%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.42 epoch: 0|step: 1080|ppo_ep: 1|act_loss: -0.1629638671875|cri_loss: 0.003173828125|unsuper_loss: 0.0 average reward score: 0.0980224609375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.75%) |Training time=0.64s (19.83%) |Others=0.21 (6.42%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 1081|ppo_ep: 1|act_loss: -0.3193359375|cri_loss: -0.078125|unsuper_loss: 0.0 average reward score: -0.070068359375 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.91%) |Training time=0.64s (20.08%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 1082|ppo_ep: 1|act_loss: 0.2626953125|cri_loss: 0.1929931640625|unsuper_loss: 0.0 average reward score: 0.2132568359375 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.40s (72.72%) |Training time=0.71s (21.53%) |Others=0.19 (5.75%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.42 epoch: 0|step: 1083|ppo_ep: 1|act_loss: 0.04071044921875|cri_loss: 0.06036376953125|unsuper_loss: 0.0 average reward score: -0.76953125 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.40s (72.40%) |Training time=0.72s (21.73%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.42 epoch: 0|step: 1084|ppo_ep: 1|act_loss: -0.0311279296875|cri_loss: 0.015594482421875|unsuper_loss: 0.0 average reward score: -0.0050811767578125 ------------------------------------------------------------------------------------- |E2E latency=3.75s |Gather latency=0.00s (0.00%) |Generate time=2.42s (64.61%) |Training time=1.03s (27.38%) |Others=0.30 (8.01%)|CurSamplesPerSec=2.13 |AvgSamplesPerSec=2.42 epoch: 0|step: 1085|ppo_ep: 1|act_loss: 0.08294677734375|cri_loss: 0.0736083984375|unsuper_loss: 0.0 average reward score: 0.2425537109375 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.45s (72.80%) |Training time=0.71s (21.10%) |Others=0.21 (6.10%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.42 epoch: 0|step: 1086|ppo_ep: 1|act_loss: 0.29833984375|cri_loss: 0.220947265625|unsuper_loss: 0.0 average reward score: -1.2060546875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.47%) |Training time=0.64s (19.65%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 1087|ppo_ep: 1|act_loss: 0.059112548828125|cri_loss: 0.049224853515625|unsuper_loss: 0.0 average reward score: -0.951171875 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.42s (66.83%) |Training time=0.92s (25.57%) |Others=0.28 (7.61%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.42 epoch: 0|step: 1088|ppo_ep: 1|act_loss: -0.15576171875|cri_loss: -0.02386474609375|unsuper_loss: 0.0 average reward score: 0.779296875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.60%) |Training time=0.63s (19.42%) |Others=0.20 (5.98%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 1089|ppo_ep: 1|act_loss: -0.13037109375|cri_loss: -0.0040283203125|unsuper_loss: 0.0 average reward score: -0.5439453125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.45%) |Training time=0.64s (19.67%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 1090|ppo_ep: 1|act_loss: 0.062255859375|cri_loss: 0.12005615234375|unsuper_loss: 0.0 average reward score: -1.3076171875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.19%) |Training time=0.64s (19.59%) |Others=0.20 (6.22%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1091|ppo_ep: 1|act_loss: -0.095947265625|cri_loss: 0.0361328125|unsuper_loss: 0.0 average reward score: -0.5732421875 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.43s (72.75%) |Training time=0.72s (21.55%) |Others=0.19 (5.70%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.42 epoch: 0|step: 1092|ppo_ep: 1|act_loss: 0.2080078125|cri_loss: 0.142333984375|unsuper_loss: 0.0 average reward score: -0.17041015625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.03%) |Training time=0.64s (19.58%) |Others=0.21 (6.39%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 1093|ppo_ep: 1|act_loss: -0.21142578125|cri_loss: -0.0223388671875|unsuper_loss: 0.0 average reward score: -0.05029296875 ------------------------------------------------------------------------------------- |E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.44s (71.75%) |Training time=0.77s (22.61%) |Others=0.19 (5.64%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.42 epoch: 0|step: 1094|ppo_ep: 1|act_loss: 0.3984375|cri_loss: 0.251953125|unsuper_loss: 0.0 average reward score: -0.7587890625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.19%) |Training time=0.64s (19.78%) |Others=0.20 (6.02%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 1095|ppo_ep: 1|act_loss: -0.0120849609375|cri_loss: 0.04718017578125|unsuper_loss: 0.0 average reward score: -0.08172607421875 ------------------------------------------------------------------------------------- |E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.42s (66.75%) |Training time=0.93s (25.55%) |Others=0.28 (7.70%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.42 epoch: 0|step: 1096|ppo_ep: 1|act_loss: 0.0819091796875|cri_loss: 0.1474609375|unsuper_loss: 0.0 average reward score: -0.38818359375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.63%) |Training time=0.64s (19.44%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 1097|ppo_ep: 1|act_loss: -0.26513671875|cri_loss: -0.07379150390625|unsuper_loss: 0.0 average reward score: -1.177734375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.25%) |Training time=0.65s (19.81%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 1098|ppo_ep: 1|act_loss: -0.1573486328125|cri_loss: 0.0074462890625|unsuper_loss: 0.0 average reward score: 0.301513671875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.11%) |Training time=0.65s (19.88%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1099|ppo_ep: 1|act_loss: -0.071044921875|cri_loss: 0.03851318359375|unsuper_loss: 0.0 average reward score: -0.159423828125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.35%) |Training time=0.64s (19.76%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1100|ppo_ep: 1|act_loss: -0.0557861328125|cri_loss: 0.021240234375|unsuper_loss: 0.0 average reward score: 0.5224609375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.40%) |Training time=0.64s (19.67%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1101|ppo_ep: 1|act_loss: 0.12890625|cri_loss: 0.0909423828125|unsuper_loss: 0.0 average reward score: -0.26220703125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.42%) |Training time=0.64s (19.67%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1102|ppo_ep: 1|act_loss: -0.20703125|cri_loss: -0.05889892578125|unsuper_loss: 0.0 average reward score: 0.30029296875 ------------------------------------------------------------------------------------- |E2E latency=6.23s |Gather latency=0.00s (0.00%) |Generate time=2.41s (38.71%) |Training time=3.35s (53.70%) |Others=0.47 (7.59%)|CurSamplesPerSec=1.28 |AvgSamplesPerSec=2.42 epoch: 0|step: 1103|ppo_ep: 1|act_loss: 0.017059326171875|cri_loss: 0.0350341796875|unsuper_loss: 0.0 average reward score: -0.5205078125 ------------------------------------------------------------------------------------- |E2E latency=3.77s |Gather latency=0.00s (0.00%) |Generate time=2.55s (67.69%) |Training time=0.93s (24.72%) |Others=0.29 (7.59%)|CurSamplesPerSec=2.12 |AvgSamplesPerSec=2.42 epoch: 0|step: 1104|ppo_ep: 1|act_loss: 0.254150390625|cri_loss: 0.1864013671875|unsuper_loss: 0.0 average reward score: -0.93212890625 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.07%) |Training time=0.70s (21.20%) |Others=0.19 (5.73%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.42 epoch: 0|step: 1105|ppo_ep: 1|act_loss: 0.2275390625|cri_loss: 0.1416015625|unsuper_loss: 0.0 average reward score: -0.630859375 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.41s (72.23%) |Training time=0.73s (21.80%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.42 epoch: 0|step: 1106|ppo_ep: 1|act_loss: 0.40673828125|cri_loss: 0.2822265625|unsuper_loss: 0.0 average reward score: -0.89111328125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.67%) |Training time=0.64s (19.52%) |Others=0.19 (5.82%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 1107|ppo_ep: 1|act_loss: 0.52294921875|cri_loss: 0.3193359375|unsuper_loss: 0.0 average reward score: -0.220458984375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.25%) |Training time=0.64s (19.76%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1108|ppo_ep: 1|act_loss: 0.4296875|cri_loss: 0.26513671875|unsuper_loss: 0.0 average reward score: -0.79248046875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.56%) |Training time=0.66s (20.37%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 1109|ppo_ep: 1|act_loss: 0.27587890625|cri_loss: 0.19970703125|unsuper_loss: 0.0 average reward score: -0.712890625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.86%) |Training time=0.65s (20.17%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 1110|ppo_ep: 1|act_loss: 0.160888671875|cri_loss: 0.1080322265625|unsuper_loss: 0.0 average reward score: 0.29443359375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.71%) |Training time=0.65s (20.16%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 1111|ppo_ep: 1|act_loss: 0.42724609375|cri_loss: 0.29150390625|unsuper_loss: 0.0 average reward score: -0.66943359375 ------------------------------------------------------------------------------------- |E2E latency=3.65s |Gather latency=0.00s (0.00%) |Generate time=2.45s (67.12%) |Training time=0.93s (25.43%) |Others=0.27 (7.45%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.42 epoch: 0|step: 1112|ppo_ep: 1|act_loss: 0.2041015625|cri_loss: 0.1195068359375|unsuper_loss: 0.0 average reward score: -0.413330078125 ------------------------------------------------------------------------------------- |E2E latency=3.15s |Gather latency=0.00s (0.00%) |Generate time=2.28s (72.54%) |Training time=0.66s (21.13%) |Others=0.20 (6.33%)|CurSamplesPerSec=2.54 |AvgSamplesPerSec=2.42 epoch: 0|step: 1113|ppo_ep: 1|act_loss: 0.442626953125|cri_loss: 0.269287109375|unsuper_loss: 0.0 average reward score: -0.7890625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.37s (72.93%) |Training time=0.64s (19.70%) |Others=0.24 (7.37%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1114|ppo_ep: 1|act_loss: 0.912109375|cri_loss: 0.5791015625|unsuper_loss: 0.0 average reward score: -0.1129150390625 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (74.02%) |Training time=0.64s (20.03%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 1115|ppo_ep: 1|act_loss: 0.1849365234375|cri_loss: 0.134521484375|unsuper_loss: 0.0 average reward score: 0.442138671875 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.95%) |Training time=0.64s (19.28%) |Others=0.19 (5.77%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.42 epoch: 0|step: 1116|ppo_ep: 1|act_loss: 0.65478515625|cri_loss: 0.41162109375|unsuper_loss: 0.0 average reward score: -0.153564453125 ------------------------------------------------------------------------------------- |E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.61s (75.68%) |Training time=0.64s (18.53%) |Others=0.20 (5.78%)|CurSamplesPerSec=2.32 |AvgSamplesPerSec=2.42 epoch: 0|step: 1117|ppo_ep: 1|act_loss: 0.4931640625|cri_loss: 0.306884765625|unsuper_loss: 0.0 average reward score: -0.2161865234375 ------------------------------------------------------------------------------------- |E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.38s (69.00%) |Training time=0.88s (25.42%) |Others=0.19 (5.58%)|CurSamplesPerSec=2.32 |AvgSamplesPerSec=2.42 epoch: 0|step: 1118|ppo_ep: 1|act_loss: -0.02996826171875|cri_loss: 0.009521484375|unsuper_loss: 0.0 average reward score: -1.0966796875 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.45s (73.79%) |Training time=0.64s (19.29%) |Others=0.23 (6.92%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.42 [2023-04-24 14:49:31,105] [INFO] [logging.py:96:log_dist] [Rank 0] step=140, skipped=5, lr=[9.445152702231402e-06, 9.445152702231402e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 14:49:31,352] [INFO] [timer.py:199:stop] epoch=0/micro_step=1120/global_step=140, RunningAvgSamplesPerSec=15.457409668682383, CurrSamplesPerSec=14.882384317635507, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 14:49:31,556] [INFO] [logging.py:96:log_dist] [Rank 0] step=140, skipped=4, lr=[4.887756243017282e-06, 4.887756243017282e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 1119|ppo_ep: 1|act_loss: 0.1455078125|cri_loss: 0.096923828125|unsuper_loss: 0.0 average reward score: -0.34130859375 ------------------------------------------------------------------------------------- |E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.31s (65.72%) |Training time=0.93s (26.35%) |Others=0.28 (7.93%)|CurSamplesPerSec=2.27 |AvgSamplesPerSec=2.42 epoch: 0|step: 1120|ppo_ep: 1|act_loss: 0.1654052734375|cri_loss: 0.136474609375|unsuper_loss: 0.0 average reward score: 0.33056640625 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.79%) |Training time=0.64s (20.10%) |Others=0.19 (6.12%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.42 epoch: 0|step: 1121|ppo_ep: 1|act_loss: 0.1552734375|cri_loss: 0.187255859375|unsuper_loss: 0.0 average reward score: -0.73876953125 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.38s (72.36%) |Training time=0.65s (19.85%) |Others=0.26 (7.79%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.42 epoch: 0|step: 1122|ppo_ep: 1|act_loss: 0.282470703125|cri_loss: 0.1728515625|unsuper_loss: 0.0 average reward score: 0.044189453125 ------------------------------------------------------------------------------------- |E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.38s (70.53%) |Training time=0.80s (23.67%) |Others=0.20 (5.80%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.42 epoch: 0|step: 1123|ppo_ep: 1|act_loss: 0.1177978515625|cri_loss: 0.14990234375|unsuper_loss: 0.0 average reward score: -0.22705078125 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.39s (72.08%) |Training time=0.74s (22.15%) |Others=0.19 (5.77%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.42 epoch: 0|step: 1124|ppo_ep: 1|act_loss: 0.39013671875|cri_loss: 0.26416015625|unsuper_loss: 0.0 average reward score: 0.4501953125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.32s (71.09%) |Training time=0.75s (22.91%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 1125|ppo_ep: 1|act_loss: 0.708984375|cri_loss: 0.46875|unsuper_loss: 0.0 average reward score: -0.9462890625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.30%) |Training time=0.64s (19.69%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1126|ppo_ep: 1|act_loss: 0.775390625|cri_loss: 0.515625|unsuper_loss: 0.0 average reward score: -0.8271484375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.21%) |Training time=0.64s (19.79%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 1127|ppo_ep: 1|act_loss: 0.2861328125|cri_loss: 0.2021484375|unsuper_loss: 0.0 average reward score: -0.6201171875 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.40s (66.41%) |Training time=0.93s (25.59%) |Others=0.29 (8.00%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.42 epoch: 0|step: 1128|ppo_ep: 1|act_loss: 0.1746826171875|cri_loss: 0.148193359375|unsuper_loss: 0.0 average reward score: -0.99462890625 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.86%) |Training time=0.64s (20.13%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.42 epoch: 0|step: 1129|ppo_ep: 1|act_loss: 0.1387939453125|cri_loss: 0.088134765625|unsuper_loss: 0.0 average reward score: 0.4912109375 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.12%) |Training time=0.64s (19.48%) |Others=0.24 (7.39%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.42 epoch: 0|step: 1130|ppo_ep: 1|act_loss: 0.271484375|cri_loss: 0.1982421875|unsuper_loss: 0.0 average reward score: -0.447509765625 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.56%) |Training time=0.65s (20.31%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 1131|ppo_ep: 1|act_loss: -0.27294921875|cri_loss: -0.0482177734375|unsuper_loss: 0.0 average reward score: -1.0078125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.23%) |Training time=0.64s (19.86%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 1132|ppo_ep: 1|act_loss: 0.05291748046875|cri_loss: 0.06982421875|unsuper_loss: 0.0 average reward score: -0.314453125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.06%) |Training time=0.64s (19.89%) |Others=0.19 (6.05%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 1133|ppo_ep: 1|act_loss: 0.08380126953125|cri_loss: 0.0916748046875|unsuper_loss: 0.0 average reward score: -0.34716796875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.18%) |Training time=0.64s (19.84%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 1134|ppo_ep: 1|act_loss: -0.19384765625|cri_loss: -0.04364013671875|unsuper_loss: 0.0 average reward score: 0.2203369140625 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.68%) |Training time=0.65s (20.31%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.42 epoch: 0|step: 1135|ppo_ep: 1|act_loss: 0.2293701171875|cri_loss: 0.159423828125|unsuper_loss: 0.0 average reward score: -0.17041015625 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.36s (66.10%) |Training time=0.93s (25.95%) |Others=0.28 (7.95%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.42 epoch: 0|step: 1136|ppo_ep: 1|act_loss: -0.15380859375|cri_loss: -0.02508544921875|unsuper_loss: 0.0 average reward score: 0.135986328125 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.66%) |Training time=0.65s (20.24%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 1137|ppo_ep: 1|act_loss: -0.06292724609375|cri_loss: 0.00262451171875|unsuper_loss: 0.0 average reward score: 0.8798828125 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.44s (73.61%) |Training time=0.68s (20.47%) |Others=0.20 (5.93%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.42 epoch: 0|step: 1138|ppo_ep: 1|act_loss: -0.41015625|cri_loss: -0.12109375|unsuper_loss: 0.0 average reward score: -0.219482421875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.23%) |Training time=0.64s (19.84%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 1139|ppo_ep: 1|act_loss: 0.01141357421875|cri_loss: 0.05853271484375|unsuper_loss: 0.0 average reward score: 0.2303466796875 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.05%) |Training time=0.64s (19.27%) |Others=0.22 (6.68%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.42 epoch: 0|step: 1140|ppo_ep: 1|act_loss: 0.1318359375|cri_loss: 0.07958984375|unsuper_loss: 0.0 average reward score: -0.0609130859375 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.84%) |Training time=0.64s (19.09%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.42 epoch: 0|step: 1141|ppo_ep: 1|act_loss: -0.136962890625|cri_loss: -0.02166748046875|unsuper_loss: 0.0 average reward score: -0.6513671875 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.30s (71.92%) |Training time=0.69s (21.58%) |Others=0.21 (6.51%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.42 epoch: 0|step: 1142|ppo_ep: 1|act_loss: -0.2056884765625|cri_loss: -0.03125|unsuper_loss: 0.0 average reward score: -1.4052734375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.12%) |Training time=0.64s (19.85%) |Others=0.19 (6.03%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 1143|ppo_ep: 1|act_loss: -0.217041015625|cri_loss: -0.0130615234375|unsuper_loss: 0.0 average reward score: -0.0411376953125 ------------------------------------------------------------------------------------- |E2E latency=3.59s |Gather latency=0.00s (0.00%) |Generate time=2.39s (66.45%) |Training time=0.93s (25.81%) |Others=0.28 (7.74%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.42 epoch: 0|step: 1144|ppo_ep: 1|act_loss: -0.099365234375|cri_loss: 0.077392578125|unsuper_loss: 0.0 average reward score: -1.048828125 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.10%) |Training time=0.64s (19.96%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 1145|ppo_ep: 1|act_loss: 0.56201171875|cri_loss: 0.33984375|unsuper_loss: 0.0 average reward score: -0.38623046875 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.09%) |Training time=0.69s (21.08%) |Others=0.19 (5.83%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 1146|ppo_ep: 1|act_loss: 0.11474609375|cri_loss: 0.11676025390625|unsuper_loss: 0.0 average reward score: 1.0517578125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.38s (72.72%) |Training time=0.70s (21.37%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 1147|ppo_ep: 1|act_loss: 0.0550537109375|cri_loss: 0.06304931640625|unsuper_loss: 0.0 average reward score: 0.96044921875 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.07%) |Training time=0.64s (19.93%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 1148|ppo_ep: 1|act_loss: 0.1282958984375|cri_loss: 0.095458984375|unsuper_loss: 0.0 average reward score: -1.0 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.71%) |Training time=0.64s (20.14%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 1149|ppo_ep: 1|act_loss: 0.7177734375|cri_loss: 0.41162109375|unsuper_loss: 0.0 average reward score: -0.435302734375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.16%) |Training time=0.64s (19.87%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 1150|ppo_ep: 1|act_loss: 0.260009765625|cri_loss: 0.1728515625|unsuper_loss: 0.0 average reward score: -0.7958984375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.16%) |Training time=0.65s (19.81%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1151|ppo_ep: 1|act_loss: -0.171875|cri_loss: 0.002197265625|unsuper_loss: 0.0 average reward score: -0.71484375 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.18%) |Training time=0.95s (26.09%) |Others=0.28 (7.73%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.42 epoch: 0|step: 1152|ppo_ep: 1|act_loss: 0.357421875|cri_loss: 0.2198486328125|unsuper_loss: 0.0 average reward score: 0.295166015625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.30%) |Training time=0.64s (19.62%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1153|ppo_ep: 1|act_loss: 0.73779296875|cri_loss: 0.457275390625|unsuper_loss: 0.0 average reward score: 0.75390625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.35s (72.14%) |Training time=0.71s (21.95%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1154|ppo_ep: 1|act_loss: 0.47265625|cri_loss: 0.27783203125|unsuper_loss: 0.0 average reward score: 0.34814453125 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.37s (71.77%) |Training time=0.71s (21.34%) |Others=0.23 (6.89%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.42 epoch: 0|step: 1155|ppo_ep: 1|act_loss: 0.52783203125|cri_loss: 0.332275390625|unsuper_loss: 0.0 average reward score: 0.3193359375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.36%) |Training time=0.68s (20.74%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 1156|ppo_ep: 1|act_loss: 0.50244140625|cri_loss: 0.3095703125|unsuper_loss: 0.0 average reward score: 0.1566162109375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.30s (70.24%) |Training time=0.78s (23.71%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 1157|ppo_ep: 1|act_loss: 0.29931640625|cri_loss: 0.186279296875|unsuper_loss: 0.0 average reward score: -0.4677734375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.60%) |Training time=0.64s (19.55%) |Others=0.19 (5.84%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 1158|ppo_ep: 1|act_loss: 0.216796875|cri_loss: 0.1478271484375|unsuper_loss: 0.0 average reward score: -0.1005859375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.36s (72.45%) |Training time=0.70s (21.45%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 1159|ppo_ep: 1|act_loss: 0.5556640625|cri_loss: 0.328857421875|unsuper_loss: 0.0 average reward score: -0.8720703125 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.37s (65.81%) |Training time=0.96s (26.57%) |Others=0.28 (7.62%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.42 epoch: 0|step: 1160|ppo_ep: 1|act_loss: -0.0172119140625|cri_loss: 0.020111083984375|unsuper_loss: 0.0 average reward score: 1.12109375 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.53s (75.07%) |Training time=0.64s (18.96%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.42 epoch: 0|step: 1161|ppo_ep: 1|act_loss: 0.921875|cri_loss: 0.5390625|unsuper_loss: 0.0 average reward score: -0.10400390625 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.53s (75.21%) |Training time=0.64s (19.00%) |Others=0.19 (5.79%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.42 epoch: 0|step: 1162|ppo_ep: 1|act_loss: 0.8203125|cri_loss: 0.51416015625|unsuper_loss: 0.0 average reward score: 0.3076171875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.26%) |Training time=0.64s (19.71%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1163|ppo_ep: 1|act_loss: 0.7587890625|cri_loss: 0.4677734375|unsuper_loss: 0.0 average reward score: -0.310546875 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.78%) |Training time=0.64s (20.13%) |Others=0.19 (6.09%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.42 epoch: 0|step: 1164|ppo_ep: 1|act_loss: 0.5341796875|cri_loss: 0.3427734375|unsuper_loss: 0.0 average reward score: 0.6142578125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.10%) |Training time=0.64s (19.87%) |Others=0.19 (6.03%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 1165|ppo_ep: 1|act_loss: 0.206787109375|cri_loss: 0.16064453125|unsuper_loss: 0.0 average reward score: 0.9189453125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.34%) |Training time=0.64s (19.73%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 1166|ppo_ep: 1|act_loss: 0.46240234375|cri_loss: 0.31787109375|unsuper_loss: 0.0 average reward score: 0.71630859375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.11%) |Training time=0.64s (19.79%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 1167|ppo_ep: 1|act_loss: 0.75|cri_loss: 0.447265625|unsuper_loss: 0.0 average reward score: -0.42724609375 ------------------------------------------------------------------------------------- |E2E latency=3.67s |Gather latency=0.00s (0.00%) |Generate time=2.40s (65.41%) |Training time=0.99s (26.99%) |Others=0.28 (7.60%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.42 epoch: 0|step: 1168|ppo_ep: 1|act_loss: 0.5693359375|cri_loss: 0.3505859375|unsuper_loss: 0.0 average reward score: -0.05712890625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.74%) |Training time=0.64s (19.39%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 1169|ppo_ep: 1|act_loss: 0.44091796875|cri_loss: 0.275390625|unsuper_loss: 0.0 average reward score: -0.70361328125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.32s (72.02%) |Training time=0.71s (21.99%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 1170|ppo_ep: 1|act_loss: 0.11273193359375|cri_loss: 0.136474609375|unsuper_loss: 0.0 average reward score: -0.440185546875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.95%) |Training time=0.64s (20.02%) |Others=0.19 (6.03%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 1171|ppo_ep: 1|act_loss: 0.078125|cri_loss: 0.1209716796875|unsuper_loss: 0.0 average reward score: 0.72412109375 ------------------------------------------------------------------------------------- |E2E latency=3.16s |Gather latency=0.00s (0.00%) |Generate time=2.32s (73.38%) |Training time=0.65s (20.55%) |Others=0.19 (6.07%)|CurSamplesPerSec=2.54 |AvgSamplesPerSec=2.42 epoch: 0|step: 1172|ppo_ep: 1|act_loss: 0.0416259765625|cri_loss: 0.0404052734375|unsuper_loss: 0.0 average reward score: 1.615234375 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.59%) |Training time=0.64s (19.39%) |Others=0.20 (6.02%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.42 epoch: 0|step: 1173|ppo_ep: 1|act_loss: 0.51318359375|cri_loss: 0.320068359375|unsuper_loss: 0.0 average reward score: -0.137939453125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.91%) |Training time=0.64s (19.86%) |Others=0.20 (6.23%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 1174|ppo_ep: 1|act_loss: 0.47509765625|cri_loss: 0.2958984375|unsuper_loss: 0.0 average reward score: -0.105712890625 ------------------------------------------------------------------------------------- |E2E latency=6.57s |Gather latency=0.00s (0.00%) |Generate time=3.82s (58.25%) |Training time=2.28s (34.78%) |Others=0.46 (6.98%)|CurSamplesPerSec=1.22 |AvgSamplesPerSec=2.41 epoch: 0|step: 1175|ppo_ep: 1|act_loss: 0.1522216796875|cri_loss: 0.1036376953125|unsuper_loss: 0.0 average reward score: 0.6435546875 ------------------------------------------------------------------------------------- |E2E latency=3.74s |Gather latency=0.00s (0.00%) |Generate time=2.53s (67.69%) |Training time=0.93s (24.89%) |Others=0.28 (7.42%)|CurSamplesPerSec=2.14 |AvgSamplesPerSec=2.41 epoch: 0|step: 1176|ppo_ep: 1|act_loss: 0.2105712890625|cri_loss: 0.1212158203125|unsuper_loss: 0.0 average reward score: 0.64501953125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.32%) |Training time=0.64s (19.74%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1177|ppo_ep: 1|act_loss: -0.07806396484375|cri_loss: -0.00738525390625|unsuper_loss: 0.0 average reward score: 1.08984375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.15%) |Training time=0.64s (19.75%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1178|ppo_ep: 1|act_loss: -0.211181640625|cri_loss: -0.0025634765625|unsuper_loss: 0.0 average reward score: 1.2412109375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.43%) |Training time=0.64s (19.60%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1179|ppo_ep: 1|act_loss: 0.1905517578125|cri_loss: 0.2137451171875|unsuper_loss: 0.0 average reward score: -0.59716796875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.12%) |Training time=0.65s (19.92%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1180|ppo_ep: 1|act_loss: 0.1766357421875|cri_loss: 0.1278076171875|unsuper_loss: 0.0 average reward score: -0.1767578125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.81%) |Training time=0.65s (19.95%) |Others=0.20 (6.24%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1181|ppo_ep: 1|act_loss: 0.05340576171875|cri_loss: 0.0731201171875|unsuper_loss: 0.0 average reward score: 0.955078125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.17%) |Training time=0.64s (19.82%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1182|ppo_ep: 1|act_loss: 0.359130859375|cri_loss: 0.23828125|unsuper_loss: 0.0 average reward score: 1.7109375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.14%) |Training time=0.64s (19.82%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1183|ppo_ep: 1|act_loss: 0.1776123046875|cri_loss: 0.15478515625|unsuper_loss: 0.0 average reward score: 1.796875 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.67%) |Training time=0.93s (25.58%) |Others=0.28 (7.74%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.41 epoch: 0|step: 1184|ppo_ep: 1|act_loss: 0.121826171875|cri_loss: 0.1298828125|unsuper_loss: 0.0 average reward score: 1.234375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.98%) |Training time=0.65s (19.99%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1185|ppo_ep: 1|act_loss: -0.062255859375|cri_loss: 0.03314208984375|unsuper_loss: 0.0 average reward score: 1.1005859375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.06%) |Training time=0.65s (19.88%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1186|ppo_ep: 1|act_loss: -0.09228515625|cri_loss: 0.044677734375|unsuper_loss: 0.0 average reward score: 1.87109375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.77%) |Training time=0.66s (20.17%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1187|ppo_ep: 1|act_loss: 0.08746337890625|cri_loss: 0.0673828125|unsuper_loss: 0.0 average reward score: -0.2373046875 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (74.03%) |Training time=0.64s (19.98%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 1188|ppo_ep: 1|act_loss: -0.09918212890625|cri_loss: 0.00860595703125|unsuper_loss: 0.0 average reward score: 0.97607421875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.12%) |Training time=0.64s (19.76%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1189|ppo_ep: 1|act_loss: -0.153076171875|cri_loss: -0.02484130859375|unsuper_loss: 0.0 average reward score: 1.662109375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.18%) |Training time=0.64s (19.71%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1190|ppo_ep: 1|act_loss: 0.138427734375|cri_loss: 0.1448974609375|unsuper_loss: 0.0 average reward score: 0.56689453125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.24%) |Training time=0.64s (19.66%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1191|ppo_ep: 1|act_loss: -0.111572265625|cri_loss: 0.01904296875|unsuper_loss: 0.0 average reward score: -0.6240234375 ------------------------------------------------------------------------------------- |E2E latency=5.62s |Gather latency=0.00s (0.00%) |Generate time=3.33s (59.17%) |Training time=1.76s (31.32%) |Others=0.53 (9.51%)|CurSamplesPerSec=1.42 |AvgSamplesPerSec=2.41 epoch: 0|step: 1192|ppo_ep: 1|act_loss: 0.124755859375|cri_loss: 0.11614990234375|unsuper_loss: 0.0 average reward score: 0.7373046875 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.80%) |Training time=0.64s (19.41%) |Others=0.19 (5.79%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 1193|ppo_ep: 1|act_loss: -0.023681640625|cri_loss: 0.05108642578125|unsuper_loss: 0.0 average reward score: -0.921875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.51%) |Training time=0.64s (19.57%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1194|ppo_ep: 1|act_loss: -0.0020751953125|cri_loss: 0.10888671875|unsuper_loss: 0.0 average reward score: 2.83984375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.48%) |Training time=0.64s (19.47%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 1195|ppo_ep: 1|act_loss: -0.45263671875|cri_loss: -0.1241455078125|unsuper_loss: 0.0 average reward score: 1.7197265625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.20%) |Training time=0.64s (19.76%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1196|ppo_ep: 1|act_loss: -0.0831298828125|cri_loss: 0.031494140625|unsuper_loss: 0.0 average reward score: 1.94140625 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.40s (72.69%) |Training time=0.65s (19.59%) |Others=0.25 (7.71%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 1197|ppo_ep: 1|act_loss: 0.146728515625|cri_loss: 0.1553955078125|unsuper_loss: 0.0 average reward score: 2.51171875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.30%) |Training time=0.64s (19.79%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1198|ppo_ep: 1|act_loss: 0.10418701171875|cri_loss: 0.11004638671875|unsuper_loss: 0.0 average reward score: 2.076171875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.01%) |Training time=0.64s (19.84%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 [2023-04-24 14:54:00,244] [INFO] [logging.py:96:log_dist] [Rank 0] step=150, skipped=5, lr=[9.312950172135383e-06, 9.312950172135383e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 14:54:00,490] [INFO] [timer.py:199:stop] epoch=0/micro_step=1200/global_step=150, RunningAvgSamplesPerSec=15.419399550768803, CurrSamplesPerSec=15.855440166613704, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 14:54:00,694] [INFO] [logging.py:96:log_dist] [Rank 0] step=150, skipped=4, lr=[4.817611704345344e-06, 4.817611704345344e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 1199|ppo_ep: 1|act_loss: 0.04278564453125|cri_loss: 0.042816162109375|unsuper_loss: 0.0 average reward score: 0.22216796875 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.40s (66.63%) |Training time=0.92s (25.63%) |Others=0.28 (7.74%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.41 epoch: 0|step: 1200|ppo_ep: 1|act_loss: 0.1865234375|cri_loss: 0.1552734375|unsuper_loss: 0.0 average reward score: 1.4609375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.35%) |Training time=0.63s (19.73%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 1201|ppo_ep: 1|act_loss: -0.2279052734375|cri_loss: -0.021240234375|unsuper_loss: 0.0 average reward score: 1.6806640625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.10%) |Training time=0.65s (19.99%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1202|ppo_ep: 1|act_loss: 0.0743408203125|cri_loss: 0.10906982421875|unsuper_loss: 0.0 average reward score: 1.02734375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.16%) |Training time=0.64s (19.70%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1203|ppo_ep: 1|act_loss: 0.0897216796875|cri_loss: 0.09466552734375|unsuper_loss: 0.0 average reward score: 0.83349609375 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.11%) |Training time=0.64s (19.84%) |Others=0.19 (6.05%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 1204|ppo_ep: 1|act_loss: 0.1131591796875|cri_loss: 0.15478515625|unsuper_loss: 0.0 average reward score: 3.1328125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.48%) |Training time=0.64s (19.56%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1205|ppo_ep: 1|act_loss: 0.133544921875|cri_loss: 0.11004638671875|unsuper_loss: 0.0 average reward score: 2.765625 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (74.08%) |Training time=0.64s (19.96%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 1206|ppo_ep: 1|act_loss: -0.0712890625|cri_loss: 0.0631103515625|unsuper_loss: 0.0 average reward score: 3.056640625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.78%) |Training time=0.65s (20.14%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 1207|ppo_ep: 1|act_loss: 0.1953125|cri_loss: 0.147705078125|unsuper_loss: 0.0 average reward score: 1.19921875 ------------------------------------------------------------------------------------- |E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.43s (66.89%) |Training time=0.92s (25.43%) |Others=0.28 (7.68%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.41 epoch: 0|step: 1208|ppo_ep: 1|act_loss: 0.62060546875|cri_loss: 0.3896484375|unsuper_loss: 0.0 average reward score: 4.0703125 ------------------------------------------------------------------------------------- |E2E latency=3.13s |Gather latency=0.00s (0.00%) |Generate time=2.28s (72.86%) |Training time=0.66s (21.00%) |Others=0.19 (6.14%)|CurSamplesPerSec=2.56 |AvgSamplesPerSec=2.41 epoch: 0|step: 1209|ppo_ep: 1|act_loss: 0.3349609375|cri_loss: 0.2261962890625|unsuper_loss: 0.0 average reward score: 0.66796875 ------------------------------------------------------------------------------------- |E2E latency=7.96s |Gather latency=0.00s (0.00%) |Generate time=5.30s (66.54%) |Training time=1.97s (24.81%) |Others=0.69 (8.64%)|CurSamplesPerSec=1.01 |AvgSamplesPerSec=2.41 epoch: 0|step: 1210|ppo_ep: 1|act_loss: 0.533203125|cri_loss: 0.325927734375|unsuper_loss: 0.0 average reward score: 1.1728515625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.00%) |Training time=0.66s (20.54%) |Others=0.21 (6.46%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 1211|ppo_ep: 1|act_loss: 0.47021484375|cri_loss: 0.267822265625|unsuper_loss: 0.0 average reward score: -0.0029296875 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.36s (74.01%) |Training time=0.64s (19.98%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 1212|ppo_ep: 1|act_loss: 0.37939453125|cri_loss: 0.271240234375|unsuper_loss: 0.0 average reward score: 0.7861328125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.30%) |Training time=0.64s (19.68%) |Others=0.20 (6.02%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1213|ppo_ep: 1|act_loss: 0.3681640625|cri_loss: 0.263916015625|unsuper_loss: 0.0 average reward score: 1.794921875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.33%) |Training time=0.64s (19.81%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1214|ppo_ep: 1|act_loss: 0.5166015625|cri_loss: 0.304931640625|unsuper_loss: 0.0 average reward score: -1.51953125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.31%) |Training time=0.64s (19.67%) |Others=0.20 (6.02%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1215|ppo_ep: 1|act_loss: 0.54150390625|cri_loss: 0.354736328125|unsuper_loss: 0.0 average reward score: 0.7421875 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.66%) |Training time=0.93s (25.64%) |Others=0.28 (7.70%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.41 epoch: 0|step: 1216|ppo_ep: 1|act_loss: 0.74365234375|cri_loss: 0.5087890625|unsuper_loss: 0.0 average reward score: 2.708984375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.42%) |Training time=0.63s (19.65%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1217|ppo_ep: 1|act_loss: 0.247314453125|cri_loss: 0.290771484375|unsuper_loss: 0.0 average reward score: 2.765625 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.50s (75.04%) |Training time=0.64s (19.26%) |Others=0.19 (5.70%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.41 epoch: 0|step: 1218|ppo_ep: 1|act_loss: 0.29345703125|cri_loss: 0.260498046875|unsuper_loss: 0.0 average reward score: 1.6689453125 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.97%) |Training time=0.64s (19.31%) |Others=0.19 (5.72%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.41 epoch: 0|step: 1219|ppo_ep: 1|act_loss: 0.1710205078125|cri_loss: 0.1348876953125|unsuper_loss: 0.0 average reward score: 2.58984375 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.80%) |Training time=0.64s (19.98%) |Others=0.20 (6.22%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 1220|ppo_ep: 1|act_loss: 0.1768798828125|cri_loss: 0.11639404296875|unsuper_loss: 0.0 average reward score: 1.986328125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.90%) |Training time=0.65s (19.90%) |Others=0.20 (6.20%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1221|ppo_ep: 1|act_loss: -0.0068359375|cri_loss: 0.066162109375|unsuper_loss: 0.0 average reward score: 1.51171875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.19%) |Training time=0.64s (19.85%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1222|ppo_ep: 1|act_loss: 0.958984375|cri_loss: 0.6298828125|unsuper_loss: 0.0 average reward score: 3.93359375 ------------------------------------------------------------------------------------- |E2E latency=3.14s |Gather latency=0.00s (0.00%) |Generate time=2.30s (73.15%) |Training time=0.64s (20.40%) |Others=0.20 (6.45%)|CurSamplesPerSec=2.55 |AvgSamplesPerSec=2.41 epoch: 0|step: 1223|ppo_ep: 1|act_loss: 0.13134765625|cri_loss: 0.105224609375|unsuper_loss: 0.0 average reward score: 0.8857421875 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.40s (66.58%) |Training time=0.93s (25.83%) |Others=0.27 (7.59%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.41 epoch: 0|step: 1224|ppo_ep: 1|act_loss: 0.6904296875|cri_loss: 0.49072265625|unsuper_loss: 0.0 average reward score: 3.375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.40%) |Training time=0.64s (19.70%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1225|ppo_ep: 1|act_loss: 0.02020263671875|cri_loss: 0.0693359375|unsuper_loss: 0.0 average reward score: 0.9609375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.62%) |Training time=0.64s (19.58%) |Others=0.19 (5.80%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1226|ppo_ep: 1|act_loss: 0.043243408203125|cri_loss: 0.1031494140625|unsuper_loss: 0.0 average reward score: 2.56640625 ------------------------------------------------------------------------------------- |E2E latency=4.71s |Gather latency=0.00s (0.00%) |Generate time=3.60s (76.53%) |Training time=0.85s (18.03%) |Others=0.26 (5.44%)|CurSamplesPerSec=1.70 |AvgSamplesPerSec=2.41 epoch: 0|step: 1227|ppo_ep: 1|act_loss: 0.329833984375|cri_loss: 0.259521484375|unsuper_loss: 0.0 average reward score: 0.86279296875 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.49%) |Training time=0.65s (20.31%) |Others=0.20 (6.19%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 1228|ppo_ep: 1|act_loss: 0.232177734375|cri_loss: 0.26806640625|unsuper_loss: 0.0 average reward score: 2.4609375 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.46s (73.61%) |Training time=0.66s (19.81%) |Others=0.22 (6.58%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.41 epoch: 0|step: 1229|ppo_ep: 1|act_loss: 0.11956787109375|cri_loss: 0.093017578125|unsuper_loss: 0.0 average reward score: -0.318115234375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.36%) |Training time=0.65s (19.68%) |Others=0.20 (5.96%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 1230|ppo_ep: 1|act_loss: -0.1724853515625|cri_loss: -0.0274658203125|unsuper_loss: 0.0 average reward score: 0.927734375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.94%) |Training time=0.65s (19.96%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1231|ppo_ep: 1|act_loss: 0.078857421875|cri_loss: 0.11566162109375|unsuper_loss: 0.0 average reward score: 2.626953125 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.38s (66.09%) |Training time=0.94s (26.02%) |Others=0.28 (7.89%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.41 epoch: 0|step: 1232|ppo_ep: 1|act_loss: -0.482421875|cri_loss: -0.11865234375|unsuper_loss: 0.0 average reward score: 2.26953125 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.23%) |Training time=0.66s (19.78%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.41 epoch: 0|step: 1233|ppo_ep: 1|act_loss: -0.48095703125|cri_loss: -0.111083984375|unsuper_loss: 0.0 average reward score: 3.33203125 ------------------------------------------------------------------------------------- |E2E latency=3.16s |Gather latency=0.00s (0.00%) |Generate time=2.30s (72.88%) |Training time=0.66s (20.86%) |Others=0.20 (6.27%)|CurSamplesPerSec=2.54 |AvgSamplesPerSec=2.41 epoch: 0|step: 1234|ppo_ep: 1|act_loss: -0.05572509765625|cri_loss: 0.01983642578125|unsuper_loss: 0.0 average reward score: 2.171875 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.53%) |Training time=0.65s (19.47%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.41 epoch: 0|step: 1235|ppo_ep: 1|act_loss: 0.09197998046875|cri_loss: 0.15234375|unsuper_loss: 0.0 average reward score: 2.6640625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.46%) |Training time=0.64s (19.64%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1236|ppo_ep: 1|act_loss: -0.30859375|cri_loss: -0.085205078125|unsuper_loss: 0.0 average reward score: 2.8046875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.95%) |Training time=0.64s (19.93%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1237|ppo_ep: 1|act_loss: 0.0277099609375|cri_loss: 0.0911865234375|unsuper_loss: 0.0 average reward score: 2.580078125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.11%) |Training time=0.64s (19.68%) |Others=0.20 (6.21%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1238|ppo_ep: 1|act_loss: -0.345703125|cri_loss: -0.0704345703125|unsuper_loss: 0.0 average reward score: 3.70703125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.06%) |Training time=0.64s (19.58%) |Others=0.21 (6.36%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1239|ppo_ep: 1|act_loss: -0.06689453125|cri_loss: 0.0692138671875|unsuper_loss: 0.0 average reward score: 2.68359375 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.42s (66.87%) |Training time=0.93s (25.58%) |Others=0.27 (7.55%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.41 epoch: 0|step: 1240|ppo_ep: 1|act_loss: -0.0849609375|cri_loss: 0.0701904296875|unsuper_loss: 0.0 average reward score: 4.3125 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.68%) |Training time=0.63s (19.63%) |Others=0.21 (6.69%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 1241|ppo_ep: 1|act_loss: -0.4501953125|cri_loss: -0.1348876953125|unsuper_loss: 0.0 average reward score: 2.83984375 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.93%) |Training time=0.64s (19.96%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 1242|ppo_ep: 1|act_loss: 0.1427001953125|cri_loss: 0.1143798828125|unsuper_loss: 0.0 average reward score: 1.7734375 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.29%) |Training time=0.64s (20.15%) |Others=0.21 (6.57%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.41 epoch: 0|step: 1243|ppo_ep: 1|act_loss: -0.599609375|cri_loss: -0.14599609375|unsuper_loss: 0.0 average reward score: 3.6953125 ------------------------------------------------------------------------------------- |E2E latency=3.15s |Gather latency=0.00s (0.00%) |Generate time=2.32s (73.63%) |Training time=0.64s (20.21%) |Others=0.19 (6.17%)|CurSamplesPerSec=2.54 |AvgSamplesPerSec=2.41 epoch: 0|step: 1244|ppo_ep: 1|act_loss: -0.5068359375|cri_loss: -0.12939453125|unsuper_loss: 0.0 average reward score: 3.6484375 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.24%) |Training time=0.65s (20.30%) |Others=0.21 (6.46%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 1245|ppo_ep: 1|act_loss: -0.0205078125|cri_loss: 0.03094482421875|unsuper_loss: 0.0 average reward score: 1.36328125 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.91%) |Training time=0.64s (19.28%) |Others=0.19 (5.81%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.41 epoch: 0|step: 1246|ppo_ep: 1|act_loss: -0.21435546875|cri_loss: -0.044189453125|unsuper_loss: 0.0 average reward score: 1.984375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.04%) |Training time=0.65s (19.81%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1247|ppo_ep: 1|act_loss: -0.53466796875|cri_loss: -0.121826171875|unsuper_loss: 0.0 average reward score: 2.439453125 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.37s (65.18%) |Training time=0.99s (27.17%) |Others=0.28 (7.65%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.41 epoch: 0|step: 1248|ppo_ep: 1|act_loss: -0.1634521484375|cri_loss: 0.0169677734375|unsuper_loss: 0.0 average reward score: 2.4609375 ------------------------------------------------------------------------------------- |E2E latency=3.14s |Gather latency=0.00s (0.00%) |Generate time=2.30s (73.41%) |Training time=0.64s (20.34%) |Others=0.20 (6.25%)|CurSamplesPerSec=2.55 |AvgSamplesPerSec=2.41 epoch: 0|step: 1249|ppo_ep: 1|act_loss: -0.32275390625|cri_loss: -0.08154296875|unsuper_loss: 0.0 average reward score: 3.123046875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.75%) |Training time=0.65s (20.14%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 1250|ppo_ep: 1|act_loss: -0.326171875|cri_loss: -0.0740966796875|unsuper_loss: 0.0 average reward score: 3.9765625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.26%) |Training time=0.64s (19.79%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1251|ppo_ep: 1|act_loss: -0.49609375|cri_loss: -0.1636962890625|unsuper_loss: 0.0 average reward score: 2.74609375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.47%) |Training time=0.64s (19.55%) |Others=0.23 (6.98%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 1252|ppo_ep: 1|act_loss: 0.0101318359375|cri_loss: 0.08148193359375|unsuper_loss: 0.0 average reward score: 1.5791015625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.38%) |Training time=0.65s (19.80%) |Others=0.19 (5.82%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1253|ppo_ep: 1|act_loss: -0.420654296875|cri_loss: -0.122314453125|unsuper_loss: 0.0 average reward score: 3.373046875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.63%) |Training time=0.66s (20.49%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1254|ppo_ep: 1|act_loss: -0.073486328125|cri_loss: 0.013916015625|unsuper_loss: 0.0 average reward score: 1.8212890625 ------------------------------------------------------------------------------------- |E2E latency=3.15s |Gather latency=0.00s (0.00%) |Generate time=2.31s (73.29%) |Training time=0.64s (20.33%) |Others=0.20 (6.37%)|CurSamplesPerSec=2.54 |AvgSamplesPerSec=2.41 epoch: 0|step: 1255|ppo_ep: 1|act_loss: -0.2412109375|cri_loss: -0.058349609375|unsuper_loss: 0.0 average reward score: 1.7734375 ------------------------------------------------------------------------------------- |E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.31s (65.64%) |Training time=0.93s (26.48%) |Others=0.28 (7.88%)|CurSamplesPerSec=2.27 |AvgSamplesPerSec=2.41 epoch: 0|step: 1256|ppo_ep: 1|act_loss: 0.238037109375|cri_loss: 0.2073974609375|unsuper_loss: 0.0 average reward score: 3.0390625 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.90%) |Training time=0.64s (20.09%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.41 epoch: 0|step: 1257|ppo_ep: 1|act_loss: -0.33056640625|cri_loss: -0.0784912109375|unsuper_loss: 0.0 average reward score: 2.896484375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.63%) |Training time=0.64s (19.43%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 1258|ppo_ep: 1|act_loss: -0.0921630859375|cri_loss: 0.0506591796875|unsuper_loss: 0.0 average reward score: 2.828125 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.30%) |Training time=0.65s (20.30%) |Others=0.20 (6.40%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 1259|ppo_ep: 1|act_loss: -0.1304931640625|cri_loss: 0.01171875|unsuper_loss: 0.0 average reward score: 2.283203125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.69%) |Training time=0.65s (19.93%) |Others=0.21 (6.37%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1260|ppo_ep: 1|act_loss: 0.1636962890625|cri_loss: 0.155029296875|unsuper_loss: 0.0 average reward score: 3.82421875 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.51%) |Training time=0.65s (19.83%) |Others=0.22 (6.66%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 1261|ppo_ep: 1|act_loss: -0.2103271484375|cri_loss: -0.016845703125|unsuper_loss: 0.0 average reward score: 3.13671875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.87%) |Training time=0.64s (19.98%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 1262|ppo_ep: 1|act_loss: -0.286376953125|cri_loss: -0.0535888671875|unsuper_loss: 0.0 average reward score: 3.802734375 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.87%) |Training time=0.64s (19.91%) |Others=0.20 (6.22%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 1263|ppo_ep: 1|act_loss: 0.20947265625|cri_loss: 0.189697265625|unsuper_loss: 0.0 average reward score: 3.84375 ------------------------------------------------------------------------------------- |E2E latency=3.66s |Gather latency=0.00s (0.00%) |Generate time=2.46s (67.21%) |Training time=0.92s (25.29%) |Others=0.27 (7.49%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.41 epoch: 0|step: 1264|ppo_ep: 1|act_loss: 0.1578369140625|cri_loss: 0.1531982421875|unsuper_loss: 0.0 average reward score: 2.89453125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.32%) |Training time=0.64s (19.72%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1265|ppo_ep: 1|act_loss: 0.015106201171875|cri_loss: 0.0655517578125|unsuper_loss: 0.0 average reward score: 3.3203125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.56%) |Training time=0.64s (19.56%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1266|ppo_ep: 1|act_loss: 0.264892578125|cri_loss: 0.18359375|unsuper_loss: 0.0 average reward score: 3.375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.38s (72.64%) |Training time=0.70s (21.48%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 1267|ppo_ep: 1|act_loss: -0.1148681640625|cri_loss: 0.011962890625|unsuper_loss: 0.0 average reward score: 2.4453125 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.85%) |Training time=0.65s (20.23%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 1268|ppo_ep: 1|act_loss: -0.020751953125|cri_loss: 0.0958251953125|unsuper_loss: 0.0 average reward score: 3.5703125 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.62%) |Training time=0.65s (20.27%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 1269|ppo_ep: 1|act_loss: 0.75390625|cri_loss: 0.5859375|unsuper_loss: 0.0 average reward score: 3.74609375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.84%) |Training time=0.65s (20.30%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 1270|ppo_ep: 1|act_loss: 0.41455078125|cri_loss: 0.27392578125|unsuper_loss: 0.0 average reward score: 2.3671875 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.38%) |Training time=0.66s (20.43%) |Others=0.20 (6.20%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1271|ppo_ep: 1|act_loss: 0.513671875|cri_loss: 0.362060546875|unsuper_loss: 0.0 average reward score: 3.728515625 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.37s (66.27%) |Training time=0.93s (25.92%) |Others=0.28 (7.82%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.41 epoch: 0|step: 1272|ppo_ep: 1|act_loss: 0.2568359375|cri_loss: 0.1820068359375|unsuper_loss: 0.0 average reward score: 2.06640625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.33%) |Training time=0.67s (20.66%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1273|ppo_ep: 1|act_loss: 0.7783203125|cri_loss: 0.52099609375|unsuper_loss: 0.0 average reward score: 3.009765625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.25%) |Training time=0.68s (20.81%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1274|ppo_ep: 1|act_loss: -0.05194091796875|cri_loss: 0.02581787109375|unsuper_loss: 0.0 average reward score: 1.50390625 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (74.07%) |Training time=0.64s (19.92%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 1275|ppo_ep: 1|act_loss: 0.400146484375|cri_loss: 0.331298828125|unsuper_loss: 0.0 average reward score: 2.49609375 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.92%) |Training time=0.64s (20.05%) |Others=0.19 (6.03%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 1276|ppo_ep: 1|act_loss: 0.017333984375|cri_loss: 0.102294921875|unsuper_loss: 0.0 average reward score: 3.984375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.13%) |Training time=0.64s (19.88%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 1277|ppo_ep: 1|act_loss: 0.53076171875|cri_loss: 0.3662109375|unsuper_loss: 0.0 average reward score: 2.53515625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.33%) |Training time=0.64s (19.74%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1278|ppo_ep: 1|act_loss: 0.39697265625|cri_loss: 0.2734375|unsuper_loss: 0.0 average reward score: 2.443359375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.10%) |Training time=0.64s (19.86%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 [2023-04-24 14:58:28,774] [INFO] [logging.py:96:log_dist] [Rank 0] step=160, skipped=5, lr=[9.149435021664706e-06, 9.149435021664706e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 14:58:29,021] [INFO] [timer.py:199:stop] epoch=0/micro_step=1280/global_step=160, RunningAvgSamplesPerSec=15.431357125373724, CurrSamplesPerSec=15.868760222958555, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 14:58:29,222] [INFO] [logging.py:96:log_dist] [Rank 0] step=160, skipped=4, lr=[4.731297089649704e-06, 4.731297089649704e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 1279|ppo_ep: 1|act_loss: 0.49658203125|cri_loss: 0.32568359375|unsuper_loss: 0.0 average reward score: 0.8037109375 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.73%) |Training time=0.92s (25.61%) |Others=0.28 (7.66%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.41 epoch: 0|step: 1280|ppo_ep: 1|act_loss: -0.26708984375|cri_loss: -0.0509033203125|unsuper_loss: 0.0 average reward score: 3.640625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.37%) |Training time=0.64s (19.73%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1281|ppo_ep: 1|act_loss: -0.2138671875|cri_loss: -0.028564453125|unsuper_loss: 0.0 average reward score: 4.1484375 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.68%) |Training time=0.64s (20.10%) |Others=0.20 (6.22%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.41 epoch: 0|step: 1282|ppo_ep: 1|act_loss: 0.1337890625|cri_loss: 0.11810302734375|unsuper_loss: 0.0 average reward score: 2.54296875 ------------------------------------------------------------------------------------- |E2E latency=3.14s |Gather latency=0.00s (0.00%) |Generate time=2.31s (73.42%) |Training time=0.64s (20.38%) |Others=0.19 (6.20%)|CurSamplesPerSec=2.54 |AvgSamplesPerSec=2.41 epoch: 0|step: 1283|ppo_ep: 1|act_loss: 0.344482421875|cri_loss: 0.27099609375|unsuper_loss: 0.0 average reward score: 2.560546875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.53%) |Training time=0.64s (19.51%) |Others=0.20 (5.95%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 1284|ppo_ep: 1|act_loss: 0.345458984375|cri_loss: 0.2392578125|unsuper_loss: 0.0 average reward score: 1.5517578125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.08%) |Training time=0.64s (19.96%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1285|ppo_ep: 1|act_loss: 0.1328125|cri_loss: 0.11297607421875|unsuper_loss: 0.0 average reward score: 2.09375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.79%) |Training time=0.65s (20.24%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1286|ppo_ep: 1|act_loss: -0.01318359375|cri_loss: 0.06805419921875|unsuper_loss: 0.0 average reward score: 2.70703125 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (74.06%) |Training time=0.64s (20.01%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 1287|ppo_ep: 1|act_loss: 0.0982666015625|cri_loss: 0.123046875|unsuper_loss: 0.0 average reward score: 4.50390625 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.44s (66.93%) |Training time=0.93s (25.53%) |Others=0.27 (7.54%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.41 epoch: 0|step: 1288|ppo_ep: 1|act_loss: -0.0672607421875|cri_loss: 0.0791015625|unsuper_loss: 0.0 average reward score: 1.2392578125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.59%) |Training time=0.63s (19.60%) |Others=0.19 (5.82%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1289|ppo_ep: 1|act_loss: -0.24658203125|cri_loss: -0.0537109375|unsuper_loss: 0.0 average reward score: 3.328125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.34%) |Training time=0.64s (19.72%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1290|ppo_ep: 1|act_loss: -0.157470703125|cri_loss: -0.035400390625|unsuper_loss: 0.0 average reward score: 2.27734375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.29%) |Training time=0.64s (19.71%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1291|ppo_ep: 1|act_loss: -0.229248046875|cri_loss: -0.035888671875|unsuper_loss: 0.0 average reward score: 3.09765625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.28%) |Training time=0.64s (19.81%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1292|ppo_ep: 1|act_loss: -0.07550048828125|cri_loss: 0.00360107421875|unsuper_loss: 0.0 average reward score: 3.375 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.76%) |Training time=0.64s (20.05%) |Others=0.20 (6.19%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 1293|ppo_ep: 1|act_loss: -0.151611328125|cri_loss: 0.05322265625|unsuper_loss: 0.0 average reward score: 3.09375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.12%) |Training time=0.64s (19.91%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1294|ppo_ep: 1|act_loss: -0.02069091796875|cri_loss: 0.040435791015625|unsuper_loss: 0.0 average reward score: 2.96484375 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.51%) |Training time=0.65s (19.35%) |Others=0.21 (6.14%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.41 epoch: 0|step: 1295|ppo_ep: 1|act_loss: -0.046142578125|cri_loss: 0.05950927734375|unsuper_loss: 0.0 average reward score: 2.9453125 ------------------------------------------------------------------------------------- |E2E latency=3.66s |Gather latency=0.00s (0.00%) |Generate time=2.46s (67.18%) |Training time=0.92s (25.20%) |Others=0.28 (7.62%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.41 epoch: 0|step: 1296|ppo_ep: 1|act_loss: -0.066650390625|cri_loss: 0.048828125|unsuper_loss: 0.0 average reward score: 2.859375 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.16%) |Training time=0.64s (19.85%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 1297|ppo_ep: 1|act_loss: -0.146728515625|cri_loss: 0.0048828125|unsuper_loss: 0.0 average reward score: 2.76171875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.44%) |Training time=0.64s (19.69%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1298|ppo_ep: 1|act_loss: -0.210205078125|cri_loss: -0.0416259765625|unsuper_loss: 0.0 average reward score: 3.0 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.44%) |Training time=0.64s (19.49%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.41 epoch: 0|step: 1299|ppo_ep: 1|act_loss: -0.07318115234375|cri_loss: 0.0118408203125|unsuper_loss: 0.0 average reward score: 3.3125 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.77%) |Training time=0.64s (19.40%) |Others=0.19 (5.82%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.41 epoch: 0|step: 1300|ppo_ep: 1|act_loss: -0.0948486328125|cri_loss: 0.026123046875|unsuper_loss: 0.0 average reward score: 2.859375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.32%) |Training time=0.64s (19.66%) |Others=0.19 (6.02%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1301|ppo_ep: 1|act_loss: -0.29345703125|cri_loss: -0.08843994140625|unsuper_loss: 0.0 average reward score: 2.505859375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.23%) |Training time=0.65s (19.86%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1302|ppo_ep: 1|act_loss: -0.090576171875|cri_loss: -0.02130126953125|unsuper_loss: 0.0 average reward score: 1.822265625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.19%) |Training time=0.64s (19.71%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1303|ppo_ep: 1|act_loss: 0.0574951171875|cri_loss: 0.06182861328125|unsuper_loss: 0.0 average reward score: 0.03955078125 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.42s (66.72%) |Training time=0.93s (25.64%) |Others=0.28 (7.64%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.41 epoch: 0|step: 1304|ppo_ep: 1|act_loss: 0.1375732421875|cri_loss: 0.10321044921875|unsuper_loss: 0.0 average reward score: 2.84765625 ------------------------------------------------------------------------------------- |E2E latency=3.41s |Gather latency=0.00s (0.00%) |Generate time=2.40s (70.35%) |Training time=0.82s (23.96%) |Others=0.19 (5.69%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.41 epoch: 0|step: 1305|ppo_ep: 1|act_loss: 0.3251953125|cri_loss: 0.2333984375|unsuper_loss: 0.0 average reward score: 2.52734375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.31%) |Training time=0.64s (19.85%) |Others=0.19 (5.84%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1306|ppo_ep: 1|act_loss: 0.08349609375|cri_loss: 0.087158203125|unsuper_loss: 0.0 average reward score: 2.42578125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.22%) |Training time=0.64s (19.88%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 1307|ppo_ep: 1|act_loss: -0.1168212890625|cri_loss: 0.0286865234375|unsuper_loss: 0.0 average reward score: 3.68359375 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.16%) |Training time=0.64s (19.92%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 1308|ppo_ep: 1|act_loss: 0.317626953125|cri_loss: 0.20654296875|unsuper_loss: 0.0 average reward score: 3.41015625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.96%) |Training time=0.64s (19.95%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 1309|ppo_ep: 1|act_loss: -0.0787353515625|cri_loss: -0.014373779296875|unsuper_loss: 0.0 average reward score: 3.28515625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.21%) |Training time=0.64s (19.85%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 1310|ppo_ep: 1|act_loss: 0.05596923828125|cri_loss: 0.069580078125|unsuper_loss: 0.0 average reward score: 3.265625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.95%) |Training time=0.64s (19.79%) |Others=0.20 (6.26%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1311|ppo_ep: 1|act_loss: -0.024444580078125|cri_loss: 0.011932373046875|unsuper_loss: 0.0 average reward score: 1.498046875 ------------------------------------------------------------------------------------- |E2E latency=3.59s |Gather latency=0.00s (0.00%) |Generate time=2.38s (66.43%) |Training time=0.93s (25.79%) |Others=0.28 (7.78%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.41 epoch: 0|step: 1312|ppo_ep: 1|act_loss: 0.10546875|cri_loss: 0.0927734375|unsuper_loss: 0.0 average reward score: 4.5234375 ------------------------------------------------------------------------------------- |E2E latency=3.11s |Gather latency=0.00s (0.00%) |Generate time=2.27s (72.90%) |Training time=0.65s (21.00%) |Others=0.19 (6.10%)|CurSamplesPerSec=2.57 |AvgSamplesPerSec=2.41 epoch: 0|step: 1313|ppo_ep: 1|act_loss: -0.03973388671875|cri_loss: 0.016021728515625|unsuper_loss: 0.0 average reward score: 2.576171875 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.29%) |Training time=0.64s (19.84%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 1314|ppo_ep: 1|act_loss: 0.1571044921875|cri_loss: 0.1307373046875|unsuper_loss: 0.0 average reward score: 2.505859375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.14%) |Training time=0.64s (19.83%) |Others=0.19 (6.03%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1315|ppo_ep: 1|act_loss: 0.005859375|cri_loss: 0.0635986328125|unsuper_loss: 0.0 average reward score: 2.625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.24%) |Training time=0.64s (19.81%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 1316|ppo_ep: 1|act_loss: 0.0859375|cri_loss: 0.09259033203125|unsuper_loss: 0.0 average reward score: 3.09375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.27%) |Training time=0.67s (20.68%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1317|ppo_ep: 1|act_loss: -0.08544921875|cri_loss: -0.002685546875|unsuper_loss: 0.0 average reward score: 4.08984375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.30s (71.50%) |Training time=0.72s (22.42%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 1318|ppo_ep: 1|act_loss: 0.24365234375|cri_loss: 0.162841796875|unsuper_loss: 0.0 average reward score: 2.02734375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.10%) |Training time=0.65s (19.88%) |Others=0.20 (6.02%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1319|ppo_ep: 1|act_loss: 0.105224609375|cri_loss: 0.08917236328125|unsuper_loss: 0.0 average reward score: 3.69140625 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.73%) |Training time=0.93s (25.71%) |Others=0.27 (7.56%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.41 epoch: 0|step: 1320|ppo_ep: 1|act_loss: 0.314208984375|cri_loss: 0.205322265625|unsuper_loss: 0.0 average reward score: 2.478515625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.04%) |Training time=0.64s (19.81%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1321|ppo_ep: 1|act_loss: 0.2490234375|cri_loss: 0.173828125|unsuper_loss: 0.0 average reward score: 2.46875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.72%) |Training time=0.66s (20.28%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1322|ppo_ep: 1|act_loss: 0.04229736328125|cri_loss: 0.07110595703125|unsuper_loss: 0.0 average reward score: 3.369140625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.39s (72.99%) |Training time=0.69s (21.19%) |Others=0.19 (5.82%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1323|ppo_ep: 1|act_loss: 0.34375|cri_loss: 0.2078857421875|unsuper_loss: 0.0 average reward score: 0.64013671875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.32%) |Training time=0.66s (20.38%) |Others=0.20 (6.30%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1324|ppo_ep: 1|act_loss: 0.1651611328125|cri_loss: 0.0941162109375|unsuper_loss: 0.0 average reward score: 1.048828125 ------------------------------------------------------------------------------------- |E2E latency=3.16s |Gather latency=0.00s (0.00%) |Generate time=2.32s (73.46%) |Training time=0.64s (20.34%) |Others=0.20 (6.20%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.41 epoch: 0|step: 1325|ppo_ep: 1|act_loss: 0.17333984375|cri_loss: 0.1168212890625|unsuper_loss: 0.0 average reward score: 4.42578125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.36%) |Training time=0.64s (19.77%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1326|ppo_ep: 1|act_loss: -0.251220703125|cri_loss: -0.07781982421875|unsuper_loss: 0.0 average reward score: 4.0859375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.31%) |Training time=0.64s (19.72%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1327|ppo_ep: 1|act_loss: -0.095703125|cri_loss: -0.004150390625|unsuper_loss: 0.0 average reward score: 1.79296875 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.69%) |Training time=0.93s (25.71%) |Others=0.27 (7.60%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.41 epoch: 0|step: 1328|ppo_ep: 1|act_loss: -0.2403564453125|cri_loss: -0.0701904296875|unsuper_loss: 0.0 average reward score: 3.50390625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.46%) |Training time=0.64s (19.72%) |Others=0.19 (5.82%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1329|ppo_ep: 1|act_loss: 0.051055908203125|cri_loss: 0.0743408203125|unsuper_loss: 0.0 average reward score: 3.125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.04%) |Training time=0.64s (19.84%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1330|ppo_ep: 1|act_loss: -0.0107421875|cri_loss: 0.01141357421875|unsuper_loss: 0.0 average reward score: 2.833984375 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.95%) |Training time=0.64s (20.11%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 1331|ppo_ep: 1|act_loss: 0.1925048828125|cri_loss: 0.1556396484375|unsuper_loss: 0.0 average reward score: 2.0859375 ------------------------------------------------------------------------------------- |E2E latency=3.12s |Gather latency=0.00s (0.00%) |Generate time=2.28s (73.07%) |Training time=0.65s (20.77%) |Others=0.19 (6.16%)|CurSamplesPerSec=2.56 |AvgSamplesPerSec=2.41 epoch: 0|step: 1332|ppo_ep: 1|act_loss: 0.1033935546875|cri_loss: 0.11181640625|unsuper_loss: 0.0 average reward score: 1.068359375 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.75%) |Training time=0.64s (20.08%) |Others=0.20 (6.17%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 1333|ppo_ep: 1|act_loss: 0.015838623046875|cri_loss: 0.03839111328125|unsuper_loss: 0.0 average reward score: 2.78125 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.98%) |Training time=0.64s (20.08%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 1334|ppo_ep: 1|act_loss: 0.1015625|cri_loss: 0.0855712890625|unsuper_loss: 0.0 average reward score: 2.421875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.35%) |Training time=0.64s (19.65%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1335|ppo_ep: 1|act_loss: -0.1224365234375|cri_loss: -0.02484130859375|unsuper_loss: 0.0 average reward score: 3.53125 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.39s (66.62%) |Training time=0.92s (25.66%) |Others=0.28 (7.72%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.41 epoch: 0|step: 1336|ppo_ep: 1|act_loss: 0.039642333984375|cri_loss: 0.046875|unsuper_loss: 0.0 average reward score: 4.34375 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.35s (74.06%) |Training time=0.63s (19.90%) |Others=0.19 (6.04%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.41 epoch: 0|step: 1337|ppo_ep: 1|act_loss: 0.30419921875|cri_loss: 0.192626953125|unsuper_loss: 0.0 average reward score: 3.46484375 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.86%) |Training time=0.64s (20.13%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 1338|ppo_ep: 1|act_loss: -0.0494384765625|cri_loss: -0.0003662109375|unsuper_loss: 0.0 average reward score: 2.833984375 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.36s (74.00%) |Training time=0.64s (19.99%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 1339|ppo_ep: 1|act_loss: 0.09222412109375|cri_loss: 0.06842041015625|unsuper_loss: 0.0 average reward score: 3.4609375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.73%) |Training time=0.64s (19.45%) |Others=0.19 (5.82%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 1340|ppo_ep: 1|act_loss: -0.166259765625|cri_loss: -0.053619384765625|unsuper_loss: 0.0 average reward score: 1.8466796875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.18%) |Training time=0.64s (19.86%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1341|ppo_ep: 1|act_loss: 0.220947265625|cri_loss: 0.1424560546875|unsuper_loss: 0.0 average reward score: 2.8359375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.59%) |Training time=0.64s (19.55%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1342|ppo_ep: 1|act_loss: 0.45068359375|cri_loss: 0.29296875|unsuper_loss: 0.0 average reward score: 3.0625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.91%) |Training time=0.64s (19.94%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 1343|ppo_ep: 1|act_loss: 0.177734375|cri_loss: 0.1168212890625|unsuper_loss: 0.0 average reward score: 2.82421875 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.44s (67.11%) |Training time=0.92s (25.39%) |Others=0.27 (7.50%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.41 epoch: 0|step: 1344|ppo_ep: 1|act_loss: 0.216796875|cri_loss: 0.152099609375|unsuper_loss: 0.0 average reward score: 2.86328125 ------------------------------------------------------------------------------------- |E2E latency=3.15s |Gather latency=0.00s (0.00%) |Generate time=2.31s (73.48%) |Training time=0.64s (20.24%) |Others=0.20 (6.28%)|CurSamplesPerSec=2.54 |AvgSamplesPerSec=2.41 epoch: 0|step: 1345|ppo_ep: 1|act_loss: 0.01983642578125|cri_loss: 0.047210693359375|unsuper_loss: 0.0 average reward score: 3.60546875 ------------------------------------------------------------------------------------- |E2E latency=3.13s |Gather latency=0.00s (0.00%) |Generate time=2.29s (73.12%) |Training time=0.64s (20.53%) |Others=0.20 (6.35%)|CurSamplesPerSec=2.56 |AvgSamplesPerSec=2.41 epoch: 0|step: 1346|ppo_ep: 1|act_loss: 0.4638671875|cri_loss: 0.2861328125|unsuper_loss: 0.0 average reward score: 2.546875 ------------------------------------------------------------------------------------- |E2E latency=3.13s |Gather latency=0.00s (0.00%) |Generate time=2.29s (73.16%) |Training time=0.64s (20.48%) |Others=0.20 (6.36%)|CurSamplesPerSec=2.55 |AvgSamplesPerSec=2.41 epoch: 0|step: 1347|ppo_ep: 1|act_loss: 0.5029296875|cri_loss: 0.363037109375|unsuper_loss: 0.0 average reward score: 2.138671875 ------------------------------------------------------------------------------------- |E2E latency=3.11s |Gather latency=0.00s (0.00%) |Generate time=2.28s (73.35%) |Training time=0.64s (20.50%) |Others=0.19 (6.14%)|CurSamplesPerSec=2.58 |AvgSamplesPerSec=2.41 epoch: 0|step: 1348|ppo_ep: 1|act_loss: -0.05072021484375|cri_loss: 0.010284423828125|unsuper_loss: 0.0 average reward score: 2.341796875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.33%) |Training time=0.64s (19.67%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1349|ppo_ep: 1|act_loss: 0.0139617919921875|cri_loss: 0.04351806640625|unsuper_loss: 0.0 average reward score: 0.9775390625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.13%) |Training time=0.64s (19.94%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 1350|ppo_ep: 1|act_loss: 0.1256103515625|cri_loss: 0.1090087890625|unsuper_loss: 0.0 average reward score: 4.1484375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.07%) |Training time=0.65s (20.04%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1351|ppo_ep: 1|act_loss: -0.0460205078125|cri_loss: -0.001068115234375|unsuper_loss: 0.0 average reward score: 3.396484375 ------------------------------------------------------------------------------------- |E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.39s (65.66%) |Training time=0.97s (26.71%) |Others=0.28 (7.63%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.41 epoch: 0|step: 1352|ppo_ep: 1|act_loss: 0.32958984375|cri_loss: 0.233154296875|unsuper_loss: 0.0 average reward score: 2.5390625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.37%) |Training time=0.64s (19.78%) |Others=0.19 (5.85%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 1353|ppo_ep: 1|act_loss: 0.11859130859375|cri_loss: 0.09552001953125|unsuper_loss: 0.0 average reward score: 3.939453125 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.10%) |Training time=0.64s (19.95%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 1354|ppo_ep: 1|act_loss: 0.09552001953125|cri_loss: 0.08795166015625|unsuper_loss: 0.0 average reward score: 2.33203125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.34%) |Training time=0.64s (19.80%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1355|ppo_ep: 1|act_loss: 0.05328369140625|cri_loss: 0.051300048828125|unsuper_loss: 0.0 average reward score: 2.810546875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.20%) |Training time=0.64s (19.94%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1356|ppo_ep: 1|act_loss: -0.10626220703125|cri_loss: -0.03057861328125|unsuper_loss: 0.0 average reward score: 3.37109375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.37%) |Training time=0.64s (19.69%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1357|ppo_ep: 1|act_loss: 0.2479248046875|cri_loss: 0.1912841796875|unsuper_loss: 0.0 average reward score: 1.8408203125 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.32s (72.95%) |Training time=0.67s (21.06%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.41 epoch: 0|step: 1358|ppo_ep: 1|act_loss: 0.106689453125|cri_loss: 0.0694580078125|unsuper_loss: 0.0 average reward score: 1.66015625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.17%) |Training time=0.65s (19.86%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 [2023-04-24 15:02:50,613] [INFO] [logging.py:96:log_dist] [Rank 0] step=170, skipped=5, lr=[8.95574810302045e-06, 8.95574810302045e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 15:02:50,861] [INFO] [timer.py:199:stop] epoch=0/micro_step=1360/global_step=170, RunningAvgSamplesPerSec=15.445146591400649, CurrSamplesPerSec=15.698944771400862, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 15:02:51,060] [INFO] [logging.py:96:log_dist] [Rank 0] step=170, skipped=4, lr=[4.62941461969019e-06, 4.62941461969019e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 1359|ppo_ep: 1|act_loss: -0.029327392578125|cri_loss: 0.0020751953125|unsuper_loss: 0.0 average reward score: 2.158203125 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.39s (66.43%) |Training time=0.94s (25.96%) |Others=0.27 (7.61%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.41 epoch: 0|step: 1360|ppo_ep: 1|act_loss: 0.2164306640625|cri_loss: 0.12548828125|unsuper_loss: 0.0 average reward score: 2.796875 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.35s (74.04%) |Training time=0.63s (20.00%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.41 epoch: 0|step: 1361|ppo_ep: 1|act_loss: 0.031463623046875|cri_loss: 0.040740966796875|unsuper_loss: 0.0 average reward score: 2.681640625 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.38%) |Training time=0.64s (20.19%) |Others=0.20 (6.44%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.41 epoch: 0|step: 1362|ppo_ep: 1|act_loss: -0.04156494140625|cri_loss: -0.001739501953125|unsuper_loss: 0.0 average reward score: 3.41796875 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.32s (72.97%) |Training time=0.64s (20.11%) |Others=0.22 (6.92%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 1363|ppo_ep: 1|act_loss: 0.1522216796875|cri_loss: 0.12164306640625|unsuper_loss: 0.0 average reward score: 2.48046875 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.59%) |Training time=0.64s (19.47%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 1364|ppo_ep: 1|act_loss: 0.7060546875|cri_loss: 0.47900390625|unsuper_loss: 0.0 average reward score: 1.763671875 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (74.08%) |Training time=0.64s (19.86%) |Others=0.19 (6.06%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 1365|ppo_ep: 1|act_loss: 0.23681640625|cri_loss: 0.1448974609375|unsuper_loss: 0.0 average reward score: 3.18359375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.89%) |Training time=0.64s (19.96%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 1366|ppo_ep: 1|act_loss: 0.326171875|cri_loss: 0.20703125|unsuper_loss: 0.0 average reward score: 2.42578125 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.82%) |Training time=0.64s (20.07%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 1367|ppo_ep: 1|act_loss: 1.2333984375|cri_loss: 0.83349609375|unsuper_loss: 0.0 average reward score: 3.3359375 ------------------------------------------------------------------------------------- |E2E latency=3.59s |Gather latency=0.00s (0.00%) |Generate time=2.39s (66.65%) |Training time=0.92s (25.76%) |Others=0.27 (7.59%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.41 epoch: 0|step: 1368|ppo_ep: 1|act_loss: 0.59228515625|cri_loss: 0.38916015625|unsuper_loss: 0.0 average reward score: 3.279296875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.57%) |Training time=0.63s (19.62%) |Others=0.19 (5.81%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1369|ppo_ep: 1|act_loss: 0.2421875|cri_loss: 0.145751953125|unsuper_loss: 0.0 average reward score: 1.17578125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.07%) |Training time=0.67s (20.71%) |Others=0.20 (6.22%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1370|ppo_ep: 1|act_loss: 0.91845703125|cri_loss: 0.6337890625|unsuper_loss: 0.0 average reward score: 3.31640625 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.37s (71.47%) |Training time=0.75s (22.76%) |Others=0.19 (5.77%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.41 epoch: 0|step: 1371|ppo_ep: 1|act_loss: 0.6875|cri_loss: 0.4560546875|unsuper_loss: 0.0 average reward score: 3.646484375 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.37s (71.09%) |Training time=0.77s (23.11%) |Others=0.19 (5.80%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.41 epoch: 0|step: 1372|ppo_ep: 1|act_loss: 0.4228515625|cri_loss: 0.29736328125|unsuper_loss: 0.0 average reward score: 2.8984375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.38s (72.93%) |Training time=0.69s (21.16%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1373|ppo_ep: 1|act_loss: -0.1766357421875|cri_loss: -0.05987548828125|unsuper_loss: 0.0 average reward score: 4.17578125 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (74.02%) |Training time=0.64s (20.02%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 1374|ppo_ep: 1|act_loss: 0.4853515625|cri_loss: 0.31640625|unsuper_loss: 0.0 average reward score: 4.2109375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.38%) |Training time=0.64s (19.98%) |Others=0.21 (6.64%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1375|ppo_ep: 1|act_loss: 0.0230712890625|cri_loss: 0.027313232421875|unsuper_loss: 0.0 average reward score: 3.029296875 ------------------------------------------------------------------------------------- |E2E latency=3.57s |Gather latency=0.00s (0.00%) |Generate time=2.37s (66.45%) |Training time=0.92s (25.84%) |Others=0.28 (7.71%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.41 epoch: 0|step: 1376|ppo_ep: 1|act_loss: -0.1005859375|cri_loss: -0.021209716796875|unsuper_loss: 0.0 average reward score: 3.79296875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.18%) |Training time=0.66s (20.71%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 1377|ppo_ep: 1|act_loss: 0.578125|cri_loss: 0.36083984375|unsuper_loss: 0.0 average reward score: 2.724609375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.15%) |Training time=0.64s (19.89%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1378|ppo_ep: 1|act_loss: -0.2044677734375|cri_loss: -0.07666015625|unsuper_loss: 0.0 average reward score: 2.60546875 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.63%) |Training time=0.64s (20.27%) |Others=0.19 (6.10%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.41 epoch: 0|step: 1379|ppo_ep: 1|act_loss: -0.1295166015625|cri_loss: -0.0499267578125|unsuper_loss: 0.0 average reward score: 3.41796875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.30%) |Training time=0.64s (19.75%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1380|ppo_ep: 1|act_loss: 0.0692138671875|cri_loss: 0.053924560546875|unsuper_loss: 0.0 average reward score: 3.98828125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.20%) |Training time=0.64s (19.86%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1381|ppo_ep: 1|act_loss: 0.09405517578125|cri_loss: 0.10040283203125|unsuper_loss: 0.0 average reward score: 3.6640625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.32%) |Training time=0.64s (19.78%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1382|ppo_ep: 1|act_loss: 0.48974609375|cri_loss: 0.32568359375|unsuper_loss: 0.0 average reward score: 3.11328125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.32%) |Training time=0.64s (19.76%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1383|ppo_ep: 1|act_loss: -0.0377197265625|cri_loss: 0.02557373046875|unsuper_loss: 0.0 average reward score: 2.373046875 ------------------------------------------------------------------------------------- |E2E latency=3.59s |Gather latency=0.00s (0.00%) |Generate time=2.40s (66.73%) |Training time=0.92s (25.71%) |Others=0.27 (7.57%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.41 epoch: 0|step: 1384|ppo_ep: 1|act_loss: 0.099609375|cri_loss: 0.0792236328125|unsuper_loss: 0.0 average reward score: 1.744140625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.24%) |Training time=0.64s (19.77%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1385|ppo_ep: 1|act_loss: -0.16357421875|cri_loss: -0.05908203125|unsuper_loss: 0.0 average reward score: 1.9990234375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.30%) |Training time=0.64s (19.70%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1386|ppo_ep: 1|act_loss: -0.0589599609375|cri_loss: -0.012298583984375|unsuper_loss: 0.0 average reward score: 3.8359375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.24%) |Training time=0.64s (19.81%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1387|ppo_ep: 1|act_loss: -0.050750732421875|cri_loss: -0.001068115234375|unsuper_loss: 0.0 average reward score: 2.85546875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.34%) |Training time=0.63s (19.63%) |Others=0.19 (6.02%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1388|ppo_ep: 1|act_loss: 0.0157623291015625|cri_loss: 0.04681396484375|unsuper_loss: 0.0 average reward score: 2.3515625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.09%) |Training time=0.64s (19.81%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1389|ppo_ep: 1|act_loss: 0.0040130615234375|cri_loss: 0.032958984375|unsuper_loss: 0.0 average reward score: 3.14453125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.99%) |Training time=0.64s (19.99%) |Others=0.19 (6.02%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1390|ppo_ep: 1|act_loss: -0.1544189453125|cri_loss: -0.0484619140625|unsuper_loss: 0.0 average reward score: 3.66015625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.19%) |Training time=0.64s (19.66%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 1391|ppo_ep: 1|act_loss: 0.041748046875|cri_loss: 0.07269287109375|unsuper_loss: 0.0 average reward score: 3.98828125 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.78%) |Training time=0.92s (25.57%) |Others=0.28 (7.65%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.41 epoch: 0|step: 1392|ppo_ep: 1|act_loss: -0.1710205078125|cri_loss: -0.057891845703125|unsuper_loss: 0.0 average reward score: 3.537109375 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.84%) |Training time=0.64s (19.13%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.41 epoch: 0|step: 1393|ppo_ep: 1|act_loss: -0.101318359375|cri_loss: -0.01605224609375|unsuper_loss: 0.0 average reward score: 4.0859375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.24%) |Training time=0.64s (19.76%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1394|ppo_ep: 1|act_loss: -0.2305908203125|cri_loss: -0.0848388671875|unsuper_loss: 0.0 average reward score: 3.642578125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.11%) |Training time=0.65s (19.94%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1395|ppo_ep: 1|act_loss: -0.173828125|cri_loss: -0.052978515625|unsuper_loss: 0.0 average reward score: 3.154296875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.07%) |Training time=0.65s (19.91%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1396|ppo_ep: 1|act_loss: -0.126220703125|cri_loss: -0.045166015625|unsuper_loss: 0.0 average reward score: 2.67578125 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.98%) |Training time=0.64s (19.91%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 1397|ppo_ep: 1|act_loss: -0.228759765625|cri_loss: -0.087646484375|unsuper_loss: 0.0 average reward score: 2.337890625 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.42%) |Training time=0.66s (20.49%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 1398|ppo_ep: 1|act_loss: -0.1949462890625|cri_loss: -0.0467529296875|unsuper_loss: 0.0 average reward score: 3.123046875 ------------------------------------------------------------------------------------- |E2E latency=3.15s |Gather latency=0.00s (0.00%) |Generate time=2.31s (73.42%) |Training time=0.64s (20.30%) |Others=0.20 (6.28%)|CurSamplesPerSec=2.54 |AvgSamplesPerSec=2.41 epoch: 0|step: 1399|ppo_ep: 1|act_loss: -0.19921875|cri_loss: -0.05877685546875|unsuper_loss: 0.0 average reward score: 3.34375 ------------------------------------------------------------------------------------- |E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.30s (65.73%) |Training time=0.92s (26.33%) |Others=0.28 (7.95%)|CurSamplesPerSec=2.28 |AvgSamplesPerSec=2.41 epoch: 0|step: 1400|ppo_ep: 1|act_loss: -0.2357177734375|cri_loss: -0.06396484375|unsuper_loss: 0.0 average reward score: 3.564453125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.19%) |Training time=0.64s (19.77%) |Others=0.19 (6.03%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1401|ppo_ep: 1|act_loss: -0.228271484375|cri_loss: -0.05987548828125|unsuper_loss: 0.0 average reward score: 2.99609375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.16%) |Training time=0.64s (19.88%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1402|ppo_ep: 1|act_loss: -0.096435546875|cri_loss: -0.027313232421875|unsuper_loss: 0.0 average reward score: 2.796875 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.86%) |Training time=0.64s (19.99%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 1403|ppo_ep: 1|act_loss: -0.127197265625|cri_loss: -0.03192138671875|unsuper_loss: 0.0 average reward score: 2.80859375 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.77%) |Training time=0.64s (20.09%) |Others=0.20 (6.14%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 1404|ppo_ep: 1|act_loss: -0.079833984375|cri_loss: -0.00830078125|unsuper_loss: 0.0 average reward score: 3.06640625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.27%) |Training time=0.64s (19.70%) |Others=0.20 (6.02%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1405|ppo_ep: 1|act_loss: -0.0858154296875|cri_loss: 0.004638671875|unsuper_loss: 0.0 average reward score: 4.7578125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.06%) |Training time=0.65s (19.91%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1406|ppo_ep: 1|act_loss: -0.187744140625|cri_loss: -0.06170654296875|unsuper_loss: 0.0 average reward score: 2.8984375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.14%) |Training time=0.64s (19.76%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1407|ppo_ep: 1|act_loss: -0.1385498046875|cri_loss: -0.04290771484375|unsuper_loss: 0.0 average reward score: 3.75390625 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.40s (66.47%) |Training time=0.93s (25.80%) |Others=0.28 (7.73%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.41 epoch: 0|step: 1408|ppo_ep: 1|act_loss: -0.18798828125|cri_loss: -0.05145263671875|unsuper_loss: 0.0 average reward score: 3.73828125 ------------------------------------------------------------------------------------- |E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.38s (67.52%) |Training time=0.73s (20.64%) |Others=0.42 (11.84%)|CurSamplesPerSec=2.27 |AvgSamplesPerSec=2.41 epoch: 0|step: 1409|ppo_ep: 1|act_loss: -0.1973876953125|cri_loss: -0.059814453125|unsuper_loss: 0.0 average reward score: 4.41015625 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.31%) |Training time=0.64s (19.39%) |Others=0.21 (6.30%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.41 epoch: 0|step: 1410|ppo_ep: 1|act_loss: -0.1300048828125|cri_loss: -0.02685546875|unsuper_loss: 0.0 average reward score: 3.4921875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.37%) |Training time=0.64s (19.76%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1411|ppo_ep: 1|act_loss: -0.1416015625|cri_loss: -0.04571533203125|unsuper_loss: 0.0 average reward score: 4.3515625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.39%) |Training time=0.64s (19.74%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1412|ppo_ep: 1|act_loss: -0.1427001953125|cri_loss: -0.04620361328125|unsuper_loss: 0.0 average reward score: 2.8359375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.11%) |Training time=0.64s (19.86%) |Others=0.19 (6.03%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1413|ppo_ep: 1|act_loss: -0.0732421875|cri_loss: -0.015289306640625|unsuper_loss: 0.0 average reward score: 4.18359375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.12%) |Training time=0.64s (19.82%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1414|ppo_ep: 1|act_loss: -0.239501953125|cri_loss: -0.06427001953125|unsuper_loss: 0.0 average reward score: 4.32421875 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.14%) |Training time=0.64s (19.97%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1415|ppo_ep: 1|act_loss: -0.2493896484375|cri_loss: -0.0980224609375|unsuper_loss: 0.0 average reward score: 3.025390625 ------------------------------------------------------------------------------------- |E2E latency=3.59s |Gather latency=0.00s (0.00%) |Generate time=2.39s (66.66%) |Training time=0.92s (25.73%) |Others=0.27 (7.61%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.41 epoch: 0|step: 1416|ppo_ep: 1|act_loss: -0.093017578125|cri_loss: -0.0208740234375|unsuper_loss: 0.0 average reward score: 2.830078125 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.17%) |Training time=0.64s (19.93%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 1417|ppo_ep: 1|act_loss: -0.160888671875|cri_loss: -0.0477294921875|unsuper_loss: 0.0 average reward score: 3.140625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (74.04%) |Training time=0.64s (19.91%) |Others=0.19 (6.05%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 1418|ppo_ep: 1|act_loss: -0.1580810546875|cri_loss: -0.04608154296875|unsuper_loss: 0.0 average reward score: 4.890625 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.80%) |Training time=0.64s (20.14%) |Others=0.19 (6.06%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 1419|ppo_ep: 1|act_loss: -0.1585693359375|cri_loss: -0.0516357421875|unsuper_loss: 0.0 average reward score: 3.306640625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.16%) |Training time=0.64s (19.95%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1420|ppo_ep: 1|act_loss: -0.071533203125|cri_loss: -0.0232086181640625|unsuper_loss: 0.0 average reward score: 3.85546875 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.50%) |Training time=0.65s (19.68%) |Others=0.19 (5.81%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 1421|ppo_ep: 1|act_loss: -0.190185546875|cri_loss: -0.06866455078125|unsuper_loss: 0.0 average reward score: 3.74609375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.37%) |Training time=0.64s (19.74%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1422|ppo_ep: 1|act_loss: -0.102783203125|cri_loss: -0.02374267578125|unsuper_loss: 0.0 average reward score: 3.556640625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.24%) |Training time=0.64s (19.75%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1423|ppo_ep: 1|act_loss: -0.146728515625|cri_loss: -0.0286865234375|unsuper_loss: 0.0 average reward score: 4.25390625 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.42s (66.91%) |Training time=0.92s (25.48%) |Others=0.28 (7.61%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.41 epoch: 0|step: 1424|ppo_ep: 1|act_loss: -0.03466796875|cri_loss: -0.005615234375|unsuper_loss: 0.0 average reward score: 2.671875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.56%) |Training time=0.64s (19.60%) |Others=0.19 (5.84%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1425|ppo_ep: 1|act_loss: 0.083740234375|cri_loss: 0.0552978515625|unsuper_loss: 0.0 average reward score: 3.1328125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.24%) |Training time=0.65s (19.81%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1426|ppo_ep: 1|act_loss: -0.142578125|cri_loss: -0.05718994140625|unsuper_loss: 0.0 average reward score: 4.3515625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.38%) |Training time=0.64s (19.63%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1427|ppo_ep: 1|act_loss: -0.021820068359375|cri_loss: 0.009246826171875|unsuper_loss: 0.0 average reward score: 3.671875 ------------------------------------------------------------------------------------- |E2E latency=3.95s |Gather latency=0.00s (0.00%) |Generate time=3.10s (78.41%) |Training time=0.65s (16.37%) |Others=0.21 (5.22%)|CurSamplesPerSec=2.02 |AvgSamplesPerSec=2.41 epoch: 0|step: 1428|ppo_ep: 1|act_loss: -0.07672119140625|cri_loss: -0.00897216796875|unsuper_loss: 0.0 average reward score: 2.509765625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.17%) |Training time=0.64s (19.79%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1429|ppo_ep: 1|act_loss: 0.002410888671875|cri_loss: 0.0229644775390625|unsuper_loss: 0.0 average reward score: 3.83984375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.94%) |Training time=0.64s (20.00%) |Others=0.19 (6.06%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 1430|ppo_ep: 1|act_loss: -0.144287109375|cri_loss: -0.04632568359375|unsuper_loss: 0.0 average reward score: 3.490234375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.28%) |Training time=0.64s (19.74%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1431|ppo_ep: 1|act_loss: -0.06585693359375|cri_loss: -0.0183868408203125|unsuper_loss: 0.0 average reward score: 3.6015625 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.40s (66.56%) |Training time=0.93s (25.84%) |Others=0.27 (7.60%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.41 epoch: 0|step: 1432|ppo_ep: 1|act_loss: -0.0096435546875|cri_loss: 0.0186004638671875|unsuper_loss: 0.0 average reward score: 4.171875 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.62%) |Training time=0.64s (19.54%) |Others=0.19 (5.84%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 1433|ppo_ep: 1|act_loss: -0.09759521484375|cri_loss: -0.03692626953125|unsuper_loss: 0.0 average reward score: 3.890625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.86%) |Training time=0.64s (19.99%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 1434|ppo_ep: 1|act_loss: 0.10479736328125|cri_loss: 0.06121826171875|unsuper_loss: 0.0 average reward score: 2.83203125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.66%) |Training time=0.64s (19.54%) |Others=0.19 (5.80%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1435|ppo_ep: 1|act_loss: 0.01959228515625|cri_loss: 0.030426025390625|unsuper_loss: 0.0 average reward score: 3.5859375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.24%) |Training time=0.64s (19.78%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1436|ppo_ep: 1|act_loss: -0.1041259765625|cri_loss: -0.03009033203125|unsuper_loss: 0.0 average reward score: 4.0078125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.02%) |Training time=0.64s (19.66%) |Others=0.21 (6.33%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1437|ppo_ep: 1|act_loss: -0.0173797607421875|cri_loss: 0.0016326904296875|unsuper_loss: 0.0 average reward score: 3.91796875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.93%) |Training time=0.65s (20.01%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 1438|ppo_ep: 1|act_loss: 0.0513916015625|cri_loss: 0.04205322265625|unsuper_loss: 0.0 average reward score: 4.4296875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.30%) |Training time=0.64s (19.69%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 [2023-04-24 15:07:13,969] [INFO] [logging.py:96:log_dist] [Rank 0] step=180, skipped=5, lr=[8.73324077812118e-06, 8.73324077812118e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 15:07:14,217] [INFO] [timer.py:199:stop] epoch=0/micro_step=1440/global_step=180, RunningAvgSamplesPerSec=15.45828454461189, CurrSamplesPerSec=15.77624852748995, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 15:07:14,468] [INFO] [logging.py:96:log_dist] [Rank 0] step=180, skipped=4, lr=[4.512675132818908e-06, 4.512675132818908e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 1439|ppo_ep: 1|act_loss: -0.0740966796875|cri_loss: -0.0211181640625|unsuper_loss: 0.0 average reward score: 3.55859375 ------------------------------------------------------------------------------------- |E2E latency=3.66s |Gather latency=0.00s (0.00%) |Generate time=2.41s (65.74%) |Training time=0.93s (25.37%) |Others=0.33 (8.89%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.41 epoch: 0|step: 1440|ppo_ep: 1|act_loss: 0.0217742919921875|cri_loss: 0.01806640625|unsuper_loss: 0.0 average reward score: 3.21484375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.48%) |Training time=0.67s (20.61%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1441|ppo_ep: 1|act_loss: 0.033538818359375|cri_loss: 0.02520751953125|unsuper_loss: 0.0 average reward score: 2.953125 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.51%) |Training time=0.68s (20.63%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 1442|ppo_ep: 1|act_loss: -0.0005645751953125|cri_loss: 0.0095367431640625|unsuper_loss: 0.0 average reward score: 2.74609375 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.41s (72.86%) |Training time=0.70s (21.20%) |Others=0.20 (5.94%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.41 epoch: 0|step: 1443|ppo_ep: 1|act_loss: -0.07879638671875|cri_loss: -0.02978515625|unsuper_loss: 0.0 average reward score: 3.775390625 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.31%) |Training time=0.66s (20.19%) |Others=0.21 (6.50%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 1444|ppo_ep: 1|act_loss: 0.043731689453125|cri_loss: 0.036468505859375|unsuper_loss: 0.0 average reward score: 4.11328125 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.40s (72.37%) |Training time=0.70s (21.16%) |Others=0.21 (6.47%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.41 epoch: 0|step: 1445|ppo_ep: 1|act_loss: 0.04718017578125|cri_loss: 0.0289459228515625|unsuper_loss: 0.0 average reward score: 4.1015625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.95%) |Training time=0.66s (20.22%) |Others=0.19 (5.83%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 1446|ppo_ep: 1|act_loss: 0.04901123046875|cri_loss: 0.036834716796875|unsuper_loss: 0.0 average reward score: 2.818359375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.01%) |Training time=0.64s (19.94%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1447|ppo_ep: 1|act_loss: 0.05078125|cri_loss: 0.0299530029296875|unsuper_loss: 0.0 average reward score: 4.69921875 ------------------------------------------------------------------------------------- |E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.39s (65.99%) |Training time=0.95s (26.31%) |Others=0.28 (7.70%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.41 epoch: 0|step: 1448|ppo_ep: 1|act_loss: -0.00469970703125|cri_loss: 0.0073394775390625|unsuper_loss: 0.0 average reward score: 2.869140625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.48%) |Training time=0.64s (19.58%) |Others=0.20 (5.94%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 1449|ppo_ep: 1|act_loss: 0.1416015625|cri_loss: 0.08599853515625|unsuper_loss: 0.0 average reward score: 3.63671875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.34s (72.21%) |Training time=0.71s (21.83%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1450|ppo_ep: 1|act_loss: 0.405029296875|cri_loss: 0.246337890625|unsuper_loss: 0.0 average reward score: 2.49609375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.78%) |Training time=0.66s (20.20%) |Others=0.20 (6.02%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1451|ppo_ep: 1|act_loss: 0.145263671875|cri_loss: 0.08392333984375|unsuper_loss: 0.0 average reward score: 2.48046875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.38%) |Training time=0.65s (19.59%) |Others=0.20 (6.02%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 1452|ppo_ep: 1|act_loss: 0.3544921875|cri_loss: 0.2255859375|unsuper_loss: 0.0 average reward score: 3.19921875 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.31%) |Training time=0.64s (19.95%) |Others=0.22 (6.74%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 1453|ppo_ep: 1|act_loss: 0.2269287109375|cri_loss: 0.1337890625|unsuper_loss: 0.0 average reward score: 2.466796875 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.34%) |Training time=0.65s (20.57%) |Others=0.19 (6.08%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 1454|ppo_ep: 1|act_loss: -0.00318145751953125|cri_loss: 0.00511932373046875|unsuper_loss: 0.0 average reward score: 3.65234375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.95%) |Training time=0.64s (19.78%) |Others=0.20 (6.27%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1455|ppo_ep: 1|act_loss: 0.02783203125|cri_loss: 0.0237274169921875|unsuper_loss: 0.0 average reward score: 3.734375 ------------------------------------------------------------------------------------- |E2E latency=3.66s |Gather latency=0.00s (0.00%) |Generate time=2.43s (66.44%) |Training time=0.95s (26.06%) |Others=0.27 (7.50%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.41 epoch: 0|step: 1456|ppo_ep: 1|act_loss: 0.1287841796875|cri_loss: 0.07928466796875|unsuper_loss: 0.0 average reward score: 3.828125 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.39s (72.80%) |Training time=0.67s (20.41%) |Others=0.22 (6.79%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 1457|ppo_ep: 1|act_loss: 0.3232421875|cri_loss: 0.188232421875|unsuper_loss: 0.0 average reward score: 4.0078125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.77%) |Training time=0.66s (20.25%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1458|ppo_ep: 1|act_loss: 0.0924072265625|cri_loss: 0.05010986328125|unsuper_loss: 0.0 average reward score: 3.634765625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.98%) |Training time=0.65s (19.93%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 1459|ppo_ep: 1|act_loss: -0.04205322265625|cri_loss: -0.0125885009765625|unsuper_loss: 0.0 average reward score: 3.990234375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.48%) |Training time=0.64s (19.66%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 1460|ppo_ep: 1|act_loss: 0.05145263671875|cri_loss: 0.0364990234375|unsuper_loss: 0.0 average reward score: 3.15625 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.06%) |Training time=0.64s (19.48%) |Others=0.21 (6.46%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 1461|ppo_ep: 1|act_loss: 0.07794189453125|cri_loss: 0.046600341796875|unsuper_loss: 0.0 average reward score: 3.8828125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.31s (71.53%) |Training time=0.72s (22.15%) |Others=0.20 (6.32%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1462|ppo_ep: 1|act_loss: 0.12432861328125|cri_loss: 0.06927490234375|unsuper_loss: 0.0 average reward score: 3.34765625 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.32s (72.70%) |Training time=0.67s (21.05%) |Others=0.20 (6.24%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 1463|ppo_ep: 1|act_loss: 0.089599609375|cri_loss: 0.05120849609375|unsuper_loss: 0.0 average reward score: 3.939453125 ------------------------------------------------------------------------------------- |E2E latency=3.75s |Gather latency=0.00s (0.00%) |Generate time=2.54s (67.59%) |Training time=0.93s (24.79%) |Others=0.29 (7.62%)|CurSamplesPerSec=2.13 |AvgSamplesPerSec=2.41 epoch: 0|step: 1464|ppo_ep: 1|act_loss: -0.04791259765625|cri_loss: -0.0141448974609375|unsuper_loss: 0.0 average reward score: 3.68359375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.57%) |Training time=0.66s (20.29%) |Others=0.20 (6.14%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1465|ppo_ep: 1|act_loss: 0.045745849609375|cri_loss: 0.02935791015625|unsuper_loss: 0.0 average reward score: 2.908203125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.18%) |Training time=0.67s (20.50%) |Others=0.21 (6.32%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1466|ppo_ep: 1|act_loss: 0.146728515625|cri_loss: 0.07952880859375|unsuper_loss: 0.0 average reward score: 4.0859375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.99%) |Training time=0.65s (19.95%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1467|ppo_ep: 1|act_loss: 0.1444091796875|cri_loss: 0.08245849609375|unsuper_loss: 0.0 average reward score: 3.26953125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.31%) |Training time=0.65s (19.79%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1468|ppo_ep: 1|act_loss: 0.058990478515625|cri_loss: 0.038360595703125|unsuper_loss: 0.0 average reward score: 2.83203125 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.65%) |Training time=0.64s (19.44%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 1469|ppo_ep: 1|act_loss: 0.019866943359375|cri_loss: 0.021484375|unsuper_loss: 0.0 average reward score: 5.0703125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.17%) |Training time=0.64s (19.93%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1470|ppo_ep: 1|act_loss: 0.035919189453125|cri_loss: 0.0309600830078125|unsuper_loss: 0.0 average reward score: 4.234375 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.67%) |Training time=0.64s (20.07%) |Others=0.20 (6.26%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 1471|ppo_ep: 1|act_loss: -0.0166015625|cri_loss: -0.0015869140625|unsuper_loss: 0.0 average reward score: 4.578125 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.84%) |Training time=0.92s (25.52%) |Others=0.28 (7.64%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.41 epoch: 0|step: 1472|ppo_ep: 1|act_loss: 0.11669921875|cri_loss: 0.0694580078125|unsuper_loss: 0.0 average reward score: 4.8828125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.80%) |Training time=0.65s (20.09%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1473|ppo_ep: 1|act_loss: -0.08990478515625|cri_loss: -0.0297088623046875|unsuper_loss: 0.0 average reward score: 4.2734375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.35s (72.63%) |Training time=0.69s (21.34%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1474|ppo_ep: 1|act_loss: 0.248291015625|cri_loss: 0.155517578125|unsuper_loss: 0.0 average reward score: 2.755859375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.35s (72.23%) |Training time=0.70s (21.69%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1475|ppo_ep: 1|act_loss: 0.0267181396484375|cri_loss: 0.022308349609375|unsuper_loss: 0.0 average reward score: 4.328125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.33s (72.11%) |Training time=0.71s (21.97%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1476|ppo_ep: 1|act_loss: 0.0049285888671875|cri_loss: 0.007068634033203125|unsuper_loss: 0.0 average reward score: 3.177734375 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.45s (73.91%) |Training time=0.67s (20.12%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.41 epoch: 0|step: 1477|ppo_ep: 1|act_loss: -0.00030517578125|cri_loss: 0.0075531005859375|unsuper_loss: 0.0 average reward score: 4.30078125 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.06%) |Training time=0.67s (20.27%) |Others=0.22 (6.67%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 1478|ppo_ep: 1|act_loss: -0.08154296875|cri_loss: -0.017852783203125|unsuper_loss: 0.0 average reward score: 4.328125 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.33%) |Training time=0.65s (19.67%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 1479|ppo_ep: 1|act_loss: 0.11285400390625|cri_loss: 0.07183837890625|unsuper_loss: 0.0 average reward score: 3.23828125 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.42s (66.89%) |Training time=0.92s (25.52%) |Others=0.28 (7.60%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.41 epoch: 0|step: 1480|ppo_ep: 1|act_loss: 0.030609130859375|cri_loss: 0.02764892578125|unsuper_loss: 0.0 average reward score: 4.6171875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.46%) |Training time=0.64s (19.63%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1481|ppo_ep: 1|act_loss: 0.02337646484375|cri_loss: 0.0191497802734375|unsuper_loss: 0.0 average reward score: 4.46875 ------------------------------------------------------------------------------------- |E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.70s (76.14%) |Training time=0.65s (18.30%) |Others=0.20 (5.56%)|CurSamplesPerSec=2.26 |AvgSamplesPerSec=2.41 epoch: 0|step: 1482|ppo_ep: 1|act_loss: 0.1429443359375|cri_loss: 0.08551025390625|unsuper_loss: 0.0 average reward score: 3.66015625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.17%) |Training time=0.64s (19.62%) |Others=0.20 (6.21%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1483|ppo_ep: 1|act_loss: -0.03692626953125|cri_loss: -0.009979248046875|unsuper_loss: 0.0 average reward score: 3.6484375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.27%) |Training time=0.64s (19.69%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1484|ppo_ep: 1|act_loss: 0.160400390625|cri_loss: 0.1043701171875|unsuper_loss: 0.0 average reward score: 2.41796875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.38%) |Training time=0.65s (20.41%) |Others=0.20 (6.21%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 1485|ppo_ep: 1|act_loss: -0.0054473876953125|cri_loss: 0.006988525390625|unsuper_loss: 0.0 average reward score: 3.322265625 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.24%) |Training time=0.64s (19.55%) |Others=0.20 (6.21%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 1486|ppo_ep: 1|act_loss: 0.0677490234375|cri_loss: 0.045013427734375|unsuper_loss: 0.0 average reward score: 2.71875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.94%) |Training time=0.64s (19.94%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1487|ppo_ep: 1|act_loss: 0.09716796875|cri_loss: 0.057861328125|unsuper_loss: 0.0 average reward score: 3.189453125 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.38s (66.29%) |Training time=0.92s (25.80%) |Others=0.28 (7.91%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.41 epoch: 0|step: 1488|ppo_ep: 1|act_loss: 0.03118896484375|cri_loss: 0.02166748046875|unsuper_loss: 0.0 average reward score: 3.83203125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.43%) |Training time=0.64s (19.63%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1489|ppo_ep: 1|act_loss: -0.029541015625|cri_loss: -0.0030975341796875|unsuper_loss: 0.0 average reward score: 4.45703125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.28%) |Training time=0.65s (19.73%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1490|ppo_ep: 1|act_loss: -0.1566162109375|cri_loss: -0.06536865234375|unsuper_loss: 0.0 average reward score: 3.142578125 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.61%) |Training time=0.64s (19.41%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.41 epoch: 0|step: 1491|ppo_ep: 1|act_loss: -0.0867919921875|cri_loss: -0.0291290283203125|unsuper_loss: 0.0 average reward score: 3.75390625 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.72%) |Training time=0.64s (19.40%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.41 epoch: 0|step: 1492|ppo_ep: 1|act_loss: 0.0107574462890625|cri_loss: 0.0171356201171875|unsuper_loss: 0.0 average reward score: 3.578125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.97%) |Training time=0.64s (19.92%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1493|ppo_ep: 1|act_loss: -0.00042724609375|cri_loss: 0.0140533447265625|unsuper_loss: 0.0 average reward score: 3.869140625 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.01%) |Training time=0.65s (19.48%) |Others=0.22 (6.51%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.41 epoch: 0|step: 1494|ppo_ep: 1|act_loss: 0.140625|cri_loss: 0.0863037109375|unsuper_loss: 0.0 average reward score: 3.26953125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.50%) |Training time=0.67s (20.63%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1495|ppo_ep: 1|act_loss: 0.0153045654296875|cri_loss: 0.016143798828125|unsuper_loss: 0.0 average reward score: 4.4140625 ------------------------------------------------------------------------------------- |E2E latency=3.67s |Gather latency=0.00s (0.00%) |Generate time=2.46s (67.08%) |Training time=0.93s (25.40%) |Others=0.28 (7.52%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.41 epoch: 0|step: 1496|ppo_ep: 1|act_loss: -0.06640625|cri_loss: -0.02545166015625|unsuper_loss: 0.0 average reward score: 3.6171875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.97%) |Training time=0.65s (19.77%) |Others=0.20 (6.26%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1497|ppo_ep: 1|act_loss: 0.160888671875|cri_loss: 0.0985107421875|unsuper_loss: 0.0 average reward score: 3.6171875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.61%) |Training time=0.65s (20.03%) |Others=0.21 (6.36%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1498|ppo_ep: 1|act_loss: -0.11474609375|cri_loss: -0.044708251953125|unsuper_loss: 0.0 average reward score: 2.365234375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.35s (71.84%) |Training time=0.72s (22.10%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 1499|ppo_ep: 1|act_loss: -0.0125732421875|cri_loss: 0.00276947021484375|unsuper_loss: 0.0 average reward score: 3.7109375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.39s (72.71%) |Training time=0.65s (19.74%) |Others=0.25 (7.55%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 1500|ppo_ep: 1|act_loss: -0.040130615234375|cri_loss: -0.00189208984375|unsuper_loss: 0.0 average reward score: 3.8203125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.33s (71.91%) |Training time=0.71s (21.77%) |Others=0.20 (6.32%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1501|ppo_ep: 1|act_loss: 0.042999267578125|cri_loss: 0.03387451171875|unsuper_loss: 0.0 average reward score: 4.453125 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.28s (71.89%) |Training time=0.70s (22.02%) |Others=0.19 (6.09%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.41 epoch: 0|step: 1502|ppo_ep: 1|act_loss: 0.0186309814453125|cri_loss: 0.0174102783203125|unsuper_loss: 0.0 average reward score: 4.14453125 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.02%) |Training time=0.65s (19.79%) |Others=0.20 (6.19%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.41 epoch: 0|step: 1503|ppo_ep: 1|act_loss: -0.0018310546875|cri_loss: 0.0193023681640625|unsuper_loss: 0.0 average reward score: 4.0078125 ------------------------------------------------------------------------------------- |E2E latency=3.67s |Gather latency=0.00s (0.00%) |Generate time=2.46s (67.02%) |Training time=0.93s (25.35%) |Others=0.28 (7.63%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.41 epoch: 0|step: 1504|ppo_ep: 1|act_loss: 0.240234375|cri_loss: 0.14697265625|unsuper_loss: 0.0 average reward score: 3.2734375 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.31s (72.98%) |Training time=0.66s (20.92%) |Others=0.19 (6.10%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.41 epoch: 0|step: 1505|ppo_ep: 1|act_loss: -0.041259765625|cri_loss: 0.002288818359375|unsuper_loss: 0.0 average reward score: 2.58984375 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.31s (71.76%) |Training time=0.71s (22.22%) |Others=0.19 (6.02%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 1506|ppo_ep: 1|act_loss: 0.2445068359375|cri_loss: 0.1466064453125|unsuper_loss: 0.0 average reward score: 2.85546875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.31s (71.98%) |Training time=0.70s (21.87%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 1507|ppo_ep: 1|act_loss: -0.0869140625|cri_loss: -0.022186279296875|unsuper_loss: 0.0 average reward score: 3.96484375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.30s (70.50%) |Training time=0.77s (23.63%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1508|ppo_ep: 1|act_loss: -0.02996826171875|cri_loss: 0.000579833984375|unsuper_loss: 0.0 average reward score: 3.11328125 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.33s (72.71%) |Training time=0.68s (21.16%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 1509|ppo_ep: 1|act_loss: 0.1806640625|cri_loss: 0.10009765625|unsuper_loss: 0.0 average reward score: 3.177734375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.41%) |Training time=0.64s (19.60%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1510|ppo_ep: 1|act_loss: -0.0836181640625|cri_loss: -0.020904541015625|unsuper_loss: 0.0 average reward score: 4.203125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.31s (71.32%) |Training time=0.73s (22.64%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1511|ppo_ep: 1|act_loss: -0.04534912109375|cri_loss: -0.0152740478515625|unsuper_loss: 0.0 average reward score: 4.26171875 ------------------------------------------------------------------------------------- |E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.30s (65.80%) |Training time=0.92s (26.36%) |Others=0.27 (7.83%)|CurSamplesPerSec=2.29 |AvgSamplesPerSec=2.41 epoch: 0|step: 1512|ppo_ep: 1|act_loss: -0.1158447265625|cri_loss: -0.04443359375|unsuper_loss: 0.0 average reward score: 2.953125 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.30s (72.15%) |Training time=0.70s (21.94%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 1513|ppo_ep: 1|act_loss: -0.037567138671875|cri_loss: -0.01119232177734375|unsuper_loss: 0.0 average reward score: 2.71484375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.34s (71.96%) |Training time=0.72s (22.08%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1514|ppo_ep: 1|act_loss: -0.046722412109375|cri_loss: -0.014190673828125|unsuper_loss: 0.0 average reward score: 3.732421875 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.67%) |Training time=0.64s (20.13%) |Others=0.20 (6.20%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.41 epoch: 0|step: 1515|ppo_ep: 1|act_loss: -0.029327392578125|cri_loss: -0.0016326904296875|unsuper_loss: 0.0 average reward score: 3.18359375 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.76%) |Training time=0.64s (20.11%) |Others=0.19 (6.13%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.41 epoch: 0|step: 1516|ppo_ep: 1|act_loss: -0.10467529296875|cri_loss: -0.0098876953125|unsuper_loss: 0.0 average reward score: 2.39453125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.22%) |Training time=0.66s (20.44%) |Others=0.21 (6.34%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1517|ppo_ep: 1|act_loss: 0.06781005859375|cri_loss: 0.053558349609375|unsuper_loss: 0.0 average reward score: 3.853515625 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.43s (73.24%) |Training time=0.69s (20.82%) |Others=0.20 (5.94%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.41 epoch: 0|step: 1518|ppo_ep: 1|act_loss: -0.1121826171875|cri_loss: -0.04302978515625|unsuper_loss: 0.0 average reward score: 3.033203125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.35s (72.27%) |Training time=0.70s (21.61%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 [2023-04-24 15:11:38,270] [INFO] [logging.py:96:log_dist] [Rank 0] step=190, skipped=5, lr=[8.483465490093025e-06, 8.483465490093025e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 15:11:38,514] [INFO] [timer.py:199:stop] epoch=0/micro_step=1520/global_step=190, RunningAvgSamplesPerSec=15.44711632582419, CurrSamplesPerSec=14.710117229190399, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 15:11:38,715] [INFO] [logging.py:96:log_dist] [Rank 0] step=190, skipped=4, lr=[4.381893125430629e-06, 4.381893125430629e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 1519|ppo_ep: 1|act_loss: -0.09625244140625|cri_loss: -0.00439453125|unsuper_loss: 0.0 average reward score: 3.01953125 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.29s (64.00%) |Training time=1.01s (28.24%) |Others=0.28 (7.76%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.41 epoch: 0|step: 1520|ppo_ep: 1|act_loss: -0.123291015625|cri_loss: -0.041900634765625|unsuper_loss: 0.0 average reward score: 4.15625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.29s (70.98%) |Training time=0.74s (23.05%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1521|ppo_ep: 1|act_loss: -0.137451171875|cri_loss: -0.042755126953125|unsuper_loss: 0.0 average reward score: 2.19921875 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.31s (72.52%) |Training time=0.68s (21.38%) |Others=0.19 (6.10%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 1522|ppo_ep: 1|act_loss: -0.0736083984375|cri_loss: -0.0179443359375|unsuper_loss: 0.0 average reward score: 2.65625 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.82%) |Training time=0.64s (19.40%) |Others=0.19 (5.77%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.41 epoch: 0|step: 1523|ppo_ep: 1|act_loss: -0.047821044921875|cri_loss: -0.013671875|unsuper_loss: 0.0 average reward score: 3.853515625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.19%) |Training time=0.65s (19.92%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1524|ppo_ep: 1|act_loss: 0.001434326171875|cri_loss: 0.0269775390625|unsuper_loss: 0.0 average reward score: 3.3125 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.34s (72.92%) |Training time=0.67s (20.85%) |Others=0.20 (6.23%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 1525|ppo_ep: 1|act_loss: -0.1351318359375|cri_loss: -0.003173828125|unsuper_loss: 0.0 average reward score: 3.375 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.48%) |Training time=0.65s (20.48%) |Others=0.19 (6.04%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 1526|ppo_ep: 1|act_loss: -0.006805419921875|cri_loss: 0.020660400390625|unsuper_loss: 0.0 average reward score: 2.462890625 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.32s (73.18%) |Training time=0.65s (20.47%) |Others=0.20 (6.35%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.41 epoch: 0|step: 1527|ppo_ep: 1|act_loss: -0.160400390625|cri_loss: -0.0430908203125|unsuper_loss: 0.0 average reward score: 3.228515625 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.30s (63.49%) |Training time=1.04s (28.70%) |Others=0.28 (7.81%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.41 epoch: 0|step: 1528|ppo_ep: 1|act_loss: 0.199462890625|cri_loss: 0.11419677734375|unsuper_loss: 0.0 average reward score: 4.37109375 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.30s (71.91%) |Training time=0.71s (22.08%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 1529|ppo_ep: 1|act_loss: -0.101806640625|cri_loss: -0.042724609375|unsuper_loss: 0.0 average reward score: 4.1015625 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.30s (72.48%) |Training time=0.67s (21.13%) |Others=0.20 (6.39%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.41 epoch: 0|step: 1530|ppo_ep: 1|act_loss: -0.1146240234375|cri_loss: -0.044677734375|unsuper_loss: 0.0 average reward score: 3.431640625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.30s (71.64%) |Training time=0.71s (22.06%) |Others=0.20 (6.30%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 1531|ppo_ep: 1|act_loss: 0.03668212890625|cri_loss: 0.0316162109375|unsuper_loss: 0.0 average reward score: 2.318359375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.18%) |Training time=0.68s (20.82%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1532|ppo_ep: 1|act_loss: 0.073974609375|cri_loss: 0.0482177734375|unsuper_loss: 0.0 average reward score: 3.21484375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.32s (72.01%) |Training time=0.70s (21.79%) |Others=0.20 (6.20%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1533|ppo_ep: 1|act_loss: -0.129638671875|cri_loss: -0.04986572265625|unsuper_loss: 0.0 average reward score: 3.576171875 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.31s (72.44%) |Training time=0.68s (21.41%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 1534|ppo_ep: 1|act_loss: -0.009368896484375|cri_loss: 0.0145263671875|unsuper_loss: 0.0 average reward score: 3.087890625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.30s (71.03%) |Training time=0.74s (22.77%) |Others=0.20 (6.20%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1535|ppo_ep: 1|act_loss: 0.14306640625|cri_loss: 0.0909423828125|unsuper_loss: 0.0 average reward score: 2.50390625 ------------------------------------------------------------------------------------- |E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.30s (65.38%) |Training time=0.94s (26.67%) |Others=0.28 (7.95%)|CurSamplesPerSec=2.28 |AvgSamplesPerSec=2.41 epoch: 0|step: 1536|ppo_ep: 1|act_loss: 0.434814453125|cri_loss: 0.259033203125|unsuper_loss: 0.0 average reward score: 3.521484375 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.51s (75.12%) |Training time=0.63s (18.99%) |Others=0.20 (5.88%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.41 epoch: 0|step: 1537|ppo_ep: 1|act_loss: -0.0084228515625|cri_loss: 0.00424957275390625|unsuper_loss: 0.0 average reward score: 3.64453125 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.31%) |Training time=0.65s (19.82%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 1538|ppo_ep: 1|act_loss: 0.380859375|cri_loss: 0.233642578125|unsuper_loss: 0.0 average reward score: 3.591796875 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.11%) |Training time=0.67s (20.89%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 1539|ppo_ep: 1|act_loss: 0.10394287109375|cri_loss: 0.06396484375|unsuper_loss: 0.0 average reward score: 3.94140625 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.76%) |Training time=0.64s (19.42%) |Others=0.19 (5.82%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 1540|ppo_ep: 1|act_loss: 0.7509765625|cri_loss: 0.466064453125|unsuper_loss: 0.0 average reward score: 3.021484375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.27s (69.20%) |Training time=0.81s (24.83%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 1541|ppo_ep: 1|act_loss: 0.2252197265625|cri_loss: 0.125244140625|unsuper_loss: 0.0 average reward score: 2.6484375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.27s (70.14%) |Training time=0.77s (23.76%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1542|ppo_ep: 1|act_loss: 0.0592041015625|cri_loss: 0.041778564453125|unsuper_loss: 0.0 average reward score: 2.982421875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.30s (70.53%) |Training time=0.76s (23.33%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1543|ppo_ep: 1|act_loss: 0.26708984375|cri_loss: 0.160400390625|unsuper_loss: 0.0 average reward score: 3.703125 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.39s (66.20%) |Training time=0.94s (26.04%) |Others=0.28 (7.75%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.41 epoch: 0|step: 1544|ppo_ep: 1|act_loss: 0.4423828125|cri_loss: 0.271484375|unsuper_loss: 0.0 average reward score: 2.69921875 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.34s (72.99%) |Training time=0.67s (20.91%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.41 epoch: 0|step: 1545|ppo_ep: 1|act_loss: 0.20751953125|cri_loss: 0.129150390625|unsuper_loss: 0.0 average reward score: 2.435546875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.32%) |Training time=0.65s (19.75%) |Others=0.20 (5.93%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 1546|ppo_ep: 1|act_loss: 0.3642578125|cri_loss: 0.21240234375|unsuper_loss: 0.0 average reward score: 2.95703125 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.28s (72.10%) |Training time=0.69s (21.69%) |Others=0.20 (6.20%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.41 epoch: 0|step: 1547|ppo_ep: 1|act_loss: 0.217041015625|cri_loss: 0.12841796875|unsuper_loss: 0.0 average reward score: 3.79296875 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.24%) |Training time=0.65s (19.80%) |Others=0.20 (5.95%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 1548|ppo_ep: 1|act_loss: 0.299560546875|cri_loss: 0.185302734375|unsuper_loss: 0.0 average reward score: 2.681640625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.40%) |Training time=0.67s (20.32%) |Others=0.21 (6.28%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 1549|ppo_ep: 1|act_loss: 0.125|cri_loss: 0.07550048828125|unsuper_loss: 0.0 average reward score: 2.84375 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.24%) |Training time=0.65s (19.66%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.41 epoch: 0|step: 1550|ppo_ep: 1|act_loss: 0.56787109375|cri_loss: 0.341796875|unsuper_loss: 0.0 average reward score: 3.57421875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.86%) |Training time=0.66s (20.04%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 1551|ppo_ep: 1|act_loss: 0.2144775390625|cri_loss: 0.11785888671875|unsuper_loss: 0.0 average reward score: 2.875 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.43s (66.66%) |Training time=0.92s (25.32%) |Others=0.29 (8.01%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.41 epoch: 0|step: 1552|ppo_ep: 1|act_loss: -0.05450439453125|cri_loss: -0.021331787109375|unsuper_loss: 0.0 average reward score: 4.00390625 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.82%) |Training time=0.64s (19.33%) |Others=0.19 (5.84%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.41 epoch: 0|step: 1553|ppo_ep: 1|act_loss: 0.18896484375|cri_loss: 0.11077880859375|unsuper_loss: 0.0 average reward score: 3.57421875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.15%) |Training time=0.65s (19.79%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1554|ppo_ep: 1|act_loss: 0.2646484375|cri_loss: 0.148681640625|unsuper_loss: 0.0 average reward score: 2.533203125 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.39%) |Training time=0.64s (19.58%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 1555|ppo_ep: 1|act_loss: 0.171630859375|cri_loss: 0.1077880859375|unsuper_loss: 0.0 average reward score: 4.00390625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.29%) |Training time=0.64s (19.84%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1556|ppo_ep: 1|act_loss: 0.09515380859375|cri_loss: 0.0731201171875|unsuper_loss: 0.0 average reward score: 3.90234375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.23%) |Training time=0.64s (19.75%) |Others=0.20 (6.02%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1557|ppo_ep: 1|act_loss: 0.033294677734375|cri_loss: 0.04486083984375|unsuper_loss: 0.0 average reward score: 2.853515625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.37%) |Training time=0.64s (19.63%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1558|ppo_ep: 1|act_loss: 0.0938720703125|cri_loss: 0.0745849609375|unsuper_loss: 0.0 average reward score: 3.353515625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.36%) |Training time=0.64s (19.66%) |Others=0.20 (5.98%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1559|ppo_ep: 1|act_loss: 0.031463623046875|cri_loss: 0.0286712646484375|unsuper_loss: 0.0 average reward score: 4.2734375 ------------------------------------------------------------------------------------- |E2E latency=3.68s |Gather latency=0.00s (0.00%) |Generate time=2.48s (67.34%) |Training time=0.93s (25.24%) |Others=0.27 (7.42%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.41 epoch: 0|step: 1560|ppo_ep: 1|act_loss: -0.064453125|cri_loss: -0.018951416015625|unsuper_loss: 0.0 average reward score: 2.91015625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.78%) |Training time=0.63s (19.36%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1561|ppo_ep: 1|act_loss: -0.14990234375|cri_loss: -0.03826904296875|unsuper_loss: 0.0 average reward score: 4.10546875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.31%) |Training time=0.64s (19.63%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1562|ppo_ep: 1|act_loss: 0.11669921875|cri_loss: 0.0736083984375|unsuper_loss: 0.0 average reward score: 2.384765625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.11%) |Training time=0.64s (19.65%) |Others=0.20 (6.24%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1563|ppo_ep: 1|act_loss: -0.020751953125|cri_loss: 0.027130126953125|unsuper_loss: 0.0 average reward score: 3.017578125 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.59%) |Training time=0.64s (19.49%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 1564|ppo_ep: 1|act_loss: 0.21435546875|cri_loss: 0.1207275390625|unsuper_loss: 0.0 average reward score: 2.126953125 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.89%) |Training time=0.64s (19.37%) |Others=0.19 (5.74%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.41 epoch: 0|step: 1565|ppo_ep: 1|act_loss: 0.134765625|cri_loss: 0.0869140625|unsuper_loss: 0.0 average reward score: 3.1328125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.43%) |Training time=0.64s (19.66%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1566|ppo_ep: 1|act_loss: 0.231201171875|cri_loss: 0.1361083984375|unsuper_loss: 0.0 average reward score: 1.833984375 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.63%) |Training time=0.64s (19.24%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.41 epoch: 0|step: 1567|ppo_ep: 1|act_loss: -0.26708984375|cri_loss: -0.09490966796875|unsuper_loss: 0.0 average reward score: 4.125 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.44s (67.12%) |Training time=0.92s (25.31%) |Others=0.28 (7.57%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.41 epoch: 0|step: 1568|ppo_ep: 1|act_loss: -0.14794921875|cri_loss: -0.048431396484375|unsuper_loss: 0.0 average reward score: 2.728515625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.30%) |Training time=0.64s (19.67%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1569|ppo_ep: 1|act_loss: -0.0751953125|cri_loss: -0.00250244140625|unsuper_loss: 0.0 average reward score: 2.984375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.25%) |Training time=0.64s (19.73%) |Others=0.20 (6.02%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1570|ppo_ep: 1|act_loss: -0.0740966796875|cri_loss: -0.02069091796875|unsuper_loss: 0.0 average reward score: 2.890625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.06%) |Training time=0.64s (19.67%) |Others=0.20 (6.27%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1571|ppo_ep: 1|act_loss: 0.4013671875|cri_loss: 0.238525390625|unsuper_loss: 0.0 average reward score: 2.201171875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.25%) |Training time=0.64s (19.73%) |Others=0.20 (6.02%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1572|ppo_ep: 1|act_loss: 0.2763671875|cri_loss: 0.1639404296875|unsuper_loss: 0.0 average reward score: 2.4296875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.45%) |Training time=0.64s (19.58%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1573|ppo_ep: 1|act_loss: -0.18896484375|cri_loss: -0.067138671875|unsuper_loss: 0.0 average reward score: 3.291015625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.30%) |Training time=0.64s (19.79%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1574|ppo_ep: 1|act_loss: -0.059356689453125|cri_loss: -0.00543212890625|unsuper_loss: 0.0 average reward score: 3.37109375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.37%) |Training time=0.64s (19.79%) |Others=0.19 (5.84%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1575|ppo_ep: 1|act_loss: 0.269287109375|cri_loss: 0.157470703125|unsuper_loss: 0.0 average reward score: 2.080078125 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.37s (66.33%) |Training time=0.92s (25.82%) |Others=0.28 (7.85%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.41 epoch: 0|step: 1576|ppo_ep: 1|act_loss: 0.27197265625|cri_loss: 0.166259765625|unsuper_loss: 0.0 average reward score: 2.36328125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.25%) |Training time=0.64s (19.77%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1577|ppo_ep: 1|act_loss: 0.212646484375|cri_loss: 0.1240234375|unsuper_loss: 0.0 average reward score: 3.091796875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.33%) |Training time=0.64s (19.78%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1578|ppo_ep: 1|act_loss: 0.556640625|cri_loss: 0.326171875|unsuper_loss: 0.0 average reward score: 3.3515625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.43%) |Training time=0.64s (19.72%) |Others=0.19 (5.85%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1579|ppo_ep: 1|act_loss: 0.43017578125|cri_loss: 0.246826171875|unsuper_loss: 0.0 average reward score: 1.787109375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.29%) |Training time=0.64s (19.76%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1580|ppo_ep: 1|act_loss: 0.30322265625|cri_loss: 0.199462890625|unsuper_loss: 0.0 average reward score: 1.5087890625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.16%) |Training time=0.64s (19.81%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1581|ppo_ep: 1|act_loss: 0.3359375|cri_loss: 0.215087890625|unsuper_loss: 0.0 average reward score: 3.57421875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.33%) |Training time=0.64s (19.80%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1582|ppo_ep: 1|act_loss: -0.000274658203125|cri_loss: 0.014495849609375|unsuper_loss: 0.0 average reward score: 3.650390625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.07%) |Training time=0.64s (19.71%) |Others=0.20 (6.21%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1583|ppo_ep: 1|act_loss: 0.2294921875|cri_loss: 0.12890625|unsuper_loss: 0.0 average reward score: 2.396484375 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.42s (66.91%) |Training time=0.92s (25.51%) |Others=0.27 (7.58%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.41 epoch: 0|step: 1584|ppo_ep: 1|act_loss: 0.5439453125|cri_loss: 0.33251953125|unsuper_loss: 0.0 average reward score: 2.7578125 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.09%) |Training time=0.70s (21.20%) |Others=0.19 (5.71%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 1585|ppo_ep: 1|act_loss: 0.3916015625|cri_loss: 0.2236328125|unsuper_loss: 0.0 average reward score: 3.01953125 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.43s (73.38%) |Training time=0.69s (20.76%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.41 epoch: 0|step: 1586|ppo_ep: 1|act_loss: 0.59619140625|cri_loss: 0.343017578125|unsuper_loss: 0.0 average reward score: 3.796875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.12%) |Training time=0.64s (19.83%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1587|ppo_ep: 1|act_loss: 0.4599609375|cri_loss: 0.30224609375|unsuper_loss: 0.0 average reward score: 2.919921875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.21%) |Training time=0.64s (19.66%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1588|ppo_ep: 1|act_loss: 0.1412353515625|cri_loss: 0.0899658203125|unsuper_loss: 0.0 average reward score: 3.29296875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.45%) |Training time=0.64s (19.67%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1589|ppo_ep: 1|act_loss: 0.5341796875|cri_loss: 0.319580078125|unsuper_loss: 0.0 average reward score: 2.7734375 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.42s (72.19%) |Training time=0.74s (22.04%) |Others=0.19 (5.77%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.41 epoch: 0|step: 1590|ppo_ep: 1|act_loss: 0.01094818115234375|cri_loss: 0.023895263671875|unsuper_loss: 0.0 average reward score: 3.0546875 ------------------------------------------------------------------------------------- |E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.47s (72.99%) |Training time=0.72s (21.25%) |Others=0.20 (5.77%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.41 epoch: 0|step: 1591|ppo_ep: 1|act_loss: 0.62353515625|cri_loss: 0.363037109375|unsuper_loss: 0.0 average reward score: 1.75390625 ------------------------------------------------------------------------------------- |E2E latency=3.66s |Gather latency=0.00s (0.00%) |Generate time=2.35s (64.25%) |Training time=1.03s (28.16%) |Others=0.28 (7.60%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.41 epoch: 0|step: 1592|ppo_ep: 1|act_loss: 1.0322265625|cri_loss: 0.6533203125|unsuper_loss: 0.0 average reward score: 2.298828125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.55%) |Training time=0.63s (19.52%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1593|ppo_ep: 1|act_loss: 0.392578125|cri_loss: 0.2445068359375|unsuper_loss: 0.0 average reward score: 3.15625 ------------------------------------------------------------------------------------- |E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.33s (68.54%) |Training time=0.87s (25.48%) |Others=0.20 (5.98%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.41 epoch: 0|step: 1594|ppo_ep: 1|act_loss: 0.1531982421875|cri_loss: 0.09881591796875|unsuper_loss: 0.0 average reward score: 2.515625 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.60%) |Training time=0.64s (20.10%) |Others=0.20 (6.30%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.41 epoch: 0|step: 1595|ppo_ep: 1|act_loss: 0.23291015625|cri_loss: 0.1405029296875|unsuper_loss: 0.0 average reward score: 2.54296875 ------------------------------------------------------------------------------------- |E2E latency=3.16s |Gather latency=0.00s (0.00%) |Generate time=2.32s (73.27%) |Training time=0.65s (20.52%) |Others=0.20 (6.21%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.41 epoch: 0|step: 1596|ppo_ep: 1|act_loss: 0.51416015625|cri_loss: 0.31103515625|unsuper_loss: 0.0 average reward score: 1.111328125 ------------------------------------------------------------------------------------- |E2E latency=3.15s |Gather latency=0.00s (0.00%) |Generate time=2.30s (73.18%) |Training time=0.64s (20.46%) |Others=0.20 (6.36%)|CurSamplesPerSec=2.54 |AvgSamplesPerSec=2.41 epoch: 0|step: 1597|ppo_ep: 1|act_loss: 0.37890625|cri_loss: 0.21875|unsuper_loss: 0.0 average reward score: 2.388671875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.05%) |Training time=0.64s (19.92%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1598|ppo_ep: 1|act_loss: 0.093505859375|cri_loss: 0.06890869140625|unsuper_loss: 0.0 average reward score: 2.26171875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.35%) |Training time=0.64s (19.68%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 [2023-04-24 15:16:02,099] [INFO] [logging.py:96:log_dist] [Rank 0] step=200, skipped=5, lr=[8.208164931807571e-06, 8.208164931807571e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 15:16:02,349] [INFO] [timer.py:199:stop] epoch=0/micro_step=1600/global_step=200, RunningAvgSamplesPerSec=15.432047276479876, CurrSamplesPerSec=14.963864086971157, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 15:16:02,549] [INFO] [logging.py:96:log_dist] [Rank 0] step=200, skipped=4, lr=[4.237981069186606e-06, 4.237981069186606e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 1599|ppo_ep: 1|act_loss: -0.1539306640625|cri_loss: -0.05517578125|unsuper_loss: 0.0 average reward score: 4.7734375 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.75%) |Training time=0.92s (25.64%) |Others=0.27 (7.61%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.41 epoch: 0|step: 1600|ppo_ep: 1|act_loss: 0.4384765625|cri_loss: 0.287109375|unsuper_loss: 0.0 average reward score: 3.26171875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.59%) |Training time=0.63s (19.45%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1601|ppo_ep: 1|act_loss: 0.17041015625|cri_loss: 0.1080322265625|unsuper_loss: 0.0 average reward score: 1.6767578125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.22%) |Training time=0.64s (19.78%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1602|ppo_ep: 1|act_loss: -0.0899658203125|cri_loss: -0.00982666015625|unsuper_loss: 0.0 average reward score: 1.7666015625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.38%) |Training time=0.64s (19.63%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1603|ppo_ep: 1|act_loss: -0.08380126953125|cri_loss: -0.00701904296875|unsuper_loss: 0.0 average reward score: 2.50390625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.25%) |Training time=0.64s (19.81%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1604|ppo_ep: 1|act_loss: -0.19921875|cri_loss: -0.0391845703125|unsuper_loss: 0.0 average reward score: 1.59375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.38%) |Training time=0.64s (19.67%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 1605|ppo_ep: 1|act_loss: 0.03936767578125|cri_loss: 0.0716552734375|unsuper_loss: 0.0 average reward score: 3.6484375 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.53s (75.26%) |Training time=0.64s (19.06%) |Others=0.19 (5.67%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.41 epoch: 0|step: 1606|ppo_ep: 1|act_loss: -0.059356689453125|cri_loss: 0.00146484375|unsuper_loss: 0.0 average reward score: 3.23828125 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.26%) |Training time=0.64s (19.21%) |Others=0.22 (6.52%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.41 epoch: 0|step: 1607|ppo_ep: 1|act_loss: -0.001251220703125|cri_loss: 0.01029205322265625|unsuper_loss: 0.0 average reward score: 2.841796875 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.40s (66.52%) |Training time=0.93s (25.83%) |Others=0.28 (7.64%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.41 epoch: 0|step: 1608|ppo_ep: 1|act_loss: -0.203857421875|cri_loss: -0.05145263671875|unsuper_loss: 0.0 average reward score: 3.037109375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.36%) |Training time=0.64s (19.83%) |Others=0.19 (5.82%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.41 epoch: 0|step: 1609|ppo_ep: 1|act_loss: -0.18701171875|cri_loss: -0.05242919921875|unsuper_loss: 0.0 average reward score: 1.849609375 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.40s (72.89%) |Training time=0.70s (21.22%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 1610|ppo_ep: 1|act_loss: -0.359130859375|cri_loss: -0.1239013671875|unsuper_loss: 0.0 average reward score: 2.83984375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.36%) |Training time=0.64s (19.72%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1611|ppo_ep: 1|act_loss: -0.10394287109375|cri_loss: 0.00384521484375|unsuper_loss: 0.0 average reward score: 3.67578125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.43%) |Training time=0.64s (19.64%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1612|ppo_ep: 1|act_loss: -0.5830078125|cri_loss: -0.20263671875|unsuper_loss: 0.0 average reward score: 3.54296875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.34%) |Training time=0.64s (19.70%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1613|ppo_ep: 1|act_loss: -0.16650390625|cri_loss: -0.06787109375|unsuper_loss: 0.0 average reward score: 2.837890625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.24%) |Training time=0.65s (19.87%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1614|ppo_ep: 1|act_loss: 0.291015625|cri_loss: 0.174560546875|unsuper_loss: 0.0 average reward score: 2.63671875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.27%) |Training time=0.64s (19.76%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 1615|ppo_ep: 1|act_loss: 0.29052734375|cri_loss: 0.1650390625|unsuper_loss: 0.0 average reward score: 2.28515625 ------------------------------------------------------------------------------------- |E2E latency=3.67s |Gather latency=0.00s (0.00%) |Generate time=2.47s (67.27%) |Training time=0.92s (25.20%) |Others=0.28 (7.53%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.41 epoch: 0|step: 1616|ppo_ep: 1|act_loss: -0.456787109375|cri_loss: -0.13720703125|unsuper_loss: 0.0 average reward score: 0.9287109375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.26%) |Training time=0.64s (19.85%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.41 epoch: 0|step: 1617|ppo_ep: 1|act_loss: -0.016693115234375|cri_loss: 0.016265869140625|unsuper_loss: 0.0 average reward score: 2.302734375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.23%) |Training time=0.64s (19.62%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1618|ppo_ep: 1|act_loss: 0.197021484375|cri_loss: 0.1258544921875|unsuper_loss: 0.0 average reward score: 4.1171875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.18%) |Training time=0.64s (19.68%) |Others=0.20 (6.14%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1619|ppo_ep: 1|act_loss: -0.2998046875|cri_loss: -0.11846923828125|unsuper_loss: 0.0 average reward score: 3.525390625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.27%) |Training time=0.65s (19.90%) |Others=0.19 (5.84%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1620|ppo_ep: 1|act_loss: -0.0927734375|cri_loss: -0.0260009765625|unsuper_loss: 0.0 average reward score: 3.486328125 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.07%) |Training time=0.64s (19.89%) |Others=0.19 (6.03%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 1621|ppo_ep: 1|act_loss: 0.002197265625|cri_loss: 0.0706787109375|unsuper_loss: 0.0 average reward score: 2.25 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.95%) |Training time=0.65s (20.14%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 1622|ppo_ep: 1|act_loss: -0.0594482421875|cri_loss: -0.0160064697265625|unsuper_loss: 0.0 average reward score: 1.650390625 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.18%) |Training time=0.64s (19.93%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 1623|ppo_ep: 1|act_loss: -0.321533203125|cri_loss: -0.08935546875|unsuper_loss: 0.0 average reward score: 3.013671875 ------------------------------------------------------------------------------------- |E2E latency=3.57s |Gather latency=0.00s (0.00%) |Generate time=2.37s (66.38%) |Training time=0.93s (25.94%) |Others=0.27 (7.68%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.42 epoch: 0|step: 1624|ppo_ep: 1|act_loss: 0.4853515625|cri_loss: 0.3203125|unsuper_loss: 0.0 average reward score: 3.208984375 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (74.25%) |Training time=0.63s (19.83%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 1625|ppo_ep: 1|act_loss: -0.040374755859375|cri_loss: 0.0067138671875|unsuper_loss: 0.0 average reward score: 2.22265625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (74.02%) |Training time=0.64s (20.05%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 1626|ppo_ep: 1|act_loss: -0.1884765625|cri_loss: -0.0657958984375|unsuper_loss: 0.0 average reward score: 3.541015625 ------------------------------------------------------------------------------------- |E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.36s (66.98%) |Training time=0.68s (19.37%) |Others=0.48 (13.65%)|CurSamplesPerSec=2.27 |AvgSamplesPerSec=2.42 epoch: 0|step: 1627|ppo_ep: 1|act_loss: 0.11871337890625|cri_loss: 0.07501220703125|unsuper_loss: 0.0 average reward score: 2.921875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.42%) |Training time=0.64s (19.68%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 1628|ppo_ep: 1|act_loss: -0.184814453125|cri_loss: -0.0487060546875|unsuper_loss: 0.0 average reward score: 2.99609375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.10%) |Training time=0.64s (19.87%) |Others=0.19 (6.03%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 1629|ppo_ep: 1|act_loss: 0.255859375|cri_loss: 0.1591796875|unsuper_loss: 0.0 average reward score: 2.91015625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.27%) |Training time=0.64s (19.75%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 1630|ppo_ep: 1|act_loss: -0.1396484375|cri_loss: -0.031982421875|unsuper_loss: 0.0 average reward score: 2.9375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.09%) |Training time=0.64s (19.90%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 1631|ppo_ep: 1|act_loss: 0.6435546875|cri_loss: 0.39111328125|unsuper_loss: 0.0 average reward score: 2.6796875 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.39s (66.80%) |Training time=0.92s (25.62%) |Others=0.27 (7.58%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.42 epoch: 0|step: 1632|ppo_ep: 1|act_loss: 0.0|cri_loss: 0.01605224609375|unsuper_loss: 0.0 average reward score: 1.9580078125 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.36%) |Training time=0.63s (19.78%) |Others=0.19 (5.85%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 1633|ppo_ep: 1|act_loss: 0.358154296875|cri_loss: 0.2181396484375|unsuper_loss: 0.0 average reward score: 1.791015625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.32%) |Training time=0.64s (19.81%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 1634|ppo_ep: 1|act_loss: 0.1854248046875|cri_loss: 0.1207275390625|unsuper_loss: 0.0 average reward score: 2.90234375 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.28%) |Training time=0.64s (19.84%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 1635|ppo_ep: 1|act_loss: 0.135498046875|cri_loss: 0.10113525390625|unsuper_loss: 0.0 average reward score: 1.92578125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.28%) |Training time=0.63s (19.70%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 1636|ppo_ep: 1|act_loss: 0.5322265625|cri_loss: 0.331787109375|unsuper_loss: 0.0 average reward score: 3.1015625 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.87%) |Training time=0.64s (19.27%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.42 epoch: 0|step: 1637|ppo_ep: 1|act_loss: 0.2366943359375|cri_loss: 0.155517578125|unsuper_loss: 0.0 average reward score: 1.6142578125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.67%) |Training time=0.64s (19.58%) |Others=0.22 (6.74%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 1638|ppo_ep: 1|act_loss: 0.060150146484375|cri_loss: 0.04791259765625|unsuper_loss: 0.0 average reward score: 2.8359375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.59%) |Training time=0.67s (20.50%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 1639|ppo_ep: 1|act_loss: 0.143310546875|cri_loss: 0.0880126953125|unsuper_loss: 0.0 average reward score: 1.4580078125 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.40s (66.48%) |Training time=0.94s (25.94%) |Others=0.27 (7.58%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.42 epoch: 0|step: 1640|ppo_ep: 1|act_loss: 0.4345703125|cri_loss: 0.2744140625|unsuper_loss: 0.0 average reward score: 2.318359375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.41%) |Training time=0.64s (19.61%) |Others=0.20 (5.98%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 1641|ppo_ep: 1|act_loss: 0.262451171875|cri_loss: 0.15234375|unsuper_loss: 0.0 average reward score: 3.087890625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.60%) |Training time=0.64s (19.52%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 1642|ppo_ep: 1|act_loss: -0.0911865234375|cri_loss: -0.029571533203125|unsuper_loss: 0.0 average reward score: 3.708984375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.24%) |Training time=0.64s (19.56%) |Others=0.20 (6.20%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 1643|ppo_ep: 1|act_loss: 0.0560302734375|cri_loss: 0.08612060546875|unsuper_loss: 0.0 average reward score: 2.81640625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.30%) |Training time=0.64s (19.58%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 1644|ppo_ep: 1|act_loss: 0.2305908203125|cri_loss: 0.148193359375|unsuper_loss: 0.0 average reward score: 3.4921875 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.81%) |Training time=0.65s (19.34%) |Others=0.20 (5.85%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.42 epoch: 0|step: 1645|ppo_ep: 1|act_loss: 0.237060546875|cri_loss: 0.163818359375|unsuper_loss: 0.0 average reward score: 2.5703125 ------------------------------------------------------------------------------------- |E2E latency=3.42s |Gather latency=0.00s (0.00%) |Generate time=2.59s (75.68%) |Training time=0.64s (18.73%) |Others=0.19 (5.58%)|CurSamplesPerSec=2.34 |AvgSamplesPerSec=2.42 epoch: 0|step: 1646|ppo_ep: 1|act_loss: -0.0184326171875|cri_loss: 0.0291748046875|unsuper_loss: 0.0 average reward score: 4.296875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.46%) |Training time=0.64s (19.66%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 1647|ppo_ep: 1|act_loss: 0.1063232421875|cri_loss: 0.07080078125|unsuper_loss: 0.0 average reward score: 3.19921875 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.40s (66.71%) |Training time=0.92s (25.62%) |Others=0.28 (7.68%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.42 epoch: 0|step: 1648|ppo_ep: 1|act_loss: -0.40478515625|cri_loss: -0.157470703125|unsuper_loss: 0.0 average reward score: 3.017578125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.23%) |Training time=0.64s (19.79%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 1649|ppo_ep: 1|act_loss: 0.013092041015625|cri_loss: 0.030303955078125|unsuper_loss: 0.0 average reward score: 3.48828125 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.84%) |Training time=0.64s (20.19%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.42 epoch: 0|step: 1650|ppo_ep: 1|act_loss: -0.309814453125|cri_loss: -0.1231689453125|unsuper_loss: 0.0 average reward score: 3.81640625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.19%) |Training time=0.64s (19.85%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 1651|ppo_ep: 1|act_loss: -0.146728515625|cri_loss: -0.0025634765625|unsuper_loss: 0.0 average reward score: 2.232421875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.15%) |Training time=0.64s (19.90%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 1652|ppo_ep: 1|act_loss: -0.340576171875|cri_loss: -0.128173828125|unsuper_loss: 0.0 average reward score: 2.03125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.91%) |Training time=0.66s (20.12%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 1653|ppo_ep: 1|act_loss: -0.4443359375|cri_loss: -0.1708984375|unsuper_loss: 0.0 average reward score: 3.486328125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.23%) |Training time=0.64s (19.82%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1654|ppo_ep: 1|act_loss: 0.07861328125|cri_loss: 0.11865234375|unsuper_loss: 0.0 average reward score: 2.271484375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.09%) |Training time=0.64s (19.77%) |Others=0.20 (6.14%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 1655|ppo_ep: 1|act_loss: -0.0523681640625|cri_loss: -0.005615234375|unsuper_loss: 0.0 average reward score: 2.603515625 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.40s (66.54%) |Training time=0.92s (25.66%) |Others=0.28 (7.79%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.42 epoch: 0|step: 1656|ppo_ep: 1|act_loss: -0.29248046875|cri_loss: -0.08782958984375|unsuper_loss: 0.0 average reward score: 3.771484375 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.35%) |Training time=0.64s (19.80%) |Others=0.19 (5.85%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 1657|ppo_ep: 1|act_loss: -0.40478515625|cri_loss: -0.14501953125|unsuper_loss: 0.0 average reward score: 2.669921875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.01%) |Training time=0.64s (20.07%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 1658|ppo_ep: 1|act_loss: -0.11029052734375|cri_loss: -0.011962890625|unsuper_loss: 0.0 average reward score: 1.99609375 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.11%) |Training time=0.66s (19.92%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.42 epoch: 0|step: 1659|ppo_ep: 1|act_loss: -0.11181640625|cri_loss: -0.0291748046875|unsuper_loss: 0.0 average reward score: 2.84765625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.35%) |Training time=0.64s (19.73%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 1660|ppo_ep: 1|act_loss: -0.443359375|cri_loss: -0.169189453125|unsuper_loss: 0.0 average reward score: 2.83984375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.36%) |Training time=0.64s (19.76%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 1661|ppo_ep: 1|act_loss: -0.1201171875|cri_loss: -0.030181884765625|unsuper_loss: 0.0 average reward score: 1.880859375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.31%) |Training time=0.64s (19.74%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 1662|ppo_ep: 1|act_loss: -0.3603515625|cri_loss: -0.1197509765625|unsuper_loss: 0.0 average reward score: 2.072265625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (74.07%) |Training time=0.64s (20.01%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 1663|ppo_ep: 1|act_loss: -0.4541015625|cri_loss: -0.1451416015625|unsuper_loss: 0.0 average reward score: 2.126953125 ------------------------------------------------------------------------------------- |E2E latency=3.67s |Gather latency=0.00s (0.00%) |Generate time=2.44s (66.61%) |Training time=0.94s (25.61%) |Others=0.29 (7.78%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.42 epoch: 0|step: 1664|ppo_ep: 1|act_loss: -0.169189453125|cri_loss: -0.058441162109375|unsuper_loss: 0.0 average reward score: 3.79296875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.46%) |Training time=0.64s (19.57%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1665|ppo_ep: 1|act_loss: -0.337158203125|cri_loss: -0.1370849609375|unsuper_loss: 0.0 average reward score: 4.1015625 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.98%) |Training time=0.64s (19.97%) |Others=0.19 (6.06%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 1666|ppo_ep: 1|act_loss: -0.1405029296875|cri_loss: -0.053955078125|unsuper_loss: 0.0 average reward score: 2.70703125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.13%) |Training time=0.64s (19.66%) |Others=0.23 (7.21%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1667|ppo_ep: 1|act_loss: -0.18798828125|cri_loss: -0.0565185546875|unsuper_loss: 0.0 average reward score: 2.67578125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.22%) |Training time=0.65s (19.93%) |Others=0.19 (5.85%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 1668|ppo_ep: 1|act_loss: -0.07958984375|cri_loss: -0.026092529296875|unsuper_loss: 0.0 average reward score: 3.40625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.43%) |Training time=0.64s (19.73%) |Others=0.19 (5.84%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 1669|ppo_ep: 1|act_loss: -0.293701171875|cri_loss: -0.11029052734375|unsuper_loss: 0.0 average reward score: 3.5 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.32%) |Training time=0.64s (19.77%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 1670|ppo_ep: 1|act_loss: -0.139892578125|cri_loss: -0.037109375|unsuper_loss: 0.0 average reward score: 3.578125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.25%) |Training time=0.64s (19.72%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 1671|ppo_ep: 1|act_loss: -0.166259765625|cri_loss: -0.04351806640625|unsuper_loss: 0.0 average reward score: 2.16796875 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.42s (66.83%) |Training time=0.93s (25.65%) |Others=0.27 (7.53%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.42 epoch: 0|step: 1672|ppo_ep: 1|act_loss: -0.2039794921875|cri_loss: -0.0797119140625|unsuper_loss: 0.0 average reward score: 2.951171875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.41%) |Training time=0.64s (19.64%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 1673|ppo_ep: 1|act_loss: 0.12548828125|cri_loss: 0.094482421875|unsuper_loss: 0.0 average reward score: 2.734375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.45%) |Training time=0.64s (19.72%) |Others=0.19 (5.82%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1674|ppo_ep: 1|act_loss: -0.1656494140625|cri_loss: -0.0665283203125|unsuper_loss: 0.0 average reward score: 3.205078125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.52%) |Training time=0.64s (19.64%) |Others=0.19 (5.84%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1675|ppo_ep: 1|act_loss: 0.1090087890625|cri_loss: 0.0753173828125|unsuper_loss: 0.0 average reward score: 2.396484375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.66%) |Training time=0.64s (19.55%) |Others=0.19 (5.79%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 1676|ppo_ep: 1|act_loss: -0.20166015625|cri_loss: -0.07720947265625|unsuper_loss: 0.0 average reward score: 2.22265625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.13%) |Training time=0.64s (19.74%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1677|ppo_ep: 1|act_loss: 0.293701171875|cri_loss: 0.18505859375|unsuper_loss: 0.0 average reward score: 3.44140625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.99%) |Training time=0.64s (19.90%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 1678|ppo_ep: 1|act_loss: 0.029052734375|cri_loss: 0.051910400390625|unsuper_loss: 0.0 average reward score: 3.56640625 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.82%) |Training time=0.64s (20.11%) |Others=0.19 (6.07%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 [2023-04-24 15:20:25,817] [INFO] [logging.py:96:log_dist] [Rank 0] step=210, skipped=5, lr=[7.909259887039118e-06, 7.909259887039118e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 15:20:26,060] [INFO] [timer.py:199:stop] epoch=0/micro_step=1680/global_step=210, RunningAvgSamplesPerSec=15.45121965656154, CurrSamplesPerSec=15.871445500817797, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 15:20:26,260] [INFO] [logging.py:96:log_dist] [Rank 0] step=210, skipped=4, lr=[4.081943044660746e-06, 4.081943044660746e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 1679|ppo_ep: 1|act_loss: 0.078125|cri_loss: 0.07098388671875|unsuper_loss: 0.0 average reward score: 3.228515625 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.40s (66.77%) |Training time=0.92s (25.60%) |Others=0.27 (7.63%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.42 epoch: 0|step: 1680|ppo_ep: 1|act_loss: 0.330078125|cri_loss: 0.199951171875|unsuper_loss: 0.0 average reward score: 2.59765625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.32%) |Training time=0.64s (19.71%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 1681|ppo_ep: 1|act_loss: 0.412109375|cri_loss: 0.2568359375|unsuper_loss: 0.0 average reward score: 2.99609375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.27%) |Training time=0.64s (19.73%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 1682|ppo_ep: 1|act_loss: 0.3671875|cri_loss: 0.223388671875|unsuper_loss: 0.0 average reward score: 3.5 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.85%) |Training time=0.64s (20.02%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 1683|ppo_ep: 1|act_loss: 0.1630859375|cri_loss: 0.11944580078125|unsuper_loss: 0.0 average reward score: 2.560546875 ------------------------------------------------------------------------------------- |E2E latency=3.16s |Gather latency=0.00s (0.00%) |Generate time=2.32s (73.43%) |Training time=0.64s (20.40%) |Others=0.19 (6.17%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.42 epoch: 0|step: 1684|ppo_ep: 1|act_loss: 0.14453125|cri_loss: 0.0926513671875|unsuper_loss: 0.0 average reward score: 3.58984375 ------------------------------------------------------------------------------------- |E2E latency=3.15s |Gather latency=0.00s (0.00%) |Generate time=2.29s (72.85%) |Training time=0.65s (20.64%) |Others=0.20 (6.52%)|CurSamplesPerSec=2.54 |AvgSamplesPerSec=2.42 epoch: 0|step: 1685|ppo_ep: 1|act_loss: 0.0125274658203125|cri_loss: 0.0318603515625|unsuper_loss: 0.0 average reward score: 2.91015625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.21%) |Training time=0.64s (19.83%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 1686|ppo_ep: 1|act_loss: 0.0919189453125|cri_loss: 0.05902099609375|unsuper_loss: 0.0 average reward score: 3.248046875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.31s (71.50%) |Training time=0.72s (22.32%) |Others=0.20 (6.18%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 1687|ppo_ep: 1|act_loss: 0.040130615234375|cri_loss: 0.063232421875|unsuper_loss: 0.0 average reward score: 3.357421875 ------------------------------------------------------------------------------------- |E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.30s (65.67%) |Training time=0.93s (26.48%) |Others=0.27 (7.85%)|CurSamplesPerSec=2.29 |AvgSamplesPerSec=2.42 epoch: 0|step: 1688|ppo_ep: 1|act_loss: 0.0556640625|cri_loss: 0.035614013671875|unsuper_loss: 0.0 average reward score: 2.625 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.62%) |Training time=0.65s (20.31%) |Others=0.19 (6.07%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 1689|ppo_ep: 1|act_loss: 0.1318359375|cri_loss: 0.0877685546875|unsuper_loss: 0.0 average reward score: 2.84765625 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.87%) |Training time=0.64s (20.08%) |Others=0.19 (6.05%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 1690|ppo_ep: 1|act_loss: 0.31005859375|cri_loss: 0.18701171875|unsuper_loss: 0.0 average reward score: 3.30859375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.07%) |Training time=0.65s (19.80%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 1691|ppo_ep: 1|act_loss: 0.02789306640625|cri_loss: 0.041961669921875|unsuper_loss: 0.0 average reward score: 1.9501953125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.65%) |Training time=0.67s (20.47%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1692|ppo_ep: 1|act_loss: 0.02825927734375|cri_loss: 0.032257080078125|unsuper_loss: 0.0 average reward score: 3.033203125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.61%) |Training time=0.65s (20.15%) |Others=0.20 (6.24%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 1693|ppo_ep: 1|act_loss: 0.26953125|cri_loss: 0.1490478515625|unsuper_loss: 0.0 average reward score: 1.8076171875 ------------------------------------------------------------------------------------- |E2E latency=3.15s |Gather latency=0.00s (0.00%) |Generate time=2.31s (73.52%) |Training time=0.64s (20.35%) |Others=0.19 (6.13%)|CurSamplesPerSec=2.54 |AvgSamplesPerSec=2.42 epoch: 0|step: 1694|ppo_ep: 1|act_loss: -0.03448486328125|cri_loss: 0.0062255859375|unsuper_loss: 0.0 average reward score: 3.107421875 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.62%) |Training time=0.64s (20.13%) |Others=0.20 (6.24%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.42 epoch: 0|step: 1695|ppo_ep: 1|act_loss: -0.004730224609375|cri_loss: 0.01690673828125|unsuper_loss: 0.0 average reward score: 2.0546875 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.38s (66.39%) |Training time=0.92s (25.82%) |Others=0.28 (7.79%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.42 epoch: 0|step: 1696|ppo_ep: 1|act_loss: 0.1014404296875|cri_loss: 0.074951171875|unsuper_loss: 0.0 average reward score: 4.71875 ------------------------------------------------------------------------------------- |E2E latency=3.42s |Gather latency=0.00s (0.00%) |Generate time=2.57s (75.11%) |Training time=0.65s (18.88%) |Others=0.21 (6.01%)|CurSamplesPerSec=2.34 |AvgSamplesPerSec=2.42 epoch: 0|step: 1697|ppo_ep: 1|act_loss: 0.2734375|cri_loss: 0.1707763671875|unsuper_loss: 0.0 average reward score: 3.9765625 ------------------------------------------------------------------------------------- |E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.51s (73.94%) |Training time=0.69s (20.31%) |Others=0.20 (5.75%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.42 epoch: 0|step: 1698|ppo_ep: 1|act_loss: 0.10552978515625|cri_loss: 0.0872802734375|unsuper_loss: 0.0 average reward score: 2.1796875 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.45%) |Training time=0.64s (19.62%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 1699|ppo_ep: 1|act_loss: 0.003936767578125|cri_loss: 0.02996826171875|unsuper_loss: 0.0 average reward score: 3.65625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.07%) |Training time=0.65s (19.96%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 1700|ppo_ep: 1|act_loss: 0.001556396484375|cri_loss: 0.0281219482421875|unsuper_loss: 0.0 average reward score: 3.267578125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.40%) |Training time=0.64s (19.62%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 1701|ppo_ep: 1|act_loss: 0.3193359375|cri_loss: 0.181884765625|unsuper_loss: 0.0 average reward score: 2.931640625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.27%) |Training time=0.65s (19.87%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 1702|ppo_ep: 1|act_loss: -0.0882568359375|cri_loss: -0.025177001953125|unsuper_loss: 0.0 average reward score: 3.9453125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.28%) |Training time=0.64s (19.72%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1703|ppo_ep: 1|act_loss: -0.01727294921875|cri_loss: 0.00927734375|unsuper_loss: 0.0 average reward score: 3.982421875 ------------------------------------------------------------------------------------- |E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.43s (66.82%) |Training time=0.93s (25.59%) |Others=0.28 (7.59%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.42 epoch: 0|step: 1704|ppo_ep: 1|act_loss: 0.180908203125|cri_loss: 0.1142578125|unsuper_loss: 0.0 average reward score: 1.8916015625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.19%) |Training time=0.64s (19.76%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1705|ppo_ep: 1|act_loss: 0.15869140625|cri_loss: 0.09423828125|unsuper_loss: 0.0 average reward score: 2.72265625 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.92%) |Training time=0.64s (20.04%) |Others=0.19 (6.04%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 1706|ppo_ep: 1|act_loss: 0.192138671875|cri_loss: 0.12359619140625|unsuper_loss: 0.0 average reward score: 3.80078125 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.60%) |Training time=0.64s (20.13%) |Others=0.20 (6.27%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 1707|ppo_ep: 1|act_loss: 0.302001953125|cri_loss: 0.1893310546875|unsuper_loss: 0.0 average reward score: 3.06640625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.39%) |Training time=0.64s (19.60%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 1708|ppo_ep: 1|act_loss: -0.0029144287109375|cri_loss: 0.01079559326171875|unsuper_loss: 0.0 average reward score: 2.4453125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.27%) |Training time=0.64s (19.60%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1709|ppo_ep: 1|act_loss: 0.2484130859375|cri_loss: 0.162841796875|unsuper_loss: 0.0 average reward score: 3.6015625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.23%) |Training time=0.64s (19.80%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 1710|ppo_ep: 1|act_loss: 0.269287109375|cri_loss: 0.1494140625|unsuper_loss: 0.0 average reward score: 3.96484375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.10%) |Training time=0.64s (19.85%) |Others=0.19 (6.05%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 1711|ppo_ep: 1|act_loss: 0.08563232421875|cri_loss: 0.054351806640625|unsuper_loss: 0.0 average reward score: 3.623046875 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.64%) |Training time=0.93s (25.73%) |Others=0.28 (7.63%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.42 epoch: 0|step: 1712|ppo_ep: 1|act_loss: 0.10845947265625|cri_loss: 0.05975341796875|unsuper_loss: 0.0 average reward score: 2.61328125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.24%) |Training time=0.64s (19.82%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 1713|ppo_ep: 1|act_loss: -0.156494140625|cri_loss: -0.051513671875|unsuper_loss: 0.0 average reward score: 2.36328125 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.22%) |Training time=0.64s (19.83%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 1714|ppo_ep: 1|act_loss: 0.0006103515625|cri_loss: 0.047088623046875|unsuper_loss: 0.0 average reward score: 1.84765625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.99%) |Training time=0.65s (19.94%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 1715|ppo_ep: 1|act_loss: -0.06390380859375|cri_loss: -0.0165557861328125|unsuper_loss: 0.0 average reward score: 4.359375 ------------------------------------------------------------------------------------- |E2E latency=3.15s |Gather latency=0.00s (0.00%) |Generate time=2.31s (73.40%) |Training time=0.64s (20.43%) |Others=0.19 (6.17%)|CurSamplesPerSec=2.54 |AvgSamplesPerSec=2.42 epoch: 0|step: 1716|ppo_ep: 1|act_loss: -0.05194091796875|cri_loss: 0.002044677734375|unsuper_loss: 0.0 average reward score: 2.8984375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.07%) |Training time=0.64s (19.90%) |Others=0.19 (6.03%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 1717|ppo_ep: 1|act_loss: -0.0582275390625|cri_loss: 0.00750732421875|unsuper_loss: 0.0 average reward score: 2.77734375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.03%) |Training time=0.65s (20.04%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 1718|ppo_ep: 1|act_loss: -0.1658935546875|cri_loss: -0.056976318359375|unsuper_loss: 0.0 average reward score: 2.9140625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.15%) |Training time=0.64s (19.80%) |Others=0.19 (6.05%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 1719|ppo_ep: 1|act_loss: -0.21728515625|cri_loss: -0.09130859375|unsuper_loss: 0.0 average reward score: 2.595703125 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.40s (66.59%) |Training time=0.93s (25.88%) |Others=0.27 (7.53%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.42 epoch: 0|step: 1720|ppo_ep: 1|act_loss: -0.0110321044921875|cri_loss: 0.00179290771484375|unsuper_loss: 0.0 average reward score: 3.50390625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.15%) |Training time=0.64s (19.92%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 1721|ppo_ep: 1|act_loss: 0.04608154296875|cri_loss: 0.035247802734375|unsuper_loss: 0.0 average reward score: 2.56640625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.30%) |Training time=0.64s (19.80%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1722|ppo_ep: 1|act_loss: -0.05609130859375|cri_loss: -0.0052490234375|unsuper_loss: 0.0 average reward score: 4.171875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.18%) |Training time=0.64s (19.77%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1723|ppo_ep: 1|act_loss: -0.18701171875|cri_loss: -0.068359375|unsuper_loss: 0.0 average reward score: 3.900390625 ------------------------------------------------------------------------------------- |E2E latency=3.12s |Gather latency=0.00s (0.00%) |Generate time=2.27s (72.77%) |Training time=0.65s (20.97%) |Others=0.20 (6.27%)|CurSamplesPerSec=2.56 |AvgSamplesPerSec=2.42 epoch: 0|step: 1724|ppo_ep: 1|act_loss: 0.05670166015625|cri_loss: 0.0443115234375|unsuper_loss: 0.0 average reward score: 2.318359375 ------------------------------------------------------------------------------------- |E2E latency=3.16s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.67%) |Training time=0.64s (20.22%) |Others=0.19 (6.12%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.42 epoch: 0|step: 1725|ppo_ep: 1|act_loss: 0.1116943359375|cri_loss: 0.06866455078125|unsuper_loss: 0.0 average reward score: 3.296875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.83%) |Training time=0.66s (20.31%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 1726|ppo_ep: 1|act_loss: 0.019500732421875|cri_loss: 0.03118896484375|unsuper_loss: 0.0 average reward score: 2.7109375 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (74.05%) |Training time=0.64s (19.98%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 1727|ppo_ep: 1|act_loss: -0.070556640625|cri_loss: -0.007659912109375|unsuper_loss: 0.0 average reward score: 2.7265625 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.43s (66.88%) |Training time=0.92s (25.40%) |Others=0.28 (7.71%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.42 epoch: 0|step: 1728|ppo_ep: 1|act_loss: 0.1453857421875|cri_loss: 0.08734130859375|unsuper_loss: 0.0 average reward score: 2.5078125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.33%) |Training time=0.64s (19.65%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 1729|ppo_ep: 1|act_loss: 0.048187255859375|cri_loss: 0.0343017578125|unsuper_loss: 0.0 average reward score: 2.3515625 ------------------------------------------------------------------------------------- |E2E latency=3.16s |Gather latency=0.00s (0.00%) |Generate time=2.32s (73.54%) |Training time=0.64s (20.36%) |Others=0.19 (6.10%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.42 epoch: 0|step: 1730|ppo_ep: 1|act_loss: 0.1463623046875|cri_loss: 0.0938720703125|unsuper_loss: 0.0 average reward score: 2.544921875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.79%) |Training time=0.66s (20.21%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 1731|ppo_ep: 1|act_loss: 0.035919189453125|cri_loss: 0.032806396484375|unsuper_loss: 0.0 average reward score: 1.6220703125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.27%) |Training time=0.64s (19.79%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1732|ppo_ep: 1|act_loss: 0.125|cri_loss: 0.0859375|unsuper_loss: 0.0 average reward score: 2.3671875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.02%) |Training time=0.64s (19.80%) |Others=0.20 (6.18%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1733|ppo_ep: 1|act_loss: 0.264404296875|cri_loss: 0.2047119140625|unsuper_loss: 0.0 average reward score: 3.162109375 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.94%) |Training time=0.64s (20.10%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.42 epoch: 0|step: 1734|ppo_ep: 1|act_loss: 0.2138671875|cri_loss: 0.1346435546875|unsuper_loss: 0.0 average reward score: 3.28125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.69%) |Training time=0.65s (20.14%) |Others=0.20 (6.17%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 1735|ppo_ep: 1|act_loss: 0.2069091796875|cri_loss: 0.154052734375|unsuper_loss: 0.0 average reward score: 2.080078125 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.42s (66.76%) |Training time=0.93s (25.58%) |Others=0.28 (7.66%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.42 epoch: 0|step: 1736|ppo_ep: 1|act_loss: -0.038665771484375|cri_loss: 0.00506591796875|unsuper_loss: 0.0 average reward score: 2.76171875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.51%) |Training time=0.64s (19.53%) |Others=0.20 (5.96%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.42 epoch: 0|step: 1737|ppo_ep: 1|act_loss: -0.04486083984375|cri_loss: -0.01458740234375|unsuper_loss: 0.0 average reward score: 2.99609375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.37%) |Training time=0.64s (19.71%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 1738|ppo_ep: 1|act_loss: 0.48779296875|cri_loss: 0.3212890625|unsuper_loss: 0.0 average reward score: 2.404296875 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.37%) |Training time=0.65s (19.64%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.42 epoch: 0|step: 1739|ppo_ep: 1|act_loss: 0.0341796875|cri_loss: 0.055419921875|unsuper_loss: 0.0 average reward score: 2.71484375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.26%) |Training time=0.64s (19.82%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1740|ppo_ep: 1|act_loss: 0.08929443359375|cri_loss: 0.059722900390625|unsuper_loss: 0.0 average reward score: 3.3984375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.11%) |Training time=0.64s (19.91%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 1741|ppo_ep: 1|act_loss: 0.2432861328125|cri_loss: 0.154541015625|unsuper_loss: 0.0 average reward score: 3.51171875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.19%) |Training time=0.64s (19.90%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 epoch: 0|step: 1742|ppo_ep: 1|act_loss: 0.03753662109375|cri_loss: 0.03216552734375|unsuper_loss: 0.0 average reward score: 2.97265625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.79%) |Training time=0.66s (20.26%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.42 epoch: 0|step: 1743|ppo_ep: 1|act_loss: -0.035369873046875|cri_loss: 0.006500244140625|unsuper_loss: 0.0 average reward score: 2.32421875 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.40s (66.59%) |Training time=0.93s (25.84%) |Others=0.27 (7.58%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.42 epoch: 0|step: 1744|ppo_ep: 1|act_loss: -0.0533447265625|cri_loss: -0.00982666015625|unsuper_loss: 0.0 average reward score: 3.708984375 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.14%) |Training time=0.64s (19.98%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 1745|ppo_ep: 1|act_loss: 0.0927734375|cri_loss: 0.06231689453125|unsuper_loss: 0.0 average reward score: 2.455078125 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.08%) |Training time=0.64s (19.95%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 1746|ppo_ep: 1|act_loss: 0.084228515625|cri_loss: 0.1082763671875|unsuper_loss: 0.0 average reward score: 2.71484375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.56%) |Training time=0.64s (19.64%) |Others=0.19 (5.79%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.42 epoch: 0|step: 1747|ppo_ep: 1|act_loss: -0.224853515625|cri_loss: -0.07354736328125|unsuper_loss: 0.0 average reward score: 2.072265625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.10%) |Training time=0.64s (19.91%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 1748|ppo_ep: 1|act_loss: -0.06298828125|cri_loss: -0.0178375244140625|unsuper_loss: 0.0 average reward score: 3.55078125 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.09%) |Training time=0.64s (19.92%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 1749|ppo_ep: 1|act_loss: -0.023468017578125|cri_loss: 0.0077362060546875|unsuper_loss: 0.0 average reward score: 3.9453125 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.88%) |Training time=0.64s (20.06%) |Others=0.19 (6.06%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 1750|ppo_ep: 1|act_loss: -0.0634765625|cri_loss: -0.011962890625|unsuper_loss: 0.0 average reward score: 2.91015625 ------------------------------------------------------------------------------------- |E2E latency=3.14s |Gather latency=0.00s (0.00%) |Generate time=2.30s (73.22%) |Training time=0.64s (20.38%) |Others=0.20 (6.40%)|CurSamplesPerSec=2.55 |AvgSamplesPerSec=2.42 epoch: 0|step: 1751|ppo_ep: 1|act_loss: -0.11859130859375|cri_loss: -0.0352783203125|unsuper_loss: 0.0 average reward score: 3.08203125 ------------------------------------------------------------------------------------- |E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.28s (65.54%) |Training time=0.92s (26.54%) |Others=0.28 (7.93%)|CurSamplesPerSec=2.30 |AvgSamplesPerSec=2.42 epoch: 0|step: 1752|ppo_ep: 1|act_loss: 0.434814453125|cri_loss: 0.258544921875|unsuper_loss: 0.0 average reward score: 1.703125 ------------------------------------------------------------------------------------- |E2E latency=3.15s |Gather latency=0.00s (0.00%) |Generate time=2.32s (73.74%) |Training time=0.64s (20.19%) |Others=0.19 (6.07%)|CurSamplesPerSec=2.54 |AvgSamplesPerSec=2.42 epoch: 0|step: 1753|ppo_ep: 1|act_loss: -0.35791015625|cri_loss: -0.1241455078125|unsuper_loss: 0.0 average reward score: 2.93359375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.16%) |Training time=0.64s (19.96%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 1754|ppo_ep: 1|act_loss: 0.065185546875|cri_loss: 0.04888916015625|unsuper_loss: 0.0 average reward score: 2.435546875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.06%) |Training time=0.64s (19.98%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 1755|ppo_ep: 1|act_loss: 0.2200927734375|cri_loss: 0.156494140625|unsuper_loss: 0.0 average reward score: 1.8046875 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.27%) |Training time=0.64s (19.88%) |Others=0.19 (5.85%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 1756|ppo_ep: 1|act_loss: 0.11663818359375|cri_loss: 0.10113525390625|unsuper_loss: 0.0 average reward score: 2.427734375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.05%) |Training time=0.64s (19.86%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 1757|ppo_ep: 1|act_loss: -0.0809326171875|cri_loss: -0.030242919921875|unsuper_loss: 0.0 average reward score: 3.55859375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.21%) |Training time=0.64s (19.77%) |Others=0.19 (6.03%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.42 epoch: 0|step: 1758|ppo_ep: 1|act_loss: 0.0633544921875|cri_loss: 0.0440673828125|unsuper_loss: 0.0 average reward score: 2.98828125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.39%) |Training time=0.64s (19.64%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.42 [2023-04-24 15:24:47,626] [INFO] [logging.py:96:log_dist] [Rank 0] step=220, skipped=5, lr=[7.5888358290741174e-06, 7.5888358290741174e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 15:24:47,869] [INFO] [timer.py:199:stop] epoch=0/micro_step=1760/global_step=220, RunningAvgSamplesPerSec=15.467994538606904, CurrSamplesPerSec=15.87933770612092, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 15:24:48,068] [INFO] [logging.py:96:log_dist] [Rank 0] step=220, skipped=4, lr=[3.914867735826489e-06, 3.914867735826489e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 1759|ppo_ep: 1|act_loss: -0.013580322265625|cri_loss: 0.0100860595703125|unsuper_loss: 0.0 average reward score: 1.8046875 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.38s (66.51%) |Training time=0.93s (25.86%) |Others=0.27 (7.63%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.42 epoch: 0|step: 1760|ppo_ep: 1|act_loss: 0.20654296875|cri_loss: 0.1201171875|unsuper_loss: 0.0 average reward score: 2.69921875 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (74.31%) |Training time=0.63s (19.79%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.42 epoch: 0|step: 1761|ppo_ep: 1|act_loss: 0.116943359375|cri_loss: 0.072998046875|unsuper_loss: 0.0 average reward score: 3.06640625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.05%) |Training time=0.64s (19.90%) |Others=0.19 (6.04%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 1762|ppo_ep: 1|act_loss: -0.01171875|cri_loss: 0.026947021484375|unsuper_loss: 0.0 average reward score: 3.1875 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.17%) |Training time=0.64s (19.85%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.42 epoch: 0|step: 1763|ppo_ep: 1|act_loss: 0.0067138671875|cri_loss: 0.0252532958984375|unsuper_loss: 0.0 average reward score: 3.91796875 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.78%) |Training time=0.64s (20.24%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.42 epoch: 0|step: 1764|ppo_ep: 1|act_loss: -0.177490234375|cri_loss: -0.03131103515625|unsuper_loss: 0.0 average reward score: 2.828125 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.68%) |Training time=0.65s (19.67%) |Others=0.22 (6.65%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.42 epoch: 0|step: 1765|ppo_ep: 1|act_loss: 0.035736083984375|cri_loss: 0.03033447265625|unsuper_loss: 0.0 average reward score: 2.591796875 ------------------------------------------------------------------------------------- |E2E latency=3.86s |Gather latency=0.00s (0.00%) |Generate time=2.97s (76.96%) |Training time=0.65s (16.74%) |Others=0.24 (6.30%)|CurSamplesPerSec=2.07 |AvgSamplesPerSec=2.42 epoch: 0|step: 1766|ppo_ep: 1|act_loss: 0.03582763671875|cri_loss: 0.041290283203125|unsuper_loss: 0.0 average reward score: 3.3203125 ------------------------------------------------------------------------------------- |E2E latency=3.74s |Gather latency=0.00s (0.00%) |Generate time=2.87s (76.59%) |Training time=0.65s (17.44%) |Others=0.22 (5.97%)|CurSamplesPerSec=2.14 |AvgSamplesPerSec=2.42 epoch: 0|step: 1767|ppo_ep: 1|act_loss: -0.056488037109375|cri_loss: -0.006561279296875|unsuper_loss: 0.0 average reward score: 1.958984375 ------------------------------------------------------------------------------------- |E2E latency=4.20s |Gather latency=0.00s (0.00%) |Generate time=2.91s (69.34%) |Training time=0.97s (23.16%) |Others=0.31 (7.50%)|CurSamplesPerSec=1.91 |AvgSamplesPerSec=2.42 epoch: 0|step: 1768|ppo_ep: 1|act_loss: 0.49267578125|cri_loss: 0.295166015625|unsuper_loss: 0.0 average reward score: 1.91015625 ------------------------------------------------------------------------------------- |E2E latency=3.88s |Gather latency=0.00s (0.00%) |Generate time=2.97s (76.55%) |Training time=0.65s (16.73%) |Others=0.26 (6.72%)|CurSamplesPerSec=2.06 |AvgSamplesPerSec=2.42 epoch: 0|step: 1769|ppo_ep: 1|act_loss: 0.1641845703125|cri_loss: 0.10772705078125|unsuper_loss: 0.0 average reward score: 3.189453125 ------------------------------------------------------------------------------------- |E2E latency=3.92s |Gather latency=0.00s (0.00%) |Generate time=3.00s (76.48%) |Training time=0.69s (17.63%) |Others=0.23 (5.89%)|CurSamplesPerSec=2.04 |AvgSamplesPerSec=2.42 epoch: 0|step: 1770|ppo_ep: 1|act_loss: 0.113525390625|cri_loss: 0.12127685546875|unsuper_loss: 0.0 average reward score: 2.05859375 ------------------------------------------------------------------------------------- |E2E latency=3.89s |Gather latency=0.00s (0.00%) |Generate time=3.03s (77.75%) |Training time=0.65s (16.61%) |Others=0.22 (5.65%)|CurSamplesPerSec=2.05 |AvgSamplesPerSec=2.42 epoch: 0|step: 1771|ppo_ep: 1|act_loss: 0.24169921875|cri_loss: 0.1644287109375|unsuper_loss: 0.0 average reward score: 2.890625 ------------------------------------------------------------------------------------- |E2E latency=3.80s |Gather latency=0.00s (0.00%) |Generate time=2.89s (76.12%) |Training time=0.66s (17.36%) |Others=0.25 (6.52%)|CurSamplesPerSec=2.11 |AvgSamplesPerSec=2.42 epoch: 0|step: 1772|ppo_ep: 1|act_loss: 0.1290283203125|cri_loss: 0.0751953125|unsuper_loss: 0.0 average reward score: 2.62890625 ------------------------------------------------------------------------------------- |E2E latency=3.90s |Gather latency=0.00s (0.00%) |Generate time=3.01s (77.11%) |Training time=0.66s (17.03%) |Others=0.23 (5.86%)|CurSamplesPerSec=2.05 |AvgSamplesPerSec=2.42 epoch: 0|step: 1773|ppo_ep: 1|act_loss: 0.316650390625|cri_loss: 0.1904296875|unsuper_loss: 0.0 average reward score: 3.6015625 ------------------------------------------------------------------------------------- |E2E latency=3.87s |Gather latency=0.00s (0.00%) |Generate time=2.98s (77.15%) |Training time=0.65s (16.82%) |Others=0.23 (6.03%)|CurSamplesPerSec=2.07 |AvgSamplesPerSec=2.41 epoch: 0|step: 1774|ppo_ep: 1|act_loss: 0.50146484375|cri_loss: 0.316162109375|unsuper_loss: 0.0 average reward score: 2.37109375 ------------------------------------------------------------------------------------- |E2E latency=3.79s |Gather latency=0.00s (0.00%) |Generate time=2.91s (76.77%) |Training time=0.66s (17.30%) |Others=0.23 (5.94%)|CurSamplesPerSec=2.11 |AvgSamplesPerSec=2.41 epoch: 0|step: 1775|ppo_ep: 1|act_loss: 0.265625|cri_loss: 0.1549072265625|unsuper_loss: 0.0 average reward score: 2.5859375 ------------------------------------------------------------------------------------- |E2E latency=4.34s |Gather latency=0.00s (0.00%) |Generate time=3.08s (70.85%) |Training time=0.96s (22.20%) |Others=0.30 (6.95%)|CurSamplesPerSec=1.84 |AvgSamplesPerSec=2.41 epoch: 0|step: 1776|ppo_ep: 1|act_loss: 0.57470703125|cri_loss: 0.345703125|unsuper_loss: 0.0 average reward score: 2.869140625 ------------------------------------------------------------------------------------- |E2E latency=3.95s |Gather latency=0.00s (0.00%) |Generate time=3.05s (77.32%) |Training time=0.66s (16.67%) |Others=0.24 (6.01%)|CurSamplesPerSec=2.03 |AvgSamplesPerSec=2.41 epoch: 0|step: 1777|ppo_ep: 1|act_loss: 0.48046875|cri_loss: 0.30419921875|unsuper_loss: 0.0 average reward score: 3.28125 ------------------------------------------------------------------------------------- |E2E latency=4.03s |Gather latency=0.00s (0.00%) |Generate time=3.14s (77.95%) |Training time=0.67s (16.61%) |Others=0.22 (5.43%)|CurSamplesPerSec=1.98 |AvgSamplesPerSec=2.41 epoch: 0|step: 1778|ppo_ep: 1|act_loss: 0.176513671875|cri_loss: 0.111083984375|unsuper_loss: 0.0 average reward score: 1.9677734375 ------------------------------------------------------------------------------------- |E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.56s (75.28%) |Training time=0.64s (19.00%) |Others=0.19 (5.73%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.41 epoch: 0|step: 1779|ppo_ep: 1|act_loss: 0.31201171875|cri_loss: 0.20556640625|unsuper_loss: 0.0 average reward score: 2.6328125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.43%) |Training time=0.67s (20.64%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.41 epoch: 0|step: 1780|ppo_ep: 1|act_loss: 0.0259246826171875|cri_loss: 0.04315185546875|unsuper_loss: 0.0 average reward score: 2.91015625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.10%) |Training time=0.64s (19.77%) |Others=0.20 (6.14%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1781|ppo_ep: 1|act_loss: 0.3525390625|cri_loss: 0.2373046875|unsuper_loss: 0.0 average reward score: 2.7109375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.27%) |Training time=0.64s (19.77%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1782|ppo_ep: 1|act_loss: 0.1866455078125|cri_loss: 0.1121826171875|unsuper_loss: 0.0 average reward score: 3.7890625 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.52s (75.07%) |Training time=0.64s (19.14%) |Others=0.19 (5.79%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.41 epoch: 0|step: 1783|ppo_ep: 1|act_loss: 0.275634765625|cri_loss: 0.169921875|unsuper_loss: 0.0 average reward score: 2.1640625 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.36s (65.35%) |Training time=0.98s (27.05%) |Others=0.27 (7.59%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.41 epoch: 0|step: 1784|ppo_ep: 1|act_loss: 0.062042236328125|cri_loss: 0.0762939453125|unsuper_loss: 0.0 average reward score: 1.9296875 ------------------------------------------------------------------------------------- |E2E latency=3.16s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.78%) |Training time=0.64s (20.11%) |Others=0.19 (6.11%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.41 epoch: 0|step: 1785|ppo_ep: 1|act_loss: 0.1461181640625|cri_loss: 0.0865478515625|unsuper_loss: 0.0 average reward score: 3.55078125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.32%) |Training time=0.64s (19.68%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 1786|ppo_ep: 1|act_loss: 0.234619140625|cri_loss: 0.1397705078125|unsuper_loss: 0.0 average reward score: 2.828125 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.41%) |Training time=0.65s (19.66%) |Others=0.20 (5.93%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.41 epoch: 0|step: 1787|ppo_ep: 1|act_loss: 0.443359375|cri_loss: 0.27783203125|unsuper_loss: 0.0 average reward score: 1.703125 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.56%) |Training time=0.64s (19.58%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 1788|ppo_ep: 1|act_loss: 0.01447296142578125|cri_loss: 0.027191162109375|unsuper_loss: 0.0 average reward score: 3.76171875 ------------------------------------------------------------------------------------- |E2E latency=3.88s |Gather latency=0.00s (0.00%) |Generate time=2.43s (62.62%) |Training time=0.85s (21.88%) |Others=0.60 (15.50%)|CurSamplesPerSec=2.06 |AvgSamplesPerSec=2.41 epoch: 0|step: 1789|ppo_ep: 1|act_loss: -0.11083984375|cri_loss: -0.037353515625|unsuper_loss: 0.0 average reward score: 1.982421875 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.73%) |Training time=0.65s (19.28%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.41 epoch: 0|step: 1790|ppo_ep: 1|act_loss: -0.0404052734375|cri_loss: -0.00628662109375|unsuper_loss: 0.0 average reward score: 1.9443359375 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.93%) |Training time=0.64s (19.15%) |Others=0.20 (5.93%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.41 epoch: 0|step: 1791|ppo_ep: 1|act_loss: 0.333251953125|cri_loss: 0.201904296875|unsuper_loss: 0.0 average reward score: 2.685546875 ------------------------------------------------------------------------------------- |E2E latency=3.65s |Gather latency=0.00s (0.00%) |Generate time=2.45s (67.06%) |Training time=0.93s (25.37%) |Others=0.28 (7.57%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.41 epoch: 0|step: 1792|ppo_ep: 1|act_loss: 0.06231689453125|cri_loss: 0.0484619140625|unsuper_loss: 0.0 average reward score: 1.712890625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.87%) |Training time=0.66s (20.28%) |Others=0.19 (5.84%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1793|ppo_ep: 1|act_loss: 0.004974365234375|cri_loss: 0.0192413330078125|unsuper_loss: 0.0 average reward score: 2.39453125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.40%) |Training time=0.64s (19.68%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1794|ppo_ep: 1|act_loss: 0.00844573974609375|cri_loss: 0.019195556640625|unsuper_loss: 0.0 average reward score: 1.9443359375 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.87%) |Training time=0.64s (19.15%) |Others=0.20 (5.98%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.41 epoch: 0|step: 1795|ppo_ep: 1|act_loss: 0.1231689453125|cri_loss: 0.076416015625|unsuper_loss: 0.0 average reward score: 2.43359375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (73.88%) |Training time=0.64s (19.59%) |Others=0.21 (6.54%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 1796|ppo_ep: 1|act_loss: -0.1419677734375|cri_loss: -0.04193115234375|unsuper_loss: 0.0 average reward score: 2.71875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.11%) |Training time=0.69s (20.88%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 1797|ppo_ep: 1|act_loss: -0.00811767578125|cri_loss: 0.0186309814453125|unsuper_loss: 0.0 average reward score: 3.123046875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.37%) |Training time=0.65s (19.74%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 1798|ppo_ep: 1|act_loss: 0.19287109375|cri_loss: 0.12335205078125|unsuper_loss: 0.0 average reward score: 2.6171875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.25%) |Training time=0.64s (19.74%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.41 epoch: 0|step: 1799|ppo_ep: 1|act_loss: -0.00046539306640625|cri_loss: 0.0074615478515625|unsuper_loss: 0.0 average reward score: 2.626953125 ------------------------------------------------------------------------------------- |E2E latency=3.71s |Gather latency=0.00s (0.00%) |Generate time=2.48s (66.79%) |Training time=0.94s (25.35%) |Others=0.29 (7.86%)|CurSamplesPerSec=2.16 |AvgSamplesPerSec=2.41 epoch: 0|step: 1800|ppo_ep: 1|act_loss: -0.008209228515625|cri_loss: 0.0210418701171875|unsuper_loss: 0.0 average reward score: 3.005859375 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.70%) |Training time=0.64s (19.29%) |Others=0.20 (6.02%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.41 epoch: 0|step: 1801|ppo_ep: 1|act_loss: -0.07177734375|cri_loss: 0.001708984375|unsuper_loss: 0.0 average reward score: 3.263671875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.21%) |Training time=0.65s (19.89%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1802|ppo_ep: 1|act_loss: -0.0933837890625|cri_loss: -0.030242919921875|unsuper_loss: 0.0 average reward score: 1.671875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.58%) |Training time=0.63s (19.42%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1803|ppo_ep: 1|act_loss: -0.18310546875|cri_loss: -0.052001953125|unsuper_loss: 0.0 average reward score: 2.08203125 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.91%) |Training time=0.64s (19.21%) |Others=0.20 (5.88%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.41 epoch: 0|step: 1804|ppo_ep: 1|act_loss: 0.046478271484375|cri_loss: 0.049285888671875|unsuper_loss: 0.0 average reward score: 1.279296875 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.53%) |Training time=0.64s (19.43%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.41 epoch: 0|step: 1805|ppo_ep: 1|act_loss: -0.1175537109375|cri_loss: -0.042510986328125|unsuper_loss: 0.0 average reward score: 2.8515625 ------------------------------------------------------------------------------------- |E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.63s (75.48%) |Training time=0.65s (18.52%) |Others=0.21 (5.99%)|CurSamplesPerSec=2.29 |AvgSamplesPerSec=2.41 epoch: 0|step: 1806|ppo_ep: 1|act_loss: 0.0919189453125|cri_loss: 0.056396484375|unsuper_loss: 0.0 average reward score: 2.52734375 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.53%) |Training time=0.65s (19.40%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.41 epoch: 0|step: 1807|ppo_ep: 1|act_loss: -0.205078125|cri_loss: -0.077392578125|unsuper_loss: 0.0 average reward score: 2.94140625 ------------------------------------------------------------------------------------- |E2E latency=3.67s |Gather latency=0.00s (0.00%) |Generate time=2.46s (66.92%) |Training time=0.93s (25.32%) |Others=0.28 (7.76%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.41 epoch: 0|step: 1808|ppo_ep: 1|act_loss: 0.5439453125|cri_loss: 0.338134765625|unsuper_loss: 0.0 average reward score: 2.51953125 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.34%) |Training time=0.64s (19.49%) |Others=0.20 (6.17%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 1809|ppo_ep: 1|act_loss: 0.152587890625|cri_loss: 0.10052490234375|unsuper_loss: 0.0 average reward score: 2.80078125 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.82%) |Training time=0.64s (19.35%) |Others=0.19 (5.83%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.41 epoch: 0|step: 1810|ppo_ep: 1|act_loss: 0.0088958740234375|cri_loss: 0.034912109375|unsuper_loss: 0.0 average reward score: 2.98046875 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.42%) |Training time=0.64s (19.30%) |Others=0.21 (6.28%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.41 epoch: 0|step: 1811|ppo_ep: 1|act_loss: -0.04620361328125|cri_loss: -0.0108795166015625|unsuper_loss: 0.0 average reward score: 3.88671875 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.17%) |Training time=0.65s (19.63%) |Others=0.20 (6.20%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.41 epoch: 0|step: 1812|ppo_ep: 1|act_loss: -0.196044921875|cri_loss: -0.07763671875|unsuper_loss: 0.0 average reward score: 3.32421875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.41%) |Training time=0.64s (19.59%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 1813|ppo_ep: 1|act_loss: 0.33935546875|cri_loss: 0.2020263671875|unsuper_loss: 0.0 average reward score: 2.34375 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.58%) |Training time=0.64s (19.39%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.41 epoch: 0|step: 1814|ppo_ep: 1|act_loss: -0.12005615234375|cri_loss: -0.03375244140625|unsuper_loss: 0.0 average reward score: 3.66015625 ------------------------------------------------------------------------------------- |E2E latency=3.42s |Gather latency=0.00s (0.00%) |Generate time=2.58s (75.47%) |Training time=0.64s (18.76%) |Others=0.20 (5.77%)|CurSamplesPerSec=2.34 |AvgSamplesPerSec=2.41 epoch: 0|step: 1815|ppo_ep: 1|act_loss: -0.240234375|cri_loss: -0.09326171875|unsuper_loss: 0.0 average reward score: 1.1455078125 ------------------------------------------------------------------------------------- |E2E latency=3.68s |Gather latency=0.00s (0.00%) |Generate time=2.48s (67.18%) |Training time=0.93s (25.15%) |Others=0.28 (7.66%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.41 epoch: 0|step: 1816|ppo_ep: 1|act_loss: 0.040313720703125|cri_loss: 0.0753173828125|unsuper_loss: 0.0 average reward score: 1.796875 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.52s (74.91%) |Training time=0.65s (19.17%) |Others=0.20 (5.93%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.41 epoch: 0|step: 1817|ppo_ep: 1|act_loss: -0.21728515625|cri_loss: -0.05841064453125|unsuper_loss: 0.0 average reward score: 4.05859375 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.71%) |Training time=0.64s (19.39%) |Others=0.20 (5.90%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.41 epoch: 0|step: 1818|ppo_ep: 1|act_loss: -0.039398193359375|cri_loss: 0.001556396484375|unsuper_loss: 0.0 average reward score: 3.6796875 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.03%) |Training time=0.64s (19.22%) |Others=0.23 (6.75%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.41 epoch: 0|step: 1819|ppo_ep: 1|act_loss: -0.169677734375|cri_loss: -0.055023193359375|unsuper_loss: 0.0 average reward score: 2.91015625 ------------------------------------------------------------------------------------- |E2E latency=3.44s |Gather latency=0.00s (0.00%) |Generate time=2.58s (75.01%) |Training time=0.66s (19.09%) |Others=0.20 (5.90%)|CurSamplesPerSec=2.33 |AvgSamplesPerSec=2.41 epoch: 0|step: 1820|ppo_ep: 1|act_loss: -0.0709228515625|cri_loss: 0.01641845703125|unsuper_loss: 0.0 average reward score: 1.72265625 ------------------------------------------------------------------------------------- |E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.52s (74.62%) |Training time=0.65s (19.30%) |Others=0.21 (6.08%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.41 epoch: 0|step: 1821|ppo_ep: 1|act_loss: 0.433349609375|cri_loss: 0.2529296875|unsuper_loss: 0.0 average reward score: 3.3671875 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.44%) |Training time=0.64s (19.45%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.41 epoch: 0|step: 1822|ppo_ep: 1|act_loss: 0.21728515625|cri_loss: 0.1416015625|unsuper_loss: 0.0 average reward score: 2.8125 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.18%) |Training time=0.64s (19.42%) |Others=0.21 (6.40%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.41 epoch: 0|step: 1823|ppo_ep: 1|act_loss: 0.15478515625|cri_loss: 0.110595703125|unsuper_loss: 0.0 average reward score: 1.517578125 ------------------------------------------------------------------------------------- |E2E latency=3.69s |Gather latency=0.00s (0.00%) |Generate time=2.45s (66.33%) |Training time=0.96s (25.93%) |Others=0.29 (7.74%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.41 epoch: 0|step: 1824|ppo_ep: 1|act_loss: 0.002349853515625|cri_loss: 0.0275115966796875|unsuper_loss: 0.0 average reward score: 1.611328125 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.12%) |Training time=0.65s (19.37%) |Others=0.22 (6.51%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.41 epoch: 0|step: 1825|ppo_ep: 1|act_loss: -0.25244140625|cri_loss: -0.09527587890625|unsuper_loss: 0.0 average reward score: 3.4921875 ------------------------------------------------------------------------------------- |E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.63s (75.49%) |Training time=0.64s (18.36%) |Others=0.21 (6.16%)|CurSamplesPerSec=2.30 |AvgSamplesPerSec=2.41 epoch: 0|step: 1826|ppo_ep: 1|act_loss: 0.059356689453125|cri_loss: 0.0540771484375|unsuper_loss: 0.0 average reward score: 2.412109375 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.14%) |Training time=0.65s (19.67%) |Others=0.20 (6.18%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 1827|ppo_ep: 1|act_loss: -0.160400390625|cri_loss: -0.04779052734375|unsuper_loss: 0.0 average reward score: 2.783203125 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.44%) |Training time=0.64s (19.33%) |Others=0.21 (6.23%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.41 epoch: 0|step: 1828|ppo_ep: 1|act_loss: -0.204345703125|cri_loss: -0.05462646484375|unsuper_loss: 0.0 average reward score: 4.09375 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.64%) |Training time=0.65s (19.35%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.41 epoch: 0|step: 1829|ppo_ep: 1|act_loss: -0.1962890625|cri_loss: -0.078369140625|unsuper_loss: 0.0 average reward score: 2.021484375 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.30%) |Training time=0.64s (19.41%) |Others=0.21 (6.29%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.41 epoch: 0|step: 1830|ppo_ep: 1|act_loss: 0.10369873046875|cri_loss: 0.08447265625|unsuper_loss: 0.0 average reward score: 1.919921875 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.50s (73.99%) |Training time=0.64s (18.93%) |Others=0.24 (7.08%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.41 epoch: 0|step: 1831|ppo_ep: 1|act_loss: -0.1767578125|cri_loss: -0.068603515625|unsuper_loss: 0.0 average reward score: 2.88671875 ------------------------------------------------------------------------------------- |E2E latency=3.89s |Gather latency=0.00s (0.00%) |Generate time=2.68s (68.78%) |Training time=0.93s (23.90%) |Others=0.28 (7.32%)|CurSamplesPerSec=2.06 |AvgSamplesPerSec=2.41 epoch: 0|step: 1832|ppo_ep: 1|act_loss: -0.005828857421875|cri_loss: 0.0159454345703125|unsuper_loss: 0.0 average reward score: 3.578125 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.54%) |Training time=0.64s (19.34%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 1833|ppo_ep: 1|act_loss: -0.08306884765625|cri_loss: -0.0201416015625|unsuper_loss: 0.0 average reward score: 1.8232421875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.36%) |Training time=0.64s (19.52%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.41 epoch: 0|step: 1834|ppo_ep: 1|act_loss: 0.11993408203125|cri_loss: 0.102783203125|unsuper_loss: 0.0 average reward score: 2.69921875 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.42%) |Training time=0.64s (19.30%) |Others=0.21 (6.29%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.41 epoch: 0|step: 1835|ppo_ep: 1|act_loss: 0.103271484375|cri_loss: 0.0831298828125|unsuper_loss: 0.0 average reward score: 3.3203125 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.70%) |Training time=0.65s (19.52%) |Others=0.19 (5.79%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.41 epoch: 0|step: 1836|ppo_ep: 1|act_loss: -0.08587646484375|cri_loss: -0.00970458984375|unsuper_loss: 0.0 average reward score: 3.208984375 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.09%) |Training time=0.64s (19.49%) |Others=0.21 (6.42%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 1837|ppo_ep: 1|act_loss: -0.26806640625|cri_loss: -0.0902099609375|unsuper_loss: 0.0 average reward score: 2.12109375 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.53%) |Training time=0.65s (19.62%) |Others=0.19 (5.85%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.41 epoch: 0|step: 1838|ppo_ep: 1|act_loss: -0.299560546875|cri_loss: -0.1214599609375|unsuper_loss: 0.0 average reward score: 2.490234375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.43%) |Training time=0.64s (19.64%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 [2023-04-24 15:29:24,041] [INFO] [logging.py:96:log_dist] [Rank 0] step=230, skipped=5, lr=[7.249128370274815e-06, 7.249128370274815e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 15:29:24,280] [INFO] [timer.py:199:stop] epoch=0/micro_step=1840/global_step=230, RunningAvgSamplesPerSec=15.475585967999457, CurrSamplesPerSec=15.840676588442118, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 15:29:24,494] [INFO] [logging.py:96:log_dist] [Rank 0] step=230, skipped=4, lr=[3.737920834262134e-06, 3.737920834262134e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 1839|ppo_ep: 1|act_loss: -0.04254150390625|cri_loss: 0.01123046875|unsuper_loss: 0.0 average reward score: 2.998046875 ------------------------------------------------------------------------------------- |E2E latency=3.67s |Gather latency=0.00s (0.00%) |Generate time=2.46s (66.86%) |Training time=0.92s (25.17%) |Others=0.29 (7.97%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.41 epoch: 0|step: 1840|ppo_ep: 1|act_loss: 0.0229949951171875|cri_loss: 0.0189361572265625|unsuper_loss: 0.0 average reward score: 1.841796875 ------------------------------------------------------------------------------------- |E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.55s (75.32%) |Training time=0.64s (18.89%) |Others=0.20 (5.79%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.41 epoch: 0|step: 1841|ppo_ep: 1|act_loss: 0.08087158203125|cri_loss: 0.06243896484375|unsuper_loss: 0.0 average reward score: 1.34375 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.58%) |Training time=0.64s (19.35%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.41 epoch: 0|step: 1842|ppo_ep: 1|act_loss: 0.35107421875|cri_loss: 0.203125|unsuper_loss: 0.0 average reward score: 1.216796875 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.53%) |Training time=0.64s (19.37%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.41 epoch: 0|step: 1843|ppo_ep: 1|act_loss: 0.00830078125|cri_loss: 0.0241546630859375|unsuper_loss: 0.0 average reward score: 2.169921875 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.55%) |Training time=0.64s (19.56%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 1844|ppo_ep: 1|act_loss: 0.53369140625|cri_loss: 0.306640625|unsuper_loss: 0.0 average reward score: 1.6533203125 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.43%) |Training time=0.64s (19.52%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 1845|ppo_ep: 1|act_loss: 0.08856201171875|cri_loss: 0.07342529296875|unsuper_loss: 0.0 average reward score: 3.16796875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.34%) |Training time=0.65s (19.62%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 1846|ppo_ep: 1|act_loss: -0.113525390625|cri_loss: -0.0142822265625|unsuper_loss: 0.0 average reward score: 2.80859375 ------------------------------------------------------------------------------------- |E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.56s (75.24%) |Training time=0.64s (18.87%) |Others=0.20 (5.89%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.41 epoch: 0|step: 1847|ppo_ep: 1|act_loss: 0.196533203125|cri_loss: 0.1156005859375|unsuper_loss: 0.0 average reward score: 2.078125 ------------------------------------------------------------------------------------- |E2E latency=3.67s |Gather latency=0.00s (0.00%) |Generate time=2.46s (66.98%) |Training time=0.93s (25.27%) |Others=0.28 (7.75%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.41 epoch: 0|step: 1848|ppo_ep: 1|act_loss: 0.20654296875|cri_loss: 0.1231689453125|unsuper_loss: 0.0 average reward score: 1.845703125 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.35%) |Training time=0.64s (19.49%) |Others=0.20 (6.16%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 1849|ppo_ep: 1|act_loss: 0.011749267578125|cri_loss: 0.02850341796875|unsuper_loss: 0.0 average reward score: 2.72265625 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.16%) |Training time=0.65s (19.61%) |Others=0.21 (6.24%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.41 epoch: 0|step: 1850|ppo_ep: 1|act_loss: 0.1412353515625|cri_loss: 0.09674072265625|unsuper_loss: 0.0 average reward score: 2.74609375 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.71%) |Training time=0.64s (19.40%) |Others=0.20 (5.90%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.41 epoch: 0|step: 1851|ppo_ep: 1|act_loss: -0.0372314453125|cri_loss: 0.0233154296875|unsuper_loss: 0.0 average reward score: 2.64453125 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.76%) |Training time=0.64s (19.35%) |Others=0.20 (5.89%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.41 epoch: 0|step: 1852|ppo_ep: 1|act_loss: 0.318115234375|cri_loss: 0.203857421875|unsuper_loss: 0.0 average reward score: 2.775390625 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.18%) |Training time=0.65s (19.55%) |Others=0.21 (6.27%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.41 epoch: 0|step: 1853|ppo_ep: 1|act_loss: -0.3193359375|cri_loss: -0.1209716796875|unsuper_loss: 0.0 average reward score: 3.68359375 ------------------------------------------------------------------------------------- |E2E latency=3.42s |Gather latency=0.00s (0.00%) |Generate time=2.57s (75.28%) |Training time=0.64s (18.84%) |Others=0.20 (5.88%)|CurSamplesPerSec=2.34 |AvgSamplesPerSec=2.41 epoch: 0|step: 1854|ppo_ep: 1|act_loss: 0.386962890625|cri_loss: 0.2227783203125|unsuper_loss: 0.0 average reward score: 2.751953125 ------------------------------------------------------------------------------------- |E2E latency=3.42s |Gather latency=0.00s (0.00%) |Generate time=2.57s (75.16%) |Training time=0.64s (18.84%) |Others=0.21 (6.00%)|CurSamplesPerSec=2.34 |AvgSamplesPerSec=2.41 epoch: 0|step: 1855|ppo_ep: 1|act_loss: 0.35888671875|cri_loss: 0.208740234375|unsuper_loss: 0.0 average reward score: 3.58203125 ------------------------------------------------------------------------------------- |E2E latency=3.69s |Gather latency=0.00s (0.00%) |Generate time=2.49s (67.32%) |Training time=0.93s (25.13%) |Others=0.28 (7.54%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.41 epoch: 0|step: 1856|ppo_ep: 1|act_loss: 0.04388427734375|cri_loss: 0.0364990234375|unsuper_loss: 0.0 average reward score: 3.576171875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.35%) |Training time=0.64s (19.31%) |Others=0.21 (6.34%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 1857|ppo_ep: 1|act_loss: -0.057952880859375|cri_loss: -0.013763427734375|unsuper_loss: 0.0 average reward score: 3.22265625 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.51s (75.02%) |Training time=0.64s (19.16%) |Others=0.19 (5.82%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.41 epoch: 0|step: 1858|ppo_ep: 1|act_loss: 0.1988525390625|cri_loss: 0.157470703125|unsuper_loss: 0.0 average reward score: 3.05078125 ------------------------------------------------------------------------------------- |E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.53s (74.99%) |Training time=0.64s (19.03%) |Others=0.20 (5.98%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.41 epoch: 0|step: 1859|ppo_ep: 1|act_loss: 0.02447509765625|cri_loss: 0.032806396484375|unsuper_loss: 0.0 average reward score: 2.5390625 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.83%) |Training time=0.64s (19.14%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.41 epoch: 0|step: 1860|ppo_ep: 1|act_loss: 0.47412109375|cri_loss: 0.28173828125|unsuper_loss: 0.0 average reward score: 2.611328125 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.64%) |Training time=0.64s (19.31%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.41 epoch: 0|step: 1861|ppo_ep: 1|act_loss: 0.364990234375|cri_loss: 0.248291015625|unsuper_loss: 0.0 average reward score: 3.21875 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.39%) |Training time=0.65s (19.51%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.41 epoch: 0|step: 1862|ppo_ep: 1|act_loss: 0.03253173828125|cri_loss: 0.027984619140625|unsuper_loss: 0.0 average reward score: 3.740234375 ------------------------------------------------------------------------------------- |E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.55s (75.28%) |Training time=0.64s (18.90%) |Others=0.20 (5.83%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.41 epoch: 0|step: 1863|ppo_ep: 1|act_loss: -0.027374267578125|cri_loss: 0.017425537109375|unsuper_loss: 0.0 average reward score: 2.8828125 ------------------------------------------------------------------------------------- |E2E latency=3.69s |Gather latency=0.00s (0.00%) |Generate time=2.48s (67.17%) |Training time=0.93s (25.19%) |Others=0.28 (7.64%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.41 epoch: 0|step: 1864|ppo_ep: 1|act_loss: 0.0526123046875|cri_loss: 0.04425048828125|unsuper_loss: 0.0 average reward score: 2.203125 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.40%) |Training time=0.65s (19.64%) |Others=0.20 (5.96%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 1865|ppo_ep: 1|act_loss: 0.169189453125|cri_loss: 0.10333251953125|unsuper_loss: 0.0 average reward score: 2.9609375 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.63%) |Training time=0.64s (19.45%) |Others=0.20 (5.92%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.41 epoch: 0|step: 1866|ppo_ep: 1|act_loss: 0.078857421875|cri_loss: 0.0587158203125|unsuper_loss: 0.0 average reward score: 4.171875 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.61%) |Training time=0.64s (19.48%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 1867|ppo_ep: 1|act_loss: 0.21337890625|cri_loss: 0.1376953125|unsuper_loss: 0.0 average reward score: 2.87890625 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.55%) |Training time=0.64s (19.54%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.41 epoch: 0|step: 1868|ppo_ep: 1|act_loss: 0.280517578125|cri_loss: 0.17333984375|unsuper_loss: 0.0 average reward score: 1.8671875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.38%) |Training time=0.64s (19.53%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.41 epoch: 0|step: 1869|ppo_ep: 1|act_loss: 0.07476806640625|cri_loss: 0.06280517578125|unsuper_loss: 0.0 average reward score: 2.228515625 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.99%) |Training time=0.64s (19.10%) |Others=0.20 (5.91%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.41 epoch: 0|step: 1870|ppo_ep: 1|act_loss: 0.340576171875|cri_loss: 0.2078857421875|unsuper_loss: 0.0 average reward score: 2.123046875 ------------------------------------------------------------------------------------- |E2E latency=3.41s |Gather latency=0.00s (0.00%) |Generate time=2.56s (75.06%) |Training time=0.64s (18.84%) |Others=0.21 (6.10%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.41 epoch: 0|step: 1871|ppo_ep: 1|act_loss: 0.1541748046875|cri_loss: 0.09417724609375|unsuper_loss: 0.0 average reward score: 2.689453125 ------------------------------------------------------------------------------------- |E2E latency=4.11s |Gather latency=0.00s (0.00%) |Generate time=2.83s (68.82%) |Training time=0.94s (22.97%) |Others=0.34 (8.20%)|CurSamplesPerSec=1.95 |AvgSamplesPerSec=2.41 epoch: 0|step: 1872|ppo_ep: 1|act_loss: 0.21240234375|cri_loss: 0.1241455078125|unsuper_loss: 0.0 average reward score: 3.29296875 ------------------------------------------------------------------------------------- |E2E latency=4.30s |Gather latency=0.00s (0.00%) |Generate time=3.28s (76.36%) |Training time=0.78s (18.15%) |Others=0.24 (5.49%)|CurSamplesPerSec=1.86 |AvgSamplesPerSec=2.41 epoch: 0|step: 1873|ppo_ep: 1|act_loss: 0.2685546875|cri_loss: 0.15771484375|unsuper_loss: 0.0 average reward score: 3.3359375 ------------------------------------------------------------------------------------- |E2E latency=3.86s |Gather latency=0.00s (0.00%) |Generate time=2.98s (77.22%) |Training time=0.66s (16.99%) |Others=0.22 (5.79%)|CurSamplesPerSec=2.07 |AvgSamplesPerSec=2.41 epoch: 0|step: 1874|ppo_ep: 1|act_loss: 0.02154541015625|cri_loss: 0.027679443359375|unsuper_loss: 0.0 average reward score: 2.759765625 ------------------------------------------------------------------------------------- |E2E latency=3.90s |Gather latency=0.00s (0.00%) |Generate time=2.88s (73.69%) |Training time=0.76s (19.46%) |Others=0.27 (6.85%)|CurSamplesPerSec=2.05 |AvgSamplesPerSec=2.41 epoch: 0|step: 1875|ppo_ep: 1|act_loss: 0.09893798828125|cri_loss: 0.0721435546875|unsuper_loss: 0.0 average reward score: 2.1328125 ------------------------------------------------------------------------------------- |E2E latency=3.98s |Gather latency=0.00s (0.00%) |Generate time=3.10s (77.87%) |Training time=0.65s (16.38%) |Others=0.23 (5.76%)|CurSamplesPerSec=2.01 |AvgSamplesPerSec=2.41 epoch: 0|step: 1876|ppo_ep: 1|act_loss: 0.226318359375|cri_loss: 0.1517333984375|unsuper_loss: 0.0 average reward score: 2.71875 ------------------------------------------------------------------------------------- |E2E latency=5.77s |Gather latency=0.00s (0.00%) |Generate time=3.28s (56.82%) |Training time=1.94s (33.64%) |Others=0.55 (9.54%)|CurSamplesPerSec=1.39 |AvgSamplesPerSec=2.41 epoch: 0|step: 1877|ppo_ep: 1|act_loss: 0.390625|cri_loss: 0.2252197265625|unsuper_loss: 0.0 average reward score: 3.626953125 ------------------------------------------------------------------------------------- |E2E latency=3.57s |Gather latency=0.00s (0.00%) |Generate time=2.71s (75.96%) |Training time=0.64s (18.05%) |Others=0.21 (5.98%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.41 epoch: 0|step: 1878|ppo_ep: 1|act_loss: 0.24462890625|cri_loss: 0.147705078125|unsuper_loss: 0.0 average reward score: 2.865234375 ------------------------------------------------------------------------------------- |E2E latency=3.90s |Gather latency=0.00s (0.00%) |Generate time=3.01s (77.14%) |Training time=0.65s (16.61%) |Others=0.24 (6.25%)|CurSamplesPerSec=2.05 |AvgSamplesPerSec=2.41 epoch: 0|step: 1879|ppo_ep: 1|act_loss: 0.309326171875|cri_loss: 0.186767578125|unsuper_loss: 0.0 average reward score: 3.3359375 ------------------------------------------------------------------------------------- |E2E latency=3.93s |Gather latency=0.00s (0.00%) |Generate time=2.68s (68.12%) |Training time=0.98s (24.97%) |Others=0.27 (6.92%)|CurSamplesPerSec=2.04 |AvgSamplesPerSec=2.41 epoch: 0|step: 1880|ppo_ep: 1|act_loss: 0.158203125|cri_loss: 0.10931396484375|unsuper_loss: 0.0 average reward score: 2.65234375 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.63s (72.43%) |Training time=0.77s (21.17%) |Others=0.23 (6.40%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.41 epoch: 0|step: 1881|ppo_ep: 1|act_loss: 0.02691650390625|cri_loss: 0.035919189453125|unsuper_loss: 0.0 average reward score: 2.609375 ------------------------------------------------------------------------------------- |E2E latency=4.00s |Gather latency=0.00s (0.00%) |Generate time=3.13s (78.29%) |Training time=0.64s (16.09%) |Others=0.22 (5.62%)|CurSamplesPerSec=2.00 |AvgSamplesPerSec=2.41 epoch: 0|step: 1882|ppo_ep: 1|act_loss: 0.09332275390625|cri_loss: 0.0687255859375|unsuper_loss: 0.0 average reward score: 3.09375 ------------------------------------------------------------------------------------- |E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.63s (75.73%) |Training time=0.65s (18.58%) |Others=0.20 (5.70%)|CurSamplesPerSec=2.30 |AvgSamplesPerSec=2.41 epoch: 0|step: 1883|ppo_ep: 1|act_loss: 0.144775390625|cri_loss: 0.0914306640625|unsuper_loss: 0.0 average reward score: 3.08203125 ------------------------------------------------------------------------------------- |E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.61s (74.60%) |Training time=0.65s (18.60%) |Others=0.24 (6.80%)|CurSamplesPerSec=2.29 |AvgSamplesPerSec=2.41 epoch: 0|step: 1884|ppo_ep: 1|act_loss: 0.257080078125|cri_loss: 0.1541748046875|unsuper_loss: 0.0 average reward score: 3.26953125 ------------------------------------------------------------------------------------- |E2E latency=3.90s |Gather latency=0.00s (0.00%) |Generate time=2.71s (69.42%) |Training time=0.95s (24.41%) |Others=0.24 (6.17%)|CurSamplesPerSec=2.05 |AvgSamplesPerSec=2.41 epoch: 0|step: 1885|ppo_ep: 1|act_loss: -0.133544921875|cri_loss: -0.02923583984375|unsuper_loss: 0.0 average reward score: 2.533203125 ------------------------------------------------------------------------------------- |E2E latency=3.99s |Gather latency=0.00s (0.00%) |Generate time=3.11s (77.97%) |Training time=0.66s (16.53%) |Others=0.22 (5.50%)|CurSamplesPerSec=2.00 |AvgSamplesPerSec=2.41 epoch: 0|step: 1886|ppo_ep: 1|act_loss: -0.02984619140625|cri_loss: 0.00457763671875|unsuper_loss: 0.0 average reward score: 3.2578125 ------------------------------------------------------------------------------------- |E2E latency=3.97s |Gather latency=0.00s (0.00%) |Generate time=2.84s (71.40%) |Training time=0.91s (22.96%) |Others=0.22 (5.64%)|CurSamplesPerSec=2.01 |AvgSamplesPerSec=2.41 epoch: 0|step: 1887|ppo_ep: 1|act_loss: 0.04345703125|cri_loss: 0.07061767578125|unsuper_loss: 0.0 average reward score: 3.76171875 ------------------------------------------------------------------------------------- |E2E latency=4.11s |Gather latency=0.00s (0.00%) |Generate time=2.85s (69.34%) |Training time=0.94s (22.85%) |Others=0.32 (7.81%)|CurSamplesPerSec=1.95 |AvgSamplesPerSec=2.41 epoch: 0|step: 1888|ppo_ep: 1|act_loss: -0.0909423828125|cri_loss: -0.023223876953125|unsuper_loss: 0.0 average reward score: 2.228515625 ------------------------------------------------------------------------------------- |E2E latency=4.01s |Gather latency=0.00s (0.00%) |Generate time=3.14s (78.35%) |Training time=0.64s (15.92%) |Others=0.23 (5.73%)|CurSamplesPerSec=2.00 |AvgSamplesPerSec=2.41 epoch: 0|step: 1889|ppo_ep: 1|act_loss: -0.3427734375|cri_loss: -0.130615234375|unsuper_loss: 0.0 average reward score: 3.583984375 ------------------------------------------------------------------------------------- |E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.66s (75.23%) |Training time=0.65s (18.39%) |Others=0.23 (6.38%)|CurSamplesPerSec=2.27 |AvgSamplesPerSec=2.41 epoch: 0|step: 1890|ppo_ep: 1|act_loss: 0.13623046875|cri_loss: 0.11492919921875|unsuper_loss: 0.0 average reward score: 3.015625 ------------------------------------------------------------------------------------- |E2E latency=3.76s |Gather latency=0.00s (0.00%) |Generate time=2.79s (74.18%) |Training time=0.75s (19.79%) |Others=0.23 (6.03%)|CurSamplesPerSec=2.13 |AvgSamplesPerSec=2.41 epoch: 0|step: 1891|ppo_ep: 1|act_loss: 0.08612060546875|cri_loss: 0.06915283203125|unsuper_loss: 0.0 average reward score: 2.82421875 ------------------------------------------------------------------------------------- |E2E latency=3.72s |Gather latency=0.00s (0.00%) |Generate time=2.83s (76.01%) |Training time=0.65s (17.45%) |Others=0.24 (6.53%)|CurSamplesPerSec=2.15 |AvgSamplesPerSec=2.41 epoch: 0|step: 1892|ppo_ep: 1|act_loss: -0.099853515625|cri_loss: -0.01348876953125|unsuper_loss: 0.0 average reward score: 1.939453125 ------------------------------------------------------------------------------------- |E2E latency=3.67s |Gather latency=0.00s (0.00%) |Generate time=2.77s (75.41%) |Training time=0.67s (18.20%) |Others=0.23 (6.39%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.41 epoch: 0|step: 1893|ppo_ep: 1|act_loss: -0.09649658203125|cri_loss: -0.01678466796875|unsuper_loss: 0.0 average reward score: 3.5390625 ------------------------------------------------------------------------------------- |E2E latency=4.00s |Gather latency=0.00s (0.00%) |Generate time=3.14s (78.51%) |Training time=0.64s (15.90%) |Others=0.22 (5.59%)|CurSamplesPerSec=2.00 |AvgSamplesPerSec=2.41 epoch: 0|step: 1894|ppo_ep: 1|act_loss: -0.1787109375|cri_loss: -0.063232421875|unsuper_loss: 0.0 average reward score: 3.98828125 ------------------------------------------------------------------------------------- |E2E latency=3.77s |Gather latency=0.00s (0.00%) |Generate time=2.90s (76.87%) |Training time=0.65s (17.32%) |Others=0.22 (5.81%)|CurSamplesPerSec=2.12 |AvgSamplesPerSec=2.41 epoch: 0|step: 1895|ppo_ep: 1|act_loss: -0.2265625|cri_loss: -0.0843505859375|unsuper_loss: 0.0 average reward score: 4.046875 ------------------------------------------------------------------------------------- |E2E latency=4.35s |Gather latency=0.00s (0.00%) |Generate time=2.74s (62.96%) |Training time=1.30s (29.93%) |Others=0.31 (7.10%)|CurSamplesPerSec=1.84 |AvgSamplesPerSec=2.41 epoch: 0|step: 1896|ppo_ep: 1|act_loss: -0.023284912109375|cri_loss: 0.0048675537109375|unsuper_loss: 0.0 average reward score: 2.412109375 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.76s (75.89%) |Training time=0.64s (17.68%) |Others=0.23 (6.42%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.41 epoch: 0|step: 1897|ppo_ep: 1|act_loss: 0.0251312255859375|cri_loss: 0.04522705078125|unsuper_loss: 0.0 average reward score: 3.267578125 ------------------------------------------------------------------------------------- |E2E latency=4.07s |Gather latency=0.00s (0.00%) |Generate time=3.12s (76.79%) |Training time=0.72s (17.73%) |Others=0.22 (5.49%)|CurSamplesPerSec=1.97 |AvgSamplesPerSec=2.41 epoch: 0|step: 1898|ppo_ep: 1|act_loss: -0.16845703125|cri_loss: -0.065673828125|unsuper_loss: 0.0 average reward score: 2.435546875 ------------------------------------------------------------------------------------- |E2E latency=3.83s |Gather latency=0.00s (0.00%) |Generate time=2.92s (76.40%) |Training time=0.68s (17.68%) |Others=0.23 (5.92%)|CurSamplesPerSec=2.09 |AvgSamplesPerSec=2.40 epoch: 0|step: 1899|ppo_ep: 1|act_loss: -0.07769775390625|cri_loss: 0.013427734375|unsuper_loss: 0.0 average reward score: 3.95703125 ------------------------------------------------------------------------------------- |E2E latency=3.89s |Gather latency=0.00s (0.00%) |Generate time=3.02s (77.62%) |Training time=0.64s (16.48%) |Others=0.23 (5.89%)|CurSamplesPerSec=2.06 |AvgSamplesPerSec=2.40 epoch: 0|step: 1900|ppo_ep: 1|act_loss: -0.12176513671875|cri_loss: -0.02813720703125|unsuper_loss: 0.0 average reward score: 3.515625 ------------------------------------------------------------------------------------- |E2E latency=3.96s |Gather latency=0.00s (0.00%) |Generate time=3.07s (77.59%) |Training time=0.64s (16.28%) |Others=0.24 (6.13%)|CurSamplesPerSec=2.02 |AvgSamplesPerSec=2.40 epoch: 0|step: 1901|ppo_ep: 1|act_loss: -0.2802734375|cri_loss: -0.1072998046875|unsuper_loss: 0.0 average reward score: 3.322265625 ------------------------------------------------------------------------------------- |E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.61s (73.48%) |Training time=0.70s (19.83%) |Others=0.24 (6.69%)|CurSamplesPerSec=2.25 |AvgSamplesPerSec=2.40 epoch: 0|step: 1902|ppo_ep: 1|act_loss: 0.06268310546875|cri_loss: 0.0618896484375|unsuper_loss: 0.0 average reward score: 2.88671875 ------------------------------------------------------------------------------------- |E2E latency=4.02s |Gather latency=0.00s (0.00%) |Generate time=2.69s (67.00%) |Training time=1.10s (27.33%) |Others=0.23 (5.67%)|CurSamplesPerSec=1.99 |AvgSamplesPerSec=2.40 epoch: 0|step: 1903|ppo_ep: 1|act_loss: 0.34130859375|cri_loss: 0.22314453125|unsuper_loss: 0.0 average reward score: 2.92578125 ------------------------------------------------------------------------------------- |E2E latency=4.11s |Gather latency=0.00s (0.00%) |Generate time=2.86s (69.62%) |Training time=0.93s (22.61%) |Others=0.32 (7.78%)|CurSamplesPerSec=1.95 |AvgSamplesPerSec=2.40 epoch: 0|step: 1904|ppo_ep: 1|act_loss: -0.134033203125|cri_loss: -0.037933349609375|unsuper_loss: 0.0 average reward score: 2.41015625 ------------------------------------------------------------------------------------- |E2E latency=3.82s |Gather latency=0.00s (0.00%) |Generate time=2.95s (77.02%) |Training time=0.66s (17.29%) |Others=0.22 (5.69%)|CurSamplesPerSec=2.09 |AvgSamplesPerSec=2.40 epoch: 0|step: 1905|ppo_ep: 1|act_loss: 0.03570556640625|cri_loss: 0.04937744140625|unsuper_loss: 0.0 average reward score: 3.74609375 ------------------------------------------------------------------------------------- |E2E latency=3.80s |Gather latency=0.00s (0.00%) |Generate time=2.91s (76.62%) |Training time=0.66s (17.35%) |Others=0.23 (6.03%)|CurSamplesPerSec=2.10 |AvgSamplesPerSec=2.40 epoch: 0|step: 1906|ppo_ep: 1|act_loss: -0.172119140625|cri_loss: -0.055633544921875|unsuper_loss: 0.0 average reward score: 4.265625 ------------------------------------------------------------------------------------- |E2E latency=3.81s |Gather latency=0.00s (0.00%) |Generate time=2.93s (76.82%) |Training time=0.66s (17.21%) |Others=0.23 (5.96%)|CurSamplesPerSec=2.10 |AvgSamplesPerSec=2.40 epoch: 0|step: 1907|ppo_ep: 1|act_loss: -0.072265625|cri_loss: -0.0201416015625|unsuper_loss: 0.0 average reward score: 3.19140625 ------------------------------------------------------------------------------------- |E2E latency=3.65s |Gather latency=0.00s (0.00%) |Generate time=2.69s (73.87%) |Training time=0.75s (20.63%) |Others=0.20 (5.51%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.40 epoch: 0|step: 1908|ppo_ep: 1|act_loss: 0.249267578125|cri_loss: 0.160888671875|unsuper_loss: 0.0 average reward score: 3.2734375 ------------------------------------------------------------------------------------- |E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.59s (74.88%) |Training time=0.65s (18.74%) |Others=0.22 (6.37%)|CurSamplesPerSec=2.31 |AvgSamplesPerSec=2.40 epoch: 0|step: 1909|ppo_ep: 1|act_loss: -0.25830078125|cri_loss: -0.09051513671875|unsuper_loss: 0.0 average reward score: 3.22265625 ------------------------------------------------------------------------------------- |E2E latency=3.79s |Gather latency=0.00s (0.00%) |Generate time=2.90s (76.65%) |Training time=0.65s (17.26%) |Others=0.23 (6.09%)|CurSamplesPerSec=2.11 |AvgSamplesPerSec=2.40 epoch: 0|step: 1910|ppo_ep: 1|act_loss: -0.24951171875|cri_loss: -0.104736328125|unsuper_loss: 0.0 average reward score: 3.90234375 ------------------------------------------------------------------------------------- |E2E latency=3.75s |Gather latency=0.00s (0.00%) |Generate time=2.88s (76.71%) |Training time=0.66s (17.54%) |Others=0.22 (5.74%)|CurSamplesPerSec=2.13 |AvgSamplesPerSec=2.40 epoch: 0|step: 1911|ppo_ep: 1|act_loss: -0.134765625|cri_loss: -0.040374755859375|unsuper_loss: 0.0 average reward score: 2.583984375 ------------------------------------------------------------------------------------- |E2E latency=4.26s |Gather latency=0.00s (0.00%) |Generate time=3.01s (70.70%) |Training time=0.94s (22.06%) |Others=0.31 (7.24%)|CurSamplesPerSec=1.88 |AvgSamplesPerSec=2.40 epoch: 0|step: 1912|ppo_ep: 1|act_loss: 0.5908203125|cri_loss: 0.376953125|unsuper_loss: 0.0 average reward score: 2.93359375 ------------------------------------------------------------------------------------- |E2E latency=3.85s |Gather latency=0.00s (0.00%) |Generate time=2.98s (77.56%) |Training time=0.64s (16.56%) |Others=0.23 (5.88%)|CurSamplesPerSec=2.08 |AvgSamplesPerSec=2.40 epoch: 0|step: 1913|ppo_ep: 1|act_loss: -0.08978271484375|cri_loss: -0.01104736328125|unsuper_loss: 0.0 average reward score: 3.36328125 ------------------------------------------------------------------------------------- |E2E latency=3.67s |Gather latency=0.00s (0.00%) |Generate time=2.67s (72.86%) |Training time=0.77s (21.03%) |Others=0.22 (6.11%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.40 epoch: 0|step: 1914|ppo_ep: 1|act_loss: -0.0955810546875|cri_loss: -0.01678466796875|unsuper_loss: 0.0 average reward score: 2.6640625 ------------------------------------------------------------------------------------- |E2E latency=4.05s |Gather latency=0.00s (0.00%) |Generate time=2.95s (72.95%) |Training time=0.87s (21.48%) |Others=0.23 (5.57%)|CurSamplesPerSec=1.98 |AvgSamplesPerSec=2.40 epoch: 0|step: 1915|ppo_ep: 1|act_loss: -0.033538818359375|cri_loss: 0.000701904296875|unsuper_loss: 0.0 average reward score: 2.49609375 ------------------------------------------------------------------------------------- |E2E latency=3.90s |Gather latency=0.00s (0.00%) |Generate time=3.00s (76.83%) |Training time=0.68s (17.41%) |Others=0.22 (5.76%)|CurSamplesPerSec=2.05 |AvgSamplesPerSec=2.40 epoch: 0|step: 1916|ppo_ep: 1|act_loss: -0.2115478515625|cri_loss: -0.067626953125|unsuper_loss: 0.0 average reward score: 2.966796875 ------------------------------------------------------------------------------------- |E2E latency=3.94s |Gather latency=0.00s (0.00%) |Generate time=3.07s (77.83%) |Training time=0.65s (16.42%) |Others=0.23 (5.76%)|CurSamplesPerSec=2.03 |AvgSamplesPerSec=2.40 epoch: 0|step: 1917|ppo_ep: 1|act_loss: -0.1981201171875|cri_loss: -0.0821533203125|unsuper_loss: 0.0 average reward score: 4.4609375 ------------------------------------------------------------------------------------- |E2E latency=3.84s |Gather latency=0.00s (0.00%) |Generate time=2.97s (77.27%) |Training time=0.65s (16.83%) |Others=0.23 (5.90%)|CurSamplesPerSec=2.08 |AvgSamplesPerSec=2.40 epoch: 0|step: 1918|ppo_ep: 1|act_loss: -0.044891357421875|cri_loss: 0.002044677734375|unsuper_loss: 0.0 average reward score: 3.5859375 ------------------------------------------------------------------------------------- |E2E latency=4.10s |Gather latency=0.00s (0.00%) |Generate time=3.21s (78.47%) |Training time=0.66s (16.20%) |Others=0.22 (5.33%)|CurSamplesPerSec=1.95 |AvgSamplesPerSec=2.40 [2023-04-24 15:34:19,941] [INFO] [logging.py:96:log_dist] [Rank 0] step=240, skipped=5, lr=[6.892507664116141e-06, 6.892507664116141e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 15:34:20,192] [INFO] [timer.py:199:stop] epoch=0/micro_step=1920/global_step=240, RunningAvgSamplesPerSec=15.42836931129126, CurrSamplesPerSec=13.948336449297427, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 15:34:20,431] [INFO] [logging.py:96:log_dist] [Rank 0] step=240, skipped=4, lr=[3.552336906070838e-06, 3.552336906070838e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 1919|ppo_ep: 1|act_loss: 0.185791015625|cri_loss: 0.1304931640625|unsuper_loss: 0.0 average reward score: 2.21484375 ------------------------------------------------------------------------------------- |E2E latency=4.12s |Gather latency=0.00s (0.00%) |Generate time=2.71s (65.91%) |Training time=1.09s (26.41%) |Others=0.32 (7.68%)|CurSamplesPerSec=1.94 |AvgSamplesPerSec=2.40 epoch: 0|step: 1920|ppo_ep: 1|act_loss: -0.03155517578125|cri_loss: 0.003570556640625|unsuper_loss: 0.0 average reward score: 1.685546875 ------------------------------------------------------------------------------------- |E2E latency=3.65s |Gather latency=0.00s (0.00%) |Generate time=2.67s (73.09%) |Training time=0.76s (20.90%) |Others=0.22 (6.01%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.40 epoch: 0|step: 1921|ppo_ep: 1|act_loss: 0.077392578125|cri_loss: 0.0543212890625|unsuper_loss: 0.0 average reward score: 2.560546875 ------------------------------------------------------------------------------------- |E2E latency=4.06s |Gather latency=0.00s (0.00%) |Generate time=3.19s (78.71%) |Training time=0.65s (16.02%) |Others=0.21 (5.26%)|CurSamplesPerSec=1.97 |AvgSamplesPerSec=2.40 epoch: 0|step: 1922|ppo_ep: 1|act_loss: -0.054534912109375|cri_loss: -0.007537841796875|unsuper_loss: 0.0 average reward score: 3.2109375 ------------------------------------------------------------------------------------- |E2E latency=3.84s |Gather latency=0.00s (0.00%) |Generate time=2.92s (76.17%) |Training time=0.64s (16.73%) |Others=0.27 (7.09%)|CurSamplesPerSec=2.09 |AvgSamplesPerSec=2.40 epoch: 0|step: 1923|ppo_ep: 1|act_loss: -0.08587646484375|cri_loss: -0.014404296875|unsuper_loss: 0.0 average reward score: 3.08203125 ------------------------------------------------------------------------------------- |E2E latency=3.87s |Gather latency=0.00s (0.00%) |Generate time=2.75s (71.19%) |Training time=0.86s (22.28%) |Others=0.25 (6.54%)|CurSamplesPerSec=2.07 |AvgSamplesPerSec=2.40 epoch: 0|step: 1924|ppo_ep: 1|act_loss: 0.2469482421875|cri_loss: 0.145263671875|unsuper_loss: 0.0 average reward score: 2.525390625 ------------------------------------------------------------------------------------- |E2E latency=3.81s |Gather latency=0.00s (0.00%) |Generate time=2.92s (76.62%) |Training time=0.64s (16.90%) |Others=0.25 (6.48%)|CurSamplesPerSec=2.10 |AvgSamplesPerSec=2.40 epoch: 0|step: 1925|ppo_ep: 1|act_loss: 0.11309814453125|cri_loss: 0.0946044921875|unsuper_loss: 0.0 average reward score: 3.630859375 ------------------------------------------------------------------------------------- |E2E latency=3.96s |Gather latency=0.00s (0.00%) |Generate time=2.71s (68.37%) |Training time=1.00s (25.33%) |Others=0.25 (6.30%)|CurSamplesPerSec=2.02 |AvgSamplesPerSec=2.40 epoch: 0|step: 1926|ppo_ep: 1|act_loss: 0.03131103515625|cri_loss: 0.0511474609375|unsuper_loss: 0.0 average reward score: 4.140625 ------------------------------------------------------------------------------------- |E2E latency=4.03s |Gather latency=0.00s (0.00%) |Generate time=3.16s (78.35%) |Training time=0.65s (16.14%) |Others=0.22 (5.51%)|CurSamplesPerSec=1.98 |AvgSamplesPerSec=2.40 epoch: 0|step: 1927|ppo_ep: 1|act_loss: 0.01959228515625|cri_loss: 0.042083740234375|unsuper_loss: 0.0 average reward score: 2.287109375 ------------------------------------------------------------------------------------- |E2E latency=4.44s |Gather latency=0.00s (0.00%) |Generate time=3.17s (71.31%) |Training time=0.96s (21.56%) |Others=0.32 (7.12%)|CurSamplesPerSec=1.80 |AvgSamplesPerSec=2.40 epoch: 0|step: 1928|ppo_ep: 1|act_loss: -0.121826171875|cri_loss: -0.01873779296875|unsuper_loss: 0.0 average reward score: 2.96484375 ------------------------------------------------------------------------------------- |E2E latency=4.22s |Gather latency=0.00s (0.00%) |Generate time=3.34s (79.27%) |Training time=0.65s (15.48%) |Others=0.22 (5.25%)|CurSamplesPerSec=1.90 |AvgSamplesPerSec=2.40 epoch: 0|step: 1929|ppo_ep: 1|act_loss: -0.0867919921875|cri_loss: -0.019805908203125|unsuper_loss: 0.0 average reward score: 3.921875 ------------------------------------------------------------------------------------- |E2E latency=3.84s |Gather latency=0.00s (0.00%) |Generate time=2.97s (77.27%) |Training time=0.65s (17.01%) |Others=0.22 (5.72%)|CurSamplesPerSec=2.08 |AvgSamplesPerSec=2.40 epoch: 0|step: 1930|ppo_ep: 1|act_loss: 0.009918212890625|cri_loss: 0.0140380859375|unsuper_loss: 0.0 average reward score: 4.15625 ------------------------------------------------------------------------------------- |E2E latency=3.72s |Gather latency=0.00s (0.00%) |Generate time=2.79s (74.96%) |Training time=0.70s (18.82%) |Others=0.23 (6.21%)|CurSamplesPerSec=2.15 |AvgSamplesPerSec=2.40 epoch: 0|step: 1931|ppo_ep: 1|act_loss: 0.3505859375|cri_loss: 0.210693359375|unsuper_loss: 0.0 average reward score: 3.45703125 ------------------------------------------------------------------------------------- |E2E latency=3.71s |Gather latency=0.00s (0.00%) |Generate time=2.78s (74.88%) |Training time=0.72s (19.35%) |Others=0.21 (5.77%)|CurSamplesPerSec=2.16 |AvgSamplesPerSec=2.40 epoch: 0|step: 1932|ppo_ep: 1|act_loss: 0.014923095703125|cri_loss: 0.0299224853515625|unsuper_loss: 0.0 average reward score: 2.4375 ------------------------------------------------------------------------------------- |E2E latency=3.98s |Gather latency=0.00s (0.00%) |Generate time=2.69s (67.48%) |Training time=1.07s (26.92%) |Others=0.22 (5.60%)|CurSamplesPerSec=2.01 |AvgSamplesPerSec=2.40 epoch: 0|step: 1933|ppo_ep: 1|act_loss: -0.05255126953125|cri_loss: -0.0159149169921875|unsuper_loss: 0.0 average reward score: 2.66015625 ------------------------------------------------------------------------------------- |E2E latency=3.76s |Gather latency=0.00s (0.00%) |Generate time=2.91s (77.31%) |Training time=0.64s (17.11%) |Others=0.21 (5.58%)|CurSamplesPerSec=2.13 |AvgSamplesPerSec=2.40 epoch: 0|step: 1934|ppo_ep: 1|act_loss: 0.274169921875|cri_loss: 0.171630859375|unsuper_loss: 0.0 average reward score: 0.6494140625 ------------------------------------------------------------------------------------- |E2E latency=3.93s |Gather latency=0.00s (0.00%) |Generate time=3.02s (76.90%) |Training time=0.66s (16.79%) |Others=0.25 (6.31%)|CurSamplesPerSec=2.03 |AvgSamplesPerSec=2.40 epoch: 0|step: 1935|ppo_ep: 1|act_loss: 0.054443359375|cri_loss: 0.041900634765625|unsuper_loss: 0.0 average reward score: 2.81640625 ------------------------------------------------------------------------------------- |E2E latency=4.22s |Gather latency=0.00s (0.00%) |Generate time=2.97s (70.29%) |Training time=0.94s (22.35%) |Others=0.31 (7.36%)|CurSamplesPerSec=1.89 |AvgSamplesPerSec=2.40 epoch: 0|step: 1936|ppo_ep: 1|act_loss: -0.08331298828125|cri_loss: -0.03607177734375|unsuper_loss: 0.0 average reward score: 2.515625 ------------------------------------------------------------------------------------- |E2E latency=4.06s |Gather latency=0.00s (0.00%) |Generate time=3.18s (78.31%) |Training time=0.65s (16.13%) |Others=0.23 (5.55%)|CurSamplesPerSec=1.97 |AvgSamplesPerSec=2.40 epoch: 0|step: 1937|ppo_ep: 1|act_loss: 0.00506591796875|cri_loss: 0.0249176025390625|unsuper_loss: 0.0 average reward score: 3.96484375 ------------------------------------------------------------------------------------- |E2E latency=3.81s |Gather latency=0.00s (0.00%) |Generate time=2.84s (74.61%) |Training time=0.74s (19.54%) |Others=0.22 (5.85%)|CurSamplesPerSec=2.10 |AvgSamplesPerSec=2.40 epoch: 0|step: 1938|ppo_ep: 1|act_loss: -0.294921875|cri_loss: -0.119873046875|unsuper_loss: 0.0 average reward score: 4.484375 ------------------------------------------------------------------------------------- |E2E latency=3.87s |Gather latency=0.00s (0.00%) |Generate time=2.97s (76.80%) |Training time=0.67s (17.35%) |Others=0.23 (5.85%)|CurSamplesPerSec=2.07 |AvgSamplesPerSec=2.40 epoch: 0|step: 1939|ppo_ep: 1|act_loss: 0.037109375|cri_loss: 0.0310821533203125|unsuper_loss: 0.0 average reward score: 2.60546875 ------------------------------------------------------------------------------------- |E2E latency=3.87s |Gather latency=0.00s (0.00%) |Generate time=3.00s (77.53%) |Training time=0.66s (17.12%) |Others=0.21 (5.35%)|CurSamplesPerSec=2.06 |AvgSamplesPerSec=2.40 epoch: 0|step: 1940|ppo_ep: 1|act_loss: 0.02142333984375|cri_loss: 0.030242919921875|unsuper_loss: 0.0 average reward score: 4.484375 ------------------------------------------------------------------------------------- |E2E latency=4.16s |Gather latency=0.00s (0.00%) |Generate time=3.26s (78.41%) |Training time=0.66s (15.76%) |Others=0.24 (5.83%)|CurSamplesPerSec=1.92 |AvgSamplesPerSec=2.40 epoch: 0|step: 1941|ppo_ep: 1|act_loss: -0.019500732421875|cri_loss: 0.000762939453125|unsuper_loss: 0.0 average reward score: 3.44921875 ------------------------------------------------------------------------------------- |E2E latency=3.78s |Gather latency=0.00s (0.00%) |Generate time=2.92s (77.14%) |Training time=0.64s (17.03%) |Others=0.22 (5.82%)|CurSamplesPerSec=2.12 |AvgSamplesPerSec=2.40 epoch: 0|step: 1942|ppo_ep: 1|act_loss: 0.06719970703125|cri_loss: 0.045623779296875|unsuper_loss: 0.0 average reward score: 2.2890625 ------------------------------------------------------------------------------------- |E2E latency=3.96s |Gather latency=0.00s (0.00%) |Generate time=3.05s (77.11%) |Training time=0.67s (16.86%) |Others=0.24 (6.03%)|CurSamplesPerSec=2.02 |AvgSamplesPerSec=2.40 epoch: 0|step: 1943|ppo_ep: 1|act_loss: -0.03387451171875|cri_loss: -0.01085662841796875|unsuper_loss: 0.0 average reward score: 3.388671875 ------------------------------------------------------------------------------------- |E2E latency=4.46s |Gather latency=0.00s (0.00%) |Generate time=2.89s (64.97%) |Training time=1.25s (28.01%) |Others=0.31 (7.01%)|CurSamplesPerSec=1.80 |AvgSamplesPerSec=2.40 epoch: 0|step: 1944|ppo_ep: 1|act_loss: 0.0092315673828125|cri_loss: 0.0141143798828125|unsuper_loss: 0.0 average reward score: 3.27734375 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.78s (76.52%) |Training time=0.64s (17.57%) |Others=0.21 (5.91%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.39 epoch: 0|step: 1945|ppo_ep: 1|act_loss: 0.0965576171875|cri_loss: 0.05816650390625|unsuper_loss: 0.0 average reward score: 1.787109375 ------------------------------------------------------------------------------------- |E2E latency=3.66s |Gather latency=0.00s (0.00%) |Generate time=2.77s (75.86%) |Training time=0.65s (17.68%) |Others=0.24 (6.46%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.39 epoch: 0|step: 1946|ppo_ep: 1|act_loss: -0.075927734375|cri_loss: -0.0295257568359375|unsuper_loss: 0.0 average reward score: 3.982421875 ------------------------------------------------------------------------------------- |E2E latency=3.70s |Gather latency=0.00s (0.00%) |Generate time=2.64s (71.37%) |Training time=0.82s (22.24%) |Others=0.24 (6.39%)|CurSamplesPerSec=2.16 |AvgSamplesPerSec=2.39 epoch: 0|step: 1947|ppo_ep: 1|act_loss: 0.052154541015625|cri_loss: 0.0413818359375|unsuper_loss: 0.0 average reward score: 3.642578125 ------------------------------------------------------------------------------------- |E2E latency=3.78s |Gather latency=0.00s (0.00%) |Generate time=2.87s (75.92%) |Training time=0.70s (18.45%) |Others=0.21 (5.63%)|CurSamplesPerSec=2.11 |AvgSamplesPerSec=2.39 epoch: 0|step: 1948|ppo_ep: 1|act_loss: 0.09259033203125|cri_loss: 0.08917236328125|unsuper_loss: 0.0 average reward score: 1.072265625 ------------------------------------------------------------------------------------- |E2E latency=3.57s |Gather latency=0.00s (0.00%) |Generate time=2.68s (75.19%) |Training time=0.66s (18.61%) |Others=0.22 (6.21%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.39 epoch: 0|step: 1949|ppo_ep: 1|act_loss: -0.0222015380859375|cri_loss: -0.0030364990234375|unsuper_loss: 0.0 average reward score: 3.890625 ------------------------------------------------------------------------------------- |E2E latency=3.79s |Gather latency=0.00s (0.00%) |Generate time=2.74s (72.32%) |Training time=0.82s (21.65%) |Others=0.23 (6.03%)|CurSamplesPerSec=2.11 |AvgSamplesPerSec=2.39 epoch: 0|step: 1950|ppo_ep: 1|act_loss: 0.0161895751953125|cri_loss: 0.0158538818359375|unsuper_loss: 0.0 average reward score: 4.1640625 ------------------------------------------------------------------------------------- |E2E latency=3.79s |Gather latency=0.00s (0.00%) |Generate time=2.92s (77.02%) |Training time=0.66s (17.30%) |Others=0.22 (5.68%)|CurSamplesPerSec=2.11 |AvgSamplesPerSec=2.39 epoch: 0|step: 1951|ppo_ep: 1|act_loss: 0.08416748046875|cri_loss: 0.05023193359375|unsuper_loss: 0.0 average reward score: 2.96484375 ------------------------------------------------------------------------------------- |E2E latency=4.04s |Gather latency=0.00s (0.00%) |Generate time=2.80s (69.31%) |Training time=0.94s (23.15%) |Others=0.30 (7.54%)|CurSamplesPerSec=1.98 |AvgSamplesPerSec=2.39 epoch: 0|step: 1952|ppo_ep: 1|act_loss: -0.04290771484375|cri_loss: -0.014129638671875|unsuper_loss: 0.0 average reward score: 3.66015625 ------------------------------------------------------------------------------------- |E2E latency=3.86s |Gather latency=0.00s (0.00%) |Generate time=3.00s (77.84%) |Training time=0.64s (16.61%) |Others=0.21 (5.55%)|CurSamplesPerSec=2.07 |AvgSamplesPerSec=2.39 epoch: 0|step: 1953|ppo_ep: 1|act_loss: -0.03759765625|cri_loss: -0.009124755859375|unsuper_loss: 0.0 average reward score: 3.392578125 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.78s (76.51%) |Training time=0.65s (17.74%) |Others=0.21 (5.75%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.39 epoch: 0|step: 1954|ppo_ep: 1|act_loss: 0.1585693359375|cri_loss: 0.0968017578125|unsuper_loss: 0.0 average reward score: 2.083984375 ------------------------------------------------------------------------------------- |E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.65s (75.43%) |Training time=0.65s (18.38%) |Others=0.22 (6.19%)|CurSamplesPerSec=2.28 |AvgSamplesPerSec=2.39 epoch: 0|step: 1955|ppo_ep: 1|act_loss: 0.20556640625|cri_loss: 0.11322021484375|unsuper_loss: 0.0 average reward score: 2.5546875 ------------------------------------------------------------------------------------- |E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.56s (74.04%) |Training time=0.68s (19.62%) |Others=0.22 (6.33%)|CurSamplesPerSec=2.31 |AvgSamplesPerSec=2.39 epoch: 0|step: 1956|ppo_ep: 1|act_loss: 0.1151123046875|cri_loss: 0.0694580078125|unsuper_loss: 0.0 average reward score: 2.59375 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.50s (75.03%) |Training time=0.64s (19.20%) |Others=0.19 (5.76%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.39 epoch: 0|step: 1957|ppo_ep: 1|act_loss: -0.0284271240234375|cri_loss: -0.0081024169921875|unsuper_loss: 0.0 average reward score: 2.84765625 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.52s (75.24%) |Training time=0.64s (19.13%) |Others=0.19 (5.63%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.39 epoch: 0|step: 1958|ppo_ep: 1|act_loss: 0.0648193359375|cri_loss: 0.043182373046875|unsuper_loss: 0.0 average reward score: 3.15234375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.09%) |Training time=0.64s (19.93%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.39 epoch: 0|step: 1959|ppo_ep: 1|act_loss: 0.154296875|cri_loss: 0.09625244140625|unsuper_loss: 0.0 average reward score: 3.486328125 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.42s (66.86%) |Training time=0.93s (25.59%) |Others=0.27 (7.56%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.39 epoch: 0|step: 1960|ppo_ep: 1|act_loss: 0.227783203125|cri_loss: 0.1455078125|unsuper_loss: 0.0 average reward score: 2.20703125 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.17%) |Training time=0.64s (19.88%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.39 epoch: 0|step: 1961|ppo_ep: 1|act_loss: 0.02606201171875|cri_loss: 0.033294677734375|unsuper_loss: 0.0 average reward score: 2.94921875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.22%) |Training time=0.64s (19.89%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.39 epoch: 0|step: 1962|ppo_ep: 1|act_loss: 0.2392578125|cri_loss: 0.1373291015625|unsuper_loss: 0.0 average reward score: 2.71875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.58%) |Training time=0.64s (19.49%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.39 epoch: 0|step: 1963|ppo_ep: 1|act_loss: 0.0113677978515625|cri_loss: 0.0155181884765625|unsuper_loss: 0.0 average reward score: 3.53515625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.55%) |Training time=0.64s (19.60%) |Others=0.19 (5.85%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.39 epoch: 0|step: 1964|ppo_ep: 1|act_loss: -0.021484375|cri_loss: 0.006622314453125|unsuper_loss: 0.0 average reward score: 2.67578125 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.55%) |Training time=0.65s (19.61%) |Others=0.19 (5.84%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.39 epoch: 0|step: 1965|ppo_ep: 1|act_loss: 0.13720703125|cri_loss: 0.07958984375|unsuper_loss: 0.0 average reward score: 4.69140625 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.40s (71.72%) |Training time=0.75s (22.56%) |Others=0.19 (5.72%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.39 epoch: 0|step: 1966|ppo_ep: 1|act_loss: 0.10400390625|cri_loss: 0.07073974609375|unsuper_loss: 0.0 average reward score: 3.7109375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.27%) |Training time=0.64s (19.68%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.39 epoch: 0|step: 1967|ppo_ep: 1|act_loss: -0.023101806640625|cri_loss: 0.004730224609375|unsuper_loss: 0.0 average reward score: 3.71484375 ------------------------------------------------------------------------------------- |E2E latency=3.68s |Gather latency=0.00s (0.00%) |Generate time=2.48s (67.20%) |Training time=0.93s (25.22%) |Others=0.28 (7.58%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.39 epoch: 0|step: 1968|ppo_ep: 1|act_loss: -0.00012969970703125|cri_loss: 0.005153656005859375|unsuper_loss: 0.0 average reward score: 3.669921875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.93%) |Training time=0.65s (20.00%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.39 epoch: 0|step: 1969|ppo_ep: 1|act_loss: 0.0494384765625|cri_loss: 0.037139892578125|unsuper_loss: 0.0 average reward score: 3.888671875 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.81%) |Training time=0.64s (19.33%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.39 epoch: 0|step: 1970|ppo_ep: 1|act_loss: 0.134521484375|cri_loss: 0.07745361328125|unsuper_loss: 0.0 average reward score: 4.40625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.29%) |Training time=0.64s (19.59%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.39 epoch: 0|step: 1971|ppo_ep: 1|act_loss: 0.03839111328125|cri_loss: 0.0286102294921875|unsuper_loss: 0.0 average reward score: 4.140625 ------------------------------------------------------------------------------------- |E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.58s (74.86%) |Training time=0.64s (18.51%) |Others=0.23 (6.63%)|CurSamplesPerSec=2.32 |AvgSamplesPerSec=2.39 epoch: 0|step: 1972|ppo_ep: 1|act_loss: 0.1605224609375|cri_loss: 0.0926513671875|unsuper_loss: 0.0 average reward score: 2.390625 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.43s (73.23%) |Training time=0.64s (19.35%) |Others=0.25 (7.42%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.39 epoch: 0|step: 1973|ppo_ep: 1|act_loss: 0.18115234375|cri_loss: 0.1126708984375|unsuper_loss: 0.0 average reward score: 4.1328125 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.84%) |Training time=0.64s (20.07%) |Others=0.19 (6.08%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.39 epoch: 0|step: 1974|ppo_ep: 1|act_loss: -0.0933837890625|cri_loss: -0.037872314453125|unsuper_loss: 0.0 average reward score: 4.484375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.81%) |Training time=0.65s (20.05%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.39 epoch: 0|step: 1975|ppo_ep: 1|act_loss: -0.0362548828125|cri_loss: -0.0110015869140625|unsuper_loss: 0.0 average reward score: 3.654296875 ------------------------------------------------------------------------------------- |E2E latency=3.59s |Gather latency=0.00s (0.00%) |Generate time=2.39s (66.59%) |Training time=0.93s (25.80%) |Others=0.27 (7.61%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.39 epoch: 0|step: 1976|ppo_ep: 1|act_loss: 0.240234375|cri_loss: 0.14697265625|unsuper_loss: 0.0 average reward score: 2.65625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.14%) |Training time=0.64s (19.90%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.39 epoch: 0|step: 1977|ppo_ep: 1|act_loss: 0.0946044921875|cri_loss: 0.054534912109375|unsuper_loss: 0.0 average reward score: 4.9921875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.87%) |Training time=0.65s (20.11%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.39 epoch: 0|step: 1978|ppo_ep: 1|act_loss: 0.058929443359375|cri_loss: 0.05267333984375|unsuper_loss: 0.0 average reward score: 2.921875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.25%) |Training time=0.64s (19.82%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.39 epoch: 0|step: 1979|ppo_ep: 1|act_loss: -0.142333984375|cri_loss: -0.05621337890625|unsuper_loss: 0.0 average reward score: 3.3515625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.28%) |Training time=0.64s (19.85%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.39 epoch: 0|step: 1980|ppo_ep: 1|act_loss: 0.09088134765625|cri_loss: 0.05755615234375|unsuper_loss: 0.0 average reward score: 2.91015625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.06%) |Training time=0.64s (19.93%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.39 epoch: 0|step: 1981|ppo_ep: 1|act_loss: 0.05718994140625|cri_loss: 0.055755615234375|unsuper_loss: 0.0 average reward score: 2.49609375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.33%) |Training time=0.64s (19.76%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.39 epoch: 0|step: 1982|ppo_ep: 1|act_loss: 0.1728515625|cri_loss: 0.1177978515625|unsuper_loss: 0.0 average reward score: 4.37890625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.09%) |Training time=0.64s (19.79%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.39 epoch: 0|step: 1983|ppo_ep: 1|act_loss: 0.11767578125|cri_loss: 0.07476806640625|unsuper_loss: 0.0 average reward score: 2.8046875 ------------------------------------------------------------------------------------- |E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.34s (66.05%) |Training time=0.92s (26.09%) |Others=0.28 (7.86%)|CurSamplesPerSec=2.26 |AvgSamplesPerSec=2.39 epoch: 0|step: 1984|ppo_ep: 1|act_loss: 0.0770263671875|cri_loss: 0.052581787109375|unsuper_loss: 0.0 average reward score: 3.796875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.45%) |Training time=0.64s (19.63%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.39 epoch: 0|step: 1985|ppo_ep: 1|act_loss: 0.042022705078125|cri_loss: 0.03515625|unsuper_loss: 0.0 average reward score: 3.271484375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.30%) |Training time=0.64s (19.78%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.39 epoch: 0|step: 1986|ppo_ep: 1|act_loss: -0.230224609375|cri_loss: -0.06573486328125|unsuper_loss: 0.0 average reward score: 3.12890625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.30%) |Training time=0.64s (19.71%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.39 epoch: 0|step: 1987|ppo_ep: 1|act_loss: -0.0628662109375|cri_loss: -0.023651123046875|unsuper_loss: 0.0 average reward score: 3.9453125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.50%) |Training time=0.64s (19.62%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.39 epoch: 0|step: 1988|ppo_ep: 1|act_loss: -0.05206298828125|cri_loss: -0.002960205078125|unsuper_loss: 0.0 average reward score: 1.78125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.43%) |Training time=0.64s (19.62%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.39 epoch: 0|step: 1989|ppo_ep: 1|act_loss: 0.07281494140625|cri_loss: 0.0572509765625|unsuper_loss: 0.0 average reward score: 2.537109375 ------------------------------------------------------------------------------------- |E2E latency=3.90s |Gather latency=0.00s (0.00%) |Generate time=2.45s (62.97%) |Training time=1.24s (31.71%) |Others=0.21 (5.32%)|CurSamplesPerSec=2.05 |AvgSamplesPerSec=2.39 epoch: 0|step: 1990|ppo_ep: 1|act_loss: -0.140869140625|cri_loss: -0.0576171875|unsuper_loss: 0.0 average reward score: 3.39453125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.37%) |Training time=0.64s (19.61%) |Others=0.20 (6.02%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.39 epoch: 0|step: 1991|ppo_ep: 1|act_loss: -0.068603515625|cri_loss: -0.02935791015625|unsuper_loss: 0.0 average reward score: 3.779296875 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.56%) |Training time=0.93s (25.80%) |Others=0.28 (7.64%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.39 epoch: 0|step: 1992|ppo_ep: 1|act_loss: -0.1092529296875|cri_loss: -0.0467529296875|unsuper_loss: 0.0 average reward score: 2.78125 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (74.19%) |Training time=0.63s (19.84%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.39 epoch: 0|step: 1993|ppo_ep: 1|act_loss: -0.119873046875|cri_loss: -0.0426025390625|unsuper_loss: 0.0 average reward score: 3.58984375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.04%) |Training time=0.65s (20.05%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.39 epoch: 0|step: 1994|ppo_ep: 1|act_loss: 0.043243408203125|cri_loss: 0.037689208984375|unsuper_loss: 0.0 average reward score: 2.087890625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.22%) |Training time=0.64s (19.88%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.39 epoch: 0|step: 1995|ppo_ep: 1|act_loss: 0.2138671875|cri_loss: 0.1396484375|unsuper_loss: 0.0 average reward score: 2.919921875 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.14%) |Training time=0.64s (19.87%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.39 epoch: 0|step: 1996|ppo_ep: 1|act_loss: -0.00016021728515625|cri_loss: 0.0076141357421875|unsuper_loss: 0.0 average reward score: 2.875 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.12%) |Training time=0.64s (19.88%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.39 epoch: 0|step: 1997|ppo_ep: 1|act_loss: -0.06591796875|cri_loss: -0.0203094482421875|unsuper_loss: 0.0 average reward score: 3.80078125 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.39%) |Training time=0.64s (19.57%) |Others=0.23 (7.05%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.39 epoch: 0|step: 1998|ppo_ep: 1|act_loss: -0.07806396484375|cri_loss: -0.032806396484375|unsuper_loss: 0.0 average reward score: 2.9453125 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.52s (75.00%) |Training time=0.64s (18.97%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.39 [2023-04-24 15:39:05,574] [INFO] [logging.py:96:log_dist] [Rank 0] step=250, skipped=5, lr=[6.521461868523521e-06, 6.521461868523521e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 15:39:05,818] [INFO] [timer.py:199:stop] epoch=0/micro_step=2000/global_step=250, RunningAvgSamplesPerSec=15.408167169708241, CurrSamplesPerSec=15.86393988715515, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 15:39:06,021] [INFO] [logging.py:96:log_dist] [Rank 0] step=250, skipped=4, lr=[3.3594107782600754e-06, 3.3594107782600754e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 1999|ppo_ep: 1|act_loss: -0.04443359375|cri_loss: -0.018280029296875|unsuper_loss: 0.0 average reward score: 4.09375 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.43s (66.86%) |Training time=0.93s (25.51%) |Others=0.28 (7.63%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.39 epoch: 0|step: 2000|ppo_ep: 1|act_loss: 0.1861572265625|cri_loss: 0.111328125|unsuper_loss: 0.0 average reward score: 3.3671875 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.84%) |Training time=0.64s (20.10%) |Others=0.19 (6.06%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.39 epoch: 0|step: 2001|ppo_ep: 1|act_loss: -0.056304931640625|cri_loss: -0.00506591796875|unsuper_loss: 0.0 average reward score: 3.884765625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.92%) |Training time=0.65s (20.04%) |Others=0.19 (6.03%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.39 epoch: 0|step: 2002|ppo_ep: 1|act_loss: -0.0247650146484375|cri_loss: -0.002105712890625|unsuper_loss: 0.0 average reward score: 3.0625 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (74.12%) |Training time=0.64s (19.91%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.39 epoch: 0|step: 2003|ppo_ep: 1|act_loss: 0.0947265625|cri_loss: 0.07476806640625|unsuper_loss: 0.0 average reward score: 3.23046875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.36%) |Training time=0.64s (19.76%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.39 epoch: 0|step: 2004|ppo_ep: 1|act_loss: 0.093017578125|cri_loss: 0.0572509765625|unsuper_loss: 0.0 average reward score: 2.99609375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.24%) |Training time=0.64s (19.75%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.39 epoch: 0|step: 2005|ppo_ep: 1|act_loss: -0.0194854736328125|cri_loss: -0.003631591796875|unsuper_loss: 0.0 average reward score: 1.869140625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.45%) |Training time=0.64s (19.68%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.39 epoch: 0|step: 2006|ppo_ep: 1|act_loss: 0.2164306640625|cri_loss: 0.144775390625|unsuper_loss: 0.0 average reward score: 3.921875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.18%) |Training time=0.64s (19.83%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.39 epoch: 0|step: 2007|ppo_ep: 1|act_loss: -0.11907958984375|cri_loss: -0.052978515625|unsuper_loss: 0.0 average reward score: 4.21484375 ------------------------------------------------------------------------------------- |E2E latency=4.47s |Gather latency=0.00s (0.00%) |Generate time=2.41s (54.00%) |Training time=1.30s (29.02%) |Others=0.76 (16.98%)|CurSamplesPerSec=1.79 |AvgSamplesPerSec=2.39 epoch: 0|step: 2008|ppo_ep: 1|act_loss: 0.04693603515625|cri_loss: 0.03802490234375|unsuper_loss: 0.0 average reward score: 3.33203125 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.91%) |Training time=0.66s (20.04%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.39 epoch: 0|step: 2009|ppo_ep: 1|act_loss: 0.031829833984375|cri_loss: 0.030242919921875|unsuper_loss: 0.0 average reward score: 3.76953125 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.32s (73.05%) |Training time=0.65s (20.54%) |Others=0.20 (6.41%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.39 epoch: 0|step: 2010|ppo_ep: 1|act_loss: 0.025238037109375|cri_loss: 0.0307464599609375|unsuper_loss: 0.0 average reward score: 4.31640625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.27%) |Training time=0.64s (19.66%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.39 epoch: 0|step: 2011|ppo_ep: 1|act_loss: -0.04986572265625|cri_loss: -0.0219879150390625|unsuper_loss: 0.0 average reward score: 3.80859375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.36%) |Training time=0.66s (20.30%) |Others=0.21 (6.34%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.39 epoch: 0|step: 2012|ppo_ep: 1|act_loss: -0.069091796875|cri_loss: -0.02783203125|unsuper_loss: 0.0 average reward score: 4.359375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.35%) |Training time=0.67s (20.49%) |Others=0.20 (6.17%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.39 epoch: 0|step: 2013|ppo_ep: 1|act_loss: 0.0440673828125|cri_loss: 0.0280609130859375|unsuper_loss: 0.0 average reward score: 4.3671875 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.30%) |Training time=0.65s (19.64%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.39 epoch: 0|step: 2014|ppo_ep: 1|act_loss: 0.207275390625|cri_loss: 0.1256103515625|unsuper_loss: 0.0 average reward score: 4.5390625 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.45s (73.76%) |Training time=0.66s (19.93%) |Others=0.21 (6.30%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.39 epoch: 0|step: 2015|ppo_ep: 1|act_loss: 0.0816650390625|cri_loss: 0.048309326171875|unsuper_loss: 0.0 average reward score: 4.6328125 ------------------------------------------------------------------------------------- |E2E latency=3.66s |Gather latency=0.00s (0.00%) |Generate time=2.45s (66.75%) |Training time=0.93s (25.41%) |Others=0.29 (7.83%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.39 epoch: 0|step: 2016|ppo_ep: 1|act_loss: 0.224365234375|cri_loss: 0.142333984375|unsuper_loss: 0.0 average reward score: 4.0546875 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.49%) |Training time=0.64s (19.55%) |Others=0.20 (5.96%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.39 epoch: 0|step: 2017|ppo_ep: 1|act_loss: 0.0972900390625|cri_loss: 0.06024169921875|unsuper_loss: 0.0 average reward score: 3.578125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.14%) |Training time=0.64s (19.83%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.39 epoch: 0|step: 2018|ppo_ep: 1|act_loss: 0.058990478515625|cri_loss: 0.03875732421875|unsuper_loss: 0.0 average reward score: 4.734375 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.77%) |Training time=0.65s (19.36%) |Others=0.20 (5.87%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.39 epoch: 0|step: 2019|ppo_ep: 1|act_loss: 0.042755126953125|cri_loss: 0.0343017578125|unsuper_loss: 0.0 average reward score: 3.78515625 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.51s (75.00%) |Training time=0.64s (19.17%) |Others=0.20 (5.83%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.39 epoch: 0|step: 2020|ppo_ep: 1|act_loss: 0.053070068359375|cri_loss: 0.03271484375|unsuper_loss: 0.0 average reward score: 3.080078125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.43%) |Training time=0.64s (19.56%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.39 epoch: 0|step: 2021|ppo_ep: 1|act_loss: -0.0830078125|cri_loss: -0.00634765625|unsuper_loss: 0.0 average reward score: 3.455078125 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.06%) |Training time=0.64s (19.92%) |Others=0.19 (6.02%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.39 epoch: 0|step: 2022|ppo_ep: 1|act_loss: 0.064208984375|cri_loss: 0.03717041015625|unsuper_loss: 0.0 average reward score: 3.17578125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.40%) |Training time=0.64s (19.62%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.39 epoch: 0|step: 2023|ppo_ep: 1|act_loss: -0.0294647216796875|cri_loss: -0.00323486328125|unsuper_loss: 0.0 average reward score: 3.01953125 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.43s (66.84%) |Training time=0.93s (25.42%) |Others=0.28 (7.74%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.39 epoch: 0|step: 2024|ppo_ep: 1|act_loss: -0.02081298828125|cri_loss: -0.00384521484375|unsuper_loss: 0.0 average reward score: 3.013671875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.47%) |Training time=0.64s (19.51%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.39 epoch: 0|step: 2025|ppo_ep: 1|act_loss: 0.160400390625|cri_loss: 0.0904541015625|unsuper_loss: 0.0 average reward score: 4.4921875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.46%) |Training time=0.64s (19.63%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.39 epoch: 0|step: 2026|ppo_ep: 1|act_loss: -0.003753662109375|cri_loss: 0.003673553466796875|unsuper_loss: 0.0 average reward score: 3.29296875 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.46%) |Training time=0.64s (19.60%) |Others=0.20 (5.94%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.39 epoch: 0|step: 2027|ppo_ep: 1|act_loss: 0.113525390625|cri_loss: 0.06964111328125|unsuper_loss: 0.0 average reward score: 3.75 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.12%) |Training time=0.64s (19.70%) |Others=0.20 (6.18%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.39 epoch: 0|step: 2028|ppo_ep: 1|act_loss: 0.00119781494140625|cri_loss: 0.00505828857421875|unsuper_loss: 0.0 average reward score: 2.662109375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.47%) |Training time=0.64s (19.61%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.39 epoch: 0|step: 2029|ppo_ep: 1|act_loss: 0.10101318359375|cri_loss: 0.066162109375|unsuper_loss: 0.0 average reward score: 1.984375 ------------------------------------------------------------------------------------- |E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.62s (74.85%) |Training time=0.67s (19.04%) |Others=0.21 (6.11%)|CurSamplesPerSec=2.28 |AvgSamplesPerSec=2.39 epoch: 0|step: 2030|ppo_ep: 1|act_loss: -0.050323486328125|cri_loss: -0.01904296875|unsuper_loss: 0.0 average reward score: 3.3125 ------------------------------------------------------------------------------------- |E2E latency=3.91s |Gather latency=0.00s (0.00%) |Generate time=2.73s (69.83%) |Training time=0.94s (24.07%) |Others=0.24 (6.09%)|CurSamplesPerSec=2.05 |AvgSamplesPerSec=2.39 epoch: 0|step: 2031|ppo_ep: 1|act_loss: 0.189697265625|cri_loss: 0.1026611328125|unsuper_loss: 0.0 average reward score: 4.2578125 ------------------------------------------------------------------------------------- |E2E latency=3.84s |Gather latency=0.00s (0.00%) |Generate time=2.57s (66.80%) |Training time=0.96s (25.04%) |Others=0.31 (8.16%)|CurSamplesPerSec=2.08 |AvgSamplesPerSec=2.39 epoch: 0|step: 2032|ppo_ep: 1|act_loss: -0.0298004150390625|cri_loss: -0.0095367431640625|unsuper_loss: 0.0 average reward score: 2.400390625 ------------------------------------------------------------------------------------- |E2E latency=3.85s |Gather latency=0.00s (0.00%) |Generate time=2.94s (76.35%) |Training time=0.66s (17.24%) |Others=0.25 (6.41%)|CurSamplesPerSec=2.08 |AvgSamplesPerSec=2.39 epoch: 0|step: 2033|ppo_ep: 1|act_loss: 0.2003173828125|cri_loss: 0.10791015625|unsuper_loss: 0.0 average reward score: 3.884765625 ------------------------------------------------------------------------------------- |E2E latency=3.73s |Gather latency=0.00s (0.00%) |Generate time=2.86s (76.68%) |Training time=0.65s (17.46%) |Others=0.22 (5.86%)|CurSamplesPerSec=2.15 |AvgSamplesPerSec=2.39 epoch: 0|step: 2034|ppo_ep: 1|act_loss: 0.125|cri_loss: 0.074462890625|unsuper_loss: 0.0 average reward score: 2.4375 ------------------------------------------------------------------------------------- |E2E latency=3.72s |Gather latency=0.00s (0.00%) |Generate time=2.87s (77.15%) |Training time=0.65s (17.46%) |Others=0.20 (5.40%)|CurSamplesPerSec=2.15 |AvgSamplesPerSec=2.39 epoch: 0|step: 2035|ppo_ep: 1|act_loss: -0.0059051513671875|cri_loss: 0.011199951171875|unsuper_loss: 0.0 average reward score: 4.51171875 ------------------------------------------------------------------------------------- |E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.65s (75.56%) |Training time=0.64s (18.27%) |Others=0.22 (6.17%)|CurSamplesPerSec=2.28 |AvgSamplesPerSec=2.39 epoch: 0|step: 2036|ppo_ep: 1|act_loss: 0.1688232421875|cri_loss: 0.0914306640625|unsuper_loss: 0.0 average reward score: 3.65625 ------------------------------------------------------------------------------------- |E2E latency=3.42s |Gather latency=0.00s (0.00%) |Generate time=2.52s (73.66%) |Training time=0.64s (18.82%) |Others=0.26 (7.52%)|CurSamplesPerSec=2.34 |AvgSamplesPerSec=2.39 epoch: 0|step: 2037|ppo_ep: 1|act_loss: -0.0179443359375|cri_loss: 0.0016326904296875|unsuper_loss: 0.0 average reward score: 4.69140625 ------------------------------------------------------------------------------------- |E2E latency=3.68s |Gather latency=0.00s (0.00%) |Generate time=2.68s (72.83%) |Training time=0.79s (21.58%) |Others=0.21 (5.60%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.39 epoch: 0|step: 2038|ppo_ep: 1|act_loss: 0.100830078125|cri_loss: 0.0616455078125|unsuper_loss: 0.0 average reward score: 3.802734375 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.48%) |Training time=0.64s (19.57%) |Others=0.20 (5.95%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.39 epoch: 0|step: 2039|ppo_ep: 1|act_loss: -0.0799560546875|cri_loss: -0.02130126953125|unsuper_loss: 0.0 average reward score: 3.1640625 ------------------------------------------------------------------------------------- |E2E latency=3.68s |Gather latency=0.00s (0.00%) |Generate time=2.47s (67.13%) |Training time=0.93s (25.20%) |Others=0.28 (7.67%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.39 epoch: 0|step: 2040|ppo_ep: 1|act_loss: 0.0218353271484375|cri_loss: 0.0181732177734375|unsuper_loss: 0.0 average reward score: 4.1953125 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.48s (73.87%) |Training time=0.64s (19.03%) |Others=0.24 (7.11%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.39 epoch: 0|step: 2041|ppo_ep: 1|act_loss: -0.046966552734375|cri_loss: -0.018157958984375|unsuper_loss: 0.0 average reward score: 2.875 ------------------------------------------------------------------------------------- |E2E latency=3.42s |Gather latency=0.00s (0.00%) |Generate time=2.57s (75.33%) |Training time=0.64s (18.88%) |Others=0.20 (5.79%)|CurSamplesPerSec=2.34 |AvgSamplesPerSec=2.39 epoch: 0|step: 2042|ppo_ep: 1|act_loss: 0.1177978515625|cri_loss: 0.07061767578125|unsuper_loss: 0.0 average reward score: 3.94921875 ------------------------------------------------------------------------------------- |E2E latency=7.86s |Gather latency=0.00s (0.00%) |Generate time=5.64s (71.73%) |Training time=1.60s (20.37%) |Others=0.62 (7.89%)|CurSamplesPerSec=1.02 |AvgSamplesPerSec=2.39 epoch: 0|step: 2043|ppo_ep: 1|act_loss: -0.08251953125|cri_loss: -0.0325927734375|unsuper_loss: 0.0 average reward score: 3.80859375 ------------------------------------------------------------------------------------- |E2E latency=3.42s |Gather latency=0.00s (0.00%) |Generate time=2.54s (74.24%) |Training time=0.65s (19.15%) |Others=0.23 (6.61%)|CurSamplesPerSec=2.34 |AvgSamplesPerSec=2.39 epoch: 0|step: 2044|ppo_ep: 1|act_loss: 0.0828857421875|cri_loss: 0.04833984375|unsuper_loss: 0.0 average reward score: 2.71875 ------------------------------------------------------------------------------------- |E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.52s (74.52%) |Training time=0.65s (19.08%) |Others=0.22 (6.41%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.39 epoch: 0|step: 2045|ppo_ep: 1|act_loss: 0.09210205078125|cri_loss: 0.0645751953125|unsuper_loss: 0.0 average reward score: 2.7421875 ------------------------------------------------------------------------------------- |E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.57s (74.35%) |Training time=0.66s (19.02%) |Others=0.23 (6.63%)|CurSamplesPerSec=2.32 |AvgSamplesPerSec=2.39 epoch: 0|step: 2046|ppo_ep: 1|act_loss: 0.02532958984375|cri_loss: 0.016998291015625|unsuper_loss: 0.0 average reward score: 4.015625 ------------------------------------------------------------------------------------- |E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.57s (73.32%) |Training time=0.70s (19.94%) |Others=0.24 (6.74%)|CurSamplesPerSec=2.28 |AvgSamplesPerSec=2.39 epoch: 0|step: 2047|ppo_ep: 1|act_loss: 0.12890625|cri_loss: 0.07513427734375|unsuper_loss: 0.0 average reward score: 1.681640625 ------------------------------------------------------------------------------------- |E2E latency=3.97s |Gather latency=0.00s (0.00%) |Generate time=2.73s (68.75%) |Training time=0.95s (23.84%) |Others=0.29 (7.42%)|CurSamplesPerSec=2.02 |AvgSamplesPerSec=2.39 epoch: 0|step: 2048|ppo_ep: 1|act_loss: 0.052764892578125|cri_loss: 0.0450439453125|unsuper_loss: 0.0 average reward score: 4.54296875 ------------------------------------------------------------------------------------- |E2E latency=3.42s |Gather latency=0.00s (0.00%) |Generate time=2.53s (74.08%) |Training time=0.68s (19.90%) |Others=0.21 (6.02%)|CurSamplesPerSec=2.34 |AvgSamplesPerSec=2.39 epoch: 0|step: 2049|ppo_ep: 1|act_loss: 0.050689697265625|cri_loss: 0.032562255859375|unsuper_loss: 0.0 average reward score: 4.578125 ------------------------------------------------------------------------------------- |E2E latency=3.83s |Gather latency=0.00s (0.00%) |Generate time=2.96s (77.25%) |Training time=0.65s (16.86%) |Others=0.23 (5.89%)|CurSamplesPerSec=2.09 |AvgSamplesPerSec=2.39 epoch: 0|step: 2050|ppo_ep: 1|act_loss: 0.01312255859375|cri_loss: 0.01708984375|unsuper_loss: 0.0 average reward score: 4.14453125 ------------------------------------------------------------------------------------- |E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.63s (75.02%) |Training time=0.64s (18.36%) |Others=0.23 (6.61%)|CurSamplesPerSec=2.28 |AvgSamplesPerSec=2.39 epoch: 0|step: 2051|ppo_ep: 1|act_loss: -0.004638671875|cri_loss: 0.01062774658203125|unsuper_loss: 0.0 average reward score: 4.34375 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.75s (76.15%) |Training time=0.64s (17.85%) |Others=0.22 (6.00%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.39 epoch: 0|step: 2052|ppo_ep: 1|act_loss: 0.0576171875|cri_loss: 0.041259765625|unsuper_loss: 0.0 average reward score: 3.921875 ------------------------------------------------------------------------------------- |E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.60s (75.07%) |Training time=0.65s (18.68%) |Others=0.22 (6.25%)|CurSamplesPerSec=2.31 |AvgSamplesPerSec=2.39 epoch: 0|step: 2053|ppo_ep: 1|act_loss: -0.0283355712890625|cri_loss: -0.003204345703125|unsuper_loss: 0.0 average reward score: 4.578125 ------------------------------------------------------------------------------------- |E2E latency=3.87s |Gather latency=0.00s (0.00%) |Generate time=2.99s (77.09%) |Training time=0.66s (17.01%) |Others=0.23 (5.89%)|CurSamplesPerSec=2.06 |AvgSamplesPerSec=2.39 epoch: 0|step: 2054|ppo_ep: 1|act_loss: 0.01528167724609375|cri_loss: 0.015899658203125|unsuper_loss: 0.0 average reward score: 3.203125 ------------------------------------------------------------------------------------- |E2E latency=3.43s |Gather latency=0.00s (0.00%) |Generate time=2.52s (73.27%) |Training time=0.70s (20.41%) |Others=0.22 (6.32%)|CurSamplesPerSec=2.33 |AvgSamplesPerSec=2.39 epoch: 0|step: 2055|ppo_ep: 1|act_loss: -0.119140625|cri_loss: -0.047760009765625|unsuper_loss: 0.0 average reward score: 4.73046875 ------------------------------------------------------------------------------------- |E2E latency=4.36s |Gather latency=0.00s (0.00%) |Generate time=3.12s (71.49%) |Training time=0.93s (21.27%) |Others=0.32 (7.24%)|CurSamplesPerSec=1.84 |AvgSamplesPerSec=2.39 epoch: 0|step: 2056|ppo_ep: 1|act_loss: 0.00335693359375|cri_loss: 0.01192474365234375|unsuper_loss: 0.0 average reward score: 4.0078125 ------------------------------------------------------------------------------------- |E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.79s (76.85%) |Training time=0.63s (17.47%) |Others=0.21 (5.68%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.39 epoch: 0|step: 2057|ppo_ep: 1|act_loss: -0.0303955078125|cri_loss: -0.0012054443359375|unsuper_loss: 0.0 average reward score: 4.359375 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.75s (76.10%) |Training time=0.64s (17.75%) |Others=0.22 (6.15%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.39 epoch: 0|step: 2058|ppo_ep: 1|act_loss: 0.0066375732421875|cri_loss: 0.0109710693359375|unsuper_loss: 0.0 average reward score: 2.90625 ------------------------------------------------------------------------------------- |E2E latency=3.67s |Gather latency=0.00s (0.00%) |Generate time=2.58s (70.10%) |Training time=0.86s (23.53%) |Others=0.23 (6.37%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.39 epoch: 0|step: 2059|ppo_ep: 1|act_loss: 0.26318359375|cri_loss: 0.147705078125|unsuper_loss: 0.0 average reward score: 2.2890625 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.76s (76.59%) |Training time=0.64s (17.81%) |Others=0.20 (5.60%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.39 epoch: 0|step: 2060|ppo_ep: 1|act_loss: 0.1112060546875|cri_loss: 0.0670166015625|unsuper_loss: 0.0 average reward score: 2.263671875 ------------------------------------------------------------------------------------- |E2E latency=3.90s |Gather latency=0.00s (0.00%) |Generate time=3.03s (77.70%) |Training time=0.66s (16.86%) |Others=0.21 (5.44%)|CurSamplesPerSec=2.05 |AvgSamplesPerSec=2.39 epoch: 0|step: 2061|ppo_ep: 1|act_loss: 0.11480712890625|cri_loss: 0.07012939453125|unsuper_loss: 0.0 average reward score: 3.556640625 ------------------------------------------------------------------------------------- |E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.59s (74.52%) |Training time=0.67s (19.20%) |Others=0.22 (6.27%)|CurSamplesPerSec=2.30 |AvgSamplesPerSec=2.39 epoch: 0|step: 2062|ppo_ep: 1|act_loss: 0.12481689453125|cri_loss: 0.0758056640625|unsuper_loss: 0.0 average reward score: 4.0546875 ------------------------------------------------------------------------------------- |E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.52s (72.10%) |Training time=0.76s (21.59%) |Others=0.22 (6.31%)|CurSamplesPerSec=2.29 |AvgSamplesPerSec=2.39 epoch: 0|step: 2063|ppo_ep: 1|act_loss: 0.0157012939453125|cri_loss: 0.0221710205078125|unsuper_loss: 0.0 average reward score: 3.05859375 ------------------------------------------------------------------------------------- |E2E latency=3.77s |Gather latency=0.00s (0.00%) |Generate time=2.50s (66.29%) |Training time=0.94s (24.86%) |Others=0.33 (8.85%)|CurSamplesPerSec=2.12 |AvgSamplesPerSec=2.39 epoch: 0|step: 2064|ppo_ep: 1|act_loss: 0.00389862060546875|cri_loss: 0.00974273681640625|unsuper_loss: 0.0 average reward score: 3.625 ------------------------------------------------------------------------------------- |E2E latency=3.85s |Gather latency=0.00s (0.00%) |Generate time=2.98s (77.53%) |Training time=0.66s (17.09%) |Others=0.21 (5.38%)|CurSamplesPerSec=2.08 |AvgSamplesPerSec=2.39 epoch: 0|step: 2065|ppo_ep: 1|act_loss: 0.0882568359375|cri_loss: 0.05072021484375|unsuper_loss: 0.0 average reward score: 2.7890625 ------------------------------------------------------------------------------------- |E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.59s (74.33%) |Training time=0.66s (19.06%) |Others=0.23 (6.61%)|CurSamplesPerSec=2.30 |AvgSamplesPerSec=2.39 epoch: 0|step: 2066|ppo_ep: 1|act_loss: -0.01047515869140625|cri_loss: 0.00254058837890625|unsuper_loss: 0.0 average reward score: 4.22265625 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.74s (75.66%) |Training time=0.64s (17.81%) |Others=0.24 (6.53%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.39 epoch: 0|step: 2067|ppo_ep: 1|act_loss: 0.0386962890625|cri_loss: 0.02685546875|unsuper_loss: 0.0 average reward score: 3.04296875 ------------------------------------------------------------------------------------- |E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.51s (72.75%) |Training time=0.71s (20.56%) |Others=0.23 (6.69%)|CurSamplesPerSec=2.32 |AvgSamplesPerSec=2.39 epoch: 0|step: 2068|ppo_ep: 1|act_loss: -0.0594482421875|cri_loss: -0.0165863037109375|unsuper_loss: 0.0 average reward score: 5.02734375 ------------------------------------------------------------------------------------- |E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.48s (71.01%) |Training time=0.78s (22.36%) |Others=0.23 (6.63%)|CurSamplesPerSec=2.29 |AvgSamplesPerSec=2.39 epoch: 0|step: 2069|ppo_ep: 1|act_loss: 0.03253173828125|cri_loss: 0.024932861328125|unsuper_loss: 0.0 average reward score: 2.828125 ------------------------------------------------------------------------------------- |E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.60s (74.85%) |Training time=0.65s (18.77%) |Others=0.22 (6.38%)|CurSamplesPerSec=2.30 |AvgSamplesPerSec=2.39 epoch: 0|step: 2070|ppo_ep: 1|act_loss: -0.0161895751953125|cri_loss: 0.0005645751953125|unsuper_loss: 0.0 average reward score: 3.58984375 ------------------------------------------------------------------------------------- |E2E latency=3.89s |Gather latency=0.00s (0.00%) |Generate time=2.98s (76.57%) |Training time=0.68s (17.38%) |Others=0.24 (6.05%)|CurSamplesPerSec=2.06 |AvgSamplesPerSec=2.39 epoch: 0|step: 2071|ppo_ep: 1|act_loss: -0.048553466796875|cri_loss: -0.01519775390625|unsuper_loss: 0.0 average reward score: 3.890625 ------------------------------------------------------------------------------------- |E2E latency=4.13s |Gather latency=0.00s (0.00%) |Generate time=2.87s (69.62%) |Training time=0.95s (22.96%) |Others=0.31 (7.42%)|CurSamplesPerSec=1.94 |AvgSamplesPerSec=2.39 epoch: 0|step: 2072|ppo_ep: 1|act_loss: 0.0384521484375|cri_loss: 0.03228759765625|unsuper_loss: 0.0 average reward score: 4.13671875 ------------------------------------------------------------------------------------- |E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.44s (67.22%) |Training time=0.94s (25.95%) |Others=0.25 (6.83%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.39 epoch: 0|step: 2073|ppo_ep: 1|act_loss: 0.00145721435546875|cri_loss: 0.006679534912109375|unsuper_loss: 0.0 average reward score: 3.08984375 ------------------------------------------------------------------------------------- |E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.57s (74.50%) |Training time=0.68s (19.61%) |Others=0.20 (5.89%)|CurSamplesPerSec=2.32 |AvgSamplesPerSec=2.39 epoch: 0|step: 2074|ppo_ep: 1|act_loss: 0.22900390625|cri_loss: 0.1324462890625|unsuper_loss: 0.0 average reward score: 2.62890625 ------------------------------------------------------------------------------------- |E2E latency=3.70s |Gather latency=0.00s (0.00%) |Generate time=2.82s (76.16%) |Training time=0.65s (17.55%) |Others=0.23 (6.29%)|CurSamplesPerSec=2.16 |AvgSamplesPerSec=2.39 epoch: 0|step: 2075|ppo_ep: 1|act_loss: 0.04473876953125|cri_loss: 0.041534423828125|unsuper_loss: 0.0 average reward score: 2.62109375 ------------------------------------------------------------------------------------- |E2E latency=3.69s |Gather latency=0.00s (0.00%) |Generate time=2.82s (76.52%) |Training time=0.64s (17.47%) |Others=0.22 (6.00%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.39 epoch: 0|step: 2076|ppo_ep: 1|act_loss: 0.08935546875|cri_loss: 0.054901123046875|unsuper_loss: 0.0 average reward score: 1.53125 ------------------------------------------------------------------------------------- |E2E latency=3.84s |Gather latency=0.00s (0.00%) |Generate time=2.91s (75.79%) |Training time=0.68s (17.73%) |Others=0.25 (6.48%)|CurSamplesPerSec=2.08 |AvgSamplesPerSec=2.39 epoch: 0|step: 2077|ppo_ep: 1|act_loss: -0.06011962890625|cri_loss: -0.016387939453125|unsuper_loss: 0.0 average reward score: 3.435546875 ------------------------------------------------------------------------------------- |E2E latency=3.68s |Gather latency=0.00s (0.00%) |Generate time=2.76s (74.98%) |Training time=0.68s (18.54%) |Others=0.24 (6.48%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.39 epoch: 0|step: 2078|ppo_ep: 1|act_loss: -0.02862548828125|cri_loss: -0.0059356689453125|unsuper_loss: 0.0 average reward score: 3.5078125 ------------------------------------------------------------------------------------- |E2E latency=3.77s |Gather latency=0.00s (0.00%) |Generate time=2.89s (76.57%) |Training time=0.64s (17.10%) |Others=0.24 (6.33%)|CurSamplesPerSec=2.12 |AvgSamplesPerSec=2.39 [2023-04-24 15:43:52,315] [INFO] [logging.py:96:log_dist] [Rank 0] step=260, skipped=5, lr=[6.138579785888551e-06, 6.138579785888551e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 15:43:52,559] [INFO] [timer.py:199:stop] epoch=0/micro_step=2080/global_step=260, RunningAvgSamplesPerSec=15.397151045312219, CurrSamplesPerSec=14.60592319278567, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 15:43:52,797] [INFO] [logging.py:96:log_dist] [Rank 0] step=260, skipped=4, lr=[3.1604885046782158e-06, 3.1604885046782158e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 2079|ppo_ep: 1|act_loss: 0.09283447265625|cri_loss: 0.06585693359375|unsuper_loss: 0.0 average reward score: 3.30078125 ------------------------------------------------------------------------------------- |E2E latency=3.96s |Gather latency=0.00s (0.00%) |Generate time=2.70s (68.00%) |Training time=0.95s (24.03%) |Others=0.32 (7.97%)|CurSamplesPerSec=2.02 |AvgSamplesPerSec=2.39 epoch: 0|step: 2080|ppo_ep: 1|act_loss: -0.0074462890625|cri_loss: 0.00994873046875|unsuper_loss: 0.0 average reward score: 4.0 ------------------------------------------------------------------------------------- |E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.64s (75.10%) |Training time=0.64s (18.31%) |Others=0.23 (6.60%)|CurSamplesPerSec=2.28 |AvgSamplesPerSec=2.39 epoch: 0|step: 2081|ppo_ep: 1|act_loss: 0.03436279296875|cri_loss: 0.03790283203125|unsuper_loss: 0.0 average reward score: 2.57421875 ------------------------------------------------------------------------------------- |E2E latency=3.43s |Gather latency=0.00s (0.00%) |Generate time=2.56s (74.70%) |Training time=0.64s (18.72%) |Others=0.23 (6.58%)|CurSamplesPerSec=2.33 |AvgSamplesPerSec=2.39 epoch: 0|step: 2082|ppo_ep: 1|act_loss: -0.03131103515625|cri_loss: 0.004058837890625|unsuper_loss: 0.0 average reward score: 3.48828125 ------------------------------------------------------------------------------------- |E2E latency=3.68s |Gather latency=0.00s (0.00%) |Generate time=2.79s (75.98%) |Training time=0.64s (17.31%) |Others=0.25 (6.71%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.39 epoch: 0|step: 2083|ppo_ep: 1|act_loss: -0.031829833984375|cri_loss: 0.0030517578125|unsuper_loss: 0.0 average reward score: 3.244140625 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.76s (75.86%) |Training time=0.64s (17.60%) |Others=0.24 (6.54%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.39 epoch: 0|step: 2084|ppo_ep: 1|act_loss: 0.225341796875|cri_loss: 0.1348876953125|unsuper_loss: 0.0 average reward score: 3.23046875 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.71s (75.59%) |Training time=0.64s (17.93%) |Others=0.23 (6.48%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.39 epoch: 0|step: 2085|ppo_ep: 1|act_loss: -0.037200927734375|cri_loss: -0.010162353515625|unsuper_loss: 0.0 average reward score: 2.693359375 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.71s (75.13%) |Training time=0.67s (18.44%) |Others=0.23 (6.44%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.39 epoch: 0|step: 2086|ppo_ep: 1|act_loss: 0.217529296875|cri_loss: 0.1356201171875|unsuper_loss: 0.0 average reward score: 2.54296875 ------------------------------------------------------------------------------------- |E2E latency=3.81s |Gather latency=0.00s (0.00%) |Generate time=2.93s (76.86%) |Training time=0.65s (17.14%) |Others=0.23 (6.00%)|CurSamplesPerSec=2.10 |AvgSamplesPerSec=2.39 epoch: 0|step: 2087|ppo_ep: 1|act_loss: 0.025115966796875|cri_loss: 0.0206451416015625|unsuper_loss: 0.0 average reward score: 2.31640625 ------------------------------------------------------------------------------------- |E2E latency=4.33s |Gather latency=0.00s (0.00%) |Generate time=3.05s (70.48%) |Training time=0.97s (22.34%) |Others=0.31 (7.18%)|CurSamplesPerSec=1.85 |AvgSamplesPerSec=2.39 epoch: 0|step: 2088|ppo_ep: 1|act_loss: 0.06744384765625|cri_loss: 0.051483154296875|unsuper_loss: 0.0 average reward score: 3.494140625 ------------------------------------------------------------------------------------- |E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.60s (75.16%) |Training time=0.64s (18.53%) |Others=0.22 (6.31%)|CurSamplesPerSec=2.31 |AvgSamplesPerSec=2.39 epoch: 0|step: 2089|ppo_ep: 1|act_loss: 0.36767578125|cri_loss: 0.2509765625|unsuper_loss: 0.0 average reward score: 2.25390625 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.70s (75.38%) |Training time=0.67s (18.75%) |Others=0.21 (5.87%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.39 epoch: 0|step: 2090|ppo_ep: 1|act_loss: 0.002349853515625|cri_loss: 0.01078033447265625|unsuper_loss: 0.0 average reward score: 3.7734375 ------------------------------------------------------------------------------------- |E2E latency=7.34s |Gather latency=0.00s (0.00%) |Generate time=4.45s (60.53%) |Training time=2.11s (28.68%) |Others=0.79 (10.79%)|CurSamplesPerSec=1.09 |AvgSamplesPerSec=2.38 epoch: 0|step: 2091|ppo_ep: 1|act_loss: 0.0202789306640625|cri_loss: 0.0197296142578125|unsuper_loss: 0.0 average reward score: 4.2890625 ------------------------------------------------------------------------------------- |E2E latency=3.57s |Gather latency=0.00s (0.00%) |Generate time=2.69s (75.17%) |Training time=0.65s (18.25%) |Others=0.24 (6.58%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.38 epoch: 0|step: 2092|ppo_ep: 1|act_loss: -0.165771484375|cri_loss: -0.0572509765625|unsuper_loss: 0.0 average reward score: 3.3046875 ------------------------------------------------------------------------------------- |E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.66s (75.56%) |Training time=0.64s (18.31%) |Others=0.22 (6.14%)|CurSamplesPerSec=2.28 |AvgSamplesPerSec=2.38 epoch: 0|step: 2093|ppo_ep: 1|act_loss: 0.004337310791015625|cri_loss: 0.009033203125|unsuper_loss: 0.0 average reward score: 3.548828125 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.74s (75.91%) |Training time=0.64s (17.87%) |Others=0.22 (6.23%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.38 epoch: 0|step: 2094|ppo_ep: 1|act_loss: -0.011962890625|cri_loss: 0.00579833984375|unsuper_loss: 0.0 average reward score: 4.125 ------------------------------------------------------------------------------------- |E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.69s (75.50%) |Training time=0.65s (18.17%) |Others=0.22 (6.33%)|CurSamplesPerSec=2.25 |AvgSamplesPerSec=2.38 epoch: 0|step: 2095|ppo_ep: 1|act_loss: 0.1064453125|cri_loss: 0.0621337890625|unsuper_loss: 0.0 average reward score: 2.36328125 ------------------------------------------------------------------------------------- |E2E latency=3.89s |Gather latency=0.00s (0.00%) |Generate time=2.58s (66.37%) |Training time=0.98s (25.16%) |Others=0.33 (8.47%)|CurSamplesPerSec=2.05 |AvgSamplesPerSec=2.38 epoch: 0|step: 2096|ppo_ep: 1|act_loss: 0.09136962890625|cri_loss: 0.07196044921875|unsuper_loss: 0.0 average reward score: 2.40234375 ------------------------------------------------------------------------------------- |E2E latency=3.67s |Gather latency=0.00s (0.00%) |Generate time=2.81s (76.56%) |Training time=0.64s (17.47%) |Others=0.22 (5.96%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.38 epoch: 0|step: 2097|ppo_ep: 1|act_loss: -0.020294189453125|cri_loss: 0.0053253173828125|unsuper_loss: 0.0 average reward score: 3.650390625 ------------------------------------------------------------------------------------- |E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.61s (75.00%) |Training time=0.65s (18.67%) |Others=0.22 (6.33%)|CurSamplesPerSec=2.30 |AvgSamplesPerSec=2.38 epoch: 0|step: 2098|ppo_ep: 1|act_loss: -0.15478515625|cri_loss: -0.065673828125|unsuper_loss: 0.0 average reward score: 3.80078125 ------------------------------------------------------------------------------------- |E2E latency=3.74s |Gather latency=0.00s (0.00%) |Generate time=2.86s (76.46%) |Training time=0.65s (17.30%) |Others=0.23 (6.24%)|CurSamplesPerSec=2.14 |AvgSamplesPerSec=2.38 epoch: 0|step: 2099|ppo_ep: 1|act_loss: -0.12200927734375|cri_loss: -0.042236328125|unsuper_loss: 0.0 average reward score: 4.7265625 ------------------------------------------------------------------------------------- |E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.53s (74.58%) |Training time=0.64s (18.91%) |Others=0.22 (6.51%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.38 epoch: 0|step: 2100|ppo_ep: 1|act_loss: 0.1290283203125|cri_loss: 0.08087158203125|unsuper_loss: 0.0 average reward score: 3.126953125 ------------------------------------------------------------------------------------- |E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.67s (75.47%) |Training time=0.65s (18.26%) |Others=0.22 (6.27%)|CurSamplesPerSec=2.26 |AvgSamplesPerSec=2.38 epoch: 0|step: 2101|ppo_ep: 1|act_loss: -0.0029754638671875|cri_loss: 0.01134490966796875|unsuper_loss: 0.0 average reward score: 3.078125 ------------------------------------------------------------------------------------- |E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.52s (74.28%) |Training time=0.66s (19.32%) |Others=0.22 (6.40%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.38 epoch: 0|step: 2102|ppo_ep: 1|act_loss: -0.04595947265625|cri_loss: -0.0091552734375|unsuper_loss: 0.0 average reward score: 3.23046875 ------------------------------------------------------------------------------------- |E2E latency=3.79s |Gather latency=0.00s (0.00%) |Generate time=2.90s (76.54%) |Training time=0.66s (17.28%) |Others=0.23 (6.18%)|CurSamplesPerSec=2.11 |AvgSamplesPerSec=2.38 epoch: 0|step: 2103|ppo_ep: 1|act_loss: -0.011810302734375|cri_loss: 0.007781982421875|unsuper_loss: 0.0 average reward score: 4.234375 ------------------------------------------------------------------------------------- |E2E latency=3.91s |Gather latency=0.00s (0.00%) |Generate time=2.67s (68.17%) |Training time=0.94s (24.00%) |Others=0.31 (7.84%)|CurSamplesPerSec=2.04 |AvgSamplesPerSec=2.38 epoch: 0|step: 2104|ppo_ep: 1|act_loss: 0.0180816650390625|cri_loss: 0.02423095703125|unsuper_loss: 0.0 average reward score: 2.8046875 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.71s (75.70%) |Training time=0.64s (17.97%) |Others=0.23 (6.33%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.38 epoch: 0|step: 2105|ppo_ep: 1|act_loss: -0.0830078125|cri_loss: -0.025390625|unsuper_loss: 0.0 average reward score: 4.703125 ------------------------------------------------------------------------------------- |E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.53s (71.23%) |Training time=0.79s (22.34%) |Others=0.23 (6.43%)|CurSamplesPerSec=2.25 |AvgSamplesPerSec=2.38 epoch: 0|step: 2106|ppo_ep: 1|act_loss: 0.043487548828125|cri_loss: 0.03338623046875|unsuper_loss: 0.0 average reward score: 4.48046875 ------------------------------------------------------------------------------------- |E2E latency=5.40s |Gather latency=0.00s (0.00%) |Generate time=4.03s (74.59%) |Training time=0.94s (17.48%) |Others=0.43 (7.93%)|CurSamplesPerSec=1.48 |AvgSamplesPerSec=2.38 epoch: 0|step: 2107|ppo_ep: 1|act_loss: -0.065185546875|cri_loss: -0.02630615234375|unsuper_loss: 0.0 average reward score: 3.63671875 ------------------------------------------------------------------------------------- |E2E latency=3.73s |Gather latency=0.00s (0.00%) |Generate time=2.88s (77.19%) |Training time=0.64s (17.27%) |Others=0.21 (5.54%)|CurSamplesPerSec=2.15 |AvgSamplesPerSec=2.38 epoch: 0|step: 2108|ppo_ep: 1|act_loss: -0.12298583984375|cri_loss: -0.0472412109375|unsuper_loss: 0.0 average reward score: 4.76171875 ------------------------------------------------------------------------------------- |E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.67s (75.35%) |Training time=0.66s (18.65%) |Others=0.21 (6.00%)|CurSamplesPerSec=2.25 |AvgSamplesPerSec=2.38 epoch: 0|step: 2109|ppo_ep: 1|act_loss: -0.07275390625|cri_loss: -0.014251708984375|unsuper_loss: 0.0 average reward score: 4.796875 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.27%) |Training time=0.64s (19.23%) |Others=0.22 (6.49%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.38 epoch: 0|step: 2110|ppo_ep: 1|act_loss: -0.13671875|cri_loss: -0.042755126953125|unsuper_loss: 0.0 average reward score: 4.3203125 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.75s (75.69%) |Training time=0.64s (17.62%) |Others=0.24 (6.69%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.38 epoch: 0|step: 2111|ppo_ep: 1|act_loss: -0.138916015625|cri_loss: -0.050201416015625|unsuper_loss: 0.0 average reward score: 4.0078125 ------------------------------------------------------------------------------------- |E2E latency=3.84s |Gather latency=0.00s (0.00%) |Generate time=2.58s (67.13%) |Training time=0.94s (24.58%) |Others=0.32 (8.29%)|CurSamplesPerSec=2.08 |AvgSamplesPerSec=2.38 epoch: 0|step: 2112|ppo_ep: 1|act_loss: 0.0687255859375|cri_loss: 0.04046630859375|unsuper_loss: 0.0 average reward score: 5.65625 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.67s (74.48%) |Training time=0.67s (18.75%) |Others=0.24 (6.77%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.38 epoch: 0|step: 2113|ppo_ep: 1|act_loss: 0.06219482421875|cri_loss: 0.04693603515625|unsuper_loss: 0.0 average reward score: 3.70703125 ------------------------------------------------------------------------------------- |E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.56s (74.30%) |Training time=0.66s (19.17%) |Others=0.22 (6.52%)|CurSamplesPerSec=2.32 |AvgSamplesPerSec=2.38 epoch: 0|step: 2114|ppo_ep: 1|act_loss: -0.053955078125|cri_loss: -0.017608642578125|unsuper_loss: 0.0 average reward score: 2.98828125 ------------------------------------------------------------------------------------- |E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.65s (75.71%) |Training time=0.64s (18.25%) |Others=0.21 (6.04%)|CurSamplesPerSec=2.29 |AvgSamplesPerSec=2.38 epoch: 0|step: 2115|ppo_ep: 1|act_loss: 0.0853271484375|cri_loss: 0.05255126953125|unsuper_loss: 0.0 average reward score: 3.986328125 ------------------------------------------------------------------------------------- |E2E latency=3.41s |Gather latency=0.00s (0.00%) |Generate time=2.52s (73.96%) |Training time=0.65s (18.93%) |Others=0.24 (7.11%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.38 epoch: 0|step: 2116|ppo_ep: 1|act_loss: 0.13037109375|cri_loss: 0.0926513671875|unsuper_loss: 0.0 average reward score: 1.91015625 ------------------------------------------------------------------------------------- |E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.67s (75.25%) |Training time=0.67s (18.83%) |Others=0.21 (5.92%)|CurSamplesPerSec=2.25 |AvgSamplesPerSec=2.38 epoch: 0|step: 2117|ppo_ep: 1|act_loss: 0.0347900390625|cri_loss: 0.02825927734375|unsuper_loss: 0.0 average reward score: 4.15625 ------------------------------------------------------------------------------------- |E2E latency=3.76s |Gather latency=0.00s (0.00%) |Generate time=2.89s (76.94%) |Training time=0.65s (17.23%) |Others=0.22 (5.83%)|CurSamplesPerSec=2.13 |AvgSamplesPerSec=2.38 epoch: 0|step: 2118|ppo_ep: 1|act_loss: 0.1729736328125|cri_loss: 0.10791015625|unsuper_loss: 0.0 average reward score: 2.865234375 ------------------------------------------------------------------------------------- |E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.66s (74.81%) |Training time=0.66s (18.63%) |Others=0.23 (6.56%)|CurSamplesPerSec=2.25 |AvgSamplesPerSec=2.38 epoch: 0|step: 2119|ppo_ep: 1|act_loss: 0.0211181640625|cri_loss: 0.021392822265625|unsuper_loss: 0.0 average reward score: 4.01953125 ------------------------------------------------------------------------------------- |E2E latency=3.95s |Gather latency=0.00s (0.00%) |Generate time=2.45s (62.03%) |Training time=1.20s (30.40%) |Others=0.30 (7.57%)|CurSamplesPerSec=2.03 |AvgSamplesPerSec=2.38 epoch: 0|step: 2120|ppo_ep: 1|act_loss: -0.07696533203125|cri_loss: -0.021759033203125|unsuper_loss: 0.0 average reward score: 3.57421875 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.44s (73.42%) |Training time=0.65s (19.55%) |Others=0.23 (7.03%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.38 epoch: 0|step: 2121|ppo_ep: 1|act_loss: -0.052734375|cri_loss: -0.010711669921875|unsuper_loss: 0.0 average reward score: 4.52734375 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.43%) |Training time=0.65s (19.49%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.38 epoch: 0|step: 2122|ppo_ep: 1|act_loss: 0.043365478515625|cri_loss: 0.037139892578125|unsuper_loss: 0.0 average reward score: 3.87109375 ------------------------------------------------------------------------------------- |E2E latency=9.67s |Gather latency=0.00s (0.00%) |Generate time=5.85s (60.50%) |Training time=3.01s (31.13%) |Others=0.81 (8.37%)|CurSamplesPerSec=0.83 |AvgSamplesPerSec=2.38 epoch: 0|step: 2123|ppo_ep: 1|act_loss: -0.104248046875|cri_loss: -0.038543701171875|unsuper_loss: 0.0 average reward score: 3.11328125 ------------------------------------------------------------------------------------- |E2E latency=3.42s |Gather latency=0.00s (0.00%) |Generate time=2.59s (75.61%) |Training time=0.63s (18.49%) |Others=0.20 (5.91%)|CurSamplesPerSec=2.34 |AvgSamplesPerSec=2.38 epoch: 0|step: 2124|ppo_ep: 1|act_loss: 0.048583984375|cri_loss: 0.031646728515625|unsuper_loss: 0.0 average reward score: 3.5703125 ------------------------------------------------------------------------------------- |E2E latency=3.41s |Gather latency=0.00s (0.00%) |Generate time=2.55s (74.92%) |Training time=0.64s (18.77%) |Others=0.22 (6.31%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.38 epoch: 0|step: 2125|ppo_ep: 1|act_loss: -0.04461669921875|cri_loss: -0.0135955810546875|unsuper_loss: 0.0 average reward score: 3.787109375 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.45%) |Training time=0.64s (19.48%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.38 epoch: 0|step: 2126|ppo_ep: 1|act_loss: -0.0875244140625|cri_loss: -0.022247314453125|unsuper_loss: 0.0 average reward score: 3.8515625 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.68%) |Training time=0.64s (19.21%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.38 epoch: 0|step: 2127|ppo_ep: 1|act_loss: -0.0401611328125|cri_loss: -0.0069732666015625|unsuper_loss: 0.0 average reward score: 3.634765625 ------------------------------------------------------------------------------------- |E2E latency=3.68s |Gather latency=0.00s (0.00%) |Generate time=2.46s (66.79%) |Training time=0.93s (25.19%) |Others=0.29 (8.02%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.38 epoch: 0|step: 2128|ppo_ep: 1|act_loss: -0.00628662109375|cri_loss: 0.0146484375|unsuper_loss: 0.0 average reward score: 3.46484375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.27%) |Training time=0.64s (19.73%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2129|ppo_ep: 1|act_loss: 0.14794921875|cri_loss: 0.08056640625|unsuper_loss: 0.0 average reward score: 4.56640625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.80%) |Training time=0.65s (20.02%) |Others=0.20 (6.17%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.38 epoch: 0|step: 2130|ppo_ep: 1|act_loss: -0.0271759033203125|cri_loss: -0.0047149658203125|unsuper_loss: 0.0 average reward score: 3.79296875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.56%) |Training time=0.64s (19.44%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.38 epoch: 0|step: 2131|ppo_ep: 1|act_loss: -0.04351806640625|cri_loss: -0.0093994140625|unsuper_loss: 0.0 average reward score: 3.77734375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.50%) |Training time=0.64s (19.55%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2132|ppo_ep: 1|act_loss: 0.1513671875|cri_loss: 0.08148193359375|unsuper_loss: 0.0 average reward score: 4.3203125 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (73.99%) |Training time=0.65s (19.66%) |Others=0.21 (6.35%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2133|ppo_ep: 1|act_loss: 0.09521484375|cri_loss: 0.059295654296875|unsuper_loss: 0.0 average reward score: 2.69921875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.98%) |Training time=0.66s (20.15%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2134|ppo_ep: 1|act_loss: -0.0128173828125|cri_loss: 0.01007080078125|unsuper_loss: 0.0 average reward score: 2.91015625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.07%) |Training time=0.65s (19.75%) |Others=0.20 (6.17%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2135|ppo_ep: 1|act_loss: 0.031982421875|cri_loss: 0.02593994140625|unsuper_loss: 0.0 average reward score: 3.712890625 ------------------------------------------------------------------------------------- |E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.43s (66.95%) |Training time=0.92s (25.44%) |Others=0.28 (7.61%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.38 epoch: 0|step: 2136|ppo_ep: 1|act_loss: -0.1217041015625|cri_loss: -0.0462646484375|unsuper_loss: 0.0 average reward score: 3.61328125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.58%) |Training time=0.64s (19.51%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2137|ppo_ep: 1|act_loss: -0.060516357421875|cri_loss: -0.021636962890625|unsuper_loss: 0.0 average reward score: 3.328125 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.86%) |Training time=0.64s (19.34%) |Others=0.19 (5.80%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.38 epoch: 0|step: 2138|ppo_ep: 1|act_loss: 0.087158203125|cri_loss: 0.05889892578125|unsuper_loss: 0.0 average reward score: 3.572265625 ------------------------------------------------------------------------------------- |E2E latency=6.15s |Gather latency=0.00s (0.00%) |Generate time=4.31s (70.10%) |Training time=1.37s (22.31%) |Others=0.47 (7.59%)|CurSamplesPerSec=1.30 |AvgSamplesPerSec=2.38 epoch: 0|step: 2139|ppo_ep: 1|act_loss: 0.03460693359375|cri_loss: 0.0224609375|unsuper_loss: 0.0 average reward score: 3.5078125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.40%) |Training time=0.64s (19.67%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2140|ppo_ep: 1|act_loss: -0.04498291015625|cri_loss: -0.0096893310546875|unsuper_loss: 0.0 average reward score: 3.60546875 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.50%) |Training time=0.64s (19.40%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.38 epoch: 0|step: 2141|ppo_ep: 1|act_loss: -0.0877685546875|cri_loss: -0.033447265625|unsuper_loss: 0.0 average reward score: 3.62890625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.36%) |Training time=0.64s (19.71%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2142|ppo_ep: 1|act_loss: -0.0225982666015625|cri_loss: -0.00726318359375|unsuper_loss: 0.0 average reward score: 4.41796875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.24%) |Training time=0.64s (19.79%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2143|ppo_ep: 1|act_loss: -0.028045654296875|cri_loss: -0.00669097900390625|unsuper_loss: 0.0 average reward score: 3.54296875 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.40s (66.78%) |Training time=0.92s (25.59%) |Others=0.27 (7.63%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.38 epoch: 0|step: 2144|ppo_ep: 1|act_loss: -0.0146026611328125|cri_loss: 0.00128173828125|unsuper_loss: 0.0 average reward score: 3.763671875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.43%) |Training time=0.64s (19.61%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2145|ppo_ep: 1|act_loss: 0.1102294921875|cri_loss: 0.065185546875|unsuper_loss: 0.0 average reward score: 2.673828125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.23%) |Training time=0.64s (19.76%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2146|ppo_ep: 1|act_loss: 0.13525390625|cri_loss: 0.076416015625|unsuper_loss: 0.0 average reward score: 3.40234375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.40%) |Training time=0.64s (19.71%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2147|ppo_ep: 1|act_loss: 0.046142578125|cri_loss: 0.03033447265625|unsuper_loss: 0.0 average reward score: 3.44140625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.06%) |Training time=0.64s (20.02%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.38 epoch: 0|step: 2148|ppo_ep: 1|act_loss: 0.0289764404296875|cri_loss: 0.019439697265625|unsuper_loss: 0.0 average reward score: 4.5 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (74.13%) |Training time=0.64s (19.94%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.38 epoch: 0|step: 2149|ppo_ep: 1|act_loss: 0.0941162109375|cri_loss: 0.0560302734375|unsuper_loss: 0.0 average reward score: 4.234375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.62%) |Training time=0.64s (19.50%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2150|ppo_ep: 1|act_loss: 0.0753173828125|cri_loss: 0.0478515625|unsuper_loss: 0.0 average reward score: 4.15234375 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.63%) |Training time=0.64s (20.23%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.38 epoch: 0|step: 2151|ppo_ep: 1|act_loss: -0.0548095703125|cri_loss: -0.0196533203125|unsuper_loss: 0.0 average reward score: 3.455078125 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.40s (66.63%) |Training time=0.93s (25.75%) |Others=0.27 (7.62%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.38 epoch: 0|step: 2152|ppo_ep: 1|act_loss: 0.06512451171875|cri_loss: 0.038818359375|unsuper_loss: 0.0 average reward score: 3.013671875 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.45%) |Training time=0.63s (19.68%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.38 epoch: 0|step: 2153|ppo_ep: 1|act_loss: 0.026214599609375|cri_loss: 0.021728515625|unsuper_loss: 0.0 average reward score: 3.890625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.07%) |Training time=0.65s (20.01%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.38 epoch: 0|step: 2154|ppo_ep: 1|act_loss: 0.226806640625|cri_loss: 0.12176513671875|unsuper_loss: 0.0 average reward score: 3.97265625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.52%) |Training time=0.67s (20.50%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2155|ppo_ep: 1|act_loss: -0.015167236328125|cri_loss: 0.0023193359375|unsuper_loss: 0.0 average reward score: 3.048828125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.95%) |Training time=0.65s (20.05%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2156|ppo_ep: 1|act_loss: 0.1895751953125|cri_loss: 0.130126953125|unsuper_loss: 0.0 average reward score: 3.51953125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.08%) |Training time=0.65s (19.90%) |Others=0.20 (6.02%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2157|ppo_ep: 1|act_loss: 0.02703857421875|cri_loss: 0.016998291015625|unsuper_loss: 0.0 average reward score: 2.869140625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.02%) |Training time=0.65s (20.07%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2158|ppo_ep: 1|act_loss: -0.00102996826171875|cri_loss: 0.003875732421875|unsuper_loss: 0.0 average reward score: 4.08984375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.87%) |Training time=0.65s (20.10%) |Others=0.19 (6.02%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 [2023-04-24 15:48:44,578] [INFO] [logging.py:96:log_dist] [Rank 0] step=270, skipped=5, lr=[5.746532800883961e-06, 5.746532800883961e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 15:48:44,823] [INFO] [timer.py:199:stop] epoch=0/micro_step=2160/global_step=270, RunningAvgSamplesPerSec=15.394355071006759, CurrSamplesPerSec=15.544141964357891, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 15:48:45,026] [INFO] [logging.py:96:log_dist] [Rank 0] step=270, skipped=4, lr=[2.9569579745392263e-06, 2.9569579745392263e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 2159|ppo_ep: 1|act_loss: 0.0135345458984375|cri_loss: 0.01163482666015625|unsuper_loss: 0.0 average reward score: 4.21875 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.39s (66.30%) |Training time=0.94s (26.03%) |Others=0.28 (7.67%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.38 epoch: 0|step: 2160|ppo_ep: 1|act_loss: 0.1116943359375|cri_loss: 0.0692138671875|unsuper_loss: 0.0 average reward score: 2.857421875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.08%) |Training time=0.68s (21.11%) |Others=0.19 (5.81%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2161|ppo_ep: 1|act_loss: 0.033050537109375|cri_loss: 0.031524658203125|unsuper_loss: 0.0 average reward score: 1.7275390625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.42%) |Training time=0.66s (20.55%) |Others=0.19 (6.03%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.38 epoch: 0|step: 2162|ppo_ep: 1|act_loss: 0.0035552978515625|cri_loss: 0.011138916015625|unsuper_loss: 0.0 average reward score: 3.625 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.80%) |Training time=0.64s (20.12%) |Others=0.19 (6.09%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.38 epoch: 0|step: 2163|ppo_ep: 1|act_loss: 0.043121337890625|cri_loss: 0.0280914306640625|unsuper_loss: 0.0 average reward score: 3.080078125 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.53s (75.12%) |Training time=0.64s (19.07%) |Others=0.20 (5.81%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.38 epoch: 0|step: 2164|ppo_ep: 1|act_loss: 0.161865234375|cri_loss: 0.09930419921875|unsuper_loss: 0.0 average reward score: 1.970703125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.17%) |Training time=0.64s (19.78%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2165|ppo_ep: 1|act_loss: -0.046478271484375|cri_loss: -0.0169525146484375|unsuper_loss: 0.0 average reward score: 3.37109375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.54%) |Training time=0.64s (19.54%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2166|ppo_ep: 1|act_loss: 0.034210205078125|cri_loss: 0.0241546630859375|unsuper_loss: 0.0 average reward score: 3.65625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.24%) |Training time=0.64s (19.77%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2167|ppo_ep: 1|act_loss: 0.00603485107421875|cri_loss: 0.006305694580078125|unsuper_loss: 0.0 average reward score: 4.1953125 ------------------------------------------------------------------------------------- |E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.39%) |Training time=0.95s (26.07%) |Others=0.27 (7.54%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.38 epoch: 0|step: 2168|ppo_ep: 1|act_loss: 0.35888671875|cri_loss: 0.2227783203125|unsuper_loss: 0.0 average reward score: 3.046875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.49%) |Training time=0.64s (19.69%) |Others=0.19 (5.81%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.38 epoch: 0|step: 2169|ppo_ep: 1|act_loss: -0.06658935546875|cri_loss: -0.0251922607421875|unsuper_loss: 0.0 average reward score: 4.12109375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.85%) |Training time=0.66s (20.20%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2170|ppo_ep: 1|act_loss: -0.067626953125|cri_loss: -0.025970458984375|unsuper_loss: 0.0 average reward score: 4.8046875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.31s (71.25%) |Training time=0.74s (22.75%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2171|ppo_ep: 1|act_loss: -0.0303802490234375|cri_loss: -0.0030364990234375|unsuper_loss: 0.0 average reward score: 3.0625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.38%) |Training time=0.67s (20.72%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2172|ppo_ep: 1|act_loss: -0.1287841796875|cri_loss: -0.04986572265625|unsuper_loss: 0.0 average reward score: 4.14453125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.36s (72.55%) |Training time=0.70s (21.38%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2173|ppo_ep: 1|act_loss: -0.045074462890625|cri_loss: -0.01480865478515625|unsuper_loss: 0.0 average reward score: 3.59765625 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.77%) |Training time=0.64s (19.45%) |Others=0.19 (5.78%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.38 epoch: 0|step: 2174|ppo_ep: 1|act_loss: -0.06781005859375|cri_loss: -0.0115966796875|unsuper_loss: 0.0 average reward score: 4.97265625 ------------------------------------------------------------------------------------- |E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.55s (75.32%) |Training time=0.64s (18.91%) |Others=0.20 (5.77%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.38 epoch: 0|step: 2175|ppo_ep: 1|act_loss: -0.0751953125|cri_loss: -0.020751953125|unsuper_loss: 0.0 average reward score: 3.08203125 ------------------------------------------------------------------------------------- |E2E latency=3.69s |Gather latency=0.00s (0.00%) |Generate time=2.49s (67.46%) |Training time=0.92s (25.02%) |Others=0.28 (7.52%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.38 epoch: 0|step: 2176|ppo_ep: 1|act_loss: -0.0850830078125|cri_loss: -0.029693603515625|unsuper_loss: 0.0 average reward score: 2.380859375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.77%) |Training time=0.65s (20.17%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2177|ppo_ep: 1|act_loss: 0.0621337890625|cri_loss: 0.05499267578125|unsuper_loss: 0.0 average reward score: 3.470703125 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.46s (73.83%) |Training time=0.64s (19.26%) |Others=0.23 (6.91%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.38 epoch: 0|step: 2178|ppo_ep: 1|act_loss: -0.0728759765625|cri_loss: -0.032562255859375|unsuper_loss: 0.0 average reward score: 3.783203125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.41%) |Training time=0.64s (19.76%) |Others=0.19 (5.84%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2179|ppo_ep: 1|act_loss: 0.039154052734375|cri_loss: 0.025787353515625|unsuper_loss: 0.0 average reward score: 2.77734375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.10%) |Training time=0.64s (19.90%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.38 epoch: 0|step: 2180|ppo_ep: 1|act_loss: -0.096923828125|cri_loss: -0.039886474609375|unsuper_loss: 0.0 average reward score: 4.21875 ------------------------------------------------------------------------------------- |E2E latency=3.15s |Gather latency=0.00s (0.00%) |Generate time=2.31s (73.32%) |Training time=0.64s (20.28%) |Others=0.20 (6.40%)|CurSamplesPerSec=2.54 |AvgSamplesPerSec=2.38 epoch: 0|step: 2181|ppo_ep: 1|act_loss: -0.30029296875|cri_loss: -0.0892333984375|unsuper_loss: 0.0 average reward score: 3.03515625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.24%) |Training time=0.64s (19.80%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.38 epoch: 0|step: 2182|ppo_ep: 1|act_loss: -0.1541748046875|cri_loss: -0.059173583984375|unsuper_loss: 0.0 average reward score: 3.765625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.20%) |Training time=0.64s (19.82%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2183|ppo_ep: 1|act_loss: 0.017364501953125|cri_loss: 0.0165863037109375|unsuper_loss: 0.0 average reward score: 1.4873046875 ------------------------------------------------------------------------------------- |E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.36s (66.44%) |Training time=0.92s (25.91%) |Others=0.27 (7.66%)|CurSamplesPerSec=2.25 |AvgSamplesPerSec=2.38 epoch: 0|step: 2184|ppo_ep: 1|act_loss: -0.0216522216796875|cri_loss: 0.0019989013671875|unsuper_loss: 0.0 average reward score: 2.15234375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.01%) |Training time=0.64s (19.67%) |Others=0.20 (6.32%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2185|ppo_ep: 1|act_loss: -0.29052734375|cri_loss: -0.065673828125|unsuper_loss: 0.0 average reward score: 2.62890625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.29%) |Training time=0.64s (19.73%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2186|ppo_ep: 1|act_loss: -0.04052734375|cri_loss: -0.00262451171875|unsuper_loss: 0.0 average reward score: 3.11328125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.27%) |Training time=0.64s (19.81%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2187|ppo_ep: 1|act_loss: 0.0277862548828125|cri_loss: 0.03790283203125|unsuper_loss: 0.0 average reward score: 2.5 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.91%) |Training time=0.64s (19.35%) |Others=0.19 (5.74%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.38 epoch: 0|step: 2188|ppo_ep: 1|act_loss: -0.08392333984375|cri_loss: -0.03582763671875|unsuper_loss: 0.0 average reward score: 3.900390625 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.80%) |Training time=0.64s (19.24%) |Others=0.20 (5.96%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.38 epoch: 0|step: 2189|ppo_ep: 1|act_loss: -0.051910400390625|cri_loss: -0.012054443359375|unsuper_loss: 0.0 average reward score: 3.26171875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.33%) |Training time=0.65s (19.81%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2190|ppo_ep: 1|act_loss: -0.10675048828125|cri_loss: 0.00048828125|unsuper_loss: 0.0 average reward score: 3.3359375 ------------------------------------------------------------------------------------- |E2E latency=3.16s |Gather latency=0.00s (0.00%) |Generate time=2.32s (73.40%) |Training time=0.64s (20.28%) |Others=0.20 (6.33%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.38 epoch: 0|step: 2191|ppo_ep: 1|act_loss: -0.2340087890625|cri_loss: -0.0986328125|unsuper_loss: 0.0 average reward score: 3.248046875 ------------------------------------------------------------------------------------- |E2E latency=3.66s |Gather latency=0.00s (0.00%) |Generate time=2.43s (66.41%) |Training time=0.93s (25.54%) |Others=0.29 (8.05%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.38 epoch: 0|step: 2192|ppo_ep: 1|act_loss: 0.031982421875|cri_loss: 0.033660888671875|unsuper_loss: 0.0 average reward score: 4.1796875 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.07%) |Training time=0.65s (19.96%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2193|ppo_ep: 1|act_loss: -0.199951171875|cri_loss: -0.0728759765625|unsuper_loss: 0.0 average reward score: 2.8359375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.51%) |Training time=0.64s (19.63%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2194|ppo_ep: 1|act_loss: 0.05572509765625|cri_loss: 0.0538330078125|unsuper_loss: 0.0 average reward score: 2.408203125 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.49%) |Training time=0.65s (20.36%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.38 epoch: 0|step: 2195|ppo_ep: 1|act_loss: -0.155517578125|cri_loss: -0.06036376953125|unsuper_loss: 0.0 average reward score: 3.708984375 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.53s (75.04%) |Training time=0.64s (19.08%) |Others=0.20 (5.87%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.38 epoch: 0|step: 2196|ppo_ep: 1|act_loss: -0.053070068359375|cri_loss: -0.020782470703125|unsuper_loss: 0.0 average reward score: 4.01171875 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.12%) |Training time=0.65s (19.44%) |Others=0.22 (6.45%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.38 epoch: 0|step: 2197|ppo_ep: 1|act_loss: -0.13720703125|cri_loss: -0.0537109375|unsuper_loss: 0.0 average reward score: 3.396484375 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.22%) |Training time=0.64s (19.56%) |Others=0.20 (6.22%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.38 epoch: 0|step: 2198|ppo_ep: 1|act_loss: 0.253173828125|cri_loss: 0.1484375|unsuper_loss: 0.0 average reward score: 3.1484375 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.03%) |Training time=0.66s (19.93%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.38 epoch: 0|step: 2199|ppo_ep: 1|act_loss: 0.013916015625|cri_loss: 0.032073974609375|unsuper_loss: 0.0 average reward score: 2.271484375 ------------------------------------------------------------------------------------- |E2E latency=3.67s |Gather latency=0.00s (0.00%) |Generate time=2.46s (67.05%) |Training time=0.93s (25.29%) |Others=0.28 (7.67%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.38 epoch: 0|step: 2200|ppo_ep: 1|act_loss: 0.1241455078125|cri_loss: 0.07275390625|unsuper_loss: 0.0 average reward score: 1.17578125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.66%) |Training time=0.64s (19.89%) |Others=0.21 (6.45%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.38 epoch: 0|step: 2201|ppo_ep: 1|act_loss: -0.05181884765625|cri_loss: -0.0161590576171875|unsuper_loss: 0.0 average reward score: 2.25390625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.71%) |Training time=0.67s (20.42%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2202|ppo_ep: 1|act_loss: 0.0721435546875|cri_loss: 0.04681396484375|unsuper_loss: 0.0 average reward score: 3.1640625 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.31%) |Training time=0.64s (19.45%) |Others=0.21 (6.23%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.38 epoch: 0|step: 2203|ppo_ep: 1|act_loss: -0.020263671875|cri_loss: 0.004547119140625|unsuper_loss: 0.0 average reward score: 4.0 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.72%) |Training time=0.64s (20.21%) |Others=0.19 (6.08%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.38 epoch: 0|step: 2204|ppo_ep: 1|act_loss: 0.0119781494140625|cri_loss: 0.01110076904296875|unsuper_loss: 0.0 average reward score: 2.900390625 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.29%) |Training time=0.65s (19.72%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.38 epoch: 0|step: 2205|ppo_ep: 1|act_loss: -0.0172119140625|cri_loss: 0.0146484375|unsuper_loss: 0.0 average reward score: 2.21484375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.97%) |Training time=0.64s (19.50%) |Others=0.21 (6.52%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2206|ppo_ep: 1|act_loss: 0.041595458984375|cri_loss: 0.02606201171875|unsuper_loss: 0.0 average reward score: 3.306640625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.20%) |Training time=0.65s (19.74%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2207|ppo_ep: 1|act_loss: 0.1650390625|cri_loss: 0.09600830078125|unsuper_loss: 0.0 average reward score: 3.00390625 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.43s (66.67%) |Training time=0.94s (25.69%) |Others=0.28 (7.64%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.38 epoch: 0|step: 2208|ppo_ep: 1|act_loss: 0.241455078125|cri_loss: 0.1514892578125|unsuper_loss: 0.0 average reward score: 3.55078125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.18%) |Training time=0.64s (19.66%) |Others=0.20 (6.16%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2209|ppo_ep: 1|act_loss: 0.06817626953125|cri_loss: 0.054046630859375|unsuper_loss: 0.0 average reward score: 2.193359375 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.43s (73.82%) |Training time=0.66s (20.13%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.38 epoch: 0|step: 2210|ppo_ep: 1|act_loss: -0.0318603515625|cri_loss: -0.00042724609375|unsuper_loss: 0.0 average reward score: 2.74609375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.22%) |Training time=0.65s (19.73%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2211|ppo_ep: 1|act_loss: -0.0037689208984375|cri_loss: 0.00759124755859375|unsuper_loss: 0.0 average reward score: 3.921875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.35%) |Training time=0.65s (19.72%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2212|ppo_ep: 1|act_loss: -0.0701904296875|cri_loss: -0.010772705078125|unsuper_loss: 0.0 average reward score: 3.833984375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.81%) |Training time=0.66s (20.06%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2213|ppo_ep: 1|act_loss: 0.136962890625|cri_loss: 0.0743408203125|unsuper_loss: 0.0 average reward score: 4.2421875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.44s (73.92%) |Training time=0.65s (19.84%) |Others=0.21 (6.24%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.38 epoch: 0|step: 2214|ppo_ep: 1|act_loss: -0.255859375|cri_loss: -0.085693359375|unsuper_loss: 0.0 average reward score: 2.97265625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.82%) |Training time=0.66s (20.17%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2215|ppo_ep: 1|act_loss: 0.083740234375|cri_loss: 0.07855224609375|unsuper_loss: 0.0 average reward score: 2.923828125 ------------------------------------------------------------------------------------- |E2E latency=3.68s |Gather latency=0.00s (0.00%) |Generate time=2.45s (66.66%) |Training time=0.93s (25.30%) |Others=0.30 (8.04%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.38 epoch: 0|step: 2216|ppo_ep: 1|act_loss: 0.039306640625|cri_loss: 0.04541015625|unsuper_loss: 0.0 average reward score: 3.65625 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.48%) |Training time=0.68s (20.65%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.38 epoch: 0|step: 2217|ppo_ep: 1|act_loss: 0.0345458984375|cri_loss: 0.02435302734375|unsuper_loss: 0.0 average reward score: 2.984375 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.61%) |Training time=0.64s (19.43%) |Others=0.20 (5.96%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.38 epoch: 0|step: 2218|ppo_ep: 1|act_loss: -0.020782470703125|cri_loss: -0.00260162353515625|unsuper_loss: 0.0 average reward score: 3.376953125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.30%) |Training time=0.64s (19.73%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2219|ppo_ep: 1|act_loss: 0.054412841796875|cri_loss: 0.04827880859375|unsuper_loss: 0.0 average reward score: 2.6328125 ------------------------------------------------------------------------------------- |E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.53s (74.41%) |Training time=0.65s (19.18%) |Others=0.22 (6.41%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.38 epoch: 0|step: 2220|ppo_ep: 1|act_loss: 0.1302490234375|cri_loss: 0.0867919921875|unsuper_loss: 0.0 average reward score: 1.923828125 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.44s (73.00%) |Training time=0.69s (20.76%) |Others=0.21 (6.24%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.38 epoch: 0|step: 2221|ppo_ep: 1|act_loss: 0.05938720703125|cri_loss: 0.04376220703125|unsuper_loss: 0.0 average reward score: 4.25 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.62%) |Training time=0.65s (19.49%) |Others=0.20 (5.88%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.38 epoch: 0|step: 2222|ppo_ep: 1|act_loss: 0.1302490234375|cri_loss: 0.08538818359375|unsuper_loss: 0.0 average reward score: 3.5546875 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.05%) |Training time=0.66s (19.65%) |Others=0.21 (6.30%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.38 epoch: 0|step: 2223|ppo_ep: 1|act_loss: 0.214599609375|cri_loss: 0.140869140625|unsuper_loss: 0.0 average reward score: 2.5078125 ------------------------------------------------------------------------------------- |E2E latency=3.74s |Gather latency=0.00s (0.00%) |Generate time=2.49s (66.53%) |Training time=0.96s (25.81%) |Others=0.29 (7.66%)|CurSamplesPerSec=2.14 |AvgSamplesPerSec=2.38 epoch: 0|step: 2224|ppo_ep: 1|act_loss: -0.1556396484375|cri_loss: -0.047760009765625|unsuper_loss: 0.0 average reward score: 1.677734375 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.64%) |Training time=0.64s (19.47%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.38 epoch: 0|step: 2225|ppo_ep: 1|act_loss: 0.1669921875|cri_loss: 0.1036376953125|unsuper_loss: 0.0 average reward score: 4.19921875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.60%) |Training time=0.66s (20.10%) |Others=0.21 (6.30%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2226|ppo_ep: 1|act_loss: 0.147216796875|cri_loss: 0.091064453125|unsuper_loss: 0.0 average reward score: 1.921875 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.03%) |Training time=0.65s (19.98%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2227|ppo_ep: 1|act_loss: 0.173095703125|cri_loss: 0.102294921875|unsuper_loss: 0.0 average reward score: 3.0390625 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.84%) |Training time=0.65s (19.32%) |Others=0.20 (5.85%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.38 epoch: 0|step: 2228|ppo_ep: 1|act_loss: 0.35205078125|cri_loss: 0.2073974609375|unsuper_loss: 0.0 average reward score: 3.923828125 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.62%) |Training time=0.64s (19.40%) |Others=0.20 (5.98%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.38 epoch: 0|step: 2229|ppo_ep: 1|act_loss: 0.21533203125|cri_loss: 0.12890625|unsuper_loss: 0.0 average reward score: 0.6201171875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.95%) |Training time=0.64s (19.93%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.38 epoch: 0|step: 2230|ppo_ep: 1|act_loss: 0.1005859375|cri_loss: 0.07220458984375|unsuper_loss: 0.0 average reward score: 2.203125 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.92%) |Training time=0.64s (20.06%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.38 epoch: 0|step: 2231|ppo_ep: 1|act_loss: 0.048583984375|cri_loss: 0.053131103515625|unsuper_loss: 0.0 average reward score: 1.7919921875 ------------------------------------------------------------------------------------- |E2E latency=3.66s |Gather latency=0.00s (0.00%) |Generate time=2.44s (66.64%) |Training time=0.93s (25.53%) |Others=0.29 (7.84%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.38 epoch: 0|step: 2232|ppo_ep: 1|act_loss: 0.16162109375|cri_loss: 0.09710693359375|unsuper_loss: 0.0 average reward score: 3.380859375 ------------------------------------------------------------------------------------- |E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.56s (75.25%) |Training time=0.64s (18.90%) |Others=0.20 (5.85%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.38 epoch: 0|step: 2233|ppo_ep: 1|act_loss: 0.08013916015625|cri_loss: 0.0562744140625|unsuper_loss: 0.0 average reward score: 4.0390625 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.38%) |Training time=0.65s (19.57%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.38 epoch: 0|step: 2234|ppo_ep: 1|act_loss: 0.04949951171875|cri_loss: 0.044677734375|unsuper_loss: 0.0 average reward score: 2.63671875 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.79%) |Training time=0.64s (20.14%) |Others=0.19 (6.07%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.38 epoch: 0|step: 2235|ppo_ep: 1|act_loss: 0.178466796875|cri_loss: 0.1380615234375|unsuper_loss: 0.0 average reward score: 4.1796875 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.47s (73.89%) |Training time=0.67s (20.02%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.38 epoch: 0|step: 2236|ppo_ep: 1|act_loss: 0.135009765625|cri_loss: 0.08184814453125|unsuper_loss: 0.0 average reward score: 2.67578125 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.31%) |Training time=0.65s (19.93%) |Others=0.22 (6.76%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2237|ppo_ep: 1|act_loss: 0.12005615234375|cri_loss: 0.08294677734375|unsuper_loss: 0.0 average reward score: 3.166015625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.45%) |Training time=0.66s (20.07%) |Others=0.21 (6.48%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2238|ppo_ep: 1|act_loss: 0.0019989013671875|cri_loss: 0.01169586181640625|unsuper_loss: 0.0 average reward score: 2.111328125 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.63%) |Training time=0.64s (19.31%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.38 [2023-04-24 15:53:10,556] [INFO] [logging.py:96:log_dist] [Rank 0] step=280, skipped=5, lr=[5.348056242098441e-06, 5.348056242098441e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 15:53:10,805] [INFO] [timer.py:199:stop] epoch=0/micro_step=2240/global_step=280, RunningAvgSamplesPerSec=15.403539921341178, CurrSamplesPerSec=15.664748630332518, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 15:53:11,014] [INFO] [logging.py:96:log_dist] [Rank 0] step=280, skipped=4, lr=[2.7502392290602463e-06, 2.7502392290602463e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 2239|ppo_ep: 1|act_loss: 0.05169677734375|cri_loss: 0.044647216796875|unsuper_loss: 0.0 average reward score: 2.53125 ------------------------------------------------------------------------------------- |E2E latency=3.69s |Gather latency=0.00s (0.00%) |Generate time=2.44s (66.22%) |Training time=0.96s (26.04%) |Others=0.29 (7.74%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.38 epoch: 0|step: 2240|ppo_ep: 1|act_loss: 0.08203125|cri_loss: 0.05230712890625|unsuper_loss: 0.0 average reward score: 2.87890625 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.30%) |Training time=0.66s (19.70%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.38 epoch: 0|step: 2241|ppo_ep: 1|act_loss: -0.028778076171875|cri_loss: 0.0111083984375|unsuper_loss: 0.0 average reward score: 3.791015625 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.43s (73.79%) |Training time=0.64s (19.45%) |Others=0.22 (6.76%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.38 epoch: 0|step: 2242|ppo_ep: 1|act_loss: 0.138427734375|cri_loss: 0.09521484375|unsuper_loss: 0.0 average reward score: 2.15234375 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.41s (72.46%) |Training time=0.72s (21.73%) |Others=0.19 (5.80%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.38 epoch: 0|step: 2243|ppo_ep: 1|act_loss: 0.0289306640625|cri_loss: 0.03692626953125|unsuper_loss: 0.0 average reward score: 2.572265625 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.31%) |Training time=0.65s (19.69%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.38 epoch: 0|step: 2244|ppo_ep: 1|act_loss: 0.06414794921875|cri_loss: 0.0484619140625|unsuper_loss: 0.0 average reward score: 2.046875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.13%) |Training time=0.64s (19.82%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2245|ppo_ep: 1|act_loss: 0.1494140625|cri_loss: 0.08935546875|unsuper_loss: 0.0 average reward score: 2.7265625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.45%) |Training time=0.64s (19.65%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2246|ppo_ep: 1|act_loss: 0.19775390625|cri_loss: 0.1282958984375|unsuper_loss: 0.0 average reward score: 1.4921875 ------------------------------------------------------------------------------------- |E2E latency=4.29s |Gather latency=0.00s (0.00%) |Generate time=3.41s (79.34%) |Training time=0.68s (15.77%) |Others=0.21 (4.88%)|CurSamplesPerSec=1.86 |AvgSamplesPerSec=2.38 epoch: 0|step: 2247|ppo_ep: 1|act_loss: 0.167236328125|cri_loss: 0.10498046875|unsuper_loss: 0.0 average reward score: 2.109375 ------------------------------------------------------------------------------------- |E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.36s (64.89%) |Training time=0.97s (26.62%) |Others=0.31 (8.48%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.38 epoch: 0|step: 2248|ppo_ep: 1|act_loss: 0.0350341796875|cri_loss: 0.04071044921875|unsuper_loss: 0.0 average reward score: 1.669921875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.58%) |Training time=0.64s (19.41%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2249|ppo_ep: 1|act_loss: 0.0213470458984375|cri_loss: 0.0207061767578125|unsuper_loss: 0.0 average reward score: 2.625 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.52s (75.03%) |Training time=0.64s (19.19%) |Others=0.19 (5.78%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.38 epoch: 0|step: 2250|ppo_ep: 1|act_loss: 0.1318359375|cri_loss: 0.08013916015625|unsuper_loss: 0.0 average reward score: 3.453125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.15%) |Training time=0.64s (19.71%) |Others=0.20 (6.14%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2251|ppo_ep: 1|act_loss: -0.038299560546875|cri_loss: -0.003265380859375|unsuper_loss: 0.0 average reward score: 1.970703125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.52%) |Training time=0.64s (19.64%) |Others=0.19 (5.84%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2252|ppo_ep: 1|act_loss: 0.28759765625|cri_loss: 0.17724609375|unsuper_loss: 0.0 average reward score: 2.708984375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.22%) |Training time=0.64s (19.71%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2253|ppo_ep: 1|act_loss: 0.300048828125|cri_loss: 0.183837890625|unsuper_loss: 0.0 average reward score: 2.458984375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.39%) |Training time=0.64s (19.74%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2254|ppo_ep: 1|act_loss: 0.24609375|cri_loss: 0.146240234375|unsuper_loss: 0.0 average reward score: 3.8203125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.24%) |Training time=0.64s (19.78%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2255|ppo_ep: 1|act_loss: 0.04193115234375|cri_loss: 0.039703369140625|unsuper_loss: 0.0 average reward score: 2.14453125 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.77%) |Training time=0.92s (25.53%) |Others=0.28 (7.70%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.38 epoch: 0|step: 2256|ppo_ep: 1|act_loss: 0.1767578125|cri_loss: 0.1063232421875|unsuper_loss: 0.0 average reward score: 3.494140625 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.76%) |Training time=0.65s (20.24%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.38 epoch: 0|step: 2257|ppo_ep: 1|act_loss: 0.42236328125|cri_loss: 0.26611328125|unsuper_loss: 0.0 average reward score: 1.5673828125 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.09%) |Training time=0.64s (19.99%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.38 epoch: 0|step: 2258|ppo_ep: 1|act_loss: 0.241943359375|cri_loss: 0.149658203125|unsuper_loss: 0.0 average reward score: 3.052734375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.17%) |Training time=0.65s (19.82%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2259|ppo_ep: 1|act_loss: 0.4892578125|cri_loss: 0.3095703125|unsuper_loss: 0.0 average reward score: 1.794921875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.55%) |Training time=0.64s (19.62%) |Others=0.19 (5.83%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2260|ppo_ep: 1|act_loss: 0.12890625|cri_loss: 0.08758544921875|unsuper_loss: 0.0 average reward score: 2.240234375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.11%) |Training time=0.64s (19.70%) |Others=0.20 (6.18%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2261|ppo_ep: 1|act_loss: 0.025390625|cri_loss: 0.0418701171875|unsuper_loss: 0.0 average reward score: 3.2890625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.25%) |Training time=0.65s (19.86%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2262|ppo_ep: 1|act_loss: 0.26123046875|cri_loss: 0.1552734375|unsuper_loss: 0.0 average reward score: 1.5361328125 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.37%) |Training time=0.65s (19.63%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.38 epoch: 0|step: 2263|ppo_ep: 1|act_loss: 0.13330078125|cri_loss: 0.0889892578125|unsuper_loss: 0.0 average reward score: 2.6796875 ------------------------------------------------------------------------------------- |E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.31s (65.64%) |Training time=0.93s (26.50%) |Others=0.28 (7.85%)|CurSamplesPerSec=2.27 |AvgSamplesPerSec=2.38 epoch: 0|step: 2264|ppo_ep: 1|act_loss: 0.18017578125|cri_loss: 0.1112060546875|unsuper_loss: 0.0 average reward score: 1.548828125 ------------------------------------------------------------------------------------- |E2E latency=4.73s |Gather latency=0.00s (0.00%) |Generate time=3.82s (80.71%) |Training time=0.70s (14.85%) |Others=0.21 (4.44%)|CurSamplesPerSec=1.69 |AvgSamplesPerSec=2.38 epoch: 0|step: 2265|ppo_ep: 1|act_loss: 0.1689453125|cri_loss: 0.10809326171875|unsuper_loss: 0.0 average reward score: 2.96875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.35%) |Training time=0.64s (19.78%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.38 epoch: 0|step: 2266|ppo_ep: 1|act_loss: 0.1256103515625|cri_loss: 0.08599853515625|unsuper_loss: 0.0 average reward score: 3.12890625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.24%) |Training time=0.64s (19.71%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2267|ppo_ep: 1|act_loss: 0.336669921875|cri_loss: 0.2158203125|unsuper_loss: 0.0 average reward score: 0.931640625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.38%) |Training time=0.64s (19.68%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2268|ppo_ep: 1|act_loss: 0.54833984375|cri_loss: 0.331787109375|unsuper_loss: 0.0 average reward score: 2.80078125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.58%) |Training time=0.66s (20.33%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.38 epoch: 0|step: 2269|ppo_ep: 1|act_loss: -0.01019287109375|cri_loss: 0.0089263916015625|unsuper_loss: 0.0 average reward score: 2.125 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.87%) |Training time=0.65s (20.18%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.38 epoch: 0|step: 2270|ppo_ep: 1|act_loss: 0.46923828125|cri_loss: 0.2890625|unsuper_loss: 0.0 average reward score: 1.4609375 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (74.15%) |Training time=0.64s (19.96%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.38 epoch: 0|step: 2271|ppo_ep: 1|act_loss: -0.0372314453125|cri_loss: -0.008880615234375|unsuper_loss: 0.0 average reward score: 2.1484375 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.38s (66.29%) |Training time=0.93s (25.96%) |Others=0.28 (7.74%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.38 epoch: 0|step: 2272|ppo_ep: 1|act_loss: -0.06011962890625|cri_loss: -0.0081787109375|unsuper_loss: 0.0 average reward score: 3.62890625 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.25%) |Training time=0.64s (19.89%) |Others=0.19 (5.85%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.38 epoch: 0|step: 2273|ppo_ep: 1|act_loss: 0.03131103515625|cri_loss: 0.047210693359375|unsuper_loss: 0.0 average reward score: 0.85107421875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.38s (72.76%) |Training time=0.70s (21.43%) |Others=0.19 (5.80%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2274|ppo_ep: 1|act_loss: 0.0755615234375|cri_loss: 0.062164306640625|unsuper_loss: 0.0 average reward score: 1.0302734375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.96%) |Training time=0.65s (20.12%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.38 epoch: 0|step: 2275|ppo_ep: 1|act_loss: -0.1573486328125|cri_loss: -0.0550537109375|unsuper_loss: 0.0 average reward score: 3.701171875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.73%) |Training time=0.66s (20.45%) |Others=0.19 (5.82%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2276|ppo_ep: 1|act_loss: 0.0361328125|cri_loss: 0.036468505859375|unsuper_loss: 0.0 average reward score: 1.5546875 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.94%) |Training time=0.64s (20.00%) |Others=0.19 (6.06%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.38 epoch: 0|step: 2277|ppo_ep: 1|act_loss: -0.08074951171875|cri_loss: -0.0289306640625|unsuper_loss: 0.0 average reward score: 4.4921875 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.00%) |Training time=0.64s (19.86%) |Others=0.20 (6.14%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.38 epoch: 0|step: 2278|ppo_ep: 1|act_loss: -0.3076171875|cri_loss: -0.130859375|unsuper_loss: 0.0 average reward score: 3.1953125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.23%) |Training time=0.64s (19.71%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2279|ppo_ep: 1|act_loss: 0.023284912109375|cri_loss: 0.0250091552734375|unsuper_loss: 0.0 average reward score: 2.41796875 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.42s (66.86%) |Training time=0.92s (25.54%) |Others=0.27 (7.60%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.38 epoch: 0|step: 2280|ppo_ep: 1|act_loss: -0.25732421875|cri_loss: -0.1097412109375|unsuper_loss: 0.0 average reward score: 2.400390625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.53%) |Training time=0.64s (19.50%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2281|ppo_ep: 1|act_loss: 0.060760498046875|cri_loss: 0.05633544921875|unsuper_loss: 0.0 average reward score: 2.255859375 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.41s (72.21%) |Training time=0.64s (19.17%) |Others=0.29 (8.63%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.38 epoch: 0|step: 2282|ppo_ep: 1|act_loss: 0.04840087890625|cri_loss: 0.056976318359375|unsuper_loss: 0.0 average reward score: 2.3828125 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.23%) |Training time=0.64s (19.55%) |Others=0.20 (6.21%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.38 epoch: 0|step: 2283|ppo_ep: 1|act_loss: -0.150146484375|cri_loss: -0.061279296875|unsuper_loss: 0.0 average reward score: 3.70703125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.19%) |Training time=0.64s (19.84%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2284|ppo_ep: 1|act_loss: -0.17138671875|cri_loss: -0.0606689453125|unsuper_loss: 0.0 average reward score: 1.498046875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.38%) |Training time=0.64s (19.65%) |Others=0.20 (5.98%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2285|ppo_ep: 1|act_loss: -0.099853515625|cri_loss: -0.036041259765625|unsuper_loss: 0.0 average reward score: 1.61328125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.15%) |Training time=0.64s (19.73%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2286|ppo_ep: 1|act_loss: 0.0640869140625|cri_loss: 0.058685302734375|unsuper_loss: 0.0 average reward score: 1.06640625 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.65%) |Training time=0.64s (19.46%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.38 epoch: 0|step: 2287|ppo_ep: 1|act_loss: 0.1434326171875|cri_loss: 0.101806640625|unsuper_loss: 0.0 average reward score: 1.28515625 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.72%) |Training time=0.93s (25.69%) |Others=0.27 (7.59%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.38 epoch: 0|step: 2288|ppo_ep: 1|act_loss: -0.183349609375|cri_loss: -0.07867431640625|unsuper_loss: 0.0 average reward score: 3.22265625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.15%) |Training time=0.64s (19.65%) |Others=0.20 (6.21%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2289|ppo_ep: 1|act_loss: -0.5234375|cri_loss: -0.1610107421875|unsuper_loss: 0.0 average reward score: 1.8671875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.18%) |Training time=0.64s (19.88%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.38 epoch: 0|step: 2290|ppo_ep: 1|act_loss: -0.280517578125|cri_loss: -0.11944580078125|unsuper_loss: 0.0 average reward score: 2.2109375 ------------------------------------------------------------------------------------- |E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.54s (75.07%) |Training time=0.64s (19.03%) |Others=0.20 (5.89%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.38 epoch: 0|step: 2291|ppo_ep: 1|act_loss: -0.12127685546875|cri_loss: -0.048828125|unsuper_loss: 0.0 average reward score: 3.2421875 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.31%) |Training time=0.65s (19.73%) |Others=0.20 (5.96%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2292|ppo_ep: 1|act_loss: -0.19140625|cri_loss: -0.07965087890625|unsuper_loss: 0.0 average reward score: 2.8046875 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.68%) |Training time=0.64s (19.44%) |Others=0.20 (5.89%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.38 epoch: 0|step: 2293|ppo_ep: 1|act_loss: -0.27001953125|cri_loss: -0.112060546875|unsuper_loss: 0.0 average reward score: 2.525390625 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.58%) |Training time=0.65s (19.50%) |Others=0.20 (5.92%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.38 epoch: 0|step: 2294|ppo_ep: 1|act_loss: -0.011810302734375|cri_loss: 0.022857666015625|unsuper_loss: 0.0 average reward score: 2.69921875 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.80%) |Training time=0.65s (20.02%) |Others=0.20 (6.18%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.38 epoch: 0|step: 2295|ppo_ep: 1|act_loss: -0.14453125|cri_loss: -0.049591064453125|unsuper_loss: 0.0 average reward score: 4.265625 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.68%) |Training time=0.93s (25.74%) |Others=0.27 (7.57%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.38 epoch: 0|step: 2296|ppo_ep: 1|act_loss: -0.16064453125|cri_loss: -0.0645751953125|unsuper_loss: 0.0 average reward score: 2.3125 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.43s (73.88%) |Training time=0.64s (19.40%) |Others=0.22 (6.72%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.38 epoch: 0|step: 2297|ppo_ep: 1|act_loss: 0.0345458984375|cri_loss: 0.041748046875|unsuper_loss: 0.0 average reward score: 2.564453125 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.55%) |Training time=0.64s (19.59%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2298|ppo_ep: 1|act_loss: -0.197265625|cri_loss: -0.08544921875|unsuper_loss: 0.0 average reward score: 3.390625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.18%) |Training time=0.65s (19.72%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2299|ppo_ep: 1|act_loss: -0.0059814453125|cri_loss: 0.0114593505859375|unsuper_loss: 0.0 average reward score: 2.6796875 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.73%) |Training time=0.64s (19.34%) |Others=0.20 (5.93%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.38 epoch: 0|step: 2300|ppo_ep: 1|act_loss: -0.18359375|cri_loss: -0.0736083984375|unsuper_loss: 0.0 average reward score: 2.619140625 ------------------------------------------------------------------------------------- |E2E latency=3.89s |Gather latency=0.00s (0.00%) |Generate time=3.04s (78.03%) |Training time=0.65s (16.82%) |Others=0.20 (5.15%)|CurSamplesPerSec=2.05 |AvgSamplesPerSec=2.38 epoch: 0|step: 2301|ppo_ep: 1|act_loss: -0.05078125|cri_loss: -0.00714111328125|unsuper_loss: 0.0 average reward score: 1.6142578125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.25%) |Training time=0.65s (19.80%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2302|ppo_ep: 1|act_loss: -0.218505859375|cri_loss: -0.08837890625|unsuper_loss: 0.0 average reward score: 2.443359375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.03%) |Training time=0.64s (19.92%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.38 epoch: 0|step: 2303|ppo_ep: 1|act_loss: -0.0806884765625|cri_loss: -0.02386474609375|unsuper_loss: 0.0 average reward score: 2.79296875 ------------------------------------------------------------------------------------- |E2E latency=3.72s |Gather latency=0.00s (0.00%) |Generate time=2.53s (67.81%) |Training time=0.92s (24.76%) |Others=0.28 (7.43%)|CurSamplesPerSec=2.15 |AvgSamplesPerSec=2.38 epoch: 0|step: 2304|ppo_ep: 1|act_loss: -0.073486328125|cri_loss: -0.015777587890625|unsuper_loss: 0.0 average reward score: 2.85546875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.44%) |Training time=0.63s (19.64%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.38 epoch: 0|step: 2305|ppo_ep: 1|act_loss: -0.0919189453125|cri_loss: -0.013427734375|unsuper_loss: 0.0 average reward score: 3.87890625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.38%) |Training time=0.67s (20.68%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2306|ppo_ep: 1|act_loss: -0.0733642578125|cri_loss: -0.0277099609375|unsuper_loss: 0.0 average reward score: 3.212890625 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.58%) |Training time=0.65s (19.56%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.38 epoch: 0|step: 2307|ppo_ep: 1|act_loss: -0.03790283203125|cri_loss: 0.003021240234375|unsuper_loss: 0.0 average reward score: 1.8603515625 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.36%) |Training time=0.65s (19.67%) |Others=0.20 (5.98%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.38 epoch: 0|step: 2308|ppo_ep: 1|act_loss: -0.0858154296875|cri_loss: -0.016571044921875|unsuper_loss: 0.0 average reward score: 2.45703125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.34%) |Training time=0.67s (20.55%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2309|ppo_ep: 1|act_loss: 0.197265625|cri_loss: 0.12451171875|unsuper_loss: 0.0 average reward score: 0.62109375 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.80%) |Training time=0.64s (19.43%) |Others=0.19 (5.77%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.38 epoch: 0|step: 2310|ppo_ep: 1|act_loss: -0.09228515625|cri_loss: -0.022491455078125|unsuper_loss: 0.0 average reward score: 1.416015625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.80%) |Training time=0.64s (19.58%) |Others=0.22 (6.62%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2311|ppo_ep: 1|act_loss: 0.04718017578125|cri_loss: 0.06524658203125|unsuper_loss: 0.0 average reward score: 2.5859375 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.43s (66.85%) |Training time=0.93s (25.55%) |Others=0.28 (7.61%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.38 epoch: 0|step: 2312|ppo_ep: 1|act_loss: 0.01068115234375|cri_loss: 0.01824951171875|unsuper_loss: 0.0 average reward score: 2.138671875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.48%) |Training time=0.65s (19.61%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.38 epoch: 0|step: 2313|ppo_ep: 1|act_loss: -0.06488037109375|cri_loss: -0.0198516845703125|unsuper_loss: 0.0 average reward score: 3.255859375 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.90%) |Training time=0.64s (20.02%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.38 epoch: 0|step: 2314|ppo_ep: 1|act_loss: -0.0859375|cri_loss: -0.0286865234375|unsuper_loss: 0.0 average reward score: 3.22265625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.40%) |Training time=0.65s (20.21%) |Others=0.21 (6.39%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.38 epoch: 0|step: 2315|ppo_ep: 1|act_loss: 0.166015625|cri_loss: 0.1148681640625|unsuper_loss: 0.0 average reward score: 2.076171875 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.56%) |Training time=0.64s (19.46%) |Others=0.20 (5.98%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.38 epoch: 0|step: 2316|ppo_ep: 1|act_loss: 0.05206298828125|cri_loss: 0.051666259765625|unsuper_loss: 0.0 average reward score: 3.48046875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.67%) |Training time=0.65s (20.00%) |Others=0.21 (6.33%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2317|ppo_ep: 1|act_loss: 0.0633544921875|cri_loss: 0.06341552734375|unsuper_loss: 0.0 average reward score: 3.5859375 ------------------------------------------------------------------------------------- |E2E latency=3.85s |Gather latency=0.00s (0.00%) |Generate time=2.43s (63.19%) |Training time=0.71s (18.40%) |Others=0.71 (18.41%)|CurSamplesPerSec=2.08 |AvgSamplesPerSec=2.38 epoch: 0|step: 2318|ppo_ep: 1|act_loss: 0.1314697265625|cri_loss: 0.07867431640625|unsuper_loss: 0.0 average reward score: 2.09375 ------------------------------------------------------------------------------------- |E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.52s (74.34%) |Training time=0.65s (19.03%) |Others=0.22 (6.63%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.38 [2023-04-24 15:57:39,160] [INFO] [logging.py:96:log_dist] [Rank 0] step=290, skipped=5, lr=[4.945930297532058e-06, 4.945930297532058e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 15:57:39,409] [INFO] [timer.py:199:stop] epoch=0/micro_step=2320/global_step=290, RunningAvgSamplesPerSec=15.413885620714408, CurrSamplesPerSec=15.633471243803347, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 15:57:39,620] [INFO] [logging.py:96:log_dist] [Rank 0] step=290, skipped=4, lr=[2.5417745537732524e-06, 2.5417745537732524e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 2319|ppo_ep: 1|act_loss: 0.09088134765625|cri_loss: 0.06622314453125|unsuper_loss: 0.0 average reward score: 2.70703125 ------------------------------------------------------------------------------------- |E2E latency=3.69s |Gather latency=0.00s (0.00%) |Generate time=2.46s (66.82%) |Training time=0.93s (25.32%) |Others=0.29 (7.86%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.38 epoch: 0|step: 2320|ppo_ep: 1|act_loss: 0.06207275390625|cri_loss: 0.040618896484375|unsuper_loss: 0.0 average reward score: 3.4296875 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.74%) |Training time=0.64s (19.27%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.38 epoch: 0|step: 2321|ppo_ep: 1|act_loss: -0.0706787109375|cri_loss: -0.0270538330078125|unsuper_loss: 0.0 average reward score: 4.62890625 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.47s (73.61%) |Training time=0.64s (19.23%) |Others=0.24 (7.16%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.38 epoch: 0|step: 2322|ppo_ep: 1|act_loss: -0.136474609375|cri_loss: -0.0562744140625|unsuper_loss: 0.0 average reward score: 3.6171875 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.40%) |Training time=0.65s (19.55%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.38 epoch: 0|step: 2323|ppo_ep: 1|act_loss: -0.05572509765625|cri_loss: -0.02032470703125|unsuper_loss: 0.0 average reward score: 2.994140625 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.84%) |Training time=0.64s (19.37%) |Others=0.19 (5.79%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.38 epoch: 0|step: 2324|ppo_ep: 1|act_loss: -0.139404296875|cri_loss: -0.062469482421875|unsuper_loss: 0.0 average reward score: 3.314453125 ------------------------------------------------------------------------------------- |E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.61s (75.62%) |Training time=0.64s (18.55%) |Others=0.20 (5.83%)|CurSamplesPerSec=2.32 |AvgSamplesPerSec=2.38 epoch: 0|step: 2325|ppo_ep: 1|act_loss: -0.042938232421875|cri_loss: 0.0093994140625|unsuper_loss: 0.0 average reward score: 3.732421875 ------------------------------------------------------------------------------------- |E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.47s (73.25%) |Training time=0.71s (20.95%) |Others=0.20 (5.80%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.38 epoch: 0|step: 2326|ppo_ep: 1|act_loss: -0.1121826171875|cri_loss: -0.04376220703125|unsuper_loss: 0.0 average reward score: 3.900390625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.14%) |Training time=0.65s (19.93%) |Others=0.23 (6.94%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2327|ppo_ep: 1|act_loss: -0.0865478515625|cri_loss: -0.03271484375|unsuper_loss: 0.0 average reward score: 4.0546875 ------------------------------------------------------------------------------------- |E2E latency=3.70s |Gather latency=0.00s (0.00%) |Generate time=2.49s (67.31%) |Training time=0.92s (25.00%) |Others=0.28 (7.69%)|CurSamplesPerSec=2.16 |AvgSamplesPerSec=2.38 epoch: 0|step: 2328|ppo_ep: 1|act_loss: -0.04290771484375|cri_loss: -0.01788330078125|unsuper_loss: 0.0 average reward score: 3.79296875 ------------------------------------------------------------------------------------- |E2E latency=3.16s |Gather latency=0.00s (0.00%) |Generate time=2.32s (73.32%) |Training time=0.65s (20.58%) |Others=0.19 (6.10%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.38 epoch: 0|step: 2329|ppo_ep: 1|act_loss: 0.07281494140625|cri_loss: 0.0404052734375|unsuper_loss: 0.0 average reward score: 3.87890625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.48%) |Training time=0.66s (20.44%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2330|ppo_ep: 1|act_loss: 0.045135498046875|cri_loss: 0.02813720703125|unsuper_loss: 0.0 average reward score: 3.8203125 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.77%) |Training time=0.64s (19.25%) |Others=0.20 (5.98%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.38 epoch: 0|step: 2331|ppo_ep: 1|act_loss: 0.08349609375|cri_loss: 0.0595703125|unsuper_loss: 0.0 average reward score: 4.15625 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.38%) |Training time=0.65s (19.61%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.38 epoch: 0|step: 2332|ppo_ep: 1|act_loss: 0.09039306640625|cri_loss: 0.05364990234375|unsuper_loss: 0.0 average reward score: 4.8515625 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.79%) |Training time=0.65s (19.24%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.38 epoch: 0|step: 2333|ppo_ep: 1|act_loss: -0.01678466796875|cri_loss: -0.00356292724609375|unsuper_loss: 0.0 average reward score: 3.73828125 ------------------------------------------------------------------------------------- |E2E latency=3.42s |Gather latency=0.00s (0.00%) |Generate time=2.56s (74.92%) |Training time=0.66s (19.21%) |Others=0.20 (5.87%)|CurSamplesPerSec=2.34 |AvgSamplesPerSec=2.38 epoch: 0|step: 2334|ppo_ep: 1|act_loss: 0.06640625|cri_loss: 0.039581298828125|unsuper_loss: 0.0 average reward score: 3.65234375 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.69%) |Training time=0.65s (19.25%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.38 epoch: 0|step: 2335|ppo_ep: 1|act_loss: -0.05120849609375|cri_loss: -0.0200653076171875|unsuper_loss: 0.0 average reward score: 3.859375 ------------------------------------------------------------------------------------- |E2E latency=3.74s |Gather latency=0.00s (0.00%) |Generate time=2.51s (67.11%) |Training time=0.95s (25.31%) |Others=0.28 (7.58%)|CurSamplesPerSec=2.14 |AvgSamplesPerSec=2.38 epoch: 0|step: 2336|ppo_ep: 1|act_loss: 0.0001220703125|cri_loss: 0.00902557373046875|unsuper_loss: 0.0 average reward score: 3.259765625 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.33%) |Training time=0.64s (19.54%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.38 epoch: 0|step: 2337|ppo_ep: 1|act_loss: 0.10302734375|cri_loss: 0.063232421875|unsuper_loss: 0.0 average reward score: 4.3984375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.34%) |Training time=0.66s (20.06%) |Others=0.22 (6.61%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2338|ppo_ep: 1|act_loss: 0.1494140625|cri_loss: 0.08111572265625|unsuper_loss: 0.0 average reward score: 4.1171875 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.42%) |Training time=0.65s (20.17%) |Others=0.20 (6.41%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.38 epoch: 0|step: 2339|ppo_ep: 1|act_loss: 0.1268310546875|cri_loss: 0.07025146484375|unsuper_loss: 0.0 average reward score: 4.15625 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.20%) |Training time=0.66s (19.83%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.38 epoch: 0|step: 2340|ppo_ep: 1|act_loss: 0.056396484375|cri_loss: 0.033050537109375|unsuper_loss: 0.0 average reward score: 3.2265625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.84%) |Training time=0.64s (19.99%) |Others=0.20 (6.17%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.38 epoch: 0|step: 2341|ppo_ep: 1|act_loss: 0.06024169921875|cri_loss: 0.03802490234375|unsuper_loss: 0.0 average reward score: 3.859375 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.61%) |Training time=0.64s (20.13%) |Others=0.20 (6.26%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.38 epoch: 0|step: 2342|ppo_ep: 1|act_loss: -0.00032806396484375|cri_loss: 0.002819061279296875|unsuper_loss: 0.0 average reward score: 4.796875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.01%) |Training time=0.65s (19.88%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2343|ppo_ep: 1|act_loss: 0.01947021484375|cri_loss: 0.01250457763671875|unsuper_loss: 0.0 average reward score: 3.73046875 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.38s (65.76%) |Training time=0.94s (26.05%) |Others=0.30 (8.19%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.38 epoch: 0|step: 2344|ppo_ep: 1|act_loss: 0.107421875|cri_loss: 0.06390380859375|unsuper_loss: 0.0 average reward score: 4.23828125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.15%) |Training time=0.64s (19.78%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.38 epoch: 0|step: 2345|ppo_ep: 1|act_loss: 0.07861328125|cri_loss: 0.04473876953125|unsuper_loss: 0.0 average reward score: 3.25390625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.77%) |Training time=0.65s (20.11%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.38 epoch: 0|step: 2346|ppo_ep: 1|act_loss: 0.00095367431640625|cri_loss: 0.0069122314453125|unsuper_loss: 0.0 average reward score: 3.3828125 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.77%) |Training time=0.64s (20.04%) |Others=0.20 (6.19%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.38 epoch: 0|step: 2347|ppo_ep: 1|act_loss: 0.00513458251953125|cri_loss: 0.00852203369140625|unsuper_loss: 0.0 average reward score: 3.7734375 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.09%) |Training time=0.65s (19.59%) |Others=0.21 (6.32%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.38 epoch: 0|step: 2348|ppo_ep: 1|act_loss: 0.13916015625|cri_loss: 0.075927734375|unsuper_loss: 0.0 average reward score: 3.09765625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.90%) |Training time=0.64s (19.83%) |Others=0.20 (6.26%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2349|ppo_ep: 1|act_loss: 0.1429443359375|cri_loss: 0.08184814453125|unsuper_loss: 0.0 average reward score: 3.4140625 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.36s (71.14%) |Training time=0.76s (22.90%) |Others=0.20 (5.96%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.38 epoch: 0|step: 2350|ppo_ep: 1|act_loss: 0.0577392578125|cri_loss: 0.037200927734375|unsuper_loss: 0.0 average reward score: 3.279296875 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.38s (72.72%) |Training time=0.69s (21.06%) |Others=0.20 (6.22%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2351|ppo_ep: 1|act_loss: 0.1796875|cri_loss: 0.1043701171875|unsuper_loss: 0.0 average reward score: 3.8671875 ------------------------------------------------------------------------------------- |E2E latency=3.65s |Gather latency=0.00s (0.00%) |Generate time=2.37s (64.96%) |Training time=1.00s (27.38%) |Others=0.28 (7.66%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.38 epoch: 0|step: 2352|ppo_ep: 1|act_loss: -0.00714111328125|cri_loss: 0.00478363037109375|unsuper_loss: 0.0 average reward score: 3.412109375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.37s (72.64%) |Training time=0.69s (21.27%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2353|ppo_ep: 1|act_loss: 0.0858154296875|cri_loss: 0.048248291015625|unsuper_loss: 0.0 average reward score: 3.40625 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.38s (71.50%) |Training time=0.75s (22.51%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.38 epoch: 0|step: 2354|ppo_ep: 1|act_loss: 0.1575927734375|cri_loss: 0.086181640625|unsuper_loss: 0.0 average reward score: 3.08203125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.14%) |Training time=0.64s (19.60%) |Others=0.20 (6.26%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2355|ppo_ep: 1|act_loss: 0.173095703125|cri_loss: 0.094482421875|unsuper_loss: 0.0 average reward score: 3.19140625 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.27%) |Training time=0.65s (20.38%) |Others=0.20 (6.35%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.38 epoch: 0|step: 2356|ppo_ep: 1|act_loss: -0.049468994140625|cri_loss: -0.0181884765625|unsuper_loss: 0.0 average reward score: 4.25 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.07%) |Training time=0.64s (19.73%) |Others=0.20 (6.20%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2357|ppo_ep: 1|act_loss: 0.013214111328125|cri_loss: 0.01357269287109375|unsuper_loss: 0.0 average reward score: 3.8046875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.12%) |Training time=0.67s (20.80%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2358|ppo_ep: 1|act_loss: 0.12481689453125|cri_loss: 0.06878662109375|unsuper_loss: 0.0 average reward score: 4.125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.74%) |Training time=0.65s (19.94%) |Others=0.21 (6.33%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2359|ppo_ep: 1|act_loss: -0.03973388671875|cri_loss: -0.0148468017578125|unsuper_loss: 0.0 average reward score: 4.578125 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.39s (66.32%) |Training time=0.93s (25.79%) |Others=0.28 (7.90%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.38 epoch: 0|step: 2360|ppo_ep: 1|act_loss: -0.0174560546875|cri_loss: -0.00287628173828125|unsuper_loss: 0.0 average reward score: 4.421875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.96%) |Training time=0.64s (19.96%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.38 epoch: 0|step: 2361|ppo_ep: 1|act_loss: 0.25146484375|cri_loss: 0.14404296875|unsuper_loss: 0.0 average reward score: 2.515625 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.61%) |Training time=0.64s (20.12%) |Others=0.20 (6.27%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.38 epoch: 0|step: 2362|ppo_ep: 1|act_loss: 0.0421142578125|cri_loss: 0.0251007080078125|unsuper_loss: 0.0 average reward score: 4.23046875 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.33%) |Training time=0.64s (19.43%) |Others=0.21 (6.24%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.38 epoch: 0|step: 2363|ppo_ep: 1|act_loss: 0.1297607421875|cri_loss: 0.07147216796875|unsuper_loss: 0.0 average reward score: 4.4921875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.21%) |Training time=0.65s (19.76%) |Others=0.20 (6.02%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.38 epoch: 0|step: 2364|ppo_ep: 1|act_loss: -0.068603515625|cri_loss: -0.02435302734375|unsuper_loss: 0.0 average reward score: 3.59375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.85%) |Training time=0.64s (19.95%) |Others=0.20 (6.20%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.38 epoch: 0|step: 2365|ppo_ep: 1|act_loss: 0.07879638671875|cri_loss: 0.047607421875|unsuper_loss: 0.0 average reward score: 3.51953125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.00%) |Training time=0.65s (19.92%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2366|ppo_ep: 1|act_loss: -0.0277252197265625|cri_loss: -0.007843017578125|unsuper_loss: 0.0 average reward score: 3.826171875 ------------------------------------------------------------------------------------- |E2E latency=3.15s |Gather latency=0.00s (0.00%) |Generate time=2.30s (72.97%) |Training time=0.65s (20.73%) |Others=0.20 (6.30%)|CurSamplesPerSec=2.54 |AvgSamplesPerSec=2.38 epoch: 0|step: 2367|ppo_ep: 1|act_loss: -0.004913330078125|cri_loss: 0.0026397705078125|unsuper_loss: 0.0 average reward score: 3.216796875 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.40s (66.42%) |Training time=0.93s (25.70%) |Others=0.29 (7.89%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.38 epoch: 0|step: 2368|ppo_ep: 1|act_loss: 0.033477783203125|cri_loss: 0.02667236328125|unsuper_loss: 0.0 average reward score: 3.17578125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.45%) |Training time=0.65s (19.78%) |Others=0.19 (5.76%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2369|ppo_ep: 1|act_loss: 0.0162506103515625|cri_loss: 0.01399993896484375|unsuper_loss: 0.0 average reward score: 4.6953125 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.89%) |Training time=0.64s (19.95%) |Others=0.20 (6.16%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.38 epoch: 0|step: 2370|ppo_ep: 1|act_loss: -0.01416015625|cri_loss: 0.002593994140625|unsuper_loss: 0.0 average reward score: 3.970703125 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.46s (73.74%) |Training time=0.68s (20.30%) |Others=0.20 (5.96%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.38 epoch: 0|step: 2371|ppo_ep: 1|act_loss: 0.09918212890625|cri_loss: 0.0726318359375|unsuper_loss: 0.0 average reward score: 3.513671875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.14%) |Training time=0.65s (20.18%) |Others=0.22 (6.68%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2372|ppo_ep: 1|act_loss: 0.002269744873046875|cri_loss: 0.0098114013671875|unsuper_loss: 0.0 average reward score: 3.25 ------------------------------------------------------------------------------------- |E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.53s (74.67%) |Training time=0.65s (19.25%) |Others=0.21 (6.08%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.38 epoch: 0|step: 2373|ppo_ep: 1|act_loss: 0.006122589111328125|cri_loss: 0.011383056640625|unsuper_loss: 0.0 average reward score: 2.841796875 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.15%) |Training time=0.66s (19.90%) |Others=0.20 (5.94%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.38 epoch: 0|step: 2374|ppo_ep: 1|act_loss: 0.08245849609375|cri_loss: 0.046905517578125|unsuper_loss: 0.0 average reward score: 3.95703125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.59%) |Training time=0.65s (20.03%) |Others=0.21 (6.38%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2375|ppo_ep: 1|act_loss: -0.062103271484375|cri_loss: -0.019256591796875|unsuper_loss: 0.0 average reward score: 3.53515625 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.40s (66.50%) |Training time=0.93s (25.80%) |Others=0.28 (7.70%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.38 epoch: 0|step: 2376|ppo_ep: 1|act_loss: -0.09149169921875|cri_loss: -0.037994384765625|unsuper_loss: 0.0 average reward score: 3.490234375 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.57%) |Training time=0.65s (20.14%) |Others=0.20 (6.29%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.38 epoch: 0|step: 2377|ppo_ep: 1|act_loss: 0.074951171875|cri_loss: 0.042236328125|unsuper_loss: 0.0 average reward score: 3.40625 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.50%) |Training time=0.68s (20.62%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.38 epoch: 0|step: 2378|ppo_ep: 1|act_loss: -0.078125|cri_loss: -0.030029296875|unsuper_loss: 0.0 average reward score: 4.3359375 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.65%) |Training time=0.64s (20.10%) |Others=0.20 (6.25%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.38 epoch: 0|step: 2379|ppo_ep: 1|act_loss: 0.03173828125|cri_loss: 0.026580810546875|unsuper_loss: 0.0 average reward score: 3.46484375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.57%) |Training time=0.65s (20.16%) |Others=0.20 (6.27%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.38 epoch: 0|step: 2380|ppo_ep: 1|act_loss: -0.00022125244140625|cri_loss: 0.00665283203125|unsuper_loss: 0.0 average reward score: 4.0859375 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.59%) |Training time=0.64s (20.28%) |Others=0.19 (6.13%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.38 epoch: 0|step: 2381|ppo_ep: 1|act_loss: 0.049163818359375|cri_loss: 0.03265380859375|unsuper_loss: 0.0 average reward score: 3.23046875 ------------------------------------------------------------------------------------- |E2E latency=3.16s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.61%) |Training time=0.64s (20.39%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.38 epoch: 0|step: 2382|ppo_ep: 1|act_loss: -0.000946044921875|cri_loss: 0.00391387939453125|unsuper_loss: 0.0 average reward score: 4.2265625 ------------------------------------------------------------------------------------- |E2E latency=3.16s |Gather latency=0.00s (0.00%) |Generate time=2.32s (73.63%) |Training time=0.64s (20.26%) |Others=0.19 (6.11%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.38 epoch: 0|step: 2383|ppo_ep: 1|act_loss: -0.0965576171875|cri_loss: -0.033355712890625|unsuper_loss: 0.0 average reward score: 3.720703125 ------------------------------------------------------------------------------------- |E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.33s (65.93%) |Training time=0.93s (26.28%) |Others=0.27 (7.79%)|CurSamplesPerSec=2.27 |AvgSamplesPerSec=2.38 epoch: 0|step: 2384|ppo_ep: 1|act_loss: -0.03692626953125|cri_loss: -0.0107574462890625|unsuper_loss: 0.0 average reward score: 3.59765625 ------------------------------------------------------------------------------------- |E2E latency=3.15s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.81%) |Training time=0.64s (20.18%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.54 |AvgSamplesPerSec=2.38 epoch: 0|step: 2385|ppo_ep: 1|act_loss: 0.038726806640625|cri_loss: 0.024993896484375|unsuper_loss: 0.0 average reward score: 3.625 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.66%) |Training time=0.64s (20.20%) |Others=0.20 (6.14%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.38 epoch: 0|step: 2386|ppo_ep: 1|act_loss: -0.0222625732421875|cri_loss: -0.0051727294921875|unsuper_loss: 0.0 average reward score: 3.701171875 ------------------------------------------------------------------------------------- |E2E latency=3.15s |Gather latency=0.00s (0.00%) |Generate time=2.32s (73.61%) |Training time=0.64s (20.25%) |Others=0.19 (6.14%)|CurSamplesPerSec=2.54 |AvgSamplesPerSec=2.38 epoch: 0|step: 2387|ppo_ep: 1|act_loss: -0.09185791015625|cri_loss: -0.03778076171875|unsuper_loss: 0.0 average reward score: 3.8046875 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.59%) |Training time=0.65s (20.24%) |Others=0.20 (6.17%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.38 epoch: 0|step: 2388|ppo_ep: 1|act_loss: -0.0252685546875|cri_loss: -0.0019989013671875|unsuper_loss: 0.0 average reward score: 3.33203125 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.69%) |Training time=0.64s (19.45%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.38 epoch: 0|step: 2389|ppo_ep: 1|act_loss: -0.08447265625|cri_loss: -0.03271484375|unsuper_loss: 0.0 average reward score: 4.140625 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.81%) |Training time=0.65s (19.36%) |Others=0.20 (5.83%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.38 epoch: 0|step: 2390|ppo_ep: 1|act_loss: -0.0947265625|cri_loss: -0.0394287109375|unsuper_loss: 0.0 average reward score: 3.625 ------------------------------------------------------------------------------------- |E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.15%) |Training time=0.66s (19.57%) |Others=0.21 (6.28%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.38 epoch: 0|step: 2391|ppo_ep: 1|act_loss: -0.0106048583984375|cri_loss: 0.0053863525390625|unsuper_loss: 0.0 average reward score: 3.208984375 ------------------------------------------------------------------------------------- |E2E latency=3.74s |Gather latency=0.00s (0.00%) |Generate time=2.48s (66.37%) |Training time=0.98s (26.13%) |Others=0.28 (7.51%)|CurSamplesPerSec=2.14 |AvgSamplesPerSec=2.38 epoch: 0|step: 2392|ppo_ep: 1|act_loss: -0.03228759765625|cri_loss: -0.0094451904296875|unsuper_loss: 0.0 average reward score: 3.49609375 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.60%) |Training time=0.68s (20.64%) |Others=0.19 (5.76%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.38 epoch: 0|step: 2393|ppo_ep: 1|act_loss: -0.066650390625|cri_loss: -0.0233001708984375|unsuper_loss: 0.0 average reward score: 4.8359375 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.11%) |Training time=0.65s (19.67%) |Others=0.20 (6.21%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.38 epoch: 0|step: 2394|ppo_ep: 1|act_loss: 0.03338623046875|cri_loss: 0.0243988037109375|unsuper_loss: 0.0 average reward score: 2.607421875 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.72%) |Training time=0.64s (19.31%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.38 epoch: 0|step: 2395|ppo_ep: 1|act_loss: -0.0396728515625|cri_loss: -0.0161895751953125|unsuper_loss: 0.0 average reward score: 3.5703125 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.40%) |Training time=0.65s (19.66%) |Others=0.20 (5.95%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.38 epoch: 0|step: 2396|ppo_ep: 1|act_loss: 0.10247802734375|cri_loss: 0.05914306640625|unsuper_loss: 0.0 average reward score: 4.09375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.21%) |Training time=0.65s (19.71%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2397|ppo_ep: 1|act_loss: -0.026336669921875|cri_loss: -0.0001220703125|unsuper_loss: 0.0 average reward score: 3.453125 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.43s (72.37%) |Training time=0.71s (21.31%) |Others=0.21 (6.32%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.38 epoch: 0|step: 2398|ppo_ep: 1|act_loss: -0.09039306640625|cri_loss: -0.03509521484375|unsuper_loss: 0.0 average reward score: 4.015625 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.72%) |Training time=0.65s (20.17%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.38 [2023-04-24 16:02:04,165] [INFO] [logging.py:96:log_dist] [Rank 0] step=300, skipped=5, lr=[4.542960617105609e-06, 4.542960617105609e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 16:02:04,409] [INFO] [timer.py:199:stop] epoch=0/micro_step=2400/global_step=300, RunningAvgSamplesPerSec=15.419514816878827, CurrSamplesPerSec=15.650022174738103, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 16:02:04,612] [INFO] [logging.py:96:log_dist] [Rank 0] step=300, skipped=4, lr=[2.333018415637196e-06, 2.333018415637196e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 2399|ppo_ep: 1|act_loss: 0.07501220703125|cri_loss: 0.0421142578125|unsuper_loss: 0.0 average reward score: 4.41015625 ------------------------------------------------------------------------------------- |E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.35s (66.28%) |Training time=0.92s (25.89%) |Others=0.28 (7.83%)|CurSamplesPerSec=2.25 |AvgSamplesPerSec=2.38 epoch: 0|step: 2400|ppo_ep: 1|act_loss: -0.042694091796875|cri_loss: -0.01454925537109375|unsuper_loss: 0.0 average reward score: 4.37109375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.30%) |Training time=0.64s (19.70%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2401|ppo_ep: 1|act_loss: -0.0086822509765625|cri_loss: 0.00124359130859375|unsuper_loss: 0.0 average reward score: 3.3046875 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.30%) |Training time=0.64s (19.65%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2402|ppo_ep: 1|act_loss: 0.01332855224609375|cri_loss: 0.017303466796875|unsuper_loss: 0.0 average reward score: 3.734375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.41%) |Training time=0.64s (19.71%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2403|ppo_ep: 1|act_loss: -0.059814453125|cri_loss: -0.01715087890625|unsuper_loss: 0.0 average reward score: 4.234375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.51%) |Training time=0.64s (19.54%) |Others=0.20 (5.95%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2404|ppo_ep: 1|act_loss: 0.171630859375|cri_loss: 0.105224609375|unsuper_loss: 0.0 average reward score: 2.5234375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.54%) |Training time=0.64s (19.61%) |Others=0.19 (5.85%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2405|ppo_ep: 1|act_loss: 0.0660400390625|cri_loss: 0.0411376953125|unsuper_loss: 0.0 average reward score: 3.17578125 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.06%) |Training time=0.64s (19.91%) |Others=0.19 (6.03%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.38 epoch: 0|step: 2406|ppo_ep: 1|act_loss: 0.006649017333984375|cri_loss: 0.0110626220703125|unsuper_loss: 0.0 average reward score: 3.86328125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.05%) |Training time=0.64s (19.59%) |Others=0.21 (6.37%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2407|ppo_ep: 1|act_loss: 0.0972900390625|cri_loss: 0.05645751953125|unsuper_loss: 0.0 average reward score: 2.548828125 ------------------------------------------------------------------------------------- |E2E latency=3.65s |Gather latency=0.00s (0.00%) |Generate time=2.44s (66.87%) |Training time=0.93s (25.56%) |Others=0.28 (7.58%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.38 epoch: 0|step: 2408|ppo_ep: 1|act_loss: -0.014373779296875|cri_loss: 0.0006256103515625|unsuper_loss: 0.0 average reward score: 3.939453125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.99%) |Training time=0.64s (19.60%) |Others=0.21 (6.41%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2409|ppo_ep: 1|act_loss: 0.0291748046875|cri_loss: 0.019866943359375|unsuper_loss: 0.0 average reward score: 3.4453125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.75%) |Training time=0.65s (20.14%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2410|ppo_ep: 1|act_loss: -0.0321044921875|cri_loss: -0.0074310302734375|unsuper_loss: 0.0 average reward score: 3.39453125 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.61%) |Training time=0.65s (19.49%) |Others=0.20 (5.90%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.38 epoch: 0|step: 2411|ppo_ep: 1|act_loss: 0.0704345703125|cri_loss: 0.038604736328125|unsuper_loss: 0.0 average reward score: 3.234375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.44%) |Training time=0.64s (19.70%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2412|ppo_ep: 1|act_loss: 0.0121917724609375|cri_loss: 0.014007568359375|unsuper_loss: 0.0 average reward score: 3.099609375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.10%) |Training time=0.64s (19.71%) |Others=0.20 (6.19%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2413|ppo_ep: 1|act_loss: 0.026336669921875|cri_loss: 0.0243682861328125|unsuper_loss: 0.0 average reward score: 2.44140625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.51%) |Training time=0.64s (19.54%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2414|ppo_ep: 1|act_loss: 0.0654296875|cri_loss: 0.040283203125|unsuper_loss: 0.0 average reward score: 2.31640625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.36%) |Training time=0.64s (19.53%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2415|ppo_ep: 1|act_loss: 0.1072998046875|cri_loss: 0.0689697265625|unsuper_loss: 0.0 average reward score: 3.796875 ------------------------------------------------------------------------------------- |E2E latency=3.67s |Gather latency=0.00s (0.00%) |Generate time=2.46s (67.10%) |Training time=0.92s (25.23%) |Others=0.28 (7.67%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.38 epoch: 0|step: 2416|ppo_ep: 1|act_loss: 0.0814208984375|cri_loss: 0.048004150390625|unsuper_loss: 0.0 average reward score: 2.5078125 ------------------------------------------------------------------------------------- |E2E latency=3.15s |Gather latency=0.00s (0.00%) |Generate time=2.30s (73.14%) |Training time=0.65s (20.79%) |Others=0.19 (6.06%)|CurSamplesPerSec=2.54 |AvgSamplesPerSec=2.38 epoch: 0|step: 2417|ppo_ep: 1|act_loss: -0.010040283203125|cri_loss: 0.0|unsuper_loss: 0.0 average reward score: 3.67578125 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.05%) |Training time=0.65s (19.95%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2418|ppo_ep: 1|act_loss: 0.156494140625|cri_loss: 0.0941162109375|unsuper_loss: 0.0 average reward score: 1.572265625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.08%) |Training time=0.64s (19.75%) |Others=0.20 (6.17%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2419|ppo_ep: 1|act_loss: 0.01403045654296875|cri_loss: 0.01239013671875|unsuper_loss: 0.0 average reward score: 2.671875 ------------------------------------------------------------------------------------- |E2E latency=3.15s |Gather latency=0.00s (0.00%) |Generate time=2.31s (73.42%) |Training time=0.64s (20.37%) |Others=0.20 (6.20%)|CurSamplesPerSec=2.54 |AvgSamplesPerSec=2.38 epoch: 0|step: 2420|ppo_ep: 1|act_loss: 0.0137939453125|cri_loss: 0.01081085205078125|unsuper_loss: 0.0 average reward score: 2.74609375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.69%) |Training time=0.64s (19.43%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2421|ppo_ep: 1|act_loss: 0.03173828125|cri_loss: 0.0271453857421875|unsuper_loss: 0.0 average reward score: 3.525390625 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.99%) |Training time=0.64s (19.18%) |Others=0.19 (5.83%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.38 epoch: 0|step: 2422|ppo_ep: 1|act_loss: -0.06280517578125|cri_loss: -0.02294921875|unsuper_loss: 0.0 average reward score: 2.54296875 ------------------------------------------------------------------------------------- |E2E latency=3.44s |Gather latency=0.00s (0.00%) |Generate time=2.47s (71.64%) |Training time=0.78s (22.73%) |Others=0.19 (5.63%)|CurSamplesPerSec=2.32 |AvgSamplesPerSec=2.38 epoch: 0|step: 2423|ppo_ep: 1|act_loss: -0.0243072509765625|cri_loss: -0.0079803466796875|unsuper_loss: 0.0 average reward score: 3.22265625 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.42s (66.58%) |Training time=0.94s (25.71%) |Others=0.28 (7.71%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.38 epoch: 0|step: 2424|ppo_ep: 1|act_loss: 0.099853515625|cri_loss: 0.055389404296875|unsuper_loss: 0.0 average reward score: 3.99609375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.89%) |Training time=0.66s (20.20%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2425|ppo_ep: 1|act_loss: 0.0247650146484375|cri_loss: 0.01727294921875|unsuper_loss: 0.0 average reward score: 2.8828125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.19%) |Training time=0.64s (19.86%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.38 epoch: 0|step: 2426|ppo_ep: 1|act_loss: -0.02850341796875|cri_loss: 0.0027313232421875|unsuper_loss: 0.0 average reward score: 3.482421875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.28%) |Training time=0.64s (19.70%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2427|ppo_ep: 1|act_loss: 0.1190185546875|cri_loss: 0.06640625|unsuper_loss: 0.0 average reward score: 3.0546875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.27%) |Training time=0.64s (19.78%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.38 epoch: 0|step: 2428|ppo_ep: 1|act_loss: 0.041900634765625|cri_loss: 0.02655029296875|unsuper_loss: 0.0 average reward score: 4.1171875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.29%) |Training time=0.64s (19.74%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2429|ppo_ep: 1|act_loss: 0.01373291015625|cri_loss: 0.01230621337890625|unsuper_loss: 0.0 average reward score: 4.18359375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.29%) |Training time=0.64s (19.78%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2430|ppo_ep: 1|act_loss: 0.066650390625|cri_loss: 0.044891357421875|unsuper_loss: 0.0 average reward score: 3.064453125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.19%) |Training time=0.64s (19.70%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2431|ppo_ep: 1|act_loss: -0.0428466796875|cri_loss: -0.0110931396484375|unsuper_loss: 0.0 average reward score: 3.609375 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.38s (66.41%) |Training time=0.93s (25.87%) |Others=0.28 (7.72%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.38 epoch: 0|step: 2432|ppo_ep: 1|act_loss: -0.0189971923828125|cri_loss: -0.0045928955078125|unsuper_loss: 0.0 average reward score: 3.822265625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.49%) |Training time=0.64s (19.63%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2433|ppo_ep: 1|act_loss: -0.0457763671875|cri_loss: -0.011505126953125|unsuper_loss: 0.0 average reward score: 3.38671875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.34%) |Training time=0.64s (19.72%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2434|ppo_ep: 1|act_loss: 0.1461181640625|cri_loss: 0.08380126953125|unsuper_loss: 0.0 average reward score: 2.4453125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.36%) |Training time=0.64s (19.72%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2435|ppo_ep: 1|act_loss: 0.07958984375|cri_loss: 0.0457763671875|unsuper_loss: 0.0 average reward score: 1.021484375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.43%) |Training time=0.64s (19.70%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2436|ppo_ep: 1|act_loss: 0.04815673828125|cri_loss: 0.031494140625|unsuper_loss: 0.0 average reward score: 3.615234375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.19%) |Training time=0.65s (19.85%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2437|ppo_ep: 1|act_loss: -0.014495849609375|cri_loss: -0.00069427490234375|unsuper_loss: 0.0 average reward score: 2.353515625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.34%) |Training time=0.64s (19.81%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2438|ppo_ep: 1|act_loss: 0.078125|cri_loss: 0.045196533203125|unsuper_loss: 0.0 average reward score: 2.89453125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.99%) |Training time=0.64s (19.96%) |Others=0.19 (6.05%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.38 epoch: 0|step: 2439|ppo_ep: 1|act_loss: -0.010833740234375|cri_loss: 0.0082550048828125|unsuper_loss: 0.0 average reward score: 4.5 ------------------------------------------------------------------------------------- |E2E latency=3.66s |Gather latency=0.00s (0.00%) |Generate time=2.46s (67.12%) |Training time=0.93s (25.27%) |Others=0.28 (7.61%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.38 epoch: 0|step: 2440|ppo_ep: 1|act_loss: 0.04791259765625|cri_loss: 0.031158447265625|unsuper_loss: 0.0 average reward score: 3.09765625 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.60%) |Training time=0.64s (19.43%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.38 epoch: 0|step: 2441|ppo_ep: 1|act_loss: -0.032806396484375|cri_loss: -0.00969696044921875|unsuper_loss: 0.0 average reward score: 2.83984375 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.44s (72.81%) |Training time=0.72s (21.46%) |Others=0.19 (5.73%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.38 epoch: 0|step: 2442|ppo_ep: 1|act_loss: -0.012969970703125|cri_loss: -0.00286102294921875|unsuper_loss: 0.0 average reward score: 3.6171875 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.44s (72.98%) |Training time=0.70s (21.05%) |Others=0.20 (5.96%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.38 epoch: 0|step: 2443|ppo_ep: 1|act_loss: 0.005092620849609375|cri_loss: 0.005924224853515625|unsuper_loss: 0.0 average reward score: 4.59375 ------------------------------------------------------------------------------------- |E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.49s (73.37%) |Training time=0.71s (20.87%) |Others=0.20 (5.76%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.38 epoch: 0|step: 2444|ppo_ep: 1|act_loss: 0.0220947265625|cri_loss: 0.019744873046875|unsuper_loss: 0.0 average reward score: 3.80859375 ------------------------------------------------------------------------------------- |E2E latency=4.27s |Gather latency=0.00s (0.00%) |Generate time=2.67s (62.61%) |Training time=1.37s (32.18%) |Others=0.22 (5.21%)|CurSamplesPerSec=1.88 |AvgSamplesPerSec=2.38 epoch: 0|step: 2445|ppo_ep: 1|act_loss: 0.2060546875|cri_loss: 0.1168212890625|unsuper_loss: 0.0 average reward score: 2.71875 ------------------------------------------------------------------------------------- |E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.54s (74.76%) |Training time=0.66s (19.42%) |Others=0.20 (5.82%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.38 epoch: 0|step: 2446|ppo_ep: 1|act_loss: -0.0040740966796875|cri_loss: 0.002712249755859375|unsuper_loss: 0.0 average reward score: 3.671875 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.51%) |Training time=0.64s (19.41%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.38 epoch: 0|step: 2447|ppo_ep: 1|act_loss: 0.0076446533203125|cri_loss: 0.010040283203125|unsuper_loss: 0.0 average reward score: 3.26953125 ------------------------------------------------------------------------------------- |E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.42s (66.69%) |Training time=0.93s (25.52%) |Others=0.28 (7.79%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.38 epoch: 0|step: 2448|ppo_ep: 1|act_loss: 0.032470703125|cri_loss: 0.02490234375|unsuper_loss: 0.0 average reward score: 2.666015625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.25%) |Training time=0.64s (19.64%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2449|ppo_ep: 1|act_loss: 0.032318115234375|cri_loss: 0.025360107421875|unsuper_loss: 0.0 average reward score: 3.7734375 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.55%) |Training time=0.65s (19.56%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.38 epoch: 0|step: 2450|ppo_ep: 1|act_loss: 0.034454345703125|cri_loss: 0.023193359375|unsuper_loss: 0.0 average reward score: 3.921875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.32%) |Training time=0.64s (19.60%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2451|ppo_ep: 1|act_loss: -0.043975830078125|cri_loss: -0.01171875|unsuper_loss: 0.0 average reward score: 3.314453125 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.77%) |Training time=0.64s (19.23%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.38 epoch: 0|step: 2452|ppo_ep: 1|act_loss: -0.09521484375|cri_loss: -0.0335693359375|unsuper_loss: 0.0 average reward score: 3.400390625 ------------------------------------------------------------------------------------- |E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.54s (74.84%) |Training time=0.65s (19.12%) |Others=0.21 (6.03%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.38 epoch: 0|step: 2453|ppo_ep: 1|act_loss: 0.1336669921875|cri_loss: 0.0732421875|unsuper_loss: 0.0 average reward score: 1.6240234375 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.44%) |Training time=0.64s (19.15%) |Others=0.21 (6.41%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.38 epoch: 0|step: 2454|ppo_ep: 1|act_loss: 0.04052734375|cri_loss: 0.030242919921875|unsuper_loss: 0.0 average reward score: 2.70703125 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.27%) |Training time=0.64s (19.01%) |Others=0.23 (6.72%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.38 epoch: 0|step: 2455|ppo_ep: 1|act_loss: -0.130126953125|cri_loss: -0.052978515625|unsuper_loss: 0.0 average reward score: 4.6953125 ------------------------------------------------------------------------------------- |E2E latency=3.79s |Gather latency=0.00s (0.00%) |Generate time=2.57s (67.73%) |Training time=0.94s (24.77%) |Others=0.28 (7.50%)|CurSamplesPerSec=2.11 |AvgSamplesPerSec=2.38 epoch: 0|step: 2456|ppo_ep: 1|act_loss: -0.0496826171875|cri_loss: -0.017913818359375|unsuper_loss: 0.0 average reward score: 3.154296875 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.40%) |Training time=0.64s (19.44%) |Others=0.20 (6.16%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.38 epoch: 0|step: 2457|ppo_ep: 1|act_loss: 0.181640625|cri_loss: 0.10205078125|unsuper_loss: 0.0 average reward score: 3.576171875 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.35%) |Training time=0.64s (19.25%) |Others=0.21 (6.41%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.38 epoch: 0|step: 2458|ppo_ep: 1|act_loss: 0.061065673828125|cri_loss: 0.033935546875|unsuper_loss: 0.0 average reward score: 4.109375 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.27%) |Training time=0.64s (19.41%) |Others=0.21 (6.32%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.38 epoch: 0|step: 2459|ppo_ep: 1|act_loss: 0.063232421875|cri_loss: 0.04266357421875|unsuper_loss: 0.0 average reward score: 1.65625 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.32%) |Training time=0.64s (19.40%) |Others=0.21 (6.28%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.38 epoch: 0|step: 2460|ppo_ep: 1|act_loss: -0.051025390625|cri_loss: -0.019012451171875|unsuper_loss: 0.0 average reward score: 3.30078125 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.20%) |Training time=0.65s (19.48%) |Others=0.21 (6.32%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.38 epoch: 0|step: 2461|ppo_ep: 1|act_loss: -0.051025390625|cri_loss: -0.008544921875|unsuper_loss: 0.0 average reward score: 3.03125 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.00%) |Training time=0.66s (19.51%) |Others=0.22 (6.49%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.38 epoch: 0|step: 2462|ppo_ep: 1|act_loss: 0.03338623046875|cri_loss: 0.0254058837890625|unsuper_loss: 0.0 average reward score: 2.64453125 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.61s (72.09%) |Training time=0.80s (22.08%) |Others=0.21 (5.83%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.38 epoch: 0|step: 2463|ppo_ep: 1|act_loss: 0.097412109375|cri_loss: 0.058074951171875|unsuper_loss: 0.0 average reward score: 2.828125 ------------------------------------------------------------------------------------- |E2E latency=3.68s |Gather latency=0.00s (0.00%) |Generate time=2.46s (66.82%) |Training time=0.94s (25.48%) |Others=0.28 (7.70%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.38 epoch: 0|step: 2464|ppo_ep: 1|act_loss: -0.0635986328125|cri_loss: -0.02685546875|unsuper_loss: 0.0 average reward score: 4.1171875 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.92%) |Training time=0.64s (19.14%) |Others=0.20 (5.94%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.38 epoch: 0|step: 2465|ppo_ep: 1|act_loss: -0.0292816162109375|cri_loss: -0.0014495849609375|unsuper_loss: 0.0 average reward score: 2.939453125 ------------------------------------------------------------------------------------- |E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.53s (74.98%) |Training time=0.64s (19.09%) |Others=0.20 (5.93%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.38 epoch: 0|step: 2466|ppo_ep: 1|act_loss: -0.08441162109375|cri_loss: -0.0296783447265625|unsuper_loss: 0.0 average reward score: 3.7578125 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.52%) |Training time=0.65s (19.34%) |Others=0.21 (6.14%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.38 epoch: 0|step: 2467|ppo_ep: 1|act_loss: -0.06671142578125|cri_loss: -0.025421142578125|unsuper_loss: 0.0 average reward score: 3.44140625 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.79%) |Training time=0.65s (19.33%) |Others=0.20 (5.88%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.38 epoch: 0|step: 2468|ppo_ep: 1|act_loss: -0.06243896484375|cri_loss: -0.0204010009765625|unsuper_loss: 0.0 average reward score: 3.6796875 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.20%) |Training time=0.66s (19.85%) |Others=0.20 (5.94%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.38 epoch: 0|step: 2469|ppo_ep: 1|act_loss: -0.04705810546875|cri_loss: -0.0159912109375|unsuper_loss: 0.0 average reward score: 2.81640625 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.87%) |Training time=0.64s (19.28%) |Others=0.19 (5.85%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.38 epoch: 0|step: 2470|ppo_ep: 1|act_loss: -0.046295166015625|cri_loss: -0.0161895751953125|unsuper_loss: 0.0 average reward score: 4.078125 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.52s (74.74%) |Training time=0.65s (19.26%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.38 epoch: 0|step: 2471|ppo_ep: 1|act_loss: -0.068115234375|cri_loss: -0.02239990234375|unsuper_loss: 0.0 average reward score: 2.96484375 ------------------------------------------------------------------------------------- |E2E latency=3.76s |Gather latency=0.00s (0.00%) |Generate time=2.55s (67.63%) |Training time=0.93s (24.79%) |Others=0.29 (7.58%)|CurSamplesPerSec=2.13 |AvgSamplesPerSec=2.38 epoch: 0|step: 2472|ppo_ep: 1|act_loss: -0.09228515625|cri_loss: -0.03826904296875|unsuper_loss: 0.0 average reward score: 3.728515625 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.21%) |Training time=0.65s (19.69%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.38 epoch: 0|step: 2473|ppo_ep: 1|act_loss: -0.04376220703125|cri_loss: -0.014556884765625|unsuper_loss: 0.0 average reward score: 2.7109375 ------------------------------------------------------------------------------------- |E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.02%) |Training time=0.68s (20.12%) |Others=0.20 (5.87%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.38 epoch: 0|step: 2474|ppo_ep: 1|act_loss: -0.1177978515625|cri_loss: -0.050628662109375|unsuper_loss: 0.0 average reward score: 2.974609375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.36s (72.81%) |Training time=0.68s (20.99%) |Others=0.20 (6.20%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2475|ppo_ep: 1|act_loss: -0.0189666748046875|cri_loss: -0.002532958984375|unsuper_loss: 0.0 average reward score: 3.31640625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.01%) |Training time=0.64s (19.80%) |Others=0.20 (6.19%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2476|ppo_ep: 1|act_loss: -0.100830078125|cri_loss: -0.03839111328125|unsuper_loss: 0.0 average reward score: 3.4296875 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.80%) |Training time=0.64s (20.11%) |Others=0.19 (6.09%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.38 epoch: 0|step: 2477|ppo_ep: 1|act_loss: -0.0545654296875|cri_loss: -0.0160369873046875|unsuper_loss: 0.0 average reward score: 3.07421875 ------------------------------------------------------------------------------------- |E2E latency=3.16s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.68%) |Training time=0.64s (20.20%) |Others=0.19 (6.13%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.38 epoch: 0|step: 2478|ppo_ep: 1|act_loss: -0.0183563232421875|cri_loss: -0.0028076171875|unsuper_loss: 0.0 average reward score: 4.7109375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.90%) |Training time=0.64s (19.85%) |Others=0.20 (6.25%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 [2023-04-24 16:06:32,377] [INFO] [logging.py:96:log_dist] [Rank 0] step=310, skipped=5, lr=[4.141958737521091e-06, 4.141958737521091e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 16:06:32,562] [INFO] [timer.py:199:stop] epoch=0/micro_step=2480/global_step=310, RunningAvgSamplesPerSec=15.416504366898998, CurrSamplesPerSec=15.390142826199552, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 16:06:32,788] [INFO] [logging.py:96:log_dist] [Rank 0] step=310, skipped=4, lr=[2.1254273151597967e-06, 2.1254273151597967e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 2479|ppo_ep: 1|act_loss: -0.03515625|cri_loss: -0.01092529296875|unsuper_loss: 0.0 average reward score: 3.5625 ------------------------------------------------------------------------------------- |E2E latency=3.65s |Gather latency=0.00s (0.00%) |Generate time=2.33s (63.87%) |Training time=1.02s (27.86%) |Others=0.30 (8.27%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.38 epoch: 0|step: 2480|ppo_ep: 1|act_loss: -0.045440673828125|cri_loss: -0.0194854736328125|unsuper_loss: 0.0 average reward score: 3.9765625 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.93%) |Training time=0.64s (19.09%) |Others=0.20 (5.98%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.38 epoch: 0|step: 2481|ppo_ep: 1|act_loss: 0.04351806640625|cri_loss: 0.036956787109375|unsuper_loss: 0.0 average reward score: 3.654296875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.93%) |Training time=0.64s (19.85%) |Others=0.20 (6.22%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.38 epoch: 0|step: 2482|ppo_ep: 1|act_loss: -0.0823974609375|cri_loss: -0.03533935546875|unsuper_loss: 0.0 average reward score: 4.03125 ------------------------------------------------------------------------------------- |E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.54s (74.73%) |Training time=0.65s (19.22%) |Others=0.21 (6.06%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.38 epoch: 0|step: 2483|ppo_ep: 1|act_loss: -0.14892578125|cri_loss: -0.056640625|unsuper_loss: 0.0 average reward score: 2.66796875 ------------------------------------------------------------------------------------- |E2E latency=4.10s |Gather latency=0.00s (0.00%) |Generate time=3.22s (78.52%) |Training time=0.66s (16.13%) |Others=0.22 (5.35%)|CurSamplesPerSec=1.95 |AvgSamplesPerSec=2.38 epoch: 0|step: 2484|ppo_ep: 1|act_loss: 0.00604248046875|cri_loss: 0.0119476318359375|unsuper_loss: 0.0 average reward score: 2.6796875 ------------------------------------------------------------------------------------- |E2E latency=3.74s |Gather latency=0.00s (0.00%) |Generate time=2.84s (75.79%) |Training time=0.66s (17.71%) |Others=0.24 (6.50%)|CurSamplesPerSec=2.14 |AvgSamplesPerSec=2.38 epoch: 0|step: 2485|ppo_ep: 1|act_loss: 0.0164794921875|cri_loss: 0.021942138671875|unsuper_loss: 0.0 average reward score: 4.0078125 ------------------------------------------------------------------------------------- |E2E latency=3.66s |Gather latency=0.00s (0.00%) |Generate time=2.59s (70.73%) |Training time=0.87s (23.75%) |Others=0.20 (5.52%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.38 epoch: 0|step: 2486|ppo_ep: 1|act_loss: 0.10400390625|cri_loss: 0.057891845703125|unsuper_loss: 0.0 average reward score: 3.087890625 ------------------------------------------------------------------------------------- |E2E latency=3.71s |Gather latency=0.00s (0.00%) |Generate time=2.54s (68.35%) |Training time=0.95s (25.54%) |Others=0.23 (6.11%)|CurSamplesPerSec=2.16 |AvgSamplesPerSec=2.38 epoch: 0|step: 2487|ppo_ep: 1|act_loss: -0.0006561279296875|cri_loss: 0.006916046142578125|unsuper_loss: 0.0 average reward score: 2.560546875 ------------------------------------------------------------------------------------- |E2E latency=4.17s |Gather latency=0.00s (0.00%) |Generate time=2.93s (70.27%) |Training time=0.93s (22.29%) |Others=0.31 (7.44%)|CurSamplesPerSec=1.92 |AvgSamplesPerSec=2.38 epoch: 0|step: 2488|ppo_ep: 1|act_loss: 0.04736328125|cri_loss: 0.0282135009765625|unsuper_loss: 0.0 average reward score: 4.421875 ------------------------------------------------------------------------------------- |E2E latency=3.89s |Gather latency=0.00s (0.00%) |Generate time=3.02s (77.61%) |Training time=0.65s (16.68%) |Others=0.22 (5.72%)|CurSamplesPerSec=2.05 |AvgSamplesPerSec=2.38 epoch: 0|step: 2489|ppo_ep: 1|act_loss: -0.11676025390625|cri_loss: -0.0472412109375|unsuper_loss: 0.0 average reward score: 2.7734375 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.73s (75.83%) |Training time=0.65s (18.02%) |Others=0.22 (6.15%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.38 epoch: 0|step: 2490|ppo_ep: 1|act_loss: -0.038848876953125|cri_loss: -0.0126800537109375|unsuper_loss: 0.0 average reward score: 2.7734375 ------------------------------------------------------------------------------------- |E2E latency=3.43s |Gather latency=0.00s (0.00%) |Generate time=2.55s (74.47%) |Training time=0.64s (18.73%) |Others=0.23 (6.80%)|CurSamplesPerSec=2.33 |AvgSamplesPerSec=2.38 epoch: 0|step: 2491|ppo_ep: 1|act_loss: -0.10107421875|cri_loss: -0.039520263671875|unsuper_loss: 0.0 average reward score: 3.64453125 ------------------------------------------------------------------------------------- |E2E latency=3.78s |Gather latency=0.00s (0.00%) |Generate time=2.46s (65.11%) |Training time=1.09s (28.84%) |Others=0.23 (6.05%)|CurSamplesPerSec=2.12 |AvgSamplesPerSec=2.38 epoch: 0|step: 2492|ppo_ep: 1|act_loss: 0.03143310546875|cri_loss: 0.0195159912109375|unsuper_loss: 0.0 average reward score: 3.654296875 ------------------------------------------------------------------------------------- |E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.46s (72.76%) |Training time=0.67s (19.94%) |Others=0.25 (7.30%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.38 epoch: 0|step: 2493|ppo_ep: 1|act_loss: -0.056671142578125|cri_loss: -0.021087646484375|unsuper_loss: 0.0 average reward score: 2.853515625 ------------------------------------------------------------------------------------- |E2E latency=3.95s |Gather latency=0.00s (0.00%) |Generate time=2.87s (72.77%) |Training time=0.86s (21.77%) |Others=0.22 (5.46%)|CurSamplesPerSec=2.03 |AvgSamplesPerSec=2.38 epoch: 0|step: 2494|ppo_ep: 1|act_loss: 0.053466796875|cri_loss: 0.03302001953125|unsuper_loss: 0.0 average reward score: 3.7109375 ------------------------------------------------------------------------------------- |E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.67s (75.35%) |Training time=0.64s (18.15%) |Others=0.23 (6.50%)|CurSamplesPerSec=2.25 |AvgSamplesPerSec=2.38 epoch: 0|step: 2495|ppo_ep: 1|act_loss: 0.1551513671875|cri_loss: 0.09100341796875|unsuper_loss: 0.0 average reward score: 2.83203125 ------------------------------------------------------------------------------------- |E2E latency=4.13s |Gather latency=0.00s (0.00%) |Generate time=2.54s (61.41%) |Training time=1.26s (30.62%) |Others=0.33 (7.97%)|CurSamplesPerSec=1.94 |AvgSamplesPerSec=2.38 epoch: 0|step: 2496|ppo_ep: 1|act_loss: 0.1053466796875|cri_loss: 0.06103515625|unsuper_loss: 0.0 average reward score: 3.125 ------------------------------------------------------------------------------------- |E2E latency=3.70s |Gather latency=0.00s (0.00%) |Generate time=2.83s (76.41%) |Training time=0.66s (17.75%) |Others=0.22 (5.85%)|CurSamplesPerSec=2.16 |AvgSamplesPerSec=2.38 epoch: 0|step: 2497|ppo_ep: 1|act_loss: 0.06146240234375|cri_loss: 0.037750244140625|unsuper_loss: 0.0 average reward score: 2.72265625 ------------------------------------------------------------------------------------- |E2E latency=3.70s |Gather latency=0.00s (0.00%) |Generate time=2.82s (76.28%) |Training time=0.64s (17.29%) |Others=0.24 (6.43%)|CurSamplesPerSec=2.16 |AvgSamplesPerSec=2.38 epoch: 0|step: 2498|ppo_ep: 1|act_loss: -0.0750732421875|cri_loss: -0.02996826171875|unsuper_loss: 0.0 average reward score: 4.3125 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.72s (75.94%) |Training time=0.64s (18.02%) |Others=0.22 (6.04%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.38 epoch: 0|step: 2499|ppo_ep: 1|act_loss: 0.047210693359375|cri_loss: 0.033416748046875|unsuper_loss: 0.0 average reward score: 4.5 ------------------------------------------------------------------------------------- |E2E latency=3.81s |Gather latency=0.00s (0.00%) |Generate time=2.92s (76.63%) |Training time=0.68s (17.86%) |Others=0.21 (5.50%)|CurSamplesPerSec=2.10 |AvgSamplesPerSec=2.38 epoch: 0|step: 2500|ppo_ep: 1|act_loss: 0.0545654296875|cri_loss: 0.03375244140625|unsuper_loss: 0.0 average reward score: 3.626953125 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.73s (75.87%) |Training time=0.64s (17.89%) |Others=0.22 (6.24%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.38 epoch: 0|step: 2501|ppo_ep: 1|act_loss: -0.04315185546875|cri_loss: -0.015228271484375|unsuper_loss: 0.0 average reward score: 3.63671875 ------------------------------------------------------------------------------------- |E2E latency=3.77s |Gather latency=0.00s (0.00%) |Generate time=2.91s (77.06%) |Training time=0.64s (17.05%) |Others=0.22 (5.89%)|CurSamplesPerSec=2.12 |AvgSamplesPerSec=2.38 epoch: 0|step: 2502|ppo_ep: 1|act_loss: 0.02203369140625|cri_loss: 0.0163421630859375|unsuper_loss: 0.0 average reward score: 4.15625 ------------------------------------------------------------------------------------- |E2E latency=3.71s |Gather latency=0.00s (0.00%) |Generate time=2.86s (77.00%) |Training time=0.65s (17.39%) |Others=0.21 (5.60%)|CurSamplesPerSec=2.15 |AvgSamplesPerSec=2.38 epoch: 0|step: 2503|ppo_ep: 1|act_loss: 0.058135986328125|cri_loss: 0.0338134765625|unsuper_loss: 0.0 average reward score: 3.083984375 ------------------------------------------------------------------------------------- |E2E latency=3.99s |Gather latency=0.00s (0.00%) |Generate time=2.75s (69.05%) |Training time=0.94s (23.64%) |Others=0.29 (7.31%)|CurSamplesPerSec=2.01 |AvgSamplesPerSec=2.38 epoch: 0|step: 2504|ppo_ep: 1|act_loss: 0.057861328125|cri_loss: 0.0340576171875|unsuper_loss: 0.0 average reward score: 3.8125 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.77s (76.85%) |Training time=0.64s (17.86%) |Others=0.19 (5.29%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.38 epoch: 0|step: 2505|ppo_ep: 1|act_loss: 0.1201171875|cri_loss: 0.0704345703125|unsuper_loss: 0.0 average reward score: 2.716796875 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.23%) |Training time=0.64s (19.45%) |Others=0.21 (6.32%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.38 epoch: 0|step: 2506|ppo_ep: 1|act_loss: 0.068115234375|cri_loss: 0.041748046875|unsuper_loss: 0.0 average reward score: 4.0625 ------------------------------------------------------------------------------------- |E2E latency=3.72s |Gather latency=0.00s (0.00%) |Generate time=2.84s (76.29%) |Training time=0.67s (17.94%) |Others=0.21 (5.77%)|CurSamplesPerSec=2.15 |AvgSamplesPerSec=2.38 epoch: 0|step: 2507|ppo_ep: 1|act_loss: 0.03857421875|cri_loss: 0.0245208740234375|unsuper_loss: 0.0 average reward score: 3.62890625 ------------------------------------------------------------------------------------- |E2E latency=3.85s |Gather latency=0.00s (0.00%) |Generate time=2.99s (77.59%) |Training time=0.66s (17.04%) |Others=0.21 (5.38%)|CurSamplesPerSec=2.08 |AvgSamplesPerSec=2.38 epoch: 0|step: 2508|ppo_ep: 1|act_loss: 0.130615234375|cri_loss: 0.081787109375|unsuper_loss: 0.0 average reward score: 2.17578125 ------------------------------------------------------------------------------------- |E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.66s (74.84%) |Training time=0.67s (18.75%) |Others=0.23 (6.41%)|CurSamplesPerSec=2.25 |AvgSamplesPerSec=2.38 epoch: 0|step: 2509|ppo_ep: 1|act_loss: -0.05828857421875|cri_loss: -0.02069091796875|unsuper_loss: 0.0 average reward score: 4.53125 ------------------------------------------------------------------------------------- |E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.65s (75.12%) |Training time=0.65s (18.39%) |Others=0.23 (6.49%)|CurSamplesPerSec=2.26 |AvgSamplesPerSec=2.38 epoch: 0|step: 2510|ppo_ep: 1|act_loss: 0.01800537109375|cri_loss: 0.0115203857421875|unsuper_loss: 0.0 average reward score: 4.09375 ------------------------------------------------------------------------------------- |E2E latency=3.67s |Gather latency=0.00s (0.00%) |Generate time=2.79s (75.81%) |Training time=0.66s (17.95%) |Others=0.23 (6.24%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.38 epoch: 0|step: 2511|ppo_ep: 1|act_loss: 0.164794921875|cri_loss: 0.093017578125|unsuper_loss: 0.0 average reward score: 2.794921875 ------------------------------------------------------------------------------------- |E2E latency=4.06s |Gather latency=0.00s (0.00%) |Generate time=2.51s (61.78%) |Training time=1.22s (30.09%) |Others=0.33 (8.13%)|CurSamplesPerSec=1.97 |AvgSamplesPerSec=2.38 epoch: 0|step: 2512|ppo_ep: 1|act_loss: -0.032867431640625|cri_loss: -0.010467529296875|unsuper_loss: 0.0 average reward score: 4.55859375 ------------------------------------------------------------------------------------- |E2E latency=3.76s |Gather latency=0.00s (0.00%) |Generate time=2.92s (77.61%) |Training time=0.63s (16.84%) |Others=0.21 (5.55%)|CurSamplesPerSec=2.13 |AvgSamplesPerSec=2.38 epoch: 0|step: 2513|ppo_ep: 1|act_loss: -0.05853271484375|cri_loss: -0.026031494140625|unsuper_loss: 0.0 average reward score: 4.0859375 ------------------------------------------------------------------------------------- |E2E latency=3.87s |Gather latency=0.00s (0.00%) |Generate time=2.92s (75.55%) |Training time=0.72s (18.65%) |Others=0.22 (5.80%)|CurSamplesPerSec=2.07 |AvgSamplesPerSec=2.38 epoch: 0|step: 2514|ppo_ep: 1|act_loss: 0.07958984375|cri_loss: 0.052978515625|unsuper_loss: 0.0 average reward score: 3.03515625 ------------------------------------------------------------------------------------- |E2E latency=3.59s |Gather latency=0.00s (0.00%) |Generate time=2.72s (75.74%) |Training time=0.65s (18.12%) |Others=0.22 (6.14%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.38 epoch: 0|step: 2515|ppo_ep: 1|act_loss: 0.0096588134765625|cri_loss: 0.01268768310546875|unsuper_loss: 0.0 average reward score: 3.8671875 ------------------------------------------------------------------------------------- |E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.75s (75.75%) |Training time=0.67s (18.30%) |Others=0.22 (5.95%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.38 epoch: 0|step: 2516|ppo_ep: 1|act_loss: 0.18798828125|cri_loss: 0.1068115234375|unsuper_loss: 0.0 average reward score: 3.91796875 ------------------------------------------------------------------------------------- |E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.51s (71.67%) |Training time=0.74s (21.00%) |Others=0.26 (7.33%)|CurSamplesPerSec=2.28 |AvgSamplesPerSec=2.38 epoch: 0|step: 2517|ppo_ep: 1|act_loss: -0.020111083984375|cri_loss: 0.00213623046875|unsuper_loss: 0.0 average reward score: 3.88671875 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.70%) |Training time=0.64s (19.50%) |Others=0.22 (6.80%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2518|ppo_ep: 1|act_loss: 0.03131103515625|cri_loss: 0.020355224609375|unsuper_loss: 0.0 average reward score: 4.1875 ------------------------------------------------------------------------------------- |E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.44s (70.51%) |Training time=0.78s (22.53%) |Others=0.24 (6.96%)|CurSamplesPerSec=2.31 |AvgSamplesPerSec=2.38 epoch: 0|step: 2519|ppo_ep: 1|act_loss: -0.0169830322265625|cri_loss: -0.00174713134765625|unsuper_loss: 0.0 average reward score: 4.24609375 ------------------------------------------------------------------------------------- |E2E latency=4.13s |Gather latency=0.00s (0.00%) |Generate time=2.89s (69.98%) |Training time=0.93s (22.59%) |Others=0.31 (7.43%)|CurSamplesPerSec=1.94 |AvgSamplesPerSec=2.38 epoch: 0|step: 2520|ppo_ep: 1|act_loss: 0.13232421875|cri_loss: 0.07147216796875|unsuper_loss: 0.0 average reward score: 3.60546875 ------------------------------------------------------------------------------------- |E2E latency=3.80s |Gather latency=0.00s (0.00%) |Generate time=2.92s (76.91%) |Training time=0.65s (17.03%) |Others=0.23 (6.06%)|CurSamplesPerSec=2.11 |AvgSamplesPerSec=2.38 epoch: 0|step: 2521|ppo_ep: 1|act_loss: 0.048187255859375|cri_loss: 0.028717041015625|unsuper_loss: 0.0 average reward score: 3.966796875 ------------------------------------------------------------------------------------- |E2E latency=3.76s |Gather latency=0.00s (0.00%) |Generate time=2.91s (77.26%) |Training time=0.65s (17.19%) |Others=0.21 (5.54%)|CurSamplesPerSec=2.13 |AvgSamplesPerSec=2.38 epoch: 0|step: 2522|ppo_ep: 1|act_loss: -0.0518798828125|cri_loss: -0.012847900390625|unsuper_loss: 0.0 average reward score: 3.4375 ------------------------------------------------------------------------------------- |E2E latency=3.67s |Gather latency=0.00s (0.00%) |Generate time=2.81s (76.41%) |Training time=0.65s (17.62%) |Others=0.22 (5.97%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.38 epoch: 0|step: 2523|ppo_ep: 1|act_loss: 0.0443115234375|cri_loss: 0.027069091796875|unsuper_loss: 0.0 average reward score: 4.30078125 ------------------------------------------------------------------------------------- |E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.60s (72.98%) |Training time=0.74s (20.76%) |Others=0.22 (6.25%)|CurSamplesPerSec=2.25 |AvgSamplesPerSec=2.38 epoch: 0|step: 2524|ppo_ep: 1|act_loss: 0.07135009765625|cri_loss: 0.04132080078125|unsuper_loss: 0.0 average reward score: 3.744140625 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.73s (75.94%) |Training time=0.65s (18.00%) |Others=0.22 (6.06%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.38 epoch: 0|step: 2525|ppo_ep: 1|act_loss: 0.050445556640625|cri_loss: 0.03179931640625|unsuper_loss: 0.0 average reward score: 3.796875 ------------------------------------------------------------------------------------- |E2E latency=4.06s |Gather latency=0.00s (0.00%) |Generate time=3.18s (78.28%) |Training time=0.65s (16.00%) |Others=0.23 (5.71%)|CurSamplesPerSec=1.97 |AvgSamplesPerSec=2.38 epoch: 0|step: 2526|ppo_ep: 1|act_loss: 0.162109375|cri_loss: 0.0943603515625|unsuper_loss: 0.0 average reward score: 3.7890625 ------------------------------------------------------------------------------------- |E2E latency=3.96s |Gather latency=0.00s (0.00%) |Generate time=2.91s (73.41%) |Training time=0.82s (20.80%) |Others=0.23 (5.79%)|CurSamplesPerSec=2.02 |AvgSamplesPerSec=2.38 epoch: 0|step: 2527|ppo_ep: 1|act_loss: 0.208984375|cri_loss: 0.140380859375|unsuper_loss: 0.0 average reward score: 2.33203125 ------------------------------------------------------------------------------------- |E2E latency=3.91s |Gather latency=0.00s (0.00%) |Generate time=2.55s (65.22%) |Training time=1.03s (26.41%) |Others=0.33 (8.37%)|CurSamplesPerSec=2.04 |AvgSamplesPerSec=2.38 epoch: 0|step: 2528|ppo_ep: 1|act_loss: -0.03131103515625|cri_loss: -0.009552001953125|unsuper_loss: 0.0 average reward score: 4.01953125 ------------------------------------------------------------------------------------- |E2E latency=3.77s |Gather latency=0.00s (0.00%) |Generate time=2.86s (75.80%) |Training time=0.68s (18.00%) |Others=0.23 (6.20%)|CurSamplesPerSec=2.12 |AvgSamplesPerSec=2.38 epoch: 0|step: 2529|ppo_ep: 1|act_loss: -0.035614013671875|cri_loss: -0.015228271484375|unsuper_loss: 0.0 average reward score: 3.83203125 ------------------------------------------------------------------------------------- |E2E latency=4.01s |Gather latency=0.00s (0.00%) |Generate time=3.10s (77.40%) |Training time=0.69s (17.14%) |Others=0.22 (5.46%)|CurSamplesPerSec=1.99 |AvgSamplesPerSec=2.38 epoch: 0|step: 2530|ppo_ep: 1|act_loss: 0.060791015625|cri_loss: 0.03790283203125|unsuper_loss: 0.0 average reward score: 3.29296875 ------------------------------------------------------------------------------------- |E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.64s (75.34%) |Training time=0.64s (18.40%) |Others=0.22 (6.26%)|CurSamplesPerSec=2.29 |AvgSamplesPerSec=2.38 epoch: 0|step: 2531|ppo_ep: 1|act_loss: 0.2410888671875|cri_loss: 0.1343994140625|unsuper_loss: 0.0 average reward score: 2.9375 ------------------------------------------------------------------------------------- |E2E latency=3.77s |Gather latency=0.00s (0.00%) |Generate time=2.61s (69.06%) |Training time=0.94s (25.04%) |Others=0.22 (5.90%)|CurSamplesPerSec=2.12 |AvgSamplesPerSec=2.38 epoch: 0|step: 2532|ppo_ep: 1|act_loss: 0.034576416015625|cri_loss: 0.02520751953125|unsuper_loss: 0.0 average reward score: 4.4453125 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.76s (75.91%) |Training time=0.65s (17.79%) |Others=0.23 (6.30%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.38 epoch: 0|step: 2533|ppo_ep: 1|act_loss: 0.02935791015625|cri_loss: 0.020355224609375|unsuper_loss: 0.0 average reward score: 4.2578125 ------------------------------------------------------------------------------------- |E2E latency=3.86s |Gather latency=0.00s (0.00%) |Generate time=2.62s (67.88%) |Training time=1.02s (26.53%) |Others=0.22 (5.58%)|CurSamplesPerSec=2.07 |AvgSamplesPerSec=2.38 epoch: 0|step: 2534|ppo_ep: 1|act_loss: 0.1300048828125|cri_loss: 0.080322265625|unsuper_loss: 0.0 average reward score: 3.609375 ------------------------------------------------------------------------------------- |E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.66s (75.49%) |Training time=0.65s (18.47%) |Others=0.21 (6.05%)|CurSamplesPerSec=2.27 |AvgSamplesPerSec=2.38 epoch: 0|step: 2535|ppo_ep: 1|act_loss: -0.0104522705078125|cri_loss: 0.001678466796875|unsuper_loss: 0.0 average reward score: 3.9296875 ------------------------------------------------------------------------------------- |E2E latency=4.08s |Gather latency=0.00s (0.00%) |Generate time=2.81s (68.86%) |Training time=0.94s (23.00%) |Others=0.33 (8.14%)|CurSamplesPerSec=1.96 |AvgSamplesPerSec=2.38 epoch: 0|step: 2536|ppo_ep: 1|act_loss: -0.03094482421875|cri_loss: -0.01015472412109375|unsuper_loss: 0.0 average reward score: 3.80859375 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.72s (75.58%) |Training time=0.64s (17.79%) |Others=0.24 (6.62%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.38 epoch: 0|step: 2537|ppo_ep: 1|act_loss: 0.09881591796875|cri_loss: 0.0596923828125|unsuper_loss: 0.0 average reward score: 3.0390625 ------------------------------------------------------------------------------------- |E2E latency=3.93s |Gather latency=0.00s (0.00%) |Generate time=3.07s (78.13%) |Training time=0.65s (16.46%) |Others=0.21 (5.40%)|CurSamplesPerSec=2.04 |AvgSamplesPerSec=2.38 epoch: 0|step: 2538|ppo_ep: 1|act_loss: 0.0266571044921875|cri_loss: 0.0178680419921875|unsuper_loss: 0.0 average reward score: 3.80078125 ------------------------------------------------------------------------------------- |E2E latency=4.31s |Gather latency=0.00s (0.00%) |Generate time=2.78s (64.60%) |Training time=1.28s (29.84%) |Others=0.24 (5.57%)|CurSamplesPerSec=1.86 |AvgSamplesPerSec=2.38 epoch: 0|step: 2539|ppo_ep: 1|act_loss: 0.042022705078125|cri_loss: 0.0286407470703125|unsuper_loss: 0.0 average reward score: 3.28515625 ------------------------------------------------------------------------------------- |E2E latency=3.57s |Gather latency=0.00s (0.00%) |Generate time=2.68s (74.87%) |Training time=0.67s (18.88%) |Others=0.22 (6.25%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.38 epoch: 0|step: 2540|ppo_ep: 1|act_loss: -0.1102294921875|cri_loss: -0.04498291015625|unsuper_loss: 0.0 average reward score: 4.21875 ------------------------------------------------------------------------------------- |E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.46s (69.28%) |Training time=0.87s (24.41%) |Others=0.22 (6.31%)|CurSamplesPerSec=2.25 |AvgSamplesPerSec=2.38 epoch: 0|step: 2541|ppo_ep: 1|act_loss: 0.145263671875|cri_loss: 0.0859375|unsuper_loss: 0.0 average reward score: 3.96875 ------------------------------------------------------------------------------------- |E2E latency=3.71s |Gather latency=0.00s (0.00%) |Generate time=2.84s (76.37%) |Training time=0.67s (17.92%) |Others=0.21 (5.70%)|CurSamplesPerSec=2.15 |AvgSamplesPerSec=2.38 epoch: 0|step: 2542|ppo_ep: 1|act_loss: 0.0200653076171875|cri_loss: 0.01708984375|unsuper_loss: 0.0 average reward score: 4.078125 ------------------------------------------------------------------------------------- |E2E latency=3.86s |Gather latency=0.00s (0.00%) |Generate time=2.65s (68.57%) |Training time=0.96s (24.94%) |Others=0.25 (6.48%)|CurSamplesPerSec=2.07 |AvgSamplesPerSec=2.38 epoch: 0|step: 2543|ppo_ep: 1|act_loss: 0.02703857421875|cri_loss: 0.0195770263671875|unsuper_loss: 0.0 average reward score: 3.89453125 ------------------------------------------------------------------------------------- |E2E latency=4.12s |Gather latency=0.00s (0.00%) |Generate time=2.85s (69.29%) |Training time=0.94s (22.77%) |Others=0.33 (7.94%)|CurSamplesPerSec=1.94 |AvgSamplesPerSec=2.37 epoch: 0|step: 2544|ppo_ep: 1|act_loss: 0.227783203125|cri_loss: 0.1380615234375|unsuper_loss: 0.0 average reward score: 3.236328125 ------------------------------------------------------------------------------------- |E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.61s (74.30%) |Training time=0.68s (19.42%) |Others=0.22 (6.29%)|CurSamplesPerSec=2.27 |AvgSamplesPerSec=2.37 epoch: 0|step: 2545|ppo_ep: 1|act_loss: 0.07720947265625|cri_loss: 0.04852294921875|unsuper_loss: 0.0 average reward score: 3.97265625 ------------------------------------------------------------------------------------- |E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.64s (74.09%) |Training time=0.72s (20.19%) |Others=0.20 (5.71%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.37 epoch: 0|step: 2546|ppo_ep: 1|act_loss: 0.06829833984375|cri_loss: 0.04034423828125|unsuper_loss: 0.0 average reward score: 3.1015625 ------------------------------------------------------------------------------------- |E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.68s (75.39%) |Training time=0.64s (18.12%) |Others=0.23 (6.48%)|CurSamplesPerSec=2.25 |AvgSamplesPerSec=2.37 epoch: 0|step: 2547|ppo_ep: 1|act_loss: -0.014190673828125|cri_loss: 0.015350341796875|unsuper_loss: 0.0 average reward score: 3.4765625 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.75s (76.08%) |Training time=0.65s (17.89%) |Others=0.22 (6.03%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.37 epoch: 0|step: 2548|ppo_ep: 1|act_loss: 0.0986328125|cri_loss: 0.06146240234375|unsuper_loss: 0.0 average reward score: 3.7890625 ------------------------------------------------------------------------------------- |E2E latency=3.41s |Gather latency=0.00s (0.00%) |Generate time=2.55s (74.90%) |Training time=0.64s (18.85%) |Others=0.21 (6.24%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.37 epoch: 0|step: 2549|ppo_ep: 1|act_loss: 0.0089569091796875|cri_loss: 0.00801849365234375|unsuper_loss: 0.0 average reward score: 4.875 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.74%) |Training time=0.67s (20.44%) |Others=0.19 (5.82%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.37 epoch: 0|step: 2550|ppo_ep: 1|act_loss: 0.1484375|cri_loss: 0.07977294921875|unsuper_loss: 0.0 average reward score: 2.376953125 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.75%) |Training time=0.64s (19.35%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.37 epoch: 0|step: 2551|ppo_ep: 1|act_loss: 0.0494384765625|cri_loss: 0.031097412109375|unsuper_loss: 0.0 average reward score: 2.83203125 ------------------------------------------------------------------------------------- |E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.35s (66.13%) |Training time=0.93s (26.12%) |Others=0.27 (7.75%)|CurSamplesPerSec=2.25 |AvgSamplesPerSec=2.37 epoch: 0|step: 2552|ppo_ep: 1|act_loss: -0.028717041015625|cri_loss: -0.00850677490234375|unsuper_loss: 0.0 average reward score: 4.03515625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.35%) |Training time=0.63s (19.65%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.37 epoch: 0|step: 2553|ppo_ep: 1|act_loss: 0.03985595703125|cri_loss: 0.030517578125|unsuper_loss: 0.0 average reward score: 3.884765625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.39s (72.92%) |Training time=0.69s (21.06%) |Others=0.20 (6.02%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.37 epoch: 0|step: 2554|ppo_ep: 1|act_loss: 0.03125|cri_loss: 0.023834228515625|unsuper_loss: 0.0 average reward score: 3.36328125 ------------------------------------------------------------------------------------- |E2E latency=3.11s |Gather latency=0.00s (0.00%) |Generate time=2.27s (72.94%) |Training time=0.65s (20.89%) |Others=0.19 (6.18%)|CurSamplesPerSec=2.57 |AvgSamplesPerSec=2.37 epoch: 0|step: 2555|ppo_ep: 1|act_loss: 0.07086181640625|cri_loss: 0.040069580078125|unsuper_loss: 0.0 average reward score: 3.994140625 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.49%) |Training time=0.65s (20.31%) |Others=0.20 (6.20%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.37 epoch: 0|step: 2556|ppo_ep: 1|act_loss: 0.0648193359375|cri_loss: 0.0406494140625|unsuper_loss: 0.0 average reward score: 2.1484375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.06%) |Training time=0.64s (19.83%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.37 epoch: 0|step: 2557|ppo_ep: 1|act_loss: 0.06182861328125|cri_loss: 0.039398193359375|unsuper_loss: 0.0 average reward score: 3.96484375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.96%) |Training time=0.65s (20.07%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2558|ppo_ep: 1|act_loss: -0.041046142578125|cri_loss: -0.011016845703125|unsuper_loss: 0.0 average reward score: 4.5625 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.50%) |Training time=0.64s (20.31%) |Others=0.20 (6.19%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.38 [2023-04-24 16:11:24,716] [INFO] [logging.py:96:log_dist] [Rank 0] step=320, skipped=5, lr=[3.7457224660498317e-06, 3.7457224660498317e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 16:11:24,963] [INFO] [timer.py:199:stop] epoch=0/micro_step=2560/global_step=320, RunningAvgSamplesPerSec=15.37740654472043, CurrSamplesPerSec=15.634115891039713, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 16:11:25,165] [INFO] [logging.py:96:log_dist] [Rank 0] step=320, skipped=4, lr=[1.920449624331179e-06, 1.920449624331179e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 2559|ppo_ep: 1|act_loss: 0.175048828125|cri_loss: 0.0965576171875|unsuper_loss: 0.0 average reward score: 3.91015625 ------------------------------------------------------------------------------------- |E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.31s (64.96%) |Training time=0.97s (27.24%) |Others=0.28 (7.80%)|CurSamplesPerSec=2.25 |AvgSamplesPerSec=2.38 epoch: 0|step: 2560|ppo_ep: 1|act_loss: 0.16552734375|cri_loss: 0.09552001953125|unsuper_loss: 0.0 average reward score: 2.57421875 ------------------------------------------------------------------------------------- |E2E latency=3.12s |Gather latency=0.00s (0.00%) |Generate time=2.27s (72.68%) |Training time=0.66s (21.22%) |Others=0.19 (6.10%)|CurSamplesPerSec=2.57 |AvgSamplesPerSec=2.38 epoch: 0|step: 2561|ppo_ep: 1|act_loss: 0.10565185546875|cri_loss: 0.061279296875|unsuper_loss: 0.0 average reward score: 3.51171875 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.67%) |Training time=0.64s (20.15%) |Others=0.20 (6.19%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.38 epoch: 0|step: 2562|ppo_ep: 1|act_loss: -0.0040435791015625|cri_loss: 0.00292205810546875|unsuper_loss: 0.0 average reward score: 3.82421875 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.73%) |Training time=0.64s (20.15%) |Others=0.19 (6.13%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.38 epoch: 0|step: 2563|ppo_ep: 1|act_loss: -0.015777587890625|cri_loss: 0.0021209716796875|unsuper_loss: 0.0 average reward score: 2.85546875 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.81%) |Training time=0.64s (19.33%) |Others=0.19 (5.85%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.38 epoch: 0|step: 2564|ppo_ep: 1|act_loss: 0.09661865234375|cri_loss: 0.0634765625|unsuper_loss: 0.0 average reward score: 2.875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.18%) |Training time=0.65s (19.79%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2565|ppo_ep: 1|act_loss: -0.009796142578125|cri_loss: 0.002349853515625|unsuper_loss: 0.0 average reward score: 3.5390625 ------------------------------------------------------------------------------------- |E2E latency=3.13s |Gather latency=0.00s (0.00%) |Generate time=2.27s (72.58%) |Training time=0.67s (21.31%) |Others=0.19 (6.11%)|CurSamplesPerSec=2.55 |AvgSamplesPerSec=2.38 epoch: 0|step: 2566|ppo_ep: 1|act_loss: -0.0142974853515625|cri_loss: 0.008392333984375|unsuper_loss: 0.0 average reward score: 3.587890625 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.55%) |Training time=0.64s (20.32%) |Others=0.19 (6.14%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.38 epoch: 0|step: 2567|ppo_ep: 1|act_loss: 0.0157623291015625|cri_loss: 0.0157318115234375|unsuper_loss: 0.0 average reward score: 2.7109375 ------------------------------------------------------------------------------------- |E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.29s (65.43%) |Training time=0.94s (26.78%) |Others=0.27 (7.79%)|CurSamplesPerSec=2.29 |AvgSamplesPerSec=2.38 epoch: 0|step: 2568|ppo_ep: 1|act_loss: -0.107177734375|cri_loss: -0.043701171875|unsuper_loss: 0.0 average reward score: 2.5703125 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.38%) |Training time=0.64s (20.35%) |Others=0.20 (6.27%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.38 epoch: 0|step: 2569|ppo_ep: 1|act_loss: -0.018402099609375|cri_loss: -0.001739501953125|unsuper_loss: 0.0 average reward score: 3.859375 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.54%) |Training time=0.65s (20.30%) |Others=0.20 (6.16%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.38 epoch: 0|step: 2570|ppo_ep: 1|act_loss: -0.024383544921875|cri_loss: -0.0067138671875|unsuper_loss: 0.0 average reward score: 4.359375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.31s (71.71%) |Training time=0.71s (22.02%) |Others=0.20 (6.27%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.38 epoch: 0|step: 2571|ppo_ep: 1|act_loss: 0.0184326171875|cri_loss: 0.01556396484375|unsuper_loss: 0.0 average reward score: 4.4453125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.34s (72.43%) |Training time=0.69s (21.49%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.38 epoch: 0|step: 2572|ppo_ep: 1|act_loss: 0.0999755859375|cri_loss: 0.064208984375|unsuper_loss: 0.0 average reward score: 4.1171875 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.32s (72.05%) |Training time=0.71s (21.91%) |Others=0.19 (6.04%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.38 epoch: 0|step: 2573|ppo_ep: 1|act_loss: -0.083740234375|cri_loss: -0.0338134765625|unsuper_loss: 0.0 average reward score: 3.79296875 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.31s (72.38%) |Training time=0.69s (21.55%) |Others=0.19 (6.07%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.38 epoch: 0|step: 2574|ppo_ep: 1|act_loss: -0.06427001953125|cri_loss: -0.0259246826171875|unsuper_loss: 0.0 average reward score: 4.9921875 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.93%) |Training time=0.64s (20.04%) |Others=0.19 (6.04%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.38 epoch: 0|step: 2575|ppo_ep: 1|act_loss: 0.0214691162109375|cri_loss: 0.01800537109375|unsuper_loss: 0.0 average reward score: 4.48828125 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.42s (66.70%) |Training time=0.93s (25.57%) |Others=0.28 (7.73%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.38 epoch: 0|step: 2576|ppo_ep: 1|act_loss: -0.0819091796875|cri_loss: -0.0306243896484375|unsuper_loss: 0.0 average reward score: 4.4609375 ------------------------------------------------------------------------------------- |E2E latency=3.15s |Gather latency=0.00s (0.00%) |Generate time=2.29s (72.79%) |Training time=0.67s (21.17%) |Others=0.19 (6.03%)|CurSamplesPerSec=2.54 |AvgSamplesPerSec=2.38 epoch: 0|step: 2577|ppo_ep: 1|act_loss: 0.0153045654296875|cri_loss: 0.01556396484375|unsuper_loss: 0.0 average reward score: 4.64453125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.10%) |Training time=0.65s (19.84%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2578|ppo_ep: 1|act_loss: -0.052490234375|cri_loss: -0.0182647705078125|unsuper_loss: 0.0 average reward score: 4.4765625 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.26s (71.35%) |Training time=0.72s (22.66%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.38 epoch: 0|step: 2579|ppo_ep: 1|act_loss: -0.10809326171875|cri_loss: -0.045684814453125|unsuper_loss: 0.0 average reward score: 4.8828125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.09%) |Training time=0.68s (20.99%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.38 epoch: 0|step: 2580|ppo_ep: 1|act_loss: -0.00372314453125|cri_loss: 0.00574493408203125|unsuper_loss: 0.0 average reward score: 4.2578125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.33s (72.38%) |Training time=0.69s (21.47%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.38 epoch: 0|step: 2581|ppo_ep: 1|act_loss: -0.1466064453125|cri_loss: -0.044586181640625|unsuper_loss: 0.0 average reward score: 4.26953125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.00%) |Training time=0.64s (19.84%) |Others=0.20 (6.16%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2582|ppo_ep: 1|act_loss: -0.0517578125|cri_loss: -0.0127105712890625|unsuper_loss: 0.0 average reward score: 3.48046875 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.28s (71.72%) |Training time=0.70s (22.11%) |Others=0.20 (6.17%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.38 epoch: 0|step: 2583|ppo_ep: 1|act_loss: -0.0245819091796875|cri_loss: -0.0040740966796875|unsuper_loss: 0.0 average reward score: 3.935546875 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.37s (66.17%) |Training time=0.93s (25.99%) |Others=0.28 (7.84%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.38 epoch: 0|step: 2584|ppo_ep: 1|act_loss: -0.123779296875|cri_loss: -0.047119140625|unsuper_loss: 0.0 average reward score: 5.03515625 ------------------------------------------------------------------------------------- |E2E latency=3.16s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.71%) |Training time=0.64s (20.31%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.38 epoch: 0|step: 2585|ppo_ep: 1|act_loss: -0.1546630859375|cri_loss: -0.06396484375|unsuper_loss: 0.0 average reward score: 3.9140625 ------------------------------------------------------------------------------------- |E2E latency=3.14s |Gather latency=0.00s (0.00%) |Generate time=2.31s (73.49%) |Training time=0.64s (20.43%) |Others=0.19 (6.09%)|CurSamplesPerSec=2.55 |AvgSamplesPerSec=2.38 epoch: 0|step: 2586|ppo_ep: 1|act_loss: 0.04583740234375|cri_loss: 0.026611328125|unsuper_loss: 0.0 average reward score: 5.0625 ------------------------------------------------------------------------------------- |E2E latency=3.13s |Gather latency=0.00s (0.00%) |Generate time=2.29s (73.16%) |Training time=0.65s (20.66%) |Others=0.19 (6.18%)|CurSamplesPerSec=2.55 |AvgSamplesPerSec=2.38 epoch: 0|step: 2587|ppo_ep: 1|act_loss: 0.0053253173828125|cri_loss: 0.0105743408203125|unsuper_loss: 0.0 average reward score: 4.43359375 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.81%) |Training time=0.64s (20.17%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.38 epoch: 0|step: 2588|ppo_ep: 1|act_loss: -0.0732421875|cri_loss: -0.03106689453125|unsuper_loss: 0.0 average reward score: 3.77734375 ------------------------------------------------------------------------------------- |E2E latency=3.12s |Gather latency=0.00s (0.00%) |Generate time=2.26s (72.45%) |Training time=0.67s (21.33%) |Others=0.19 (6.22%)|CurSamplesPerSec=2.56 |AvgSamplesPerSec=2.38 epoch: 0|step: 2589|ppo_ep: 1|act_loss: -0.080078125|cri_loss: -0.03302001953125|unsuper_loss: 0.0 average reward score: 3.671875 ------------------------------------------------------------------------------------- |E2E latency=3.13s |Gather latency=0.00s (0.00%) |Generate time=2.26s (72.44%) |Training time=0.67s (21.47%) |Others=0.19 (6.09%)|CurSamplesPerSec=2.56 |AvgSamplesPerSec=2.38 epoch: 0|step: 2590|ppo_ep: 1|act_loss: -0.0137786865234375|cri_loss: -0.0012969970703125|unsuper_loss: 0.0 average reward score: 4.1875 ------------------------------------------------------------------------------------- |E2E latency=3.13s |Gather latency=0.00s (0.00%) |Generate time=2.26s (72.23%) |Training time=0.67s (21.30%) |Others=0.20 (6.47%)|CurSamplesPerSec=2.56 |AvgSamplesPerSec=2.38 epoch: 0|step: 2591|ppo_ep: 1|act_loss: -0.036041259765625|cri_loss: -0.01198577880859375|unsuper_loss: 0.0 average reward score: 3.185546875 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.37s (66.27%) |Training time=0.92s (25.84%) |Others=0.28 (7.89%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.38 epoch: 0|step: 2592|ppo_ep: 1|act_loss: 0.091064453125|cri_loss: 0.053619384765625|unsuper_loss: 0.0 average reward score: 2.751953125 ------------------------------------------------------------------------------------- |E2E latency=3.13s |Gather latency=0.00s (0.00%) |Generate time=2.30s (73.42%) |Training time=0.64s (20.51%) |Others=0.19 (6.07%)|CurSamplesPerSec=2.55 |AvgSamplesPerSec=2.38 epoch: 0|step: 2593|ppo_ep: 1|act_loss: -0.08056640625|cri_loss: -0.03411865234375|unsuper_loss: 0.0 average reward score: 3.197265625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.29s (71.33%) |Training time=0.73s (22.64%) |Others=0.19 (6.03%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.38 epoch: 0|step: 2594|ppo_ep: 1|act_loss: -0.08477783203125|cri_loss: -0.03643798828125|unsuper_loss: 0.0 average reward score: 5.0 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.43%) |Training time=0.64s (19.43%) |Others=0.24 (7.14%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.38 epoch: 0|step: 2595|ppo_ep: 1|act_loss: 0.06353759765625|cri_loss: 0.035858154296875|unsuper_loss: 0.0 average reward score: 4.4921875 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.54%) |Training time=0.65s (20.36%) |Others=0.19 (6.11%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.38 epoch: 0|step: 2596|ppo_ep: 1|act_loss: -0.1287841796875|cri_loss: -0.05718994140625|unsuper_loss: 0.0 average reward score: 4.75 ------------------------------------------------------------------------------------- |E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.54s (71.56%) |Training time=0.76s (21.29%) |Others=0.25 (7.14%)|CurSamplesPerSec=2.26 |AvgSamplesPerSec=2.38 epoch: 0|step: 2597|ppo_ep: 1|act_loss: -0.03387451171875|cri_loss: -0.0089111328125|unsuper_loss: 0.0 average reward score: 4.671875 ------------------------------------------------------------------------------------- |E2E latency=4.00s |Gather latency=0.00s (0.00%) |Generate time=2.83s (70.78%) |Training time=0.89s (22.27%) |Others=0.28 (6.95%)|CurSamplesPerSec=2.00 |AvgSamplesPerSec=2.38 epoch: 0|step: 2598|ppo_ep: 1|act_loss: -0.03216552734375|cri_loss: -0.0055999755859375|unsuper_loss: 0.0 average reward score: 2.884765625 ------------------------------------------------------------------------------------- |E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.69s (75.38%) |Training time=0.65s (18.28%) |Others=0.23 (6.35%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.38 epoch: 0|step: 2599|ppo_ep: 1|act_loss: -0.046417236328125|cri_loss: -9.1552734375e-05|unsuper_loss: 0.0 average reward score: 4.08984375 ------------------------------------------------------------------------------------- |E2E latency=3.97s |Gather latency=0.00s (0.00%) |Generate time=2.69s (67.85%) |Training time=0.96s (24.24%) |Others=0.31 (7.92%)|CurSamplesPerSec=2.02 |AvgSamplesPerSec=2.38 epoch: 0|step: 2600|ppo_ep: 1|act_loss: -9.918212890625e-05|cri_loss: 0.0050201416015625|unsuper_loss: 0.0 average reward score: 4.6015625 ------------------------------------------------------------------------------------- |E2E latency=3.69s |Gather latency=0.00s (0.00%) |Generate time=2.87s (77.77%) |Training time=0.63s (17.11%) |Others=0.19 (5.12%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.38 epoch: 0|step: 2601|ppo_ep: 1|act_loss: 0.0640869140625|cri_loss: 0.0404052734375|unsuper_loss: 0.0 average reward score: 3.263671875 ------------------------------------------------------------------------------------- |E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.60s (73.98%) |Training time=0.68s (19.34%) |Others=0.23 (6.68%)|CurSamplesPerSec=2.28 |AvgSamplesPerSec=2.38 epoch: 0|step: 2602|ppo_ep: 1|act_loss: 0.01751708984375|cri_loss: 0.0166473388671875|unsuper_loss: 0.0 average reward score: 4.05078125 ------------------------------------------------------------------------------------- |E2E latency=3.70s |Gather latency=0.00s (0.00%) |Generate time=2.82s (76.08%) |Training time=0.65s (17.67%) |Others=0.23 (6.25%)|CurSamplesPerSec=2.16 |AvgSamplesPerSec=2.38 epoch: 0|step: 2603|ppo_ep: 1|act_loss: -0.03741455078125|cri_loss: -0.0069580078125|unsuper_loss: 0.0 average reward score: 3.916015625 ------------------------------------------------------------------------------------- |E2E latency=3.83s |Gather latency=0.00s (0.00%) |Generate time=2.94s (76.92%) |Training time=0.66s (17.30%) |Others=0.22 (5.78%)|CurSamplesPerSec=2.09 |AvgSamplesPerSec=2.38 epoch: 0|step: 2604|ppo_ep: 1|act_loss: 0.1153564453125|cri_loss: 0.0635986328125|unsuper_loss: 0.0 average reward score: 3.02734375 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.45%) |Training time=0.64s (19.47%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.38 epoch: 0|step: 2605|ppo_ep: 1|act_loss: 0.0333251953125|cri_loss: 0.030609130859375|unsuper_loss: 0.0 average reward score: 3.849609375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.32%) |Training time=0.65s (19.66%) |Others=0.20 (6.02%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2606|ppo_ep: 1|act_loss: 0.0213775634765625|cri_loss: 0.0166473388671875|unsuper_loss: 0.0 average reward score: 5.546875 ------------------------------------------------------------------------------------- |E2E latency=3.44s |Gather latency=0.00s (0.00%) |Generate time=2.56s (74.46%) |Training time=0.65s (18.81%) |Others=0.23 (6.73%)|CurSamplesPerSec=2.32 |AvgSamplesPerSec=2.38 epoch: 0|step: 2607|ppo_ep: 1|act_loss: -0.0634765625|cri_loss: -0.022064208984375|unsuper_loss: 0.0 average reward score: 4.4296875 ------------------------------------------------------------------------------------- |E2E latency=3.80s |Gather latency=0.00s (0.00%) |Generate time=2.56s (67.20%) |Training time=0.94s (24.64%) |Others=0.31 (8.15%)|CurSamplesPerSec=2.10 |AvgSamplesPerSec=2.38 epoch: 0|step: 2608|ppo_ep: 1|act_loss: 0.07989501953125|cri_loss: 0.05084228515625|unsuper_loss: 0.0 average reward score: 3.802734375 ------------------------------------------------------------------------------------- |E2E latency=3.69s |Gather latency=0.00s (0.00%) |Generate time=2.51s (67.89%) |Training time=0.97s (26.28%) |Others=0.22 (5.83%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.38 epoch: 0|step: 2609|ppo_ep: 1|act_loss: 0.1046142578125|cri_loss: 0.0626220703125|unsuper_loss: 0.0 average reward score: 3.880859375 ------------------------------------------------------------------------------------- |E2E latency=3.83s |Gather latency=0.00s (0.00%) |Generate time=2.96s (77.15%) |Training time=0.64s (16.75%) |Others=0.23 (6.10%)|CurSamplesPerSec=2.09 |AvgSamplesPerSec=2.38 epoch: 0|step: 2610|ppo_ep: 1|act_loss: -0.06488037109375|cri_loss: -0.0244598388671875|unsuper_loss: 0.0 average reward score: 3.56640625 ------------------------------------------------------------------------------------- |E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.56s (72.74%) |Training time=0.73s (20.85%) |Others=0.23 (6.41%)|CurSamplesPerSec=2.27 |AvgSamplesPerSec=2.38 epoch: 0|step: 2611|ppo_ep: 1|act_loss: -0.015106201171875|cri_loss: 0.005645751953125|unsuper_loss: 0.0 average reward score: 3.94921875 ------------------------------------------------------------------------------------- |E2E latency=3.43s |Gather latency=0.00s (0.00%) |Generate time=2.50s (73.04%) |Training time=0.72s (20.95%) |Others=0.21 (6.01%)|CurSamplesPerSec=2.33 |AvgSamplesPerSec=2.38 epoch: 0|step: 2612|ppo_ep: 1|act_loss: -0.0131988525390625|cri_loss: -0.003002166748046875|unsuper_loss: 0.0 average reward score: 4.5625 ------------------------------------------------------------------------------------- |E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.59s (74.67%) |Training time=0.65s (18.59%) |Others=0.23 (6.74%)|CurSamplesPerSec=2.30 |AvgSamplesPerSec=2.38 epoch: 0|step: 2613|ppo_ep: 1|act_loss: 0.0015716552734375|cri_loss: 0.01183319091796875|unsuper_loss: 0.0 average reward score: 3.515625 ------------------------------------------------------------------------------------- |E2E latency=3.70s |Gather latency=0.00s (0.00%) |Generate time=2.84s (76.57%) |Training time=0.64s (17.28%) |Others=0.23 (6.15%)|CurSamplesPerSec=2.16 |AvgSamplesPerSec=2.38 epoch: 0|step: 2614|ppo_ep: 1|act_loss: -0.0311126708984375|cri_loss: -0.004852294921875|unsuper_loss: 0.0 average reward score: 5.1171875 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.61%) |Training time=0.64s (19.43%) |Others=0.20 (5.96%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2615|ppo_ep: 1|act_loss: 0.06024169921875|cri_loss: 0.045318603515625|unsuper_loss: 0.0 average reward score: 4.6015625 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.39s (66.55%) |Training time=0.93s (25.75%) |Others=0.28 (7.70%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.38 epoch: 0|step: 2616|ppo_ep: 1|act_loss: -0.034423828125|cri_loss: -0.01277923583984375|unsuper_loss: 0.0 average reward score: 4.4296875 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.17%) |Training time=0.64s (19.54%) |Others=0.21 (6.30%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2617|ppo_ep: 1|act_loss: 0.139404296875|cri_loss: 0.078125|unsuper_loss: 0.0 average reward score: 4.0859375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.45%) |Training time=0.64s (19.60%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2618|ppo_ep: 1|act_loss: 0.036590576171875|cri_loss: 0.029937744140625|unsuper_loss: 0.0 average reward score: 5.0390625 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.78%) |Training time=0.64s (19.41%) |Others=0.19 (5.81%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.38 epoch: 0|step: 2619|ppo_ep: 1|act_loss: 0.136474609375|cri_loss: 0.081298828125|unsuper_loss: 0.0 average reward score: 3.34375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.52%) |Training time=0.64s (19.58%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2620|ppo_ep: 1|act_loss: 0.058135986328125|cri_loss: 0.0423583984375|unsuper_loss: 0.0 average reward score: 3.734375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.38%) |Training time=0.64s (19.56%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2621|ppo_ep: 1|act_loss: 0.0411376953125|cri_loss: 0.02838134765625|unsuper_loss: 0.0 average reward score: 3.70703125 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.34%) |Training time=0.65s (19.75%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2622|ppo_ep: 1|act_loss: 0.0389404296875|cri_loss: 0.02459716796875|unsuper_loss: 0.0 average reward score: 4.0234375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.91%) |Training time=0.65s (19.95%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2623|ppo_ep: 1|act_loss: 0.0745849609375|cri_loss: 0.047698974609375|unsuper_loss: 0.0 average reward score: 3.689453125 ------------------------------------------------------------------------------------- |E2E latency=3.67s |Gather latency=0.00s (0.00%) |Generate time=2.46s (67.03%) |Training time=0.93s (25.36%) |Others=0.28 (7.61%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.38 epoch: 0|step: 2624|ppo_ep: 1|act_loss: 0.046875|cri_loss: 0.034912109375|unsuper_loss: 0.0 average reward score: 4.078125 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.59%) |Training time=0.64s (19.38%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.38 epoch: 0|step: 2625|ppo_ep: 1|act_loss: 0.0192413330078125|cri_loss: 0.0167694091796875|unsuper_loss: 0.0 average reward score: 4.74609375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.53%) |Training time=0.64s (19.50%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2626|ppo_ep: 1|act_loss: -0.065185546875|cri_loss: -0.012939453125|unsuper_loss: 0.0 average reward score: 4.609375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.38%) |Training time=0.64s (19.57%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2627|ppo_ep: 1|act_loss: 0.12841796875|cri_loss: 0.080078125|unsuper_loss: 0.0 average reward score: 4.57421875 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.56%) |Training time=0.65s (19.46%) |Others=0.20 (5.98%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.38 epoch: 0|step: 2628|ppo_ep: 1|act_loss: 0.0072021484375|cri_loss: 0.0153656005859375|unsuper_loss: 0.0 average reward score: 3.236328125 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.12%) |Training time=0.65s (19.86%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2629|ppo_ep: 1|act_loss: 0.1312255859375|cri_loss: 0.07373046875|unsuper_loss: 0.0 average reward score: 4.046875 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.28%) |Training time=0.65s (19.70%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.38 epoch: 0|step: 2630|ppo_ep: 1|act_loss: 0.1651611328125|cri_loss: 0.0904541015625|unsuper_loss: 0.0 average reward score: 4.78515625 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.43s (73.78%) |Training time=0.66s (20.08%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.38 epoch: 0|step: 2631|ppo_ep: 1|act_loss: 0.13427734375|cri_loss: 0.0789794921875|unsuper_loss: 0.0 average reward score: 4.8203125 ------------------------------------------------------------------------------------- |E2E latency=3.66s |Gather latency=0.00s (0.00%) |Generate time=2.45s (66.89%) |Training time=0.93s (25.51%) |Others=0.28 (7.60%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.38 epoch: 0|step: 2632|ppo_ep: 1|act_loss: 0.1329345703125|cri_loss: 0.0711669921875|unsuper_loss: 0.0 average reward score: 3.328125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.54%) |Training time=0.64s (19.57%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2633|ppo_ep: 1|act_loss: -0.04144287109375|cri_loss: -0.0086822509765625|unsuper_loss: 0.0 average reward score: 4.51171875 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.49%) |Training time=0.65s (19.65%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.38 epoch: 0|step: 2634|ppo_ep: 1|act_loss: 0.0950927734375|cri_loss: 0.056243896484375|unsuper_loss: 0.0 average reward score: 3.404296875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.28%) |Training time=0.65s (19.64%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.38 epoch: 0|step: 2635|ppo_ep: 1|act_loss: 0.018280029296875|cri_loss: 0.032745361328125|unsuper_loss: 0.0 average reward score: 5.01953125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.34%) |Training time=0.64s (19.71%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2636|ppo_ep: 1|act_loss: 0.1197509765625|cri_loss: 0.06549072265625|unsuper_loss: 0.0 average reward score: 5.0859375 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.59%) |Training time=0.65s (20.10%) |Others=0.20 (6.31%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.38 epoch: 0|step: 2637|ppo_ep: 1|act_loss: 0.12890625|cri_loss: 0.073974609375|unsuper_loss: 0.0 average reward score: 4.00390625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.00%) |Training time=0.65s (20.09%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2638|ppo_ep: 1|act_loss: 0.155029296875|cri_loss: 0.08355712890625|unsuper_loss: 0.0 average reward score: 4.125 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.34%) |Training time=0.65s (19.62%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.38 [2023-04-24 16:15:53,015] [INFO] [logging.py:96:log_dist] [Rank 0] step=330, skipped=5, lr=[3.3570163601114465e-06, 3.3570163601114465e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 16:15:53,262] [INFO] [timer.py:199:stop] epoch=0/micro_step=2640/global_step=330, RunningAvgSamplesPerSec=15.373201199170959, CurrSamplesPerSec=15.766728350835967, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 16:15:53,463] [INFO] [logging.py:96:log_dist] [Rank 0] step=330, skipped=4, lr=[1.7195154812705344e-06, 1.7195154812705344e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 2639|ppo_ep: 1|act_loss: 0.07598876953125|cri_loss: 0.0458984375|unsuper_loss: 0.0 average reward score: 3.890625 ------------------------------------------------------------------------------------- |E2E latency=3.68s |Gather latency=0.00s (0.00%) |Generate time=2.46s (66.80%) |Training time=0.94s (25.69%) |Others=0.28 (7.51%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.38 epoch: 0|step: 2640|ppo_ep: 1|act_loss: 0.07220458984375|cri_loss: 0.046417236328125|unsuper_loss: 0.0 average reward score: 4.66015625 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.38%) |Training time=0.65s (19.30%) |Others=0.21 (6.32%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.38 epoch: 0|step: 2641|ppo_ep: 1|act_loss: 0.0989990234375|cri_loss: 0.05682373046875|unsuper_loss: 0.0 average reward score: 4.7890625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.27%) |Training time=0.64s (19.70%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2642|ppo_ep: 1|act_loss: -0.12030029296875|cri_loss: -0.05377197265625|unsuper_loss: 0.0 average reward score: 4.19140625 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.44s (73.99%) |Training time=0.65s (19.69%) |Others=0.21 (6.32%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.38 epoch: 0|step: 2643|ppo_ep: 1|act_loss: -0.03118896484375|cri_loss: -0.00778961181640625|unsuper_loss: 0.0 average reward score: 4.1328125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.00%) |Training time=0.65s (20.06%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2644|ppo_ep: 1|act_loss: 0.240478515625|cri_loss: 0.1292724609375|unsuper_loss: 0.0 average reward score: 4.16796875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.23%) |Training time=0.64s (19.62%) |Others=0.20 (6.14%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2645|ppo_ep: 1|act_loss: -0.04083251953125|cri_loss: -0.0100860595703125|unsuper_loss: 0.0 average reward score: 4.78515625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.38%) |Training time=0.64s (19.66%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2646|ppo_ep: 1|act_loss: 0.1025390625|cri_loss: 0.057464599609375|unsuper_loss: 0.0 average reward score: 5.484375 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.36%) |Training time=0.65s (20.32%) |Others=0.20 (6.32%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.38 epoch: 0|step: 2647|ppo_ep: 1|act_loss: 0.0745849609375|cri_loss: 0.0491943359375|unsuper_loss: 0.0 average reward score: 4.2109375 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.43s (66.83%) |Training time=0.93s (25.51%) |Others=0.28 (7.66%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.38 epoch: 0|step: 2648|ppo_ep: 1|act_loss: 0.092529296875|cri_loss: 0.057159423828125|unsuper_loss: 0.0 average reward score: 4.4609375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.47%) |Training time=0.64s (19.74%) |Others=0.22 (6.78%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2649|ppo_ep: 1|act_loss: 0.1058349609375|cri_loss: 0.068115234375|unsuper_loss: 0.0 average reward score: 3.6171875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.78%) |Training time=0.65s (20.11%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2650|ppo_ep: 1|act_loss: 0.07867431640625|cri_loss: 0.048095703125|unsuper_loss: 0.0 average reward score: 4.015625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.24%) |Training time=0.64s (19.66%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2651|ppo_ep: 1|act_loss: 0.0004730224609375|cri_loss: 0.009552001953125|unsuper_loss: 0.0 average reward score: 3.2890625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.63%) |Training time=0.65s (20.41%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.38 epoch: 0|step: 2652|ppo_ep: 1|act_loss: 0.131103515625|cri_loss: 0.07086181640625|unsuper_loss: 0.0 average reward score: 2.80859375 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.62%) |Training time=0.64s (20.03%) |Others=0.20 (6.35%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.38 epoch: 0|step: 2653|ppo_ep: 1|act_loss: 0.1763916015625|cri_loss: 0.1043701171875|unsuper_loss: 0.0 average reward score: 3.109375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.53%) |Training time=0.66s (20.26%) |Others=0.20 (6.21%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2654|ppo_ep: 1|act_loss: 0.0677490234375|cri_loss: 0.058258056640625|unsuper_loss: 0.0 average reward score: 3.13671875 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.65%) |Training time=0.65s (20.31%) |Others=0.19 (6.04%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.38 epoch: 0|step: 2655|ppo_ep: 1|act_loss: 0.1533203125|cri_loss: 0.08441162109375|unsuper_loss: 0.0 average reward score: 5.703125 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.40s (66.31%) |Training time=0.94s (25.96%) |Others=0.28 (7.73%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.38 epoch: 0|step: 2656|ppo_ep: 1|act_loss: -0.0042572021484375|cri_loss: 0.00921630859375|unsuper_loss: 0.0 average reward score: 3.59765625 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.91%) |Training time=0.64s (20.22%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.38 epoch: 0|step: 2657|ppo_ep: 1|act_loss: 0.0162353515625|cri_loss: 0.01285552978515625|unsuper_loss: 0.0 average reward score: 3.609375 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.63%) |Training time=0.64s (20.30%) |Others=0.19 (6.07%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.38 epoch: 0|step: 2658|ppo_ep: 1|act_loss: 0.08966064453125|cri_loss: 0.05352783203125|unsuper_loss: 0.0 average reward score: 4.6015625 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.48%) |Training time=0.65s (20.40%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.38 epoch: 0|step: 2659|ppo_ep: 1|act_loss: 0.085693359375|cri_loss: 0.0506591796875|unsuper_loss: 0.0 average reward score: 4.26953125 ------------------------------------------------------------------------------------- |E2E latency=3.13s |Gather latency=0.00s (0.00%) |Generate time=2.30s (73.42%) |Training time=0.64s (20.50%) |Others=0.19 (6.08%)|CurSamplesPerSec=2.56 |AvgSamplesPerSec=2.38 epoch: 0|step: 2660|ppo_ep: 1|act_loss: -0.0028839111328125|cri_loss: 0.0134735107421875|unsuper_loss: 0.0 average reward score: 3.607421875 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.66%) |Training time=0.64s (20.26%) |Others=0.19 (6.08%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.38 epoch: 0|step: 2661|ppo_ep: 1|act_loss: 0.0771484375|cri_loss: 0.05450439453125|unsuper_loss: 0.0 average reward score: 4.5625 ------------------------------------------------------------------------------------- |E2E latency=3.14s |Gather latency=0.00s (0.00%) |Generate time=2.29s (73.14%) |Training time=0.65s (20.64%) |Others=0.19 (6.22%)|CurSamplesPerSec=2.55 |AvgSamplesPerSec=2.38 epoch: 0|step: 2662|ppo_ep: 1|act_loss: -0.0245208740234375|cri_loss: -0.0023193359375|unsuper_loss: 0.0 average reward score: 3.8359375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.48%) |Training time=0.64s (19.47%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2663|ppo_ep: 1|act_loss: 0.1298828125|cri_loss: 0.071533203125|unsuper_loss: 0.0 average reward score: 5.27734375 ------------------------------------------------------------------------------------- |E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.33s (65.94%) |Training time=0.93s (26.34%) |Others=0.27 (7.72%)|CurSamplesPerSec=2.27 |AvgSamplesPerSec=2.38 epoch: 0|step: 2664|ppo_ep: 1|act_loss: 0.0966796875|cri_loss: 0.0576171875|unsuper_loss: 0.0 average reward score: 4.4375 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.44s (73.81%) |Training time=0.66s (20.11%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.38 epoch: 0|step: 2665|ppo_ep: 1|act_loss: 0.126953125|cri_loss: 0.07135009765625|unsuper_loss: 0.0 average reward score: 3.0 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.33s (72.75%) |Training time=0.67s (21.02%) |Others=0.20 (6.23%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.38 epoch: 0|step: 2666|ppo_ep: 1|act_loss: 0.047393798828125|cri_loss: 0.046661376953125|unsuper_loss: 0.0 average reward score: 3.966796875 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.10%) |Training time=0.65s (19.71%) |Others=0.20 (6.19%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2667|ppo_ep: 1|act_loss: -0.041412353515625|cri_loss: -0.01497650146484375|unsuper_loss: 0.0 average reward score: 3.890625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.12%) |Training time=0.65s (19.90%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2668|ppo_ep: 1|act_loss: -0.08807373046875|cri_loss: -0.03570556640625|unsuper_loss: 0.0 average reward score: 3.9375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.97%) |Training time=0.65s (19.85%) |Others=0.20 (6.18%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2669|ppo_ep: 1|act_loss: 0.03668212890625|cri_loss: 0.02752685546875|unsuper_loss: 0.0 average reward score: 3.408203125 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.44s (73.55%) |Training time=0.65s (19.59%) |Others=0.23 (6.86%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.38 epoch: 0|step: 2670|ppo_ep: 1|act_loss: 0.12213134765625|cri_loss: 0.0716552734375|unsuper_loss: 0.0 average reward score: 4.109375 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.44s (73.10%) |Training time=0.69s (20.67%) |Others=0.21 (6.24%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.38 epoch: 0|step: 2671|ppo_ep: 1|act_loss: -0.07568359375|cri_loss: -0.030853271484375|unsuper_loss: 0.0 average reward score: 3.26953125 ------------------------------------------------------------------------------------- |E2E latency=3.67s |Gather latency=0.00s (0.00%) |Generate time=2.46s (67.01%) |Training time=0.93s (25.42%) |Others=0.28 (7.57%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.38 epoch: 0|step: 2672|ppo_ep: 1|act_loss: -0.0196533203125|cri_loss: -0.0016021728515625|unsuper_loss: 0.0 average reward score: 4.27734375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.39%) |Training time=0.64s (19.63%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2673|ppo_ep: 1|act_loss: 0.08349609375|cri_loss: 0.046722412109375|unsuper_loss: 0.0 average reward score: 4.0859375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.35s (72.07%) |Training time=0.70s (21.44%) |Others=0.21 (6.49%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2674|ppo_ep: 1|act_loss: 0.0521240234375|cri_loss: 0.03485107421875|unsuper_loss: 0.0 average reward score: 4.56640625 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.71%) |Training time=0.64s (19.32%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.38 epoch: 0|step: 2675|ppo_ep: 1|act_loss: 0.173095703125|cri_loss: 0.09979248046875|unsuper_loss: 0.0 average reward score: 3.4453125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.90%) |Training time=0.65s (19.94%) |Others=0.20 (6.16%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2676|ppo_ep: 1|act_loss: -0.03985595703125|cri_loss: -0.0107269287109375|unsuper_loss: 0.0 average reward score: 4.09375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.47%) |Training time=0.67s (20.37%) |Others=0.20 (6.16%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2677|ppo_ep: 1|act_loss: -0.047882080078125|cri_loss: -0.02032470703125|unsuper_loss: 0.0 average reward score: 4.09375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.01%) |Training time=0.64s (19.91%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2678|ppo_ep: 1|act_loss: -0.0022125244140625|cri_loss: 0.004734039306640625|unsuper_loss: 0.0 average reward score: 4.0078125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.20%) |Training time=0.64s (19.88%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.38 epoch: 0|step: 2679|ppo_ep: 1|act_loss: 0.184814453125|cri_loss: 0.1021728515625|unsuper_loss: 0.0 average reward score: 4.296875 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.40s (66.60%) |Training time=0.92s (25.66%) |Others=0.28 (7.73%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.38 epoch: 0|step: 2680|ppo_ep: 1|act_loss: -0.03887939453125|cri_loss: -0.0113525390625|unsuper_loss: 0.0 average reward score: 3.44140625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.32%) |Training time=0.64s (19.75%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.38 epoch: 0|step: 2681|ppo_ep: 1|act_loss: -0.01287841796875|cri_loss: 0.0019683837890625|unsuper_loss: 0.0 average reward score: 4.33984375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.10%) |Training time=0.65s (19.84%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2682|ppo_ep: 1|act_loss: -0.0303192138671875|cri_loss: -0.0038604736328125|unsuper_loss: 0.0 average reward score: 4.0234375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.21%) |Training time=0.64s (19.77%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.38 epoch: 0|step: 2683|ppo_ep: 1|act_loss: 0.1480712890625|cri_loss: 0.0885009765625|unsuper_loss: 0.0 average reward score: 4.28125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.78%) |Training time=0.66s (20.28%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2684|ppo_ep: 1|act_loss: 0.02532958984375|cri_loss: 0.020843505859375|unsuper_loss: 0.0 average reward score: 4.71875 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.42s (72.60%) |Training time=0.72s (21.58%) |Others=0.19 (5.83%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.38 epoch: 0|step: 2685|ppo_ep: 1|act_loss: 0.1162109375|cri_loss: 0.06689453125|unsuper_loss: 0.0 average reward score: 2.83984375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.10%) |Training time=0.64s (19.66%) |Others=0.20 (6.23%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2686|ppo_ep: 1|act_loss: 0.031494140625|cri_loss: 0.02325439453125|unsuper_loss: 0.0 average reward score: 3.462890625 ------------------------------------------------------------------------------------- |E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.40s (69.50%) |Training time=0.85s (24.56%) |Others=0.21 (5.94%)|CurSamplesPerSec=2.32 |AvgSamplesPerSec=2.38 epoch: 0|step: 2687|ppo_ep: 1|act_loss: -0.1632080078125|cri_loss: -0.070068359375|unsuper_loss: 0.0 average reward score: 4.2265625 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.63%) |Training time=0.93s (25.63%) |Others=0.28 (7.74%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.38 epoch: 0|step: 2688|ppo_ep: 1|act_loss: 0.05908203125|cri_loss: 0.0394287109375|unsuper_loss: 0.0 average reward score: 3.28515625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.80%) |Training time=0.66s (20.16%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2689|ppo_ep: 1|act_loss: 0.1409912109375|cri_loss: 0.0916748046875|unsuper_loss: 0.0 average reward score: 4.34375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.04%) |Training time=0.65s (19.82%) |Others=0.20 (6.14%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2690|ppo_ep: 1|act_loss: -0.0694580078125|cri_loss: -0.022003173828125|unsuper_loss: 0.0 average reward score: 4.0625 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.40s (72.90%) |Training time=0.68s (20.77%) |Others=0.21 (6.33%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.38 epoch: 0|step: 2691|ppo_ep: 1|act_loss: -0.0592041015625|cri_loss: -0.0205841064453125|unsuper_loss: 0.0 average reward score: 4.875 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.44s (73.33%) |Training time=0.68s (20.48%) |Others=0.21 (6.20%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.38 epoch: 0|step: 2692|ppo_ep: 1|act_loss: 0.002254486083984375|cri_loss: 0.009124755859375|unsuper_loss: 0.0 average reward score: 4.6484375 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.43s (73.84%) |Training time=0.66s (20.00%) |Others=0.20 (6.16%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.38 epoch: 0|step: 2693|ppo_ep: 1|act_loss: 0.1929931640625|cri_loss: 0.11138916015625|unsuper_loss: 0.0 average reward score: 4.25390625 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.11%) |Training time=0.66s (19.84%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.38 epoch: 0|step: 2694|ppo_ep: 1|act_loss: -0.0117034912109375|cri_loss: 0.0025482177734375|unsuper_loss: 0.0 average reward score: 4.25390625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.59%) |Training time=0.66s (20.19%) |Others=0.20 (6.22%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2695|ppo_ep: 1|act_loss: -0.06207275390625|cri_loss: -0.0207366943359375|unsuper_loss: 0.0 average reward score: 4.1484375 ------------------------------------------------------------------------------------- |E2E latency=3.72s |Gather latency=0.00s (0.00%) |Generate time=2.43s (65.39%) |Training time=0.99s (26.51%) |Others=0.30 (8.11%)|CurSamplesPerSec=2.15 |AvgSamplesPerSec=2.38 epoch: 0|step: 2696|ppo_ep: 1|act_loss: 0.0138397216796875|cri_loss: 0.0188140869140625|unsuper_loss: 0.0 average reward score: 4.50390625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.32%) |Training time=0.64s (19.78%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.38 epoch: 0|step: 2697|ppo_ep: 1|act_loss: -0.04974365234375|cri_loss: -0.0174102783203125|unsuper_loss: 0.0 average reward score: 3.75390625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.06%) |Training time=0.64s (19.97%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.38 epoch: 0|step: 2698|ppo_ep: 1|act_loss: -0.09063720703125|cri_loss: -0.03350830078125|unsuper_loss: 0.0 average reward score: 3.92578125 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.40s (72.16%) |Training time=0.73s (21.99%) |Others=0.19 (5.85%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.38 epoch: 0|step: 2699|ppo_ep: 1|act_loss: -0.1533203125|cri_loss: -0.06414794921875|unsuper_loss: 0.0 average reward score: 3.76953125 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.37s (71.74%) |Training time=0.74s (22.40%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.38 epoch: 0|step: 2700|ppo_ep: 1|act_loss: 0.034271240234375|cri_loss: 0.0255126953125|unsuper_loss: 0.0 average reward score: 5.5859375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.08%) |Training time=0.64s (19.72%) |Others=0.20 (6.20%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.38 epoch: 0|step: 2701|ppo_ep: 1|act_loss: -0.006622314453125|cri_loss: 0.01080322265625|unsuper_loss: 0.0 average reward score: 3.1328125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.34%) |Training time=0.64s (19.81%) |Others=0.19 (5.85%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.38 epoch: 0|step: 2702|ppo_ep: 1|act_loss: 0.000457763671875|cri_loss: 0.020965576171875|unsuper_loss: 0.0 average reward score: 3.3203125 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.55%) |Training time=0.67s (20.37%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.38 epoch: 0|step: 2703|ppo_ep: 1|act_loss: -0.041229248046875|cri_loss: -0.009979248046875|unsuper_loss: 0.0 average reward score: 4.7890625 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.40s (66.26%) |Training time=0.94s (26.05%) |Others=0.28 (7.69%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.38 epoch: 0|step: 2704|ppo_ep: 1|act_loss: -0.04736328125|cri_loss: -0.0140838623046875|unsuper_loss: 0.0 average reward score: 3.869140625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.41%) |Training time=0.64s (19.52%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.38 epoch: 0|step: 2705|ppo_ep: 1|act_loss: -0.057891845703125|cri_loss: -0.01629638671875|unsuper_loss: 0.0 average reward score: 3.83984375 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.13%) |Training time=0.64s (19.30%) |Others=0.22 (6.57%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.38 epoch: 0|step: 2706|ppo_ep: 1|act_loss: -0.07208251953125|cri_loss: -0.025726318359375|unsuper_loss: 0.0 average reward score: 3.73046875 ------------------------------------------------------------------------------------- |E2E latency=3.41s |Gather latency=0.00s (0.00%) |Generate time=2.53s (74.09%) |Training time=0.66s (19.47%) |Others=0.22 (6.44%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.38 epoch: 0|step: 2707|ppo_ep: 1|act_loss: 0.0877685546875|cri_loss: 0.05316162109375|unsuper_loss: 0.0 average reward score: 4.2578125 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.41s (72.48%) |Training time=0.70s (21.03%) |Others=0.22 (6.49%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.38 epoch: 0|step: 2708|ppo_ep: 1|act_loss: -0.083740234375|cri_loss: -0.034393310546875|unsuper_loss: 0.0 average reward score: 3.33203125 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.47s (73.48%) |Training time=0.67s (19.94%) |Others=0.22 (6.57%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.38 epoch: 0|step: 2709|ppo_ep: 1|act_loss: -0.0936279296875|cri_loss: -0.037841796875|unsuper_loss: 0.0 average reward score: 4.1328125 ------------------------------------------------------------------------------------- |E2E latency=3.43s |Gather latency=0.00s (0.00%) |Generate time=2.54s (74.16%) |Training time=0.68s (19.92%) |Others=0.20 (5.92%)|CurSamplesPerSec=2.33 |AvgSamplesPerSec=2.38 epoch: 0|step: 2710|ppo_ep: 1|act_loss: -0.00994873046875|cri_loss: 8.392333984375e-05|unsuper_loss: 0.0 average reward score: 4.4375 ------------------------------------------------------------------------------------- |E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.46s (71.39%) |Training time=0.77s (22.43%) |Others=0.21 (6.18%)|CurSamplesPerSec=2.32 |AvgSamplesPerSec=2.38 epoch: 0|step: 2711|ppo_ep: 1|act_loss: -0.0550537109375|cri_loss: -0.018890380859375|unsuper_loss: 0.0 average reward score: 4.40234375 ------------------------------------------------------------------------------------- |E2E latency=3.68s |Gather latency=0.00s (0.00%) |Generate time=2.46s (66.71%) |Training time=0.93s (25.26%) |Others=0.30 (8.04%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.38 epoch: 0|step: 2712|ppo_ep: 1|act_loss: -0.04150390625|cri_loss: -0.012115478515625|unsuper_loss: 0.0 average reward score: 3.802734375 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.88%) |Training time=0.64s (19.13%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.38 epoch: 0|step: 2713|ppo_ep: 1|act_loss: -0.004608154296875|cri_loss: 0.005706787109375|unsuper_loss: 0.0 average reward score: 4.5546875 ------------------------------------------------------------------------------------- |E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.54s (75.12%) |Training time=0.64s (19.07%) |Others=0.20 (5.81%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.38 epoch: 0|step: 2714|ppo_ep: 1|act_loss: -0.15673828125|cri_loss: -0.07220458984375|unsuper_loss: 0.0 average reward score: 4.63671875 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.52s (74.73%) |Training time=0.64s (19.06%) |Others=0.21 (6.21%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.38 epoch: 0|step: 2715|ppo_ep: 1|act_loss: -0.022369384765625|cri_loss: -0.00054931640625|unsuper_loss: 0.0 average reward score: 4.18359375 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.48s (73.97%) |Training time=0.64s (19.23%) |Others=0.23 (6.80%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.38 epoch: 0|step: 2716|ppo_ep: 1|act_loss: -0.021331787109375|cri_loss: -0.00048828125|unsuper_loss: 0.0 average reward score: 4.01171875 ------------------------------------------------------------------------------------- |E2E latency=3.41s |Gather latency=0.00s (0.00%) |Generate time=2.54s (74.49%) |Training time=0.65s (19.19%) |Others=0.22 (6.32%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.38 epoch: 0|step: 2717|ppo_ep: 1|act_loss: 0.0224761962890625|cri_loss: 0.0208892822265625|unsuper_loss: 0.0 average reward score: 3.05859375 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.74%) |Training time=0.65s (19.22%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.38 epoch: 0|step: 2718|ppo_ep: 1|act_loss: -0.0361328125|cri_loss: -0.008453369140625|unsuper_loss: 0.0 average reward score: 4.5 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.34%) |Training time=0.65s (19.20%) |Others=0.22 (6.46%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.38 [2023-04-24 16:20:19,012] [INFO] [logging.py:96:log_dist] [Rank 0] step=340, skipped=5, lr=[2.978552438838442e-06, 2.978552438838442e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 16:20:19,257] [INFO] [timer.py:199:stop] epoch=0/micro_step=2720/global_step=340, RunningAvgSamplesPerSec=15.374213325005206, CurrSamplesPerSec=15.794919120866421, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 16:20:19,469] [INFO] [logging.py:96:log_dist] [Rank 0] step=340, skipped=4, lr=[1.5240268120912631e-06, 1.5240268120912631e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 2719|ppo_ep: 1|act_loss: -0.048065185546875|cri_loss: -0.014007568359375|unsuper_loss: 0.0 average reward score: 5.2890625 ------------------------------------------------------------------------------------- |E2E latency=3.75s |Gather latency=0.00s (0.00%) |Generate time=2.52s (67.29%) |Training time=0.94s (25.03%) |Others=0.29 (7.69%)|CurSamplesPerSec=2.13 |AvgSamplesPerSec=2.38 epoch: 0|step: 2720|ppo_ep: 1|act_loss: 0.0001220703125|cri_loss: 0.00981903076171875|unsuper_loss: 0.0 average reward score: 4.7109375 ------------------------------------------------------------------------------------- |E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.52s (74.59%) |Training time=0.65s (19.14%) |Others=0.21 (6.26%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.38 epoch: 0|step: 2721|ppo_ep: 1|act_loss: -0.012481689453125|cri_loss: 0.00162506103515625|unsuper_loss: 0.0 average reward score: 4.75 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.37%) |Training time=0.65s (19.64%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.38 epoch: 0|step: 2722|ppo_ep: 1|act_loss: -0.0183563232421875|cri_loss: -0.0034332275390625|unsuper_loss: 0.0 average reward score: 4.828125 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.32%) |Training time=0.64s (19.15%) |Others=0.22 (6.53%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.38 epoch: 0|step: 2723|ppo_ep: 1|act_loss: 0.078857421875|cri_loss: 0.04913330078125|unsuper_loss: 0.0 average reward score: 4.65234375 ------------------------------------------------------------------------------------- |E2E latency=3.66s |Gather latency=0.00s (0.00%) |Generate time=2.53s (69.27%) |Training time=0.90s (24.53%) |Others=0.23 (6.20%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.38 epoch: 0|step: 2724|ppo_ep: 1|act_loss: 0.048614501953125|cri_loss: 0.037445068359375|unsuper_loss: 0.0 average reward score: 4.4375 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.07%) |Training time=0.64s (19.30%) |Others=0.22 (6.62%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.38 epoch: 0|step: 2725|ppo_ep: 1|act_loss: 0.14794921875|cri_loss: 0.09185791015625|unsuper_loss: 0.0 average reward score: 4.21875 ------------------------------------------------------------------------------------- |E2E latency=3.43s |Gather latency=0.00s (0.00%) |Generate time=2.56s (74.68%) |Training time=0.65s (18.93%) |Others=0.22 (6.39%)|CurSamplesPerSec=2.33 |AvgSamplesPerSec=2.38 epoch: 0|step: 2726|ppo_ep: 1|act_loss: -0.065185546875|cri_loss: -0.02386474609375|unsuper_loss: 0.0 average reward score: 4.515625 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.08%) |Training time=0.64s (19.13%) |Others=0.23 (6.80%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.38 epoch: 0|step: 2727|ppo_ep: 1|act_loss: 0.0138702392578125|cri_loss: 0.01139068603515625|unsuper_loss: 0.0 average reward score: 4.1171875 ------------------------------------------------------------------------------------- |E2E latency=3.82s |Gather latency=0.00s (0.00%) |Generate time=2.56s (67.09%) |Training time=0.97s (25.46%) |Others=0.28 (7.45%)|CurSamplesPerSec=2.09 |AvgSamplesPerSec=2.38 epoch: 0|step: 2728|ppo_ep: 1|act_loss: 0.02947998046875|cri_loss: 0.0299530029296875|unsuper_loss: 0.0 average reward score: 4.1015625 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.45s (73.86%) |Training time=0.66s (19.79%) |Others=0.21 (6.34%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.38 epoch: 0|step: 2729|ppo_ep: 1|act_loss: -0.00742340087890625|cri_loss: 0.00307464599609375|unsuper_loss: 0.0 average reward score: 4.15234375 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.52s (74.64%) |Training time=0.65s (19.29%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.38 epoch: 0|step: 2730|ppo_ep: 1|act_loss: -0.0283203125|cri_loss: -0.009552001953125|unsuper_loss: 0.0 average reward score: 3.546875 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.41%) |Training time=0.65s (19.32%) |Others=0.21 (6.28%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.38 epoch: 0|step: 2731|ppo_ep: 1|act_loss: -0.0268402099609375|cri_loss: -0.004852294921875|unsuper_loss: 0.0 average reward score: 4.12890625 ------------------------------------------------------------------------------------- |E2E latency=3.43s |Gather latency=0.00s (0.00%) |Generate time=2.56s (74.42%) |Training time=0.66s (19.09%) |Others=0.22 (6.49%)|CurSamplesPerSec=2.33 |AvgSamplesPerSec=2.38 epoch: 0|step: 2732|ppo_ep: 1|act_loss: 0.011749267578125|cri_loss: 0.0218505859375|unsuper_loss: 0.0 average reward score: 3.8203125 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.04%) |Training time=0.66s (19.72%) |Others=0.21 (6.24%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.38 epoch: 0|step: 2733|ppo_ep: 1|act_loss: 0.1219482421875|cri_loss: 0.06939697265625|unsuper_loss: 0.0 average reward score: 3.953125 ------------------------------------------------------------------------------------- |E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.47s (72.57%) |Training time=0.73s (21.37%) |Others=0.21 (6.06%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.38 epoch: 0|step: 2734|ppo_ep: 1|act_loss: 0.016143798828125|cri_loss: 0.012664794921875|unsuper_loss: 0.0 average reward score: 4.265625 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.06%) |Training time=0.65s (19.59%) |Others=0.21 (6.36%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.38 epoch: 0|step: 2735|ppo_ep: 1|act_loss: 0.09619140625|cri_loss: 0.0526123046875|unsuper_loss: 0.0 average reward score: 2.69921875 ------------------------------------------------------------------------------------- |E2E latency=3.69s |Gather latency=0.00s (0.00%) |Generate time=2.48s (67.08%) |Training time=0.93s (25.19%) |Others=0.29 (7.73%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.38 epoch: 0|step: 2736|ppo_ep: 1|act_loss: 0.05499267578125|cri_loss: 0.03662109375|unsuper_loss: 0.0 average reward score: 4.46875 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.52s (74.77%) |Training time=0.64s (19.03%) |Others=0.21 (6.20%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.38 epoch: 0|step: 2737|ppo_ep: 1|act_loss: 0.033233642578125|cri_loss: 0.02728271484375|unsuper_loss: 0.0 average reward score: 4.2734375 ------------------------------------------------------------------------------------- |E2E latency=3.42s |Gather latency=0.00s (0.00%) |Generate time=2.56s (74.94%) |Training time=0.65s (19.14%) |Others=0.20 (5.92%)|CurSamplesPerSec=2.34 |AvgSamplesPerSec=2.38 epoch: 0|step: 2738|ppo_ep: 1|act_loss: 0.028076171875|cri_loss: 0.0213775634765625|unsuper_loss: 0.0 average reward score: 4.6328125 ------------------------------------------------------------------------------------- |E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.60s (75.29%) |Training time=0.65s (18.89%) |Others=0.20 (5.81%)|CurSamplesPerSec=2.32 |AvgSamplesPerSec=2.38 epoch: 0|step: 2739|ppo_ep: 1|act_loss: 0.0914306640625|cri_loss: 0.05718994140625|unsuper_loss: 0.0 average reward score: 3.2421875 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.72%) |Training time=0.65s (19.22%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.38 epoch: 0|step: 2740|ppo_ep: 1|act_loss: 0.08282470703125|cri_loss: 0.04718017578125|unsuper_loss: 0.0 average reward score: 5.265625 ------------------------------------------------------------------------------------- |E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.50s (73.94%) |Training time=0.67s (19.69%) |Others=0.22 (6.37%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.38 epoch: 0|step: 2741|ppo_ep: 1|act_loss: 0.0097198486328125|cri_loss: 0.0110626220703125|unsuper_loss: 0.0 average reward score: 4.3984375 ------------------------------------------------------------------------------------- |E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.61s (74.96%) |Training time=0.65s (18.83%) |Others=0.22 (6.21%)|CurSamplesPerSec=2.30 |AvgSamplesPerSec=2.38 epoch: 0|step: 2742|ppo_ep: 1|act_loss: 0.0035381317138671875|cri_loss: 0.0065765380859375|unsuper_loss: 0.0 average reward score: 5.26953125 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.17%) |Training time=0.64s (19.21%) |Others=0.22 (6.62%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.38 epoch: 0|step: 2743|ppo_ep: 1|act_loss: -0.022552490234375|cri_loss: -0.00385284423828125|unsuper_loss: 0.0 average reward score: 4.7265625 ------------------------------------------------------------------------------------- |E2E latency=3.76s |Gather latency=0.00s (0.00%) |Generate time=2.51s (66.92%) |Training time=0.94s (25.04%) |Others=0.30 (8.05%)|CurSamplesPerSec=2.13 |AvgSamplesPerSec=2.38 epoch: 0|step: 2744|ppo_ep: 1|act_loss: 0.004180908203125|cri_loss: 0.0160675048828125|unsuper_loss: 0.0 average reward score: 4.6875 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.45s (72.67%) |Training time=0.68s (20.21%) |Others=0.24 (7.12%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.38 epoch: 0|step: 2745|ppo_ep: 1|act_loss: 0.081298828125|cri_loss: 0.051422119140625|unsuper_loss: 0.0 average reward score: 4.390625 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.55%) |Training time=0.65s (19.38%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.38 epoch: 0|step: 2746|ppo_ep: 1|act_loss: 0.00598907470703125|cri_loss: 0.0106201171875|unsuper_loss: 0.0 average reward score: 3.60546875 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.17%) |Training time=0.67s (20.26%) |Others=0.22 (6.56%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.38 epoch: 0|step: 2747|ppo_ep: 1|act_loss: 0.02783203125|cri_loss: 0.0197906494140625|unsuper_loss: 0.0 average reward score: 3.9296875 ------------------------------------------------------------------------------------- |E2E latency=3.43s |Gather latency=0.00s (0.00%) |Generate time=2.57s (74.99%) |Training time=0.65s (18.90%) |Others=0.21 (6.11%)|CurSamplesPerSec=2.33 |AvgSamplesPerSec=2.38 epoch: 0|step: 2748|ppo_ep: 1|act_loss: 0.12255859375|cri_loss: 0.07122802734375|unsuper_loss: 0.0 average reward score: 4.7890625 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.75%) |Training time=0.64s (19.23%) |Others=0.20 (6.02%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.38 epoch: 0|step: 2749|ppo_ep: 1|act_loss: 0.07958984375|cri_loss: 0.04681396484375|unsuper_loss: 0.0 average reward score: 3.140625 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.46s (73.75%) |Training time=0.67s (20.03%) |Others=0.21 (6.22%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.38 epoch: 0|step: 2750|ppo_ep: 1|act_loss: 0.087646484375|cri_loss: 0.04974365234375|unsuper_loss: 0.0 average reward score: 5.140625 ------------------------------------------------------------------------------------- |E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.54s (74.91%) |Training time=0.64s (18.93%) |Others=0.21 (6.17%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.38 epoch: 0|step: 2751|ppo_ep: 1|act_loss: 0.016693115234375|cri_loss: 0.0121917724609375|unsuper_loss: 0.0 average reward score: 5.140625 ------------------------------------------------------------------------------------- |E2E latency=3.75s |Gather latency=0.00s (0.00%) |Generate time=2.50s (66.65%) |Training time=0.95s (25.28%) |Others=0.30 (8.07%)|CurSamplesPerSec=2.13 |AvgSamplesPerSec=2.38 epoch: 0|step: 2752|ppo_ep: 1|act_loss: 0.050262451171875|cri_loss: 0.02996826171875|unsuper_loss: 0.0 average reward score: 4.45703125 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.48%) |Training time=0.64s (19.20%) |Others=0.21 (6.32%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.38 epoch: 0|step: 2753|ppo_ep: 1|act_loss: -0.02813720703125|cri_loss: -0.0089569091796875|unsuper_loss: 0.0 average reward score: 3.955078125 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.84%) |Training time=0.65s (19.40%) |Others=0.19 (5.76%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.38 epoch: 0|step: 2754|ppo_ep: 1|act_loss: -0.004425048828125|cri_loss: 0.009246826171875|unsuper_loss: 0.0 average reward score: 3.248046875 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.29%) |Training time=0.64s (19.43%) |Others=0.21 (6.28%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.38 epoch: 0|step: 2755|ppo_ep: 1|act_loss: -0.0704345703125|cri_loss: -0.0286865234375|unsuper_loss: 0.0 average reward score: 4.3671875 ------------------------------------------------------------------------------------- |E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.54s (74.86%) |Training time=0.65s (19.17%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.38 epoch: 0|step: 2756|ppo_ep: 1|act_loss: -0.0699462890625|cri_loss: -0.0243988037109375|unsuper_loss: 0.0 average reward score: 3.34375 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.04%) |Training time=0.65s (19.47%) |Others=0.22 (6.49%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.38 epoch: 0|step: 2757|ppo_ep: 1|act_loss: 0.014617919921875|cri_loss: 0.01092529296875|unsuper_loss: 0.0 average reward score: 3.90234375 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.45s (72.67%) |Training time=0.71s (21.17%) |Others=0.21 (6.16%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.38 epoch: 0|step: 2758|ppo_ep: 1|act_loss: 0.226806640625|cri_loss: 0.12744140625|unsuper_loss: 0.0 average reward score: 3.4921875 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.41%) |Training time=0.64s (19.16%) |Others=0.22 (6.43%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.38 epoch: 0|step: 2759|ppo_ep: 1|act_loss: 0.150146484375|cri_loss: 0.0921630859375|unsuper_loss: 0.0 average reward score: 3.603515625 ------------------------------------------------------------------------------------- |E2E latency=3.78s |Gather latency=0.00s (0.00%) |Generate time=2.54s (67.32%) |Training time=0.94s (24.87%) |Others=0.29 (7.81%)|CurSamplesPerSec=2.12 |AvgSamplesPerSec=2.38 epoch: 0|step: 2760|ppo_ep: 1|act_loss: 0.09075927734375|cri_loss: 0.0523681640625|unsuper_loss: 0.0 average reward score: 4.109375 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.34%) |Training time=0.65s (19.46%) |Others=0.21 (6.20%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.38 epoch: 0|step: 2761|ppo_ep: 1|act_loss: 0.173583984375|cri_loss: 0.1060791015625|unsuper_loss: 0.0 average reward score: 3.16015625 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.76%) |Training time=0.64s (19.15%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.38 epoch: 0|step: 2762|ppo_ep: 1|act_loss: -0.04144287109375|cri_loss: -0.0111236572265625|unsuper_loss: 0.0 average reward score: 4.0625 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.36%) |Training time=0.65s (19.53%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.38 epoch: 0|step: 2763|ppo_ep: 1|act_loss: -0.00998687744140625|cri_loss: 0.0013275146484375|unsuper_loss: 0.0 average reward score: 3.69140625 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.44s (73.06%) |Training time=0.70s (20.90%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.38 epoch: 0|step: 2764|ppo_ep: 1|act_loss: 0.0709228515625|cri_loss: 0.04156494140625|unsuper_loss: 0.0 average reward score: 3.89453125 ------------------------------------------------------------------------------------- |E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.51s (73.92%) |Training time=0.67s (19.86%) |Others=0.21 (6.21%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.38 epoch: 0|step: 2765|ppo_ep: 1|act_loss: 0.019012451171875|cri_loss: 0.025360107421875|unsuper_loss: 0.0 average reward score: 3.75390625 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.32%) |Training time=0.65s (19.46%) |Others=0.21 (6.22%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.38 epoch: 0|step: 2766|ppo_ep: 1|act_loss: -0.0179595947265625|cri_loss: -0.0034027099609375|unsuper_loss: 0.0 average reward score: 5.11328125 ------------------------------------------------------------------------------------- |E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.59s (74.77%) |Training time=0.66s (19.01%) |Others=0.22 (6.22%)|CurSamplesPerSec=2.31 |AvgSamplesPerSec=2.38 epoch: 0|step: 2767|ppo_ep: 1|act_loss: -0.0355224609375|cri_loss: -0.01013946533203125|unsuper_loss: 0.0 average reward score: 4.453125 ------------------------------------------------------------------------------------- |E2E latency=3.83s |Gather latency=0.00s (0.00%) |Generate time=2.57s (67.24%) |Training time=0.95s (24.93%) |Others=0.30 (7.83%)|CurSamplesPerSec=2.09 |AvgSamplesPerSec=2.38 epoch: 0|step: 2768|ppo_ep: 1|act_loss: 0.0096893310546875|cri_loss: 0.0104827880859375|unsuper_loss: 0.0 average reward score: 5.0859375 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.72%) |Training time=0.65s (19.31%) |Others=0.20 (5.96%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.38 epoch: 0|step: 2769|ppo_ep: 1|act_loss: -0.056488037109375|cri_loss: -0.018341064453125|unsuper_loss: 0.0 average reward score: 4.5390625 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.26%) |Training time=0.66s (19.57%) |Others=0.21 (6.17%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.38 epoch: 0|step: 2770|ppo_ep: 1|act_loss: 0.1072998046875|cri_loss: 0.05865478515625|unsuper_loss: 0.0 average reward score: 4.25390625 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.56%) |Training time=0.65s (19.19%) |Others=0.21 (6.25%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.38 epoch: 0|step: 2771|ppo_ep: 1|act_loss: -0.0126953125|cri_loss: -0.00103759765625|unsuper_loss: 0.0 average reward score: 4.01171875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.61%) |Training time=0.64s (19.58%) |Others=0.19 (5.81%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.38 epoch: 0|step: 2772|ppo_ep: 1|act_loss: 0.07763671875|cri_loss: 0.054046630859375|unsuper_loss: 0.0 average reward score: 4.16015625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.24%) |Training time=0.64s (19.66%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.38 epoch: 0|step: 2773|ppo_ep: 1|act_loss: -0.07049560546875|cri_loss: -0.025390625|unsuper_loss: 0.0 average reward score: 4.7578125 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.05%) |Training time=0.66s (19.73%) |Others=0.21 (6.22%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.38 epoch: 0|step: 2774|ppo_ep: 1|act_loss: -0.01445770263671875|cri_loss: -0.00226593017578125|unsuper_loss: 0.0 average reward score: 5.0390625 ------------------------------------------------------------------------------------- |E2E latency=3.41s |Gather latency=0.00s (0.00%) |Generate time=2.56s (74.99%) |Training time=0.64s (18.78%) |Others=0.21 (6.23%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.38 epoch: 0|step: 2775|ppo_ep: 1|act_loss: -0.057373046875|cri_loss: -0.0166015625|unsuper_loss: 0.0 average reward score: 4.6875 ------------------------------------------------------------------------------------- |E2E latency=3.71s |Gather latency=0.00s (0.00%) |Generate time=2.47s (66.69%) |Training time=0.94s (25.35%) |Others=0.30 (7.96%)|CurSamplesPerSec=2.16 |AvgSamplesPerSec=2.38 epoch: 0|step: 2776|ppo_ep: 1|act_loss: -0.02825927734375|cri_loss: -0.0065155029296875|unsuper_loss: 0.0 average reward score: 4.1796875 ------------------------------------------------------------------------------------- |E2E latency=3.42s |Gather latency=0.00s (0.00%) |Generate time=2.57s (75.13%) |Training time=0.65s (18.86%) |Others=0.21 (6.01%)|CurSamplesPerSec=2.34 |AvgSamplesPerSec=2.38 epoch: 0|step: 2777|ppo_ep: 1|act_loss: 0.07275390625|cri_loss: 0.04180908203125|unsuper_loss: 0.0 average reward score: 2.61328125 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.96%) |Training time=0.64s (19.20%) |Others=0.20 (5.84%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.38 epoch: 0|step: 2778|ppo_ep: 1|act_loss: -0.092529296875|cri_loss: -0.0391845703125|unsuper_loss: 0.0 average reward score: 4.46875 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.74%) |Training time=0.65s (19.26%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.38 epoch: 0|step: 2779|ppo_ep: 1|act_loss: -0.06219482421875|cri_loss: -0.0216522216796875|unsuper_loss: 0.0 average reward score: 4.7109375 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.07%) |Training time=0.65s (19.50%) |Others=0.21 (6.43%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.38 epoch: 0|step: 2780|ppo_ep: 1|act_loss: 0.119384765625|cri_loss: 0.07421875|unsuper_loss: 0.0 average reward score: 3.8046875 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.30%) |Training time=0.65s (19.37%) |Others=0.21 (6.33%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.38 epoch: 0|step: 2781|ppo_ep: 1|act_loss: 0.0986328125|cri_loss: 0.05987548828125|unsuper_loss: 0.0 average reward score: 4.453125 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.76%) |Training time=0.64s (19.33%) |Others=0.20 (5.91%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.38 epoch: 0|step: 2782|ppo_ep: 1|act_loss: 0.06292724609375|cri_loss: 0.0384521484375|unsuper_loss: 0.0 average reward score: 4.4296875 ------------------------------------------------------------------------------------- |E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.53s (74.72%) |Training time=0.64s (19.04%) |Others=0.21 (6.24%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.38 epoch: 0|step: 2783|ppo_ep: 1|act_loss: -0.03863525390625|cri_loss: -0.013397216796875|unsuper_loss: 0.0 average reward score: 4.4453125 ------------------------------------------------------------------------------------- |E2E latency=3.71s |Gather latency=0.00s (0.00%) |Generate time=2.47s (66.46%) |Training time=0.95s (25.72%) |Others=0.29 (7.83%)|CurSamplesPerSec=2.16 |AvgSamplesPerSec=2.38 epoch: 0|step: 2784|ppo_ep: 1|act_loss: 0.049591064453125|cri_loss: 0.0292816162109375|unsuper_loss: 0.0 average reward score: 3.513671875 ------------------------------------------------------------------------------------- |E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.54s (74.88%) |Training time=0.64s (18.89%) |Others=0.21 (6.23%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.38 epoch: 0|step: 2785|ppo_ep: 1|act_loss: -0.03546142578125|cri_loss: 9.1552734375e-05|unsuper_loss: 0.0 average reward score: 3.69140625 ------------------------------------------------------------------------------------- |E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.53s (74.67%) |Training time=0.66s (19.37%) |Others=0.20 (5.96%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.38 epoch: 0|step: 2786|ppo_ep: 1|act_loss: -0.06414794921875|cri_loss: -0.02471923828125|unsuper_loss: 0.0 average reward score: 4.625 ------------------------------------------------------------------------------------- |E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.52s (74.20%) |Training time=0.65s (19.12%) |Others=0.23 (6.68%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.38 epoch: 0|step: 2787|ppo_ep: 1|act_loss: -0.0039825439453125|cri_loss: 0.00936126708984375|unsuper_loss: 0.0 average reward score: 4.32421875 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.48s (73.74%) |Training time=0.67s (20.01%) |Others=0.21 (6.25%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.38 epoch: 0|step: 2788|ppo_ep: 1|act_loss: 0.00739288330078125|cri_loss: 0.00984954833984375|unsuper_loss: 0.0 average reward score: 4.13671875 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.24%) |Training time=0.65s (19.27%) |Others=0.22 (6.49%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.38 epoch: 0|step: 2789|ppo_ep: 1|act_loss: -0.0215606689453125|cri_loss: -0.0027923583984375|unsuper_loss: 0.0 average reward score: 3.3671875 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.43%) |Training time=0.65s (19.62%) |Others=0.20 (5.95%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.38 epoch: 0|step: 2790|ppo_ep: 1|act_loss: -0.046783447265625|cri_loss: -0.016815185546875|unsuper_loss: 0.0 average reward score: 4.6796875 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.35%) |Training time=0.64s (19.24%) |Others=0.21 (6.42%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.38 epoch: 0|step: 2791|ppo_ep: 1|act_loss: 0.0543212890625|cri_loss: 0.033782958984375|unsuper_loss: 0.0 average reward score: 3.427734375 ------------------------------------------------------------------------------------- |E2E latency=3.72s |Gather latency=0.00s (0.00%) |Generate time=2.49s (66.95%) |Training time=0.93s (25.07%) |Others=0.30 (7.98%)|CurSamplesPerSec=2.15 |AvgSamplesPerSec=2.38 epoch: 0|step: 2792|ppo_ep: 1|act_loss: 0.036468505859375|cri_loss: 0.0246734619140625|unsuper_loss: 0.0 average reward score: 4.7890625 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.38%) |Training time=0.64s (19.30%) |Others=0.21 (6.32%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.38 epoch: 0|step: 2793|ppo_ep: 1|act_loss: -0.0198211669921875|cri_loss: -0.00396728515625|unsuper_loss: 0.0 average reward score: 5.1015625 ------------------------------------------------------------------------------------- |E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.49s (73.33%) |Training time=0.71s (20.77%) |Others=0.20 (5.90%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.38 epoch: 0|step: 2794|ppo_ep: 1|act_loss: 0.07061767578125|cri_loss: 0.0543212890625|unsuper_loss: 0.0 average reward score: 3.3203125 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.80%) |Training time=0.64s (19.20%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.38 epoch: 0|step: 2795|ppo_ep: 1|act_loss: -0.029266357421875|cri_loss: -0.0075225830078125|unsuper_loss: 0.0 average reward score: 4.6640625 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.37%) |Training time=0.66s (19.82%) |Others=0.19 (5.81%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.38 epoch: 0|step: 2796|ppo_ep: 1|act_loss: -0.0455322265625|cri_loss: -0.0161285400390625|unsuper_loss: 0.0 average reward score: 4.37109375 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.49s (69.50%) |Training time=0.87s (24.25%) |Others=0.22 (6.24%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.38 epoch: 0|step: 2797|ppo_ep: 1|act_loss: -0.060546875|cri_loss: -0.0213470458984375|unsuper_loss: 0.0 average reward score: 4.12890625 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.62%) |Training time=0.64s (19.30%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.38 epoch: 0|step: 2798|ppo_ep: 1|act_loss: 0.01206207275390625|cri_loss: 0.0124053955078125|unsuper_loss: 0.0 average reward score: 3.716796875 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.32%) |Training time=0.64s (19.20%) |Others=0.22 (6.48%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.38 [2023-04-24 16:24:52,099] [INFO] [logging.py:96:log_dist] [Rank 0] step=350, skipped=5, lr=[2.612971261202685e-06, 2.612971261202685e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 16:24:52,344] [INFO] [timer.py:199:stop] epoch=0/micro_step=2800/global_step=350, RunningAvgSamplesPerSec=15.378393177759856, CurrSamplesPerSec=14.733078681592165, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 16:24:52,563] [INFO] [logging.py:96:log_dist] [Rank 0] step=350, skipped=4, lr=[1.3353475496025049e-06, 1.3353475496025049e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 2799|ppo_ep: 1|act_loss: 0.136474609375|cri_loss: 0.07275390625|unsuper_loss: 0.0 average reward score: 5.234375 ------------------------------------------------------------------------------------- |E2E latency=3.68s |Gather latency=0.00s (0.00%) |Generate time=2.46s (66.83%) |Training time=0.93s (25.19%) |Others=0.29 (7.97%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.38 epoch: 0|step: 2800|ppo_ep: 1|act_loss: 0.003204345703125|cri_loss: 0.0135040283203125|unsuper_loss: 0.0 average reward score: 4.40625 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.51%) |Training time=0.64s (19.39%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.38 epoch: 0|step: 2801|ppo_ep: 1|act_loss: 0.0506591796875|cri_loss: 0.032440185546875|unsuper_loss: 0.0 average reward score: 4.99609375 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.76%) |Training time=0.65s (19.32%) |Others=0.20 (5.93%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.38 epoch: 0|step: 2802|ppo_ep: 1|act_loss: 0.000274658203125|cri_loss: 0.017669677734375|unsuper_loss: 0.0 average reward score: 5.1953125 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.52s (74.82%) |Training time=0.64s (19.10%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.38 epoch: 0|step: 2803|ppo_ep: 1|act_loss: 0.13818359375|cri_loss: 0.08221435546875|unsuper_loss: 0.0 average reward score: 3.48828125 ------------------------------------------------------------------------------------- |E2E latency=3.42s |Gather latency=0.00s (0.00%) |Generate time=2.47s (72.19%) |Training time=0.75s (21.77%) |Others=0.21 (6.04%)|CurSamplesPerSec=2.34 |AvgSamplesPerSec=2.38 epoch: 0|step: 2804|ppo_ep: 1|act_loss: 0.0280609130859375|cri_loss: 0.021636962890625|unsuper_loss: 0.0 average reward score: 4.6640625 ------------------------------------------------------------------------------------- |E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.45s (71.83%) |Training time=0.75s (22.13%) |Others=0.21 (6.04%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.38 epoch: 0|step: 2805|ppo_ep: 1|act_loss: 0.009857177734375|cri_loss: 0.0107879638671875|unsuper_loss: 0.0 average reward score: 4.640625 ------------------------------------------------------------------------------------- |E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.53s (74.80%) |Training time=0.64s (18.96%) |Others=0.21 (6.23%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.38 epoch: 0|step: 2806|ppo_ep: 1|act_loss: -0.09228515625|cri_loss: -0.03729248046875|unsuper_loss: 0.0 average reward score: 3.640625 ------------------------------------------------------------------------------------- |E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.54s (75.02%) |Training time=0.65s (19.15%) |Others=0.20 (5.83%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.38 epoch: 0|step: 2807|ppo_ep: 1|act_loss: -0.003326416015625|cri_loss: 0.00664520263671875|unsuper_loss: 0.0 average reward score: 4.625 ------------------------------------------------------------------------------------- |E2E latency=4.26s |Gather latency=0.00s (0.00%) |Generate time=2.97s (69.63%) |Training time=0.96s (22.42%) |Others=0.34 (7.95%)|CurSamplesPerSec=1.88 |AvgSamplesPerSec=2.38 epoch: 0|step: 2808|ppo_ep: 1|act_loss: 0.15087890625|cri_loss: 0.086669921875|unsuper_loss: 0.0 average reward score: 3.873046875 ------------------------------------------------------------------------------------- |E2E latency=4.21s |Gather latency=0.00s (0.00%) |Generate time=2.94s (69.84%) |Training time=1.02s (24.17%) |Others=0.25 (5.99%)|CurSamplesPerSec=1.90 |AvgSamplesPerSec=2.37 epoch: 0|step: 2809|ppo_ep: 1|act_loss: -0.043182373046875|cri_loss: -0.01324462890625|unsuper_loss: 0.0 average reward score: 3.91796875 ------------------------------------------------------------------------------------- |E2E latency=4.12s |Gather latency=0.00s (0.00%) |Generate time=3.23s (78.34%) |Training time=0.70s (16.96%) |Others=0.19 (4.70%)|CurSamplesPerSec=1.94 |AvgSamplesPerSec=2.37 epoch: 0|step: 2810|ppo_ep: 1|act_loss: 0.137451171875|cri_loss: 0.07684326171875|unsuper_loss: 0.0 average reward score: 3.068359375 ------------------------------------------------------------------------------------- |E2E latency=4.09s |Gather latency=0.00s (0.00%) |Generate time=3.18s (77.84%) |Training time=0.66s (16.24%) |Others=0.24 (5.91%)|CurSamplesPerSec=1.96 |AvgSamplesPerSec=2.37 epoch: 0|step: 2811|ppo_ep: 1|act_loss: -0.055755615234375|cri_loss: -0.0223388671875|unsuper_loss: 0.0 average reward score: 3.71875 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.60s (72.12%) |Training time=0.78s (21.52%) |Others=0.23 (6.36%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.37 epoch: 0|step: 2812|ppo_ep: 1|act_loss: 0.1075439453125|cri_loss: 0.0693359375|unsuper_loss: 0.0 average reward score: 2.615234375 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.64s (73.86%) |Training time=0.69s (19.30%) |Others=0.24 (6.85%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.37 epoch: 0|step: 2813|ppo_ep: 1|act_loss: -0.00628662109375|cri_loss: 0.00702667236328125|unsuper_loss: 0.0 average reward score: 3.646484375 ------------------------------------------------------------------------------------- |E2E latency=3.99s |Gather latency=0.00s (0.00%) |Generate time=3.11s (77.84%) |Training time=0.66s (16.59%) |Others=0.22 (5.57%)|CurSamplesPerSec=2.00 |AvgSamplesPerSec=2.37 epoch: 0|step: 2814|ppo_ep: 1|act_loss: -0.0352783203125|cri_loss: -0.01036834716796875|unsuper_loss: 0.0 average reward score: 3.52734375 ------------------------------------------------------------------------------------- |E2E latency=3.77s |Gather latency=0.00s (0.00%) |Generate time=2.87s (76.07%) |Training time=0.65s (17.28%) |Others=0.25 (6.64%)|CurSamplesPerSec=2.12 |AvgSamplesPerSec=2.37 epoch: 0|step: 2815|ppo_ep: 1|act_loss: -0.015655517578125|cri_loss: 0.0014801025390625|unsuper_loss: 0.0 average reward score: 4.84375 ------------------------------------------------------------------------------------- |E2E latency=3.97s |Gather latency=0.00s (0.00%) |Generate time=2.66s (66.95%) |Training time=0.98s (24.60%) |Others=0.33 (8.45%)|CurSamplesPerSec=2.02 |AvgSamplesPerSec=2.37 epoch: 0|step: 2816|ppo_ep: 1|act_loss: 0.1065673828125|cri_loss: 0.08502197265625|unsuper_loss: 0.0 average reward score: 3.03125 ------------------------------------------------------------------------------------- |E2E latency=3.78s |Gather latency=0.00s (0.00%) |Generate time=2.70s (71.52%) |Training time=0.84s (22.22%) |Others=0.24 (6.27%)|CurSamplesPerSec=2.12 |AvgSamplesPerSec=2.37 epoch: 0|step: 2817|ppo_ep: 1|act_loss: 0.1378173828125|cri_loss: 0.0919189453125|unsuper_loss: 0.0 average reward score: 3.197265625 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.73s (75.76%) |Training time=0.65s (17.93%) |Others=0.23 (6.31%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.37 epoch: 0|step: 2818|ppo_ep: 1|act_loss: -0.017333984375|cri_loss: -0.0029144287109375|unsuper_loss: 0.0 average reward score: 4.453125 ------------------------------------------------------------------------------------- |E2E latency=3.67s |Gather latency=0.00s (0.00%) |Generate time=2.77s (75.47%) |Training time=0.69s (18.72%) |Others=0.21 (5.81%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.37 epoch: 0|step: 2819|ppo_ep: 1|act_loss: 0.014923095703125|cri_loss: 0.0159912109375|unsuper_loss: 0.0 average reward score: 4.328125 ------------------------------------------------------------------------------------- |E2E latency=3.68s |Gather latency=0.00s (0.00%) |Generate time=2.81s (76.32%) |Training time=0.64s (17.48%) |Others=0.23 (6.21%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.37 epoch: 0|step: 2820|ppo_ep: 1|act_loss: -0.0016021728515625|cri_loss: 0.011199951171875|unsuper_loss: 0.0 average reward score: 3.21875 ------------------------------------------------------------------------------------- |E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.60s (75.17%) |Training time=0.65s (18.85%) |Others=0.21 (5.98%)|CurSamplesPerSec=2.31 |AvgSamplesPerSec=2.37 epoch: 0|step: 2821|ppo_ep: 1|act_loss: 0.057281494140625|cri_loss: 0.04364013671875|unsuper_loss: 0.0 average reward score: 2.9140625 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.71s (75.16%) |Training time=0.68s (18.71%) |Others=0.22 (6.13%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.37 epoch: 0|step: 2822|ppo_ep: 1|act_loss: -0.0010223388671875|cri_loss: 0.0087432861328125|unsuper_loss: 0.0 average reward score: 4.2421875 ------------------------------------------------------------------------------------- |E2E latency=3.65s |Gather latency=0.00s (0.00%) |Generate time=2.76s (75.66%) |Training time=0.66s (17.99%) |Others=0.23 (6.35%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.37 epoch: 0|step: 2823|ppo_ep: 1|act_loss: -0.025146484375|cri_loss: -0.00018310546875|unsuper_loss: 0.0 average reward score: 4.38671875 ------------------------------------------------------------------------------------- |E2E latency=4.17s |Gather latency=0.00s (0.00%) |Generate time=2.70s (64.90%) |Training time=1.13s (27.19%) |Others=0.33 (7.91%)|CurSamplesPerSec=1.92 |AvgSamplesPerSec=2.37 epoch: 0|step: 2824|ppo_ep: 1|act_loss: -0.1385498046875|cri_loss: -0.060516357421875|unsuper_loss: 0.0 average reward score: 4.3515625 ------------------------------------------------------------------------------------- |E2E latency=3.90s |Gather latency=0.00s (0.00%) |Generate time=3.03s (77.66%) |Training time=0.64s (16.37%) |Others=0.23 (5.97%)|CurSamplesPerSec=2.05 |AvgSamplesPerSec=2.37 epoch: 0|step: 2825|ppo_ep: 1|act_loss: -0.12164306640625|cri_loss: -0.05438232421875|unsuper_loss: 0.0 average reward score: 5.25390625 ------------------------------------------------------------------------------------- |E2E latency=3.82s |Gather latency=0.00s (0.00%) |Generate time=2.63s (68.74%) |Training time=0.97s (25.35%) |Others=0.23 (5.91%)|CurSamplesPerSec=2.09 |AvgSamplesPerSec=2.37 epoch: 0|step: 2826|ppo_ep: 1|act_loss: -0.086181640625|cri_loss: -0.0301361083984375|unsuper_loss: 0.0 average reward score: 3.380859375 ------------------------------------------------------------------------------------- |E2E latency=3.93s |Gather latency=0.00s (0.00%) |Generate time=2.91s (74.05%) |Training time=0.77s (19.64%) |Others=0.25 (6.32%)|CurSamplesPerSec=2.04 |AvgSamplesPerSec=2.37 epoch: 0|step: 2827|ppo_ep: 1|act_loss: -0.04718017578125|cri_loss: -0.0151519775390625|unsuper_loss: 0.0 average reward score: 4.7421875 ------------------------------------------------------------------------------------- |E2E latency=3.96s |Gather latency=0.00s (0.00%) |Generate time=2.98s (75.20%) |Training time=0.75s (18.83%) |Others=0.24 (5.97%)|CurSamplesPerSec=2.02 |AvgSamplesPerSec=2.37 epoch: 0|step: 2828|ppo_ep: 1|act_loss: -0.0433349609375|cri_loss: -0.0107879638671875|unsuper_loss: 0.0 average reward score: 3.69921875 ------------------------------------------------------------------------------------- |E2E latency=4.00s |Gather latency=0.00s (0.00%) |Generate time=2.79s (69.80%) |Training time=0.98s (24.43%) |Others=0.23 (5.78%)|CurSamplesPerSec=2.00 |AvgSamplesPerSec=2.37 epoch: 0|step: 2829|ppo_ep: 1|act_loss: -0.1708984375|cri_loss: -0.063720703125|unsuper_loss: 0.0 average reward score: 4.79296875 ------------------------------------------------------------------------------------- |E2E latency=3.69s |Gather latency=0.00s (0.00%) |Generate time=2.79s (75.59%) |Training time=0.66s (18.00%) |Others=0.24 (6.41%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.37 epoch: 0|step: 2830|ppo_ep: 1|act_loss: -0.03106689453125|cri_loss: 0.00677490234375|unsuper_loss: 0.0 average reward score: 4.3828125 ------------------------------------------------------------------------------------- |E2E latency=3.89s |Gather latency=0.00s (0.00%) |Generate time=3.00s (77.18%) |Training time=0.65s (16.69%) |Others=0.24 (6.14%)|CurSamplesPerSec=2.06 |AvgSamplesPerSec=2.37 epoch: 0|step: 2831|ppo_ep: 1|act_loss: 0.0040740966796875|cri_loss: 0.01247406005859375|unsuper_loss: 0.0 average reward score: 4.4140625 ------------------------------------------------------------------------------------- |E2E latency=4.06s |Gather latency=0.00s (0.00%) |Generate time=2.78s (68.65%) |Training time=0.93s (22.96%) |Others=0.34 (8.39%)|CurSamplesPerSec=1.97 |AvgSamplesPerSec=2.37 epoch: 0|step: 2832|ppo_ep: 1|act_loss: 0.0287322998046875|cri_loss: 0.0202178955078125|unsuper_loss: 0.0 average reward score: 3.92578125 ------------------------------------------------------------------------------------- |E2E latency=3.78s |Gather latency=0.00s (0.00%) |Generate time=2.90s (76.67%) |Training time=0.66s (17.43%) |Others=0.22 (5.91%)|CurSamplesPerSec=2.11 |AvgSamplesPerSec=2.37 epoch: 0|step: 2833|ppo_ep: 1|act_loss: 0.00888824462890625|cri_loss: 0.009735107421875|unsuper_loss: 0.0 average reward score: 4.15234375 ------------------------------------------------------------------------------------- |E2E latency=4.09s |Gather latency=0.00s (0.00%) |Generate time=3.18s (77.69%) |Training time=0.68s (16.63%) |Others=0.23 (5.68%)|CurSamplesPerSec=1.96 |AvgSamplesPerSec=2.37 epoch: 0|step: 2834|ppo_ep: 1|act_loss: 0.033355712890625|cri_loss: 0.025787353515625|unsuper_loss: 0.0 average reward score: 4.12109375 ------------------------------------------------------------------------------------- |E2E latency=3.69s |Gather latency=0.00s (0.00%) |Generate time=2.81s (76.03%) |Training time=0.65s (17.68%) |Others=0.23 (6.29%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.37 epoch: 0|step: 2835|ppo_ep: 1|act_loss: -0.0909423828125|cri_loss: -0.038055419921875|unsuper_loss: 0.0 average reward score: 4.04296875 ------------------------------------------------------------------------------------- |E2E latency=3.86s |Gather latency=0.00s (0.00%) |Generate time=2.98s (77.17%) |Training time=0.66s (17.07%) |Others=0.22 (5.76%)|CurSamplesPerSec=2.07 |AvgSamplesPerSec=2.37 epoch: 0|step: 2836|ppo_ep: 1|act_loss: -0.011871337890625|cri_loss: 0.00750732421875|unsuper_loss: 0.0 average reward score: 3.65234375 ------------------------------------------------------------------------------------- |E2E latency=3.78s |Gather latency=0.00s (0.00%) |Generate time=2.89s (76.39%) |Training time=0.68s (17.92%) |Others=0.22 (5.69%)|CurSamplesPerSec=2.12 |AvgSamplesPerSec=2.37 epoch: 0|step: 2837|ppo_ep: 1|act_loss: -0.1092529296875|cri_loss: -0.042694091796875|unsuper_loss: 0.0 average reward score: 3.51171875 ------------------------------------------------------------------------------------- |E2E latency=3.70s |Gather latency=0.00s (0.00%) |Generate time=2.76s (74.73%) |Training time=0.71s (19.27%) |Others=0.22 (6.00%)|CurSamplesPerSec=2.16 |AvgSamplesPerSec=2.37 epoch: 0|step: 2838|ppo_ep: 1|act_loss: 0.086669921875|cri_loss: 0.05377197265625|unsuper_loss: 0.0 average reward score: 2.88671875 ------------------------------------------------------------------------------------- |E2E latency=4.08s |Gather latency=0.00s (0.00%) |Generate time=3.19s (78.17%) |Training time=0.65s (15.99%) |Others=0.24 (5.84%)|CurSamplesPerSec=1.96 |AvgSamplesPerSec=2.37 epoch: 0|step: 2839|ppo_ep: 1|act_loss: -0.057708740234375|cri_loss: -0.017364501953125|unsuper_loss: 0.0 average reward score: 4.15234375 ------------------------------------------------------------------------------------- |E2E latency=4.34s |Gather latency=0.00s (0.00%) |Generate time=2.64s (60.83%) |Training time=1.39s (31.97%) |Others=0.31 (7.21%)|CurSamplesPerSec=1.84 |AvgSamplesPerSec=2.37 epoch: 0|step: 2840|ppo_ep: 1|act_loss: 0.06146240234375|cri_loss: 0.03973388671875|unsuper_loss: 0.0 average reward score: 3.85546875 ------------------------------------------------------------------------------------- |E2E latency=4.11s |Gather latency=0.00s (0.00%) |Generate time=2.70s (65.64%) |Training time=1.19s (28.97%) |Others=0.22 (5.39%)|CurSamplesPerSec=1.95 |AvgSamplesPerSec=2.37 epoch: 0|step: 2841|ppo_ep: 1|act_loss: 0.229248046875|cri_loss: 0.13232421875|unsuper_loss: 0.0 average reward score: 4.23828125 ------------------------------------------------------------------------------------- |E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.61s (73.86%) |Training time=0.70s (19.82%) |Others=0.22 (6.32%)|CurSamplesPerSec=2.26 |AvgSamplesPerSec=2.37 epoch: 0|step: 2842|ppo_ep: 1|act_loss: -0.010345458984375|cri_loss: 0.0055084228515625|unsuper_loss: 0.0 average reward score: 3.9375 ------------------------------------------------------------------------------------- |E2E latency=3.84s |Gather latency=0.00s (0.00%) |Generate time=2.97s (77.28%) |Training time=0.65s (16.91%) |Others=0.22 (5.81%)|CurSamplesPerSec=2.08 |AvgSamplesPerSec=2.37 epoch: 0|step: 2843|ppo_ep: 1|act_loss: -0.03668212890625|cri_loss: -0.01073455810546875|unsuper_loss: 0.0 average reward score: 3.6953125 ------------------------------------------------------------------------------------- |E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.67s (75.56%) |Training time=0.65s (18.40%) |Others=0.21 (6.04%)|CurSamplesPerSec=2.26 |AvgSamplesPerSec=2.37 epoch: 0|step: 2844|ppo_ep: 1|act_loss: -0.1978759765625|cri_loss: -0.07452392578125|unsuper_loss: 0.0 average reward score: 4.046875 ------------------------------------------------------------------------------------- |E2E latency=3.59s |Gather latency=0.00s (0.00%) |Generate time=2.73s (76.04%) |Training time=0.65s (17.99%) |Others=0.21 (5.97%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.37 epoch: 0|step: 2845|ppo_ep: 1|act_loss: -0.046630859375|cri_loss: -0.01641845703125|unsuper_loss: 0.0 average reward score: 3.705078125 ------------------------------------------------------------------------------------- |E2E latency=3.93s |Gather latency=0.00s (0.00%) |Generate time=3.06s (77.91%) |Training time=0.65s (16.54%) |Others=0.22 (5.55%)|CurSamplesPerSec=2.04 |AvgSamplesPerSec=2.37 epoch: 0|step: 2846|ppo_ep: 1|act_loss: 0.1309814453125|cri_loss: 0.07470703125|unsuper_loss: 0.0 average reward score: 5.46875 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.74s (75.98%) |Training time=0.66s (18.26%) |Others=0.21 (5.76%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.37 epoch: 0|step: 2847|ppo_ep: 1|act_loss: -0.040557861328125|cri_loss: -0.01346588134765625|unsuper_loss: 0.0 average reward score: 3.640625 ------------------------------------------------------------------------------------- |E2E latency=4.08s |Gather latency=0.00s (0.00%) |Generate time=2.86s (69.99%) |Training time=0.93s (22.84%) |Others=0.29 (7.17%)|CurSamplesPerSec=1.96 |AvgSamplesPerSec=2.37 epoch: 0|step: 2848|ppo_ep: 1|act_loss: 0.0068359375|cri_loss: 0.01148223876953125|unsuper_loss: 0.0 average reward score: 4.3984375 ------------------------------------------------------------------------------------- |E2E latency=3.76s |Gather latency=0.00s (0.00%) |Generate time=2.90s (76.99%) |Training time=0.65s (17.18%) |Others=0.22 (5.84%)|CurSamplesPerSec=2.13 |AvgSamplesPerSec=2.37 epoch: 0|step: 2849|ppo_ep: 1|act_loss: 0.173095703125|cri_loss: 0.10614013671875|unsuper_loss: 0.0 average reward score: 4.265625 ------------------------------------------------------------------------------------- |E2E latency=3.81s |Gather latency=0.00s (0.00%) |Generate time=2.96s (77.59%) |Training time=0.65s (17.16%) |Others=0.20 (5.25%)|CurSamplesPerSec=2.10 |AvgSamplesPerSec=2.37 epoch: 0|step: 2850|ppo_ep: 1|act_loss: 0.062286376953125|cri_loss: 0.0438232421875|unsuper_loss: 0.0 average reward score: 2.984375 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.71s (74.65%) |Training time=0.71s (19.64%) |Others=0.21 (5.71%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.37 epoch: 0|step: 2851|ppo_ep: 1|act_loss: -0.023040771484375|cri_loss: -0.002197265625|unsuper_loss: 0.0 average reward score: 3.515625 ------------------------------------------------------------------------------------- |E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.69s (75.45%) |Training time=0.66s (18.41%) |Others=0.22 (6.14%)|CurSamplesPerSec=2.25 |AvgSamplesPerSec=2.37 epoch: 0|step: 2852|ppo_ep: 1|act_loss: 0.0101470947265625|cri_loss: 0.01544189453125|unsuper_loss: 0.0 average reward score: 2.845703125 ------------------------------------------------------------------------------------- |E2E latency=3.76s |Gather latency=0.00s (0.00%) |Generate time=2.69s (71.55%) |Training time=0.84s (22.48%) |Others=0.22 (5.98%)|CurSamplesPerSec=2.13 |AvgSamplesPerSec=2.37 epoch: 0|step: 2853|ppo_ep: 1|act_loss: 0.029083251953125|cri_loss: 0.02899169921875|unsuper_loss: 0.0 average reward score: 3.8515625 ------------------------------------------------------------------------------------- |E2E latency=3.69s |Gather latency=0.00s (0.00%) |Generate time=2.80s (76.01%) |Training time=0.65s (17.64%) |Others=0.23 (6.35%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.37 epoch: 0|step: 2854|ppo_ep: 1|act_loss: 0.001434326171875|cri_loss: 0.01367950439453125|unsuper_loss: 0.0 average reward score: 4.359375 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.68s (74.85%) |Training time=0.67s (18.58%) |Others=0.24 (6.57%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.37 epoch: 0|step: 2855|ppo_ep: 1|act_loss: -0.00057220458984375|cri_loss: 0.004314422607421875|unsuper_loss: 0.0 average reward score: 4.01953125 ------------------------------------------------------------------------------------- |E2E latency=3.96s |Gather latency=0.00s (0.00%) |Generate time=2.67s (67.51%) |Training time=0.98s (24.67%) |Others=0.31 (7.82%)|CurSamplesPerSec=2.02 |AvgSamplesPerSec=2.37 epoch: 0|step: 2856|ppo_ep: 1|act_loss: -0.053802490234375|cri_loss: -0.02020263671875|unsuper_loss: 0.0 average reward score: 4.390625 ------------------------------------------------------------------------------------- |E2E latency=3.73s |Gather latency=0.00s (0.00%) |Generate time=2.85s (76.43%) |Training time=0.64s (17.28%) |Others=0.23 (6.29%)|CurSamplesPerSec=2.15 |AvgSamplesPerSec=2.37 epoch: 0|step: 2857|ppo_ep: 1|act_loss: 0.05218505859375|cri_loss: 0.036468505859375|unsuper_loss: 0.0 average reward score: 3.94921875 ------------------------------------------------------------------------------------- |E2E latency=3.77s |Gather latency=0.00s (0.00%) |Generate time=2.90s (76.92%) |Training time=0.65s (17.22%) |Others=0.22 (5.86%)|CurSamplesPerSec=2.12 |AvgSamplesPerSec=2.37 epoch: 0|step: 2858|ppo_ep: 1|act_loss: 0.065185546875|cri_loss: 0.0462646484375|unsuper_loss: 0.0 average reward score: 3.271484375 ------------------------------------------------------------------------------------- |E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.72s (75.03%) |Training time=0.71s (19.46%) |Others=0.20 (5.51%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.37 epoch: 0|step: 2859|ppo_ep: 1|act_loss: -0.1522216796875|cri_loss: -0.063720703125|unsuper_loss: 0.0 average reward score: 5.046875 ------------------------------------------------------------------------------------- |E2E latency=3.67s |Gather latency=0.00s (0.00%) |Generate time=2.66s (72.44%) |Training time=0.78s (21.14%) |Others=0.24 (6.42%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.37 epoch: 0|step: 2860|ppo_ep: 1|act_loss: -0.133544921875|cri_loss: -0.059722900390625|unsuper_loss: 0.0 average reward score: 3.41015625 ------------------------------------------------------------------------------------- |E2E latency=3.92s |Gather latency=0.00s (0.00%) |Generate time=3.05s (77.68%) |Training time=0.65s (16.47%) |Others=0.23 (5.85%)|CurSamplesPerSec=2.04 |AvgSamplesPerSec=2.37 epoch: 0|step: 2861|ppo_ep: 1|act_loss: -0.016204833984375|cri_loss: 0.0092620849609375|unsuper_loss: 0.0 average reward score: 4.4140625 ------------------------------------------------------------------------------------- |E2E latency=3.74s |Gather latency=0.00s (0.00%) |Generate time=2.87s (76.65%) |Training time=0.65s (17.36%) |Others=0.22 (5.99%)|CurSamplesPerSec=2.14 |AvgSamplesPerSec=2.37 epoch: 0|step: 2862|ppo_ep: 1|act_loss: -0.109619140625|cri_loss: -0.038787841796875|unsuper_loss: 0.0 average reward score: 3.236328125 ------------------------------------------------------------------------------------- |E2E latency=3.72s |Gather latency=0.00s (0.00%) |Generate time=2.83s (76.29%) |Training time=0.66s (17.72%) |Others=0.22 (6.00%)|CurSamplesPerSec=2.15 |AvgSamplesPerSec=2.37 epoch: 0|step: 2863|ppo_ep: 1|act_loss: 0.01776123046875|cri_loss: 0.025543212890625|unsuper_loss: 0.0 average reward score: 3.6015625 ------------------------------------------------------------------------------------- |E2E latency=4.08s |Gather latency=0.00s (0.00%) |Generate time=2.82s (69.03%) |Training time=0.95s (23.30%) |Others=0.31 (7.67%)|CurSamplesPerSec=1.96 |AvgSamplesPerSec=2.37 epoch: 0|step: 2864|ppo_ep: 1|act_loss: 0.039520263671875|cri_loss: 0.028411865234375|unsuper_loss: 0.0 average reward score: 3.65625 ------------------------------------------------------------------------------------- |E2E latency=3.99s |Gather latency=0.00s (0.00%) |Generate time=3.11s (77.87%) |Training time=0.67s (16.67%) |Others=0.22 (5.46%)|CurSamplesPerSec=2.00 |AvgSamplesPerSec=2.37 epoch: 0|step: 2865|ppo_ep: 1|act_loss: 0.0860595703125|cri_loss: 0.05792236328125|unsuper_loss: 0.0 average reward score: 2.9453125 ------------------------------------------------------------------------------------- |E2E latency=4.03s |Gather latency=0.00s (0.00%) |Generate time=3.16s (78.34%) |Training time=0.65s (16.13%) |Others=0.22 (5.53%)|CurSamplesPerSec=1.98 |AvgSamplesPerSec=2.37 epoch: 0|step: 2866|ppo_ep: 1|act_loss: -0.016021728515625|cri_loss: 0.0058135986328125|unsuper_loss: 0.0 average reward score: 3.23046875 ------------------------------------------------------------------------------------- |E2E latency=4.00s |Gather latency=0.00s (0.00%) |Generate time=3.13s (78.25%) |Training time=0.64s (16.11%) |Others=0.23 (5.63%)|CurSamplesPerSec=2.00 |AvgSamplesPerSec=2.37 epoch: 0|step: 2867|ppo_ep: 1|act_loss: -0.01035308837890625|cri_loss: 0.0016632080078125|unsuper_loss: 0.0 average reward score: 3.677734375 ------------------------------------------------------------------------------------- |E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.75s (75.75%) |Training time=0.65s (17.85%) |Others=0.23 (6.40%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.37 epoch: 0|step: 2868|ppo_ep: 1|act_loss: 0.0762939453125|cri_loss: 0.0545654296875|unsuper_loss: 0.0 average reward score: 3.8984375 ------------------------------------------------------------------------------------- |E2E latency=4.05s |Gather latency=0.00s (0.00%) |Generate time=3.18s (78.47%) |Training time=0.65s (16.03%) |Others=0.22 (5.50%)|CurSamplesPerSec=1.98 |AvgSamplesPerSec=2.37 epoch: 0|step: 2869|ppo_ep: 1|act_loss: -0.039520263671875|cri_loss: -0.010833740234375|unsuper_loss: 0.0 average reward score: 4.75 ------------------------------------------------------------------------------------- |E2E latency=3.80s |Gather latency=0.00s (0.00%) |Generate time=2.92s (76.84%) |Training time=0.66s (17.36%) |Others=0.22 (5.79%)|CurSamplesPerSec=2.10 |AvgSamplesPerSec=2.37 epoch: 0|step: 2870|ppo_ep: 1|act_loss: -0.01605224609375|cri_loss: -0.0023345947265625|unsuper_loss: 0.0 average reward score: 3.1328125 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.74s (75.81%) |Training time=0.65s (17.99%) |Others=0.22 (6.20%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.37 epoch: 0|step: 2871|ppo_ep: 1|act_loss: 0.040069580078125|cri_loss: 0.036651611328125|unsuper_loss: 0.0 average reward score: 2.81640625 ------------------------------------------------------------------------------------- |E2E latency=4.10s |Gather latency=0.00s (0.00%) |Generate time=2.85s (69.43%) |Training time=0.94s (22.86%) |Others=0.32 (7.70%)|CurSamplesPerSec=1.95 |AvgSamplesPerSec=2.37 epoch: 0|step: 2872|ppo_ep: 1|act_loss: 0.030059814453125|cri_loss: 0.020111083984375|unsuper_loss: 0.0 average reward score: 4.41015625 ------------------------------------------------------------------------------------- |E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.67s (75.83%) |Training time=0.64s (18.20%) |Others=0.21 (5.96%)|CurSamplesPerSec=2.28 |AvgSamplesPerSec=2.37 epoch: 0|step: 2873|ppo_ep: 1|act_loss: 0.030731201171875|cri_loss: 0.0263519287109375|unsuper_loss: 0.0 average reward score: 3.7734375 ------------------------------------------------------------------------------------- |E2E latency=3.66s |Gather latency=0.00s (0.00%) |Generate time=2.78s (75.96%) |Training time=0.66s (18.12%) |Others=0.22 (5.92%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.37 epoch: 0|step: 2874|ppo_ep: 1|act_loss: 0.142578125|cri_loss: 0.0819091796875|unsuper_loss: 0.0 average reward score: 3.9921875 ------------------------------------------------------------------------------------- |E2E latency=3.89s |Gather latency=0.00s (0.00%) |Generate time=3.02s (77.69%) |Training time=0.65s (16.58%) |Others=0.22 (5.74%)|CurSamplesPerSec=2.05 |AvgSamplesPerSec=2.37 epoch: 0|step: 2875|ppo_ep: 1|act_loss: -0.0075836181640625|cri_loss: 0.009674072265625|unsuper_loss: 0.0 average reward score: 4.4140625 ------------------------------------------------------------------------------------- |E2E latency=3.85s |Gather latency=0.00s (0.00%) |Generate time=2.99s (77.69%) |Training time=0.65s (16.88%) |Others=0.21 (5.43%)|CurSamplesPerSec=2.08 |AvgSamplesPerSec=2.37 epoch: 0|step: 2876|ppo_ep: 1|act_loss: 0.02960205078125|cri_loss: 0.0191497802734375|unsuper_loss: 0.0 average reward score: 3.19921875 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.63s (73.39%) |Training time=0.73s (20.36%) |Others=0.22 (6.25%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.37 epoch: 0|step: 2877|ppo_ep: 1|act_loss: 0.191162109375|cri_loss: 0.10662841796875|unsuper_loss: 0.0 average reward score: 3.40234375 ------------------------------------------------------------------------------------- |E2E latency=4.02s |Gather latency=0.00s (0.00%) |Generate time=2.66s (66.13%) |Training time=1.13s (28.23%) |Others=0.23 (5.63%)|CurSamplesPerSec=1.99 |AvgSamplesPerSec=2.37 epoch: 0|step: 2878|ppo_ep: 1|act_loss: 0.11553955078125|cri_loss: 0.07574462890625|unsuper_loss: 0.0 average reward score: 3.4140625 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.56s (71.21%) |Training time=0.82s (22.68%) |Others=0.22 (6.11%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.37 [2023-04-24 16:29:55,092] [INFO] [logging.py:96:log_dist] [Rank 0] step=360, skipped=5, lr=[2.2628235027224777e-06, 2.2628235027224777e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 16:29:55,340] [INFO] [timer.py:199:stop] epoch=0/micro_step=2880/global_step=360, RunningAvgSamplesPerSec=15.341536557828125, CurrSamplesPerSec=13.127428932112721, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 16:29:55,594] [INFO] [logging.py:96:log_dist] [Rank 0] step=360, skipped=4, lr=[1.1547941170915686e-06, 1.1547941170915686e-06], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 2879|ppo_ep: 1|act_loss: -0.060791015625|cri_loss: -0.01885986328125|unsuper_loss: 0.0 average reward score: 3.99609375 ------------------------------------------------------------------------------------- |E2E latency=4.05s |Gather latency=0.00s (0.00%) |Generate time=2.72s (67.23%) |Training time=1.00s (24.59%) |Others=0.33 (8.18%)|CurSamplesPerSec=1.98 |AvgSamplesPerSec=2.37 epoch: 0|step: 2880|ppo_ep: 1|act_loss: 0.122802734375|cri_loss: 0.07342529296875|unsuper_loss: 0.0 average reward score: 4.5078125 ------------------------------------------------------------------------------------- |E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.62s (74.54%) |Training time=0.67s (18.95%) |Others=0.23 (6.50%)|CurSamplesPerSec=2.28 |AvgSamplesPerSec=2.37 epoch: 0|step: 2881|ppo_ep: 1|act_loss: -0.05279541015625|cri_loss: -0.021270751953125|unsuper_loss: 0.0 average reward score: 4.640625 ------------------------------------------------------------------------------------- |E2E latency=3.74s |Gather latency=0.00s (0.00%) |Generate time=2.74s (73.17%) |Training time=0.79s (21.15%) |Others=0.21 (5.68%)|CurSamplesPerSec=2.14 |AvgSamplesPerSec=2.37 epoch: 0|step: 2882|ppo_ep: 1|act_loss: 0.19580078125|cri_loss: 0.1278076171875|unsuper_loss: 0.0 average reward score: 2.1015625 ------------------------------------------------------------------------------------- |E2E latency=3.82s |Gather latency=0.00s (0.00%) |Generate time=2.96s (77.44%) |Training time=0.64s (16.88%) |Others=0.22 (5.68%)|CurSamplesPerSec=2.10 |AvgSamplesPerSec=2.37 epoch: 0|step: 2883|ppo_ep: 1|act_loss: -0.05242919921875|cri_loss: -0.0179595947265625|unsuper_loss: 0.0 average reward score: 4.85546875 ------------------------------------------------------------------------------------- |E2E latency=3.91s |Gather latency=0.00s (0.00%) |Generate time=3.03s (77.59%) |Training time=0.66s (16.87%) |Others=0.22 (5.54%)|CurSamplesPerSec=2.05 |AvgSamplesPerSec=2.37 epoch: 0|step: 2884|ppo_ep: 1|act_loss: 0.01224517822265625|cri_loss: 0.0257568359375|unsuper_loss: 0.0 average reward score: 3.703125 ------------------------------------------------------------------------------------- |E2E latency=3.44s |Gather latency=0.00s (0.00%) |Generate time=2.56s (74.31%) |Training time=0.65s (18.89%) |Others=0.23 (6.80%)|CurSamplesPerSec=2.32 |AvgSamplesPerSec=2.37 epoch: 0|step: 2885|ppo_ep: 1|act_loss: -0.02569580078125|cri_loss: -0.0045623779296875|unsuper_loss: 0.0 average reward score: 4.1796875 ------------------------------------------------------------------------------------- |E2E latency=3.95s |Gather latency=0.00s (0.00%) |Generate time=3.07s (77.73%) |Training time=0.66s (16.68%) |Others=0.22 (5.59%)|CurSamplesPerSec=2.02 |AvgSamplesPerSec=2.37 epoch: 0|step: 2886|ppo_ep: 1|act_loss: 0.136962890625|cri_loss: 0.0894775390625|unsuper_loss: 0.0 average reward score: 1.935546875 ------------------------------------------------------------------------------------- |E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.67s (75.48%) |Training time=0.64s (17.93%) |Others=0.23 (6.59%)|CurSamplesPerSec=2.26 |AvgSamplesPerSec=2.37 epoch: 0|step: 2887|ppo_ep: 1|act_loss: -0.1070556640625|cri_loss: -0.034820556640625|unsuper_loss: 0.0 average reward score: 4.15625 ------------------------------------------------------------------------------------- |E2E latency=3.75s |Gather latency=0.00s (0.00%) |Generate time=2.52s (67.19%) |Training time=0.94s (25.07%) |Others=0.29 (7.74%)|CurSamplesPerSec=2.13 |AvgSamplesPerSec=2.37 epoch: 0|step: 2888|ppo_ep: 1|act_loss: 0.1121826171875|cri_loss: 0.06768798828125|unsuper_loss: 0.0 average reward score: 2.90234375 ------------------------------------------------------------------------------------- |E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.63s (75.99%) |Training time=0.64s (18.42%) |Others=0.19 (5.59%)|CurSamplesPerSec=2.31 |AvgSamplesPerSec=2.37 epoch: 0|step: 2889|ppo_ep: 1|act_loss: 0.2005615234375|cri_loss: 0.121826171875|unsuper_loss: 0.0 average reward score: 2.52734375 ------------------------------------------------------------------------------------- |E2E latency=3.92s |Gather latency=0.00s (0.00%) |Generate time=2.70s (69.05%) |Training time=0.64s (16.40%) |Others=0.57 (14.55%)|CurSamplesPerSec=2.04 |AvgSamplesPerSec=2.37 epoch: 0|step: 2890|ppo_ep: 1|act_loss: 0.0295257568359375|cri_loss: 0.0204925537109375|unsuper_loss: 0.0 average reward score: 3.27734375 ------------------------------------------------------------------------------------- |E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.51s (71.82%) |Training time=0.74s (21.20%) |Others=0.24 (6.98%)|CurSamplesPerSec=2.29 |AvgSamplesPerSec=2.37 epoch: 0|step: 2891|ppo_ep: 1|act_loss: 0.0943603515625|cri_loss: 0.057464599609375|unsuper_loss: 0.0 average reward score: 3.6796875 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.31%) |Training time=0.67s (19.87%) |Others=0.19 (5.82%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.37 epoch: 0|step: 2892|ppo_ep: 1|act_loss: 0.1834716796875|cri_loss: 0.10693359375|unsuper_loss: 0.0 average reward score: 2.78515625 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.15%) |Training time=0.64s (19.42%) |Others=0.21 (6.43%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.37 epoch: 0|step: 2893|ppo_ep: 1|act_loss: -0.0316162109375|cri_loss: -0.0089111328125|unsuper_loss: 0.0 average reward score: 3.51171875 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.41s (72.63%) |Training time=0.71s (21.49%) |Others=0.20 (5.88%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.37 epoch: 0|step: 2894|ppo_ep: 1|act_loss: 0.22314453125|cri_loss: 0.134765625|unsuper_loss: 0.0 average reward score: 2.623046875 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.53%) |Training time=0.65s (19.49%) |Others=0.20 (5.98%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.37 epoch: 0|step: 2895|ppo_ep: 1|act_loss: -0.07830810546875|cri_loss: -0.0240325927734375|unsuper_loss: 0.0 average reward score: 4.5625 ------------------------------------------------------------------------------------- |E2E latency=3.69s |Gather latency=0.00s (0.00%) |Generate time=2.47s (66.90%) |Training time=0.94s (25.34%) |Others=0.29 (7.76%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.37 epoch: 0|step: 2896|ppo_ep: 1|act_loss: 0.37646484375|cri_loss: 0.2476806640625|unsuper_loss: 0.0 average reward score: 3.228515625 ------------------------------------------------------------------------------------- |E2E latency=3.43s |Gather latency=0.00s (0.00%) |Generate time=2.59s (75.56%) |Training time=0.64s (18.73%) |Others=0.20 (5.71%)|CurSamplesPerSec=2.33 |AvgSamplesPerSec=2.37 epoch: 0|step: 2897|ppo_ep: 1|act_loss: 0.025543212890625|cri_loss: 0.031005859375|unsuper_loss: 0.0 average reward score: 2.93359375 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.81%) |Training time=0.64s (19.32%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.37 epoch: 0|step: 2898|ppo_ep: 1|act_loss: 0.09234619140625|cri_loss: 0.05462646484375|unsuper_loss: 0.0 average reward score: 3.830078125 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.43s (73.84%) |Training time=0.66s (20.09%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.37 epoch: 0|step: 2899|ppo_ep: 1|act_loss: -0.00360107421875|cri_loss: 0.029266357421875|unsuper_loss: 0.0 average reward score: 3.3671875 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.54%) |Training time=0.65s (19.40%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.37 epoch: 0|step: 2900|ppo_ep: 1|act_loss: 0.193115234375|cri_loss: 0.1435546875|unsuper_loss: 0.0 average reward score: 2.59765625 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.47s (73.98%) |Training time=0.66s (19.85%) |Others=0.21 (6.17%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.37 epoch: 0|step: 2901|ppo_ep: 1|act_loss: 0.0167999267578125|cri_loss: 0.0153350830078125|unsuper_loss: 0.0 average reward score: 3.6328125 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.73%) |Training time=0.65s (19.39%) |Others=0.20 (5.88%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.37 epoch: 0|step: 2902|ppo_ep: 1|act_loss: 0.159912109375|cri_loss: 0.124755859375|unsuper_loss: 0.0 average reward score: 4.359375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.12%) |Training time=0.64s (19.70%) |Others=0.20 (6.18%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.37 epoch: 0|step: 2903|ppo_ep: 1|act_loss: 0.286865234375|cri_loss: 0.18798828125|unsuper_loss: 0.0 average reward score: 2.8359375 ------------------------------------------------------------------------------------- |E2E latency=3.69s |Gather latency=0.00s (0.00%) |Generate time=2.47s (66.98%) |Training time=0.94s (25.41%) |Others=0.28 (7.61%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.37 epoch: 0|step: 2904|ppo_ep: 1|act_loss: 0.120849609375|cri_loss: 0.1033935546875|unsuper_loss: 0.0 average reward score: 3.6484375 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.86%) |Training time=0.64s (19.25%) |Others=0.20 (5.89%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.37 epoch: 0|step: 2905|ppo_ep: 1|act_loss: 0.1533203125|cri_loss: 0.09332275390625|unsuper_loss: 0.0 average reward score: 3.55078125 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.15%) |Training time=0.67s (19.86%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.37 epoch: 0|step: 2906|ppo_ep: 1|act_loss: 0.1527099609375|cri_loss: 0.1212158203125|unsuper_loss: 0.0 average reward score: 3.01953125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.53%) |Training time=0.64s (19.57%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.37 epoch: 0|step: 2907|ppo_ep: 1|act_loss: 0.48583984375|cri_loss: 0.295654296875|unsuper_loss: 0.0 average reward score: 3.37109375 ------------------------------------------------------------------------------------- |E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.47s (73.04%) |Training time=0.68s (20.25%) |Others=0.23 (6.71%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.37 epoch: 0|step: 2908|ppo_ep: 1|act_loss: 0.27392578125|cri_loss: 0.1689453125|unsuper_loss: 0.0 average reward score: 3.7265625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.70%) |Training time=0.66s (20.06%) |Others=0.20 (6.24%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.37 epoch: 0|step: 2909|ppo_ep: 1|act_loss: 0.203125|cri_loss: 0.11785888671875|unsuper_loss: 0.0 average reward score: 2.34765625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.93%) |Training time=0.64s (19.98%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.37 epoch: 0|step: 2910|ppo_ep: 1|act_loss: 0.2236328125|cri_loss: 0.150146484375|unsuper_loss: 0.0 average reward score: 2.25 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.18%) |Training time=0.64s (19.69%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.37 epoch: 0|step: 2911|ppo_ep: 1|act_loss: 0.45166015625|cri_loss: 0.28125|unsuper_loss: 0.0 average reward score: 3.265625 ------------------------------------------------------------------------------------- |E2E latency=3.67s |Gather latency=0.00s (0.00%) |Generate time=2.44s (66.54%) |Training time=0.94s (25.49%) |Others=0.29 (7.96%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.37 epoch: 0|step: 2912|ppo_ep: 1|act_loss: 0.35595703125|cri_loss: 0.2220458984375|unsuper_loss: 0.0 average reward score: 3.74609375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.29%) |Training time=0.67s (20.75%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.37 epoch: 0|step: 2913|ppo_ep: 1|act_loss: 0.447265625|cri_loss: 0.2822265625|unsuper_loss: 0.0 average reward score: 3.46484375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.05%) |Training time=0.64s (19.94%) |Others=0.19 (6.02%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.37 epoch: 0|step: 2914|ppo_ep: 1|act_loss: -0.01171875|cri_loss: 0.00121307373046875|unsuper_loss: 0.0 average reward score: 3.7109375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.54%) |Training time=0.65s (20.26%) |Others=0.20 (6.20%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.37 epoch: 0|step: 2915|ppo_ep: 1|act_loss: 0.19482421875|cri_loss: 0.130859375|unsuper_loss: 0.0 average reward score: 3.61328125 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.52s (74.83%) |Training time=0.65s (19.22%) |Others=0.20 (5.95%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.37 epoch: 0|step: 2916|ppo_ep: 1|act_loss: 0.330078125|cri_loss: 0.2041015625|unsuper_loss: 0.0 average reward score: 2.5 ------------------------------------------------------------------------------------- |E2E latency=3.42s |Gather latency=0.00s (0.00%) |Generate time=2.56s (74.92%) |Training time=0.65s (18.91%) |Others=0.21 (6.17%)|CurSamplesPerSec=2.34 |AvgSamplesPerSec=2.37 epoch: 0|step: 2917|ppo_ep: 1|act_loss: 0.126220703125|cri_loss: 0.08648681640625|unsuper_loss: 0.0 average reward score: 3.62109375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.47%) |Training time=0.64s (19.61%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.37 epoch: 0|step: 2918|ppo_ep: 1|act_loss: 0.51416015625|cri_loss: 0.30322265625|unsuper_loss: 0.0 average reward score: 3.666015625 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.52s (74.91%) |Training time=0.65s (19.19%) |Others=0.20 (5.90%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.37 epoch: 0|step: 2919|ppo_ep: 1|act_loss: 0.2490234375|cri_loss: 0.1533203125|unsuper_loss: 0.0 average reward score: 3.109375 ------------------------------------------------------------------------------------- |E2E latency=3.69s |Gather latency=0.00s (0.00%) |Generate time=2.46s (66.71%) |Training time=0.94s (25.32%) |Others=0.29 (7.97%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.37 epoch: 0|step: 2920|ppo_ep: 1|act_loss: 0.17333984375|cri_loss: 0.11004638671875|unsuper_loss: 0.0 average reward score: 2.88671875 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.44s (73.81%) |Training time=0.67s (20.39%) |Others=0.19 (5.80%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.37 epoch: 0|step: 2921|ppo_ep: 1|act_loss: 0.1689453125|cri_loss: 0.130859375|unsuper_loss: 0.0 average reward score: 2.666015625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.17%) |Training time=0.64s (19.82%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.37 epoch: 0|step: 2922|ppo_ep: 1|act_loss: 0.2548828125|cri_loss: 0.1766357421875|unsuper_loss: 0.0 average reward score: 3.3828125 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.24%) |Training time=0.64s (19.83%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.37 epoch: 0|step: 2923|ppo_ep: 1|act_loss: 0.040130615234375|cri_loss: 0.02825927734375|unsuper_loss: 0.0 average reward score: 2.83203125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.32s (70.99%) |Training time=0.75s (22.89%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.37 epoch: 0|step: 2924|ppo_ep: 1|act_loss: 0.11822509765625|cri_loss: 0.09521484375|unsuper_loss: 0.0 average reward score: 3.09765625 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.31s (72.45%) |Training time=0.68s (21.40%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.37 epoch: 0|step: 2925|ppo_ep: 1|act_loss: 0.2125244140625|cri_loss: 0.1435546875|unsuper_loss: 0.0 average reward score: 3.693359375 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.30s (71.72%) |Training time=0.71s (22.19%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.37 epoch: 0|step: 2926|ppo_ep: 1|act_loss: 0.34375|cri_loss: 0.2257080078125|unsuper_loss: 0.0 average reward score: 3.791015625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.30s (71.53%) |Training time=0.72s (22.52%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.37 epoch: 0|step: 2927|ppo_ep: 1|act_loss: 0.356689453125|cri_loss: 0.2242431640625|unsuper_loss: 0.0 average reward score: 3.2421875 ------------------------------------------------------------------------------------- |E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.30s (64.52%) |Training time=0.98s (27.67%) |Others=0.28 (7.82%)|CurSamplesPerSec=2.25 |AvgSamplesPerSec=2.37 epoch: 0|step: 2928|ppo_ep: 1|act_loss: 0.08154296875|cri_loss: 0.0966796875|unsuper_loss: 0.0 average reward score: 3.25 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.30s (71.89%) |Training time=0.71s (22.22%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.37 epoch: 0|step: 2929|ppo_ep: 1|act_loss: 0.255859375|cri_loss: 0.190185546875|unsuper_loss: 0.0 average reward score: 2.8984375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.16%) |Training time=0.64s (19.74%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.37 epoch: 0|step: 2930|ppo_ep: 1|act_loss: 0.02447509765625|cri_loss: 0.046844482421875|unsuper_loss: 0.0 average reward score: 2.5390625 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.49%) |Training time=0.64s (19.41%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.37 epoch: 0|step: 2931|ppo_ep: 1|act_loss: 0.1337890625|cri_loss: 0.0909423828125|unsuper_loss: 0.0 average reward score: 4.1953125 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.17%) |Training time=0.65s (19.84%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.37 epoch: 0|step: 2932|ppo_ep: 1|act_loss: 0.11669921875|cri_loss: 0.11224365234375|unsuper_loss: 0.0 average reward score: 4.0078125 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.57%) |Training time=0.65s (20.16%) |Others=0.20 (6.27%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.37 epoch: 0|step: 2933|ppo_ep: 1|act_loss: 0.1590576171875|cri_loss: 0.1082763671875|unsuper_loss: 0.0 average reward score: 3.0703125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.17%) |Training time=0.64s (19.83%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.37 epoch: 0|step: 2934|ppo_ep: 1|act_loss: 0.1304931640625|cri_loss: 0.10833740234375|unsuper_loss: 0.0 average reward score: 3.58203125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.21%) |Training time=0.64s (19.81%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.37 epoch: 0|step: 2935|ppo_ep: 1|act_loss: -0.0181884765625|cri_loss: 0.029937744140625|unsuper_loss: 0.0 average reward score: 3.4296875 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.69%) |Training time=0.92s (25.40%) |Others=0.29 (7.90%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.37 epoch: 0|step: 2936|ppo_ep: 1|act_loss: -0.10394287109375|cri_loss: -0.02105712890625|unsuper_loss: 0.0 average reward score: 3.6484375 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.31s (71.80%) |Training time=0.71s (22.24%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.37 epoch: 0|step: 2937|ppo_ep: 1|act_loss: -0.268310546875|cri_loss: -0.08447265625|unsuper_loss: 0.0 average reward score: 3.546875 ------------------------------------------------------------------------------------- |E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.50s (73.68%) |Training time=0.66s (19.44%) |Others=0.23 (6.88%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.37 epoch: 0|step: 2938|ppo_ep: 1|act_loss: -0.646484375|cri_loss: -0.26123046875|unsuper_loss: 0.0 average reward score: 2.244140625 ------------------------------------------------------------------------------------- |E2E latency=3.65s |Gather latency=0.00s (0.00%) |Generate time=2.75s (75.26%) |Training time=0.67s (18.29%) |Others=0.24 (6.45%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.37 epoch: 0|step: 2939|ppo_ep: 1|act_loss: -0.1688232421875|cri_loss: -0.0545654296875|unsuper_loss: 0.0 average reward score: 3.765625 ------------------------------------------------------------------------------------- |E2E latency=3.95s |Gather latency=0.00s (0.00%) |Generate time=3.04s (77.09%) |Training time=0.68s (17.13%) |Others=0.23 (5.78%)|CurSamplesPerSec=2.03 |AvgSamplesPerSec=2.37 epoch: 0|step: 2940|ppo_ep: 1|act_loss: 0.058074951171875|cri_loss: 0.07073974609375|unsuper_loss: 0.0 average reward score: 2.01171875 ------------------------------------------------------------------------------------- |E2E latency=3.71s |Gather latency=0.00s (0.00%) |Generate time=2.80s (75.43%) |Training time=0.69s (18.70%) |Others=0.22 (5.87%)|CurSamplesPerSec=2.16 |AvgSamplesPerSec=2.37 epoch: 0|step: 2941|ppo_ep: 1|act_loss: -0.014801025390625|cri_loss: 0.00921630859375|unsuper_loss: 0.0 average reward score: 4.32421875 ------------------------------------------------------------------------------------- |E2E latency=3.59s |Gather latency=0.00s (0.00%) |Generate time=2.69s (75.09%) |Training time=0.67s (18.63%) |Others=0.23 (6.28%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.37 epoch: 0|step: 2942|ppo_ep: 1|act_loss: -0.2305908203125|cri_loss: -0.0908203125|unsuper_loss: 0.0 average reward score: 3.72265625 ------------------------------------------------------------------------------------- |E2E latency=3.83s |Gather latency=0.00s (0.00%) |Generate time=2.95s (77.07%) |Training time=0.65s (16.88%) |Others=0.23 (6.04%)|CurSamplesPerSec=2.09 |AvgSamplesPerSec=2.37 epoch: 0|step: 2943|ppo_ep: 1|act_loss: 0.1448974609375|cri_loss: 0.09930419921875|unsuper_loss: 0.0 average reward score: 2.78515625 ------------------------------------------------------------------------------------- |E2E latency=3.93s |Gather latency=0.00s (0.00%) |Generate time=2.61s (66.42%) |Training time=1.00s (25.53%) |Others=0.32 (8.05%)|CurSamplesPerSec=2.03 |AvgSamplesPerSec=2.37 epoch: 0|step: 2944|ppo_ep: 1|act_loss: -0.07501220703125|cri_loss: 0.00177001953125|unsuper_loss: 0.0 average reward score: 4.3046875 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.44s (67.28%) |Training time=0.97s (26.85%) |Others=0.21 (5.88%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.37 epoch: 0|step: 2945|ppo_ep: 1|act_loss: 0.0323486328125|cri_loss: 0.0826416015625|unsuper_loss: 0.0 average reward score: 3.591796875 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.57%) |Training time=0.65s (19.57%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.37 epoch: 0|step: 2946|ppo_ep: 1|act_loss: -0.185302734375|cri_loss: -0.05877685546875|unsuper_loss: 0.0 average reward score: 3.2109375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.30%) |Training time=0.64s (19.75%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.37 epoch: 0|step: 2947|ppo_ep: 1|act_loss: -0.090087890625|cri_loss: -0.020538330078125|unsuper_loss: 0.0 average reward score: 3.66796875 ------------------------------------------------------------------------------------- |E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.61s (74.94%) |Training time=0.65s (18.52%) |Others=0.23 (6.54%)|CurSamplesPerSec=2.29 |AvgSamplesPerSec=2.37 epoch: 0|step: 2948|ppo_ep: 1|act_loss: -0.0689697265625|cri_loss: -0.004486083984375|unsuper_loss: 0.0 average reward score: 3.599609375 ------------------------------------------------------------------------------------- |E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.64s (75.12%) |Training time=0.64s (18.24%) |Others=0.23 (6.64%)|CurSamplesPerSec=2.28 |AvgSamplesPerSec=2.37 epoch: 0|step: 2949|ppo_ep: 1|act_loss: -0.3076171875|cri_loss: -0.108642578125|unsuper_loss: 0.0 average reward score: 3.703125 ------------------------------------------------------------------------------------- |E2E latency=3.75s |Gather latency=0.00s (0.00%) |Generate time=2.80s (74.79%) |Training time=0.70s (18.71%) |Others=0.24 (6.50%)|CurSamplesPerSec=2.13 |AvgSamplesPerSec=2.37 epoch: 0|step: 2950|ppo_ep: 1|act_loss: -0.04620361328125|cri_loss: 0.004638671875|unsuper_loss: 0.0 average reward score: 3.61328125 ------------------------------------------------------------------------------------- |E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.66s (75.04%) |Training time=0.65s (18.24%) |Others=0.24 (6.72%)|CurSamplesPerSec=2.26 |AvgSamplesPerSec=2.37 epoch: 0|step: 2951|ppo_ep: 1|act_loss: -0.09869384765625|cri_loss: -0.0225830078125|unsuper_loss: 0.0 average reward score: 3.234375 ------------------------------------------------------------------------------------- |E2E latency=4.25s |Gather latency=0.00s (0.00%) |Generate time=2.99s (70.36%) |Training time=0.94s (22.13%) |Others=0.32 (7.51%)|CurSamplesPerSec=1.88 |AvgSamplesPerSec=2.37 epoch: 0|step: 2952|ppo_ep: 1|act_loss: -0.12310791015625|cri_loss: -0.029296875|unsuper_loss: 0.0 average reward score: 3.54296875 ------------------------------------------------------------------------------------- |E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.59s (73.10%) |Training time=0.72s (20.48%) |Others=0.23 (6.42%)|CurSamplesPerSec=2.26 |AvgSamplesPerSec=2.37 epoch: 0|step: 2953|ppo_ep: 1|act_loss: -0.016571044921875|cri_loss: 0.015869140625|unsuper_loss: 0.0 average reward score: 4.1484375 ------------------------------------------------------------------------------------- |E2E latency=3.41s |Gather latency=0.00s (0.00%) |Generate time=2.55s (74.77%) |Training time=0.64s (18.88%) |Others=0.22 (6.35%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.37 epoch: 0|step: 2954|ppo_ep: 1|act_loss: 0.0640869140625|cri_loss: 0.080078125|unsuper_loss: 0.0 average reward score: 2.896484375 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.75%) |Training time=0.64s (19.26%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.37 epoch: 0|step: 2955|ppo_ep: 1|act_loss: 0.166748046875|cri_loss: 0.1168212890625|unsuper_loss: 0.0 average reward score: 3.419921875 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.42%) |Training time=0.65s (19.61%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.37 epoch: 0|step: 2956|ppo_ep: 1|act_loss: -0.163330078125|cri_loss: -0.0621337890625|unsuper_loss: 0.0 average reward score: 4.390625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.44%) |Training time=0.65s (20.15%) |Others=0.21 (6.42%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.37 epoch: 0|step: 2957|ppo_ep: 1|act_loss: -0.02545166015625|cri_loss: 0.023834228515625|unsuper_loss: 0.0 average reward score: 4.2578125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.22%) |Training time=0.64s (19.63%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.37 epoch: 0|step: 2958|ppo_ep: 1|act_loss: -0.0560302734375|cri_loss: -0.00103759765625|unsuper_loss: 0.0 average reward score: 3.12109375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.17%) |Training time=0.64s (19.81%) |Others=0.20 (6.02%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.37 [2023-04-24 16:34:30,612] [INFO] [logging.py:96:log_dist] [Rank 0] step=370, skipped=5, lr=[1.9305521592903324e-06, 1.9305521592903324e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 16:34:30,855] [INFO] [timer.py:199:stop] epoch=0/micro_step=2960/global_step=370, RunningAvgSamplesPerSec=15.340254456435, CurrSamplesPerSec=15.490980523415184, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 16:34:31,060] [INFO] [logging.py:96:log_dist] [Rank 0] step=370, skipped=4, lr=[9.836262435823316e-07, 9.836262435823316e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 2959|ppo_ep: 1|act_loss: -0.040191650390625|cri_loss: 0.006561279296875|unsuper_loss: 0.0 average reward score: 3.6875 ------------------------------------------------------------------------------------- |E2E latency=3.59s |Gather latency=0.00s (0.00%) |Generate time=2.39s (66.58%) |Training time=0.92s (25.61%) |Others=0.28 (7.81%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.37 epoch: 0|step: 2960|ppo_ep: 1|act_loss: 0.27880859375|cri_loss: 0.168212890625|unsuper_loss: 0.0 average reward score: 2.787109375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.66%) |Training time=0.64s (19.44%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.37 epoch: 0|step: 2961|ppo_ep: 1|act_loss: -0.25634765625|cri_loss: -0.08709716796875|unsuper_loss: 0.0 average reward score: 3.748046875 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.30%) |Training time=0.66s (19.81%) |Others=0.20 (5.89%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.37 epoch: 0|step: 2962|ppo_ep: 1|act_loss: -0.132080078125|cri_loss: -0.0438232421875|unsuper_loss: 0.0 average reward score: 3.607421875 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.41%) |Training time=0.64s (19.62%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.37 epoch: 0|step: 2963|ppo_ep: 1|act_loss: 0.32177734375|cri_loss: 0.220703125|unsuper_loss: 0.0 average reward score: 3.169921875 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.54%) |Training time=0.64s (19.54%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.37 epoch: 0|step: 2964|ppo_ep: 1|act_loss: 0.181396484375|cri_loss: 0.140625|unsuper_loss: 0.0 average reward score: 4.265625 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.72%) |Training time=0.65s (19.29%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.37 epoch: 0|step: 2965|ppo_ep: 1|act_loss: -0.0001220703125|cri_loss: 0.0227508544921875|unsuper_loss: 0.0 average reward score: 4.42578125 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (74.07%) |Training time=0.64s (19.95%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.37 epoch: 0|step: 2966|ppo_ep: 1|act_loss: 0.010498046875|cri_loss: 0.042572021484375|unsuper_loss: 0.0 average reward score: 2.611328125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.56%) |Training time=0.66s (20.41%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.37 epoch: 0|step: 2967|ppo_ep: 1|act_loss: 0.01153564453125|cri_loss: 0.030059814453125|unsuper_loss: 0.0 average reward score: 3.04296875 ------------------------------------------------------------------------------------- |E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.43%) |Training time=0.93s (25.69%) |Others=0.29 (7.89%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.37 epoch: 0|step: 2968|ppo_ep: 1|act_loss: -0.0020751953125|cri_loss: 0.019317626953125|unsuper_loss: 0.0 average reward score: 3.37890625 ------------------------------------------------------------------------------------- |E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.54s (74.77%) |Training time=0.66s (19.29%) |Others=0.20 (5.94%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.37 epoch: 0|step: 2969|ppo_ep: 1|act_loss: 0.031402587890625|cri_loss: 0.034423828125|unsuper_loss: 0.0 average reward score: 3.248046875 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.46%) |Training time=0.65s (19.56%) |Others=0.20 (5.98%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.37 epoch: 0|step: 2970|ppo_ep: 1|act_loss: 0.0946044921875|cri_loss: 0.09283447265625|unsuper_loss: 0.0 average reward score: 2.869140625 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.26%) |Training time=0.65s (19.54%) |Others=0.21 (6.20%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.37 epoch: 0|step: 2971|ppo_ep: 1|act_loss: -0.1708984375|cri_loss: -0.06524658203125|unsuper_loss: 0.0 average reward score: 3.19140625 ------------------------------------------------------------------------------------- |E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.64s (75.66%) |Training time=0.66s (18.87%) |Others=0.19 (5.47%)|CurSamplesPerSec=2.29 |AvgSamplesPerSec=2.37 epoch: 0|step: 2972|ppo_ep: 1|act_loss: 0.025909423828125|cri_loss: 0.02886962890625|unsuper_loss: 0.0 average reward score: 4.01171875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.12%) |Training time=0.64s (19.75%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.37 epoch: 0|step: 2973|ppo_ep: 1|act_loss: -0.017791748046875|cri_loss: 0.011322021484375|unsuper_loss: 0.0 average reward score: 2.517578125 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.43s (73.89%) |Training time=0.66s (20.13%) |Others=0.20 (5.98%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.37 epoch: 0|step: 2974|ppo_ep: 1|act_loss: 0.0802001953125|cri_loss: 0.0657958984375|unsuper_loss: 0.0 average reward score: 3.947265625 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.44s (73.80%) |Training time=0.66s (20.03%) |Others=0.20 (6.16%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.37 epoch: 0|step: 2975|ppo_ep: 1|act_loss: 0.21923828125|cri_loss: 0.1331787109375|unsuper_loss: 0.0 average reward score: 2.98046875 ------------------------------------------------------------------------------------- |E2E latency=3.66s |Gather latency=0.00s (0.00%) |Generate time=2.44s (66.67%) |Training time=0.94s (25.65%) |Others=0.28 (7.69%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.37 epoch: 0|step: 2976|ppo_ep: 1|act_loss: 0.0289459228515625|cri_loss: 0.032318115234375|unsuper_loss: 0.0 average reward score: 3.544921875 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.46s (73.74%) |Training time=0.66s (19.77%) |Others=0.22 (6.49%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.37 epoch: 0|step: 2977|ppo_ep: 1|act_loss: 0.544921875|cri_loss: 0.32470703125|unsuper_loss: 0.0 average reward score: 3.5859375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.17%) |Training time=0.65s (19.68%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.37 epoch: 0|step: 2978|ppo_ep: 1|act_loss: 0.150146484375|cri_loss: 0.0926513671875|unsuper_loss: 0.0 average reward score: 2.83203125 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.53s (74.92%) |Training time=0.64s (19.00%) |Others=0.21 (6.08%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.37 epoch: 0|step: 2979|ppo_ep: 1|act_loss: -0.301513671875|cri_loss: -0.1265869140625|unsuper_loss: 0.0 average reward score: 3.7421875 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.86%) |Training time=0.64s (20.10%) |Others=0.19 (6.04%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.37 epoch: 0|step: 2980|ppo_ep: 1|act_loss: -0.052093505859375|cri_loss: -0.013671875|unsuper_loss: 0.0 average reward score: 3.0859375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.60%) |Training time=0.66s (19.96%) |Others=0.21 (6.44%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.37 epoch: 0|step: 2981|ppo_ep: 1|act_loss: 0.115966796875|cri_loss: 0.07098388671875|unsuper_loss: 0.0 average reward score: 3.68359375 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.36s (71.35%) |Training time=0.76s (22.83%) |Others=0.19 (5.82%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.37 epoch: 0|step: 2982|ppo_ep: 1|act_loss: 0.16162109375|cri_loss: 0.10528564453125|unsuper_loss: 0.0 average reward score: 4.16015625 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.17%) |Training time=0.65s (19.71%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.37 epoch: 0|step: 2983|ppo_ep: 1|act_loss: 0.0849609375|cri_loss: 0.05426025390625|unsuper_loss: 0.0 average reward score: 2.984375 ------------------------------------------------------------------------------------- |E2E latency=3.59s |Gather latency=0.00s (0.00%) |Generate time=2.34s (65.25%) |Training time=0.96s (26.78%) |Others=0.29 (7.97%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.37 epoch: 0|step: 2984|ppo_ep: 1|act_loss: 0.00679779052734375|cri_loss: 0.019287109375|unsuper_loss: 0.0 average reward score: 2.90234375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.25%) |Training time=0.64s (19.74%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.37 epoch: 0|step: 2985|ppo_ep: 1|act_loss: 0.25146484375|cri_loss: 0.15283203125|unsuper_loss: 0.0 average reward score: 3.2578125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.36s (72.57%) |Training time=0.69s (21.16%) |Others=0.20 (6.27%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.37 epoch: 0|step: 2986|ppo_ep: 1|act_loss: -0.0677490234375|cri_loss: -0.016937255859375|unsuper_loss: 0.0 average reward score: 3.2890625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.49%) |Training time=0.66s (20.48%) |Others=0.19 (6.03%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.37 epoch: 0|step: 2987|ppo_ep: 1|act_loss: 0.1494140625|cri_loss: 0.096923828125|unsuper_loss: 0.0 average reward score: 3.00390625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.34s (71.56%) |Training time=0.73s (22.44%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.37 epoch: 0|step: 2988|ppo_ep: 1|act_loss: 0.04095458984375|cri_loss: 0.0438232421875|unsuper_loss: 0.0 average reward score: 3.3359375 ------------------------------------------------------------------------------------- |E2E latency=3.16s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.53%) |Training time=0.64s (20.24%) |Others=0.20 (6.23%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.37 epoch: 0|step: 2989|ppo_ep: 1|act_loss: 0.153564453125|cri_loss: 0.1043701171875|unsuper_loss: 0.0 average reward score: 3.79296875 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.43%) |Training time=0.64s (20.29%) |Others=0.20 (6.28%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.37 epoch: 0|step: 2990|ppo_ep: 1|act_loss: -0.0047607421875|cri_loss: 0.0114288330078125|unsuper_loss: 0.0 average reward score: 4.03515625 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.32s (72.91%) |Training time=0.67s (20.94%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.37 epoch: 0|step: 2991|ppo_ep: 1|act_loss: 0.1412353515625|cri_loss: 0.1005859375|unsuper_loss: 0.0 average reward score: 2.572265625 ------------------------------------------------------------------------------------- |E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.32s (65.69%) |Training time=0.94s (26.57%) |Others=0.27 (7.74%)|CurSamplesPerSec=2.26 |AvgSamplesPerSec=2.37 epoch: 0|step: 2992|ppo_ep: 1|act_loss: 0.19189453125|cri_loss: 0.1181640625|unsuper_loss: 0.0 average reward score: 3.599609375 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.32s (73.01%) |Training time=0.67s (21.11%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.37 epoch: 0|step: 2993|ppo_ep: 1|act_loss: 0.082275390625|cri_loss: 0.0596923828125|unsuper_loss: 0.0 average reward score: 3.56640625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.32s (71.70%) |Training time=0.68s (20.85%) |Others=0.24 (7.45%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.37 epoch: 0|step: 2994|ppo_ep: 1|act_loss: -0.035369873046875|cri_loss: 0.008392333984375|unsuper_loss: 0.0 average reward score: 4.01171875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.37s (72.69%) |Training time=0.64s (19.79%) |Others=0.24 (7.52%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.37 epoch: 0|step: 2995|ppo_ep: 1|act_loss: -0.14013671875|cri_loss: -0.0521240234375|unsuper_loss: 0.0 average reward score: 4.0546875 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.67%) |Training time=0.65s (20.29%) |Others=0.19 (6.05%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.37 epoch: 0|step: 2996|ppo_ep: 1|act_loss: 0.0457763671875|cri_loss: 0.04290771484375|unsuper_loss: 0.0 average reward score: 3.58984375 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.26%) |Training time=0.66s (20.56%) |Others=0.20 (6.18%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.37 epoch: 0|step: 2997|ppo_ep: 1|act_loss: -0.1429443359375|cri_loss: -0.053009033203125|unsuper_loss: 0.0 average reward score: 4.07421875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.30%) |Training time=0.64s (19.49%) |Others=0.24 (7.21%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.37 epoch: 0|step: 2998|ppo_ep: 1|act_loss: 0.384521484375|cri_loss: 0.2342529296875|unsuper_loss: 0.0 average reward score: 2.166015625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.39%) |Training time=0.64s (19.49%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.37 epoch: 0|step: 2999|ppo_ep: 1|act_loss: -0.0631103515625|cri_loss: -0.002838134765625|unsuper_loss: 0.0 average reward score: 2.59375 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.43s (66.76%) |Training time=0.93s (25.58%) |Others=0.28 (7.66%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.37 epoch: 0|step: 3000|ppo_ep: 1|act_loss: 0.000457763671875|cri_loss: 0.022735595703125|unsuper_loss: 0.0 average reward score: 3.40625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.16%) |Training time=0.64s (19.80%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.37 epoch: 0|step: 3001|ppo_ep: 1|act_loss: -0.1126708984375|cri_loss: -0.028350830078125|unsuper_loss: 0.0 average reward score: 4.03125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.83%) |Training time=0.66s (20.12%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.37 epoch: 0|step: 3002|ppo_ep: 1|act_loss: 0.150146484375|cri_loss: 0.1011962890625|unsuper_loss: 0.0 average reward score: 3.25390625 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.60%) |Training time=0.68s (20.55%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.37 epoch: 0|step: 3003|ppo_ep: 1|act_loss: 0.201416015625|cri_loss: 0.1175537109375|unsuper_loss: 0.0 average reward score: 3.4375 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.15%) |Training time=0.65s (19.77%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.37 epoch: 0|step: 3004|ppo_ep: 1|act_loss: -0.02911376953125|cri_loss: 0.023529052734375|unsuper_loss: 0.0 average reward score: 3.484375 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.42s (72.36%) |Training time=0.73s (21.73%) |Others=0.20 (5.91%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.37 epoch: 0|step: 3005|ppo_ep: 1|act_loss: -0.25927734375|cri_loss: -0.1058349609375|unsuper_loss: 0.0 average reward score: 4.0078125 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.44s (73.57%) |Training time=0.68s (20.55%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.37 epoch: 0|step: 3006|ppo_ep: 1|act_loss: -0.108154296875|cri_loss: -0.037933349609375|unsuper_loss: 0.0 average reward score: 3.478515625 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.56%) |Training time=0.64s (19.11%) |Others=0.21 (6.33%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.37 epoch: 0|step: 3007|ppo_ep: 1|act_loss: 0.03411865234375|cri_loss: 0.035888671875|unsuper_loss: 0.0 average reward score: 5.0546875 ------------------------------------------------------------------------------------- |E2E latency=3.67s |Gather latency=0.00s (0.00%) |Generate time=2.45s (66.79%) |Training time=0.93s (25.21%) |Others=0.29 (8.00%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.37 epoch: 0|step: 3008|ppo_ep: 1|act_loss: -0.03265380859375|cri_loss: 0.006317138671875|unsuper_loss: 0.0 average reward score: 2.68359375 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.56%) |Training time=0.66s (20.55%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.37 epoch: 0|step: 3009|ppo_ep: 1|act_loss: -0.0394287109375|cri_loss: -0.00128173828125|unsuper_loss: 0.0 average reward score: 4.18359375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.21%) |Training time=0.64s (19.81%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.37 epoch: 0|step: 3010|ppo_ep: 1|act_loss: -0.147705078125|cri_loss: -0.0565185546875|unsuper_loss: 0.0 average reward score: 3.15625 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.44s (73.68%) |Training time=0.65s (19.48%) |Others=0.23 (6.84%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.37 epoch: 0|step: 3011|ppo_ep: 1|act_loss: 0.03533935546875|cri_loss: 0.028533935546875|unsuper_loss: 0.0 average reward score: 4.53125 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.58%) |Training time=0.64s (19.41%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.37 epoch: 0|step: 3012|ppo_ep: 1|act_loss: 0.102294921875|cri_loss: 0.09210205078125|unsuper_loss: 0.0 average reward score: 2.69140625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.42%) |Training time=0.64s (19.49%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.37 epoch: 0|step: 3013|ppo_ep: 1|act_loss: 0.080078125|cri_loss: 0.05767822265625|unsuper_loss: 0.0 average reward score: 4.2265625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.33%) |Training time=0.64s (19.60%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.37 epoch: 0|step: 3014|ppo_ep: 1|act_loss: -0.03216552734375|cri_loss: -0.000152587890625|unsuper_loss: 0.0 average reward score: 3.111328125 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.17%) |Training time=0.65s (19.78%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.37 epoch: 0|step: 3015|ppo_ep: 1|act_loss: -0.2471923828125|cri_loss: -0.1026611328125|unsuper_loss: 0.0 average reward score: 3.83984375 ------------------------------------------------------------------------------------- |E2E latency=3.66s |Gather latency=0.00s (0.00%) |Generate time=2.45s (66.83%) |Training time=0.94s (25.57%) |Others=0.28 (7.60%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.37 epoch: 0|step: 3016|ppo_ep: 1|act_loss: -0.174072265625|cri_loss: -0.058319091796875|unsuper_loss: 0.0 average reward score: 3.255859375 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.31s (72.62%) |Training time=0.68s (21.29%) |Others=0.19 (6.09%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.37 epoch: 0|step: 3017|ppo_ep: 1|act_loss: -0.041168212890625|cri_loss: -0.009796142578125|unsuper_loss: 0.0 average reward score: 4.2421875 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.28%) |Training time=0.66s (20.67%) |Others=0.19 (6.05%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.37 epoch: 0|step: 3018|ppo_ep: 1|act_loss: -0.07098388671875|cri_loss: -0.019134521484375|unsuper_loss: 0.0 average reward score: 3.6328125 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.89%) |Training time=0.64s (20.13%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.37 epoch: 0|step: 3019|ppo_ep: 1|act_loss: -0.0858154296875|cri_loss: -0.018280029296875|unsuper_loss: 0.0 average reward score: 4.63671875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.94%) |Training time=0.65s (19.97%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.37 epoch: 0|step: 3020|ppo_ep: 1|act_loss: -0.00177001953125|cri_loss: 0.04034423828125|unsuper_loss: 0.0 average reward score: 3.95703125 ------------------------------------------------------------------------------------- |E2E latency=3.42s |Gather latency=0.00s (0.00%) |Generate time=2.57s (75.15%) |Training time=0.65s (18.97%) |Others=0.20 (5.88%)|CurSamplesPerSec=2.34 |AvgSamplesPerSec=2.37 epoch: 0|step: 3021|ppo_ep: 1|act_loss: 0.154052734375|cri_loss: 0.1046142578125|unsuper_loss: 0.0 average reward score: 4.0703125 ------------------------------------------------------------------------------------- |E2E latency=3.41s |Gather latency=0.00s (0.00%) |Generate time=2.53s (74.15%) |Training time=0.68s (20.00%) |Others=0.20 (5.85%)|CurSamplesPerSec=2.34 |AvgSamplesPerSec=2.37 epoch: 0|step: 3022|ppo_ep: 1|act_loss: -0.109619140625|cri_loss: -0.01953125|unsuper_loss: 0.0 average reward score: 3.177734375 ------------------------------------------------------------------------------------- |E2E latency=3.59s |Gather latency=0.00s (0.00%) |Generate time=2.73s (76.12%) |Training time=0.65s (18.11%) |Others=0.21 (5.77%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.37 epoch: 0|step: 3023|ppo_ep: 1|act_loss: -0.140625|cri_loss: -0.046417236328125|unsuper_loss: 0.0 average reward score: 3.7421875 ------------------------------------------------------------------------------------- |E2E latency=3.70s |Gather latency=0.00s (0.00%) |Generate time=2.47s (66.84%) |Training time=0.93s (25.29%) |Others=0.29 (7.87%)|CurSamplesPerSec=2.16 |AvgSamplesPerSec=2.37 epoch: 0|step: 3024|ppo_ep: 1|act_loss: 0.173583984375|cri_loss: 0.1097412109375|unsuper_loss: 0.0 average reward score: 2.48046875 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.40s (72.63%) |Training time=0.71s (21.50%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.37 epoch: 0|step: 3025|ppo_ep: 1|act_loss: -0.043304443359375|cri_loss: -0.003021240234375|unsuper_loss: 0.0 average reward score: 3.09375 ------------------------------------------------------------------------------------- |E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.64s (75.77%) |Training time=0.65s (18.60%) |Others=0.20 (5.64%)|CurSamplesPerSec=2.30 |AvgSamplesPerSec=2.37 epoch: 0|step: 3026|ppo_ep: 1|act_loss: 0.1510009765625|cri_loss: 0.0986328125|unsuper_loss: 0.0 average reward score: 3.046875 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.52%) |Training time=0.64s (19.30%) |Others=0.21 (6.18%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.37 epoch: 0|step: 3027|ppo_ep: 1|act_loss: -0.037506103515625|cri_loss: -0.003753662109375|unsuper_loss: 0.0 average reward score: 4.0546875 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.54%) |Training time=0.65s (19.64%) |Others=0.19 (5.82%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.37 epoch: 0|step: 3028|ppo_ep: 1|act_loss: -0.12939453125|cri_loss: -0.04638671875|unsuper_loss: 0.0 average reward score: 2.986328125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.14%) |Training time=0.65s (19.83%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.37 epoch: 0|step: 3029|ppo_ep: 1|act_loss: -0.008392333984375|cri_loss: 0.015228271484375|unsuper_loss: 0.0 average reward score: 3.6015625 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.80%) |Training time=0.65s (19.34%) |Others=0.20 (5.86%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.37 epoch: 0|step: 3030|ppo_ep: 1|act_loss: 0.1502685546875|cri_loss: 0.096923828125|unsuper_loss: 0.0 average reward score: 2.064453125 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.48%) |Training time=0.65s (19.59%) |Others=0.20 (5.94%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.37 epoch: 0|step: 3031|ppo_ep: 1|act_loss: -0.09539794921875|cri_loss: -0.0196533203125|unsuper_loss: 0.0 average reward score: 2.81640625 ------------------------------------------------------------------------------------- |E2E latency=3.74s |Gather latency=0.00s (0.00%) |Generate time=2.51s (67.31%) |Training time=0.93s (24.90%) |Others=0.29 (7.79%)|CurSamplesPerSec=2.14 |AvgSamplesPerSec=2.37 epoch: 0|step: 3032|ppo_ep: 1|act_loss: 0.49462890625|cri_loss: 0.294921875|unsuper_loss: 0.0 average reward score: 3.8984375 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.82%) |Training time=0.64s (19.19%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.37 epoch: 0|step: 3033|ppo_ep: 1|act_loss: 0.017547607421875|cri_loss: 0.020538330078125|unsuper_loss: 0.0 average reward score: 3.94140625 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.29%) |Training time=0.66s (19.85%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.37 epoch: 0|step: 3034|ppo_ep: 1|act_loss: -0.045196533203125|cri_loss: -0.00640869140625|unsuper_loss: 0.0 average reward score: 4.3671875 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.25%) |Training time=0.66s (19.68%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.37 epoch: 0|step: 3035|ppo_ep: 1|act_loss: 0.18115234375|cri_loss: 0.11627197265625|unsuper_loss: 0.0 average reward score: 3.40625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.35%) |Training time=0.64s (19.67%) |Others=0.20 (5.98%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.37 epoch: 0|step: 3036|ppo_ep: 1|act_loss: 0.062255859375|cri_loss: 0.050048828125|unsuper_loss: 0.0 average reward score: 4.97265625 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.41s (72.35%) |Training time=0.72s (21.58%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.37 epoch: 0|step: 3037|ppo_ep: 1|act_loss: -0.1182861328125|cri_loss: -0.04632568359375|unsuper_loss: 0.0 average reward score: 3.35546875 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.35s (71.67%) |Training time=0.73s (22.39%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.37 epoch: 0|step: 3038|ppo_ep: 1|act_loss: 0.20361328125|cri_loss: 0.135009765625|unsuper_loss: 0.0 average reward score: 3.84375 ------------------------------------------------------------------------------------- |E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.60s (75.12%) |Training time=0.66s (19.15%) |Others=0.20 (5.74%)|CurSamplesPerSec=2.31 |AvgSamplesPerSec=2.37 [2023-04-24 16:38:57,531] [INFO] [logging.py:96:log_dist] [Rank 0] step=380, skipped=5, lr=[1.6184755022860995e-06, 1.6184755022860995e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 16:38:57,780] [INFO] [timer.py:199:stop] epoch=0/micro_step=3040/global_step=380, RunningAvgSamplesPerSec=15.344762104429543, CurrSamplesPerSec=15.197092598193144, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 16:38:57,985] [INFO] [logging.py:96:log_dist] [Rank 0] step=380, skipped=4, lr=[8.23038174651942e-07, 8.23038174651942e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 3039|ppo_ep: 1|act_loss: -0.0811767578125|cri_loss: -0.018218994140625|unsuper_loss: 0.0 average reward score: 3.8203125 ------------------------------------------------------------------------------------- |E2E latency=3.70s |Gather latency=0.00s (0.00%) |Generate time=2.47s (66.75%) |Training time=0.94s (25.42%) |Others=0.29 (7.83%)|CurSamplesPerSec=2.16 |AvgSamplesPerSec=2.37 epoch: 0|step: 3040|ppo_ep: 1|act_loss: -0.189453125|cri_loss: -0.0770263671875|unsuper_loss: 0.0 average reward score: 5.296875 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.67%) |Training time=0.64s (19.24%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.37 epoch: 0|step: 3041|ppo_ep: 1|act_loss: 0.1417236328125|cri_loss: 0.1334228515625|unsuper_loss: 0.0 average reward score: 3.125 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.43s (73.89%) |Training time=0.66s (20.11%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.37 epoch: 0|step: 3042|ppo_ep: 1|act_loss: -0.11151123046875|cri_loss: -0.03204345703125|unsuper_loss: 0.0 average reward score: 4.359375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.74%) |Training time=0.65s (20.00%) |Others=0.20 (6.26%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.37 epoch: 0|step: 3043|ppo_ep: 1|act_loss: -0.113525390625|cri_loss: -0.03912353515625|unsuper_loss: 0.0 average reward score: 3.546875 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.40%) |Training time=0.65s (19.54%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.37 epoch: 0|step: 3044|ppo_ep: 1|act_loss: -0.1416015625|cri_loss: -0.03802490234375|unsuper_loss: 0.0 average reward score: 4.0390625 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.73%) |Training time=0.64s (19.28%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.37 epoch: 0|step: 3045|ppo_ep: 1|act_loss: -0.181396484375|cri_loss: -0.0699462890625|unsuper_loss: 0.0 average reward score: 3.9453125 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.52s (74.76%) |Training time=0.65s (19.40%) |Others=0.20 (5.84%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.37 epoch: 0|step: 3046|ppo_ep: 1|act_loss: 0.2420654296875|cri_loss: 0.1571044921875|unsuper_loss: 0.0 average reward score: 3.26171875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.25%) |Training time=0.65s (19.66%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.37 epoch: 0|step: 3047|ppo_ep: 1|act_loss: -0.06787109375|cri_loss: 0.0001220703125|unsuper_loss: 0.0 average reward score: 3.541015625 ------------------------------------------------------------------------------------- |E2E latency=3.68s |Gather latency=0.00s (0.00%) |Generate time=2.47s (66.99%) |Training time=0.93s (25.23%) |Others=0.29 (7.77%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.37 epoch: 0|step: 3048|ppo_ep: 1|act_loss: -0.1282958984375|cri_loss: -0.041290283203125|unsuper_loss: 0.0 average reward score: 3.59765625 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.45s (72.96%) |Training time=0.71s (21.26%) |Others=0.19 (5.79%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.37 epoch: 0|step: 3049|ppo_ep: 1|act_loss: 0.0615234375|cri_loss: 0.0445556640625|unsuper_loss: 0.0 average reward score: 3.87109375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.11%) |Training time=0.65s (19.88%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.37 epoch: 0|step: 3050|ppo_ep: 1|act_loss: -0.23291015625|cri_loss: -0.08984375|unsuper_loss: 0.0 average reward score: 3.470703125 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.77%) |Training time=0.65s (19.38%) |Others=0.20 (5.85%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.37 epoch: 0|step: 3051|ppo_ep: 1|act_loss: -0.097412109375|cri_loss: -0.030303955078125|unsuper_loss: 0.0 average reward score: 3.55078125 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.56%) |Training time=0.65s (19.42%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.37 epoch: 0|step: 3052|ppo_ep: 1|act_loss: -0.1844482421875|cri_loss: -0.0738525390625|unsuper_loss: 0.0 average reward score: 4.40234375 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.19%) |Training time=0.66s (19.82%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.37 epoch: 0|step: 3053|ppo_ep: 1|act_loss: 0.0280303955078125|cri_loss: 0.03216552734375|unsuper_loss: 0.0 average reward score: 3.87109375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.08%) |Training time=0.64s (19.86%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.37 epoch: 0|step: 3054|ppo_ep: 1|act_loss: -0.2166748046875|cri_loss: -0.088134765625|unsuper_loss: 0.0 average reward score: 4.2890625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.10%) |Training time=0.65s (19.90%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.37 epoch: 0|step: 3055|ppo_ep: 1|act_loss: -0.168212890625|cri_loss: -0.04534912109375|unsuper_loss: 0.0 average reward score: 4.15625 ------------------------------------------------------------------------------------- |E2E latency=3.66s |Gather latency=0.00s (0.00%) |Generate time=2.44s (66.68%) |Training time=0.94s (25.67%) |Others=0.28 (7.66%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.37 epoch: 0|step: 3056|ppo_ep: 1|act_loss: -0.1806640625|cri_loss: -0.0693359375|unsuper_loss: 0.0 average reward score: 5.3515625 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.48%) |Training time=0.64s (19.48%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.37 epoch: 0|step: 3057|ppo_ep: 1|act_loss: -0.182861328125|cri_loss: -0.070556640625|unsuper_loss: 0.0 average reward score: 4.9609375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.87%) |Training time=0.65s (20.06%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.37 epoch: 0|step: 3058|ppo_ep: 1|act_loss: -0.1693115234375|cri_loss: -0.0635986328125|unsuper_loss: 0.0 average reward score: 3.92578125 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.30%) |Training time=0.65s (19.52%) |Others=0.20 (6.18%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.37 epoch: 0|step: 3059|ppo_ep: 1|act_loss: -0.0238037109375|cri_loss: 0.004913330078125|unsuper_loss: 0.0 average reward score: 3.1015625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.73%) |Training time=0.65s (20.13%) |Others=0.20 (6.14%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.37 epoch: 0|step: 3060|ppo_ep: 1|act_loss: -0.052276611328125|cri_loss: -0.013458251953125|unsuper_loss: 0.0 average reward score: 3.568359375 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.51%) |Training time=0.65s (19.42%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.37 epoch: 0|step: 3061|ppo_ep: 1|act_loss: 0.1199951171875|cri_loss: 0.11004638671875|unsuper_loss: 0.0 average reward score: 2.990234375 ------------------------------------------------------------------------------------- |E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.65s (75.39%) |Training time=0.66s (18.83%) |Others=0.20 (5.79%)|CurSamplesPerSec=2.28 |AvgSamplesPerSec=2.37 epoch: 0|step: 3062|ppo_ep: 1|act_loss: -0.0860595703125|cri_loss: -0.024658203125|unsuper_loss: 0.0 average reward score: 3.90625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.52%) |Training time=0.64s (19.59%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.37 [2023-04-24 16:40:18,549] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1 epoch: 0|step: 3063|ppo_ep: 1|act_loss: -0.2425537109375|cri_loss: -0.0986328125|unsuper_loss: 0.0 average reward score: 2.77734375 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.46s (67.97%) |Training time=0.93s (25.80%) |Others=0.22 (6.22%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.37 epoch: 0|step: 3064|ppo_ep: 1|act_loss: -0.003448486328125|cri_loss: 0.0173492431640625|unsuper_loss: 0.0 average reward score: 3.017578125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.45%) |Training time=0.67s (20.61%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.37 epoch: 0|step: 3065|ppo_ep: 1|act_loss: -0.20947265625|cri_loss: -0.0885009765625|unsuper_loss: 0.0 average reward score: 4.52734375 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.51%) |Training time=0.65s (19.49%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.37 epoch: 0|step: 3066|ppo_ep: 1|act_loss: -0.0791015625|cri_loss: -0.02203369140625|unsuper_loss: 0.0 average reward score: 4.06640625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.08%) |Training time=0.64s (19.80%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.37 epoch: 0|step: 3067|ppo_ep: 1|act_loss: -0.17041015625|cri_loss: -0.06591796875|unsuper_loss: 0.0 average reward score: 5.25 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.38%) |Training time=0.64s (19.63%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.37 epoch: 0|step: 3068|ppo_ep: 1|act_loss: -0.1728515625|cri_loss: -0.07177734375|unsuper_loss: 0.0 average reward score: 3.39453125 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.52s (75.15%) |Training time=0.64s (19.08%) |Others=0.19 (5.77%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.37 epoch: 0|step: 3069|ppo_ep: 1|act_loss: -0.2724609375|cri_loss: -0.1131591796875|unsuper_loss: 0.0 average reward score: 4.9765625 ------------------------------------------------------------------------------------- |E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.46s (72.33%) |Training time=0.74s (21.80%) |Others=0.20 (5.87%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.37 epoch: 0|step: 3070|ppo_ep: 1|act_loss: -0.129150390625|cri_loss: -0.04730224609375|unsuper_loss: 0.0 average reward score: 3.59765625 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.61%) |Training time=0.64s (19.45%) |Others=0.20 (5.94%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.37 [2023-04-24 16:40:45,180] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768 epoch: 0|step: 3071|ppo_ep: 1|act_loss: -0.0684814453125|cri_loss: -0.008880615234375|unsuper_loss: 0.0 average reward score: 3.955078125 ------------------------------------------------------------------------------------- |E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.41s (68.04%) |Training time=0.93s (26.23%) |Others=0.20 (5.73%)|CurSamplesPerSec=2.26 |AvgSamplesPerSec=2.37 epoch: 0|step: 3072|ppo_ep: 1|act_loss: -0.000762939453125|cri_loss: 0.0158843994140625|unsuper_loss: 0.0 average reward score: 3.41796875 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.96%) |Training time=0.64s (19.18%) |Others=0.20 (5.86%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.37 epoch: 0|step: 3073|ppo_ep: 1|act_loss: -0.05523681640625|cri_loss: -0.0183868408203125|unsuper_loss: 0.0 average reward score: 4.6015625 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.52%) |Training time=0.65s (19.48%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.37 epoch: 0|step: 3074|ppo_ep: 1|act_loss: 0.141357421875|cri_loss: 0.0870361328125|unsuper_loss: 0.0 average reward score: 3.61328125 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.17%) |Training time=0.66s (19.70%) |Others=0.21 (6.13%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.37 epoch: 0|step: 3075|ppo_ep: 1|act_loss: 0.2841796875|cri_loss: 0.1845703125|unsuper_loss: 0.0 average reward score: 3.56640625 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.43%) |Training time=0.65s (19.31%) |Others=0.21 (6.26%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.37 epoch: 0|step: 3076|ppo_ep: 1|act_loss: 0.193115234375|cri_loss: 0.128173828125|unsuper_loss: 0.0 average reward score: 4.15625 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.42%) |Training time=0.65s (19.65%) |Others=0.20 (5.93%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.37 epoch: 0|step: 3077|ppo_ep: 1|act_loss: 0.052398681640625|cri_loss: 0.032440185546875|unsuper_loss: 0.0 average reward score: 3.70703125 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.75%) |Training time=0.64s (19.42%) |Others=0.19 (5.83%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.37 epoch: 0|step: 3078|ppo_ep: 1|act_loss: 0.1148681640625|cri_loss: 0.0849609375|unsuper_loss: 0.0 average reward score: 3.671875 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.66%) |Training time=0.64s (19.34%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.37 epoch: 0|step: 3079|ppo_ep: 1|act_loss: -0.216552734375|cri_loss: -0.0921630859375|unsuper_loss: 0.0 average reward score: 3.57421875 ------------------------------------------------------------------------------------- |E2E latency=3.73s |Gather latency=0.00s (0.00%) |Generate time=2.52s (67.55%) |Training time=0.93s (25.01%) |Others=0.28 (7.44%)|CurSamplesPerSec=2.15 |AvgSamplesPerSec=2.37 epoch: 0|step: 3080|ppo_ep: 1|act_loss: 0.012481689453125|cri_loss: 0.034088134765625|unsuper_loss: 0.0 average reward score: 3.39453125 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.68%) |Training time=0.63s (19.37%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.37 epoch: 0|step: 3081|ppo_ep: 1|act_loss: -0.05865478515625|cri_loss: -0.01873779296875|unsuper_loss: 0.0 average reward score: 4.109375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.37%) |Training time=0.64s (19.72%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.37 epoch: 0|step: 3082|ppo_ep: 1|act_loss: -0.110107421875|cri_loss: -0.044403076171875|unsuper_loss: 0.0 average reward score: 4.0625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.32%) |Training time=0.64s (19.78%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.37 epoch: 0|step: 3083|ppo_ep: 1|act_loss: -0.010467529296875|cri_loss: 0.00160980224609375|unsuper_loss: 0.0 average reward score: 3.287109375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.15%) |Training time=0.64s (19.76%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.37 epoch: 0|step: 3084|ppo_ep: 1|act_loss: 0.002105712890625|cri_loss: 0.0195159912109375|unsuper_loss: 0.0 average reward score: 3.93359375 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.45s (73.95%) |Training time=0.65s (19.57%) |Others=0.22 (6.48%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.37 epoch: 0|step: 3085|ppo_ep: 1|act_loss: -0.08740234375|cri_loss: -0.037872314453125|unsuper_loss: 0.0 average reward score: 5.3671875 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.43s (73.50%) |Training time=0.67s (20.34%) |Others=0.20 (6.16%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.37 epoch: 0|step: 3086|ppo_ep: 1|act_loss: 0.236572265625|cri_loss: 0.1470947265625|unsuper_loss: 0.0 average reward score: 5.36328125 ------------------------------------------------------------------------------------- |E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.50%) |Training time=0.65s (19.22%) |Others=0.21 (6.29%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.37 epoch: 0|step: 3087|ppo_ep: 1|act_loss: -0.135986328125|cri_loss: -0.049285888671875|unsuper_loss: 0.0 average reward score: 4.5625 ------------------------------------------------------------------------------------- |E2E latency=3.69s |Gather latency=0.00s (0.00%) |Generate time=2.49s (67.50%) |Training time=0.93s (25.09%) |Others=0.27 (7.40%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.37 epoch: 0|step: 3088|ppo_ep: 1|act_loss: -0.03094482421875|cri_loss: -0.00852203369140625|unsuper_loss: 0.0 average reward score: 4.265625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.99%) |Training time=0.65s (19.94%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.37 epoch: 0|step: 3089|ppo_ep: 1|act_loss: 0.1529541015625|cri_loss: 0.1036376953125|unsuper_loss: 0.0 average reward score: 3.6015625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.53%) |Training time=0.66s (20.27%) |Others=0.20 (6.20%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.37 epoch: 0|step: 3090|ppo_ep: 1|act_loss: -0.049957275390625|cri_loss: -0.009002685546875|unsuper_loss: 0.0 average reward score: 3.91796875 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.67%) |Training time=0.64s (19.41%) |Others=0.20 (5.93%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.37 epoch: 0|step: 3091|ppo_ep: 1|act_loss: -0.0556640625|cri_loss: -0.018646240234375|unsuper_loss: 0.0 average reward score: 4.359375 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.63%) |Training time=0.64s (19.33%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.37 epoch: 0|step: 3092|ppo_ep: 1|act_loss: -0.02471923828125|cri_loss: -0.004608154296875|unsuper_loss: 0.0 average reward score: 4.15234375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.02%) |Training time=0.65s (19.84%) |Others=0.20 (6.14%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.37 epoch: 0|step: 3093|ppo_ep: 1|act_loss: -0.0029754638671875|cri_loss: 0.00774383544921875|unsuper_loss: 0.0 average reward score: 4.55859375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.45%) |Training time=0.64s (19.53%) |Others=0.20 (6.02%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.37 epoch: 0|step: 3094|ppo_ep: 1|act_loss: -0.072021484375|cri_loss: -0.025787353515625|unsuper_loss: 0.0 average reward score: 4.2421875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.63%) |Training time=0.65s (20.05%) |Others=0.20 (6.32%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.37 epoch: 0|step: 3095|ppo_ep: 1|act_loss: 0.0660400390625|cri_loss: 0.04388427734375|unsuper_loss: 0.0 average reward score: 4.16796875 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.44s (66.94%) |Training time=0.92s (25.29%) |Others=0.28 (7.77%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.37 epoch: 0|step: 3096|ppo_ep: 1|act_loss: 0.2186279296875|cri_loss: 0.12115478515625|unsuper_loss: 0.0 average reward score: 3.634765625 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.92%) |Training time=0.64s (20.07%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.37 epoch: 0|step: 3097|ppo_ep: 1|act_loss: -0.0810546875|cri_loss: -0.0240478515625|unsuper_loss: 0.0 average reward score: 4.07421875 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.06%) |Training time=0.64s (19.23%) |Others=0.22 (6.71%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.37 epoch: 0|step: 3098|ppo_ep: 1|act_loss: -0.0224456787109375|cri_loss: -0.0047607421875|unsuper_loss: 0.0 average reward score: 3.9140625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.27%) |Training time=0.65s (19.73%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.37 epoch: 0|step: 3099|ppo_ep: 1|act_loss: -0.10833740234375|cri_loss: -0.0430908203125|unsuper_loss: 0.0 average reward score: 4.640625 ------------------------------------------------------------------------------------- |E2E latency=3.42s |Gather latency=0.00s (0.00%) |Generate time=2.56s (75.04%) |Training time=0.64s (18.83%) |Others=0.21 (6.12%)|CurSamplesPerSec=2.34 |AvgSamplesPerSec=2.37 epoch: 0|step: 3100|ppo_ep: 1|act_loss: -0.00347900390625|cri_loss: 0.003871917724609375|unsuper_loss: 0.0 average reward score: 4.5 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.51s (75.00%) |Training time=0.64s (19.12%) |Others=0.20 (5.88%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.37 epoch: 0|step: 3101|ppo_ep: 1|act_loss: 0.04864501953125|cri_loss: 0.031219482421875|unsuper_loss: 0.0 average reward score: 3.40625 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.66%) |Training time=0.64s (19.37%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.37 epoch: 0|step: 3102|ppo_ep: 1|act_loss: 0.115966796875|cri_loss: 0.07708740234375|unsuper_loss: 0.0 average reward score: 4.0234375 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.31%) |Training time=0.65s (19.41%) |Others=0.21 (6.28%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.37 epoch: 0|step: 3103|ppo_ep: 1|act_loss: 0.01482391357421875|cri_loss: 0.01462554931640625|unsuper_loss: 0.0 average reward score: 4.57421875 ------------------------------------------------------------------------------------- |E2E latency=3.68s |Gather latency=0.00s (0.00%) |Generate time=2.46s (66.99%) |Training time=0.93s (25.34%) |Others=0.28 (7.68%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.37 epoch: 0|step: 3104|ppo_ep: 1|act_loss: 0.137451171875|cri_loss: 0.086669921875|unsuper_loss: 0.0 average reward score: 2.353515625 ------------------------------------------------------------------------------------- |E2E latency=4.75s |Gather latency=0.00s (0.00%) |Generate time=3.86s (81.43%) |Training time=0.68s (14.34%) |Others=0.20 (4.23%)|CurSamplesPerSec=1.69 |AvgSamplesPerSec=2.37 epoch: 0|step: 3105|ppo_ep: 1|act_loss: 0.12890625|cri_loss: 0.07373046875|unsuper_loss: 0.0 average reward score: 4.671875 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.38%) |Training time=0.64s (19.64%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.37 epoch: 0|step: 3106|ppo_ep: 1|act_loss: 0.050872802734375|cri_loss: 0.0377197265625|unsuper_loss: 0.0 average reward score: 4.640625 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.74%) |Training time=0.65s (19.40%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.37 epoch: 0|step: 3107|ppo_ep: 1|act_loss: 0.1376953125|cri_loss: 0.077880859375|unsuper_loss: 0.0 average reward score: 4.5234375 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.55%) |Training time=0.65s (19.53%) |Others=0.20 (5.92%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.37 epoch: 0|step: 3108|ppo_ep: 1|act_loss: 0.0887451171875|cri_loss: 0.051300048828125|unsuper_loss: 0.0 average reward score: 4.5390625 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.49%) |Training time=0.64s (19.54%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.37 epoch: 0|step: 3109|ppo_ep: 1|act_loss: 0.045257568359375|cri_loss: 0.0323486328125|unsuper_loss: 0.0 average reward score: 4.09375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.86%) |Training time=0.65s (20.06%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.37 epoch: 0|step: 3110|ppo_ep: 1|act_loss: 0.076416015625|cri_loss: 0.04754638671875|unsuper_loss: 0.0 average reward score: 4.5078125 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.51%) |Training time=0.65s (19.48%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.37 epoch: 0|step: 3111|ppo_ep: 1|act_loss: 0.10723876953125|cri_loss: 0.07452392578125|unsuper_loss: 0.0 average reward score: 1.890625 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.43s (66.57%) |Training time=0.94s (25.82%) |Others=0.28 (7.61%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.37 epoch: 0|step: 3112|ppo_ep: 1|act_loss: 0.0280303955078125|cri_loss: 0.0202789306640625|unsuper_loss: 0.0 average reward score: 3.11328125 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.85%) |Training time=0.65s (19.43%) |Others=0.19 (5.72%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.37 epoch: 0|step: 3113|ppo_ep: 1|act_loss: 0.0265350341796875|cri_loss: 0.020599365234375|unsuper_loss: 0.0 average reward score: 2.841796875 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.11%) |Training time=0.65s (19.90%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.37 epoch: 0|step: 3114|ppo_ep: 1|act_loss: 0.0091400146484375|cri_loss: 0.016632080078125|unsuper_loss: 0.0 average reward score: 3.97265625 ------------------------------------------------------------------------------------- |E2E latency=3.42s |Gather latency=0.00s (0.00%) |Generate time=2.58s (75.37%) |Training time=0.65s (18.85%) |Others=0.20 (5.78%)|CurSamplesPerSec=2.34 |AvgSamplesPerSec=2.37 epoch: 0|step: 3115|ppo_ep: 1|act_loss: 0.0943603515625|cri_loss: 0.05908203125|unsuper_loss: 0.0 average reward score: 4.40234375 ------------------------------------------------------------------------------------- |E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.52s (72.17%) |Training time=0.78s (22.28%) |Others=0.19 (5.55%)|CurSamplesPerSec=2.29 |AvgSamplesPerSec=2.37 epoch: 0|step: 3116|ppo_ep: 1|act_loss: 0.147216796875|cri_loss: 0.09710693359375|unsuper_loss: 0.0 average reward score: 2.91796875 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.31%) |Training time=0.65s (19.59%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.37 epoch: 0|step: 3117|ppo_ep: 1|act_loss: 0.08349609375|cri_loss: 0.058807373046875|unsuper_loss: 0.0 average reward score: 5.0234375 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.76%) |Training time=0.65s (19.42%) |Others=0.19 (5.83%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.37 epoch: 0|step: 3118|ppo_ep: 1|act_loss: 0.079833984375|cri_loss: 0.051361083984375|unsuper_loss: 0.0 average reward score: 3.986328125 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.24%) |Training time=0.65s (19.64%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.37 [2023-04-24 16:43:27,284] [INFO] [logging.py:96:log_dist] [Rank 0] step=390, skipped=5, lr=[1.3287709038983694e-06, 1.3287709038983694e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 16:43:27,531] [INFO] [timer.py:199:stop] epoch=0/micro_step=3120/global_step=390, RunningAvgSamplesPerSec=15.353909793791889, CurrSamplesPerSec=15.303279662820469, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 16:43:27,738] [INFO] [logging.py:96:log_dist] [Rank 0] step=390, skipped=6, lr=[7.029407903051771e-07, 7.029407903051771e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 3119|ppo_ep: 1|act_loss: 0.047119140625|cri_loss: 0.03424072265625|unsuper_loss: 0.0 average reward score: 4.18359375 ------------------------------------------------------------------------------------- |E2E latency=3.66s |Gather latency=0.00s (0.00%) |Generate time=2.45s (67.00%) |Training time=0.93s (25.28%) |Others=0.28 (7.73%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.37 epoch: 0|step: 3120|ppo_ep: 1|act_loss: -0.0090484619140625|cri_loss: 0.000732421875|unsuper_loss: 0.0 average reward score: 4.4765625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.49%) |Training time=0.64s (19.60%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.37 epoch: 0|step: 3121|ppo_ep: 1|act_loss: -0.018035888671875|cri_loss: -0.00264739990234375|unsuper_loss: 0.0 average reward score: 4.0703125 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.29%) |Training time=0.65s (19.40%) |Others=0.21 (6.31%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.37 epoch: 0|step: 3122|ppo_ep: 1|act_loss: 0.0247802734375|cri_loss: 0.016265869140625|unsuper_loss: 0.0 average reward score: 5.03125 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.65%) |Training time=0.66s (19.54%) |Others=0.19 (5.80%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.37 epoch: 0|step: 3123|ppo_ep: 1|act_loss: 0.144775390625|cri_loss: 0.08636474609375|unsuper_loss: 0.0 average reward score: 3.693359375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.42%) |Training time=0.64s (19.65%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.37 epoch: 0|step: 3124|ppo_ep: 1|act_loss: 0.158447265625|cri_loss: 0.090087890625|unsuper_loss: 0.0 average reward score: 4.0859375 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.48s (73.78%) |Training time=0.66s (19.65%) |Others=0.22 (6.56%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.37 epoch: 0|step: 3125|ppo_ep: 1|act_loss: 0.237060546875|cri_loss: 0.1414794921875|unsuper_loss: 0.0 average reward score: 2.28515625 ------------------------------------------------------------------------------------- |E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.53s (74.89%) |Training time=0.65s (19.25%) |Others=0.20 (5.86%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.37 epoch: 0|step: 3126|ppo_ep: 1|act_loss: 0.0162506103515625|cri_loss: 0.020050048828125|unsuper_loss: 0.0 average reward score: 4.0390625 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.16%) |Training time=0.65s (19.66%) |Others=0.20 (6.18%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.37 epoch: 0|step: 3127|ppo_ep: 1|act_loss: 0.28466796875|cri_loss: 0.1591796875|unsuper_loss: 0.0 average reward score: 3.185546875 ------------------------------------------------------------------------------------- |E2E latency=3.70s |Gather latency=0.00s (0.00%) |Generate time=2.45s (66.29%) |Training time=0.94s (25.46%) |Others=0.31 (8.26%)|CurSamplesPerSec=2.16 |AvgSamplesPerSec=2.37 epoch: 0|step: 3128|ppo_ep: 1|act_loss: 0.08642578125|cri_loss: 0.05316162109375|unsuper_loss: 0.0 average reward score: 4.47265625 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.05%) |Training time=0.66s (19.60%) |Others=0.21 (6.35%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.37 epoch: 0|step: 3129|ppo_ep: 1|act_loss: 0.0263214111328125|cri_loss: 0.026611328125|unsuper_loss: 0.0 average reward score: 4.5234375 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.50%) |Training time=0.65s (19.50%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.37 epoch: 0|step: 3130|ppo_ep: 1|act_loss: -0.0218048095703125|cri_loss: -0.0016326904296875|unsuper_loss: 0.0 average reward score: 4.56640625 ------------------------------------------------------------------------------------- |E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.52s (74.44%) |Training time=0.65s (19.32%) |Others=0.21 (6.24%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.37 epoch: 0|step: 3131|ppo_ep: 1|act_loss: 0.022735595703125|cri_loss: 0.02294921875|unsuper_loss: 0.0 average reward score: 3.69140625 ------------------------------------------------------------------------------------- |E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.53s (74.79%) |Training time=0.65s (19.11%) |Others=0.21 (6.10%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.37 epoch: 0|step: 3132|ppo_ep: 1|act_loss: 0.066650390625|cri_loss: 0.043212890625|unsuper_loss: 0.0 average reward score: 3.89453125 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.48s (73.91%) |Training time=0.65s (19.25%) |Others=0.23 (6.84%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.37 epoch: 0|step: 3133|ppo_ep: 1|act_loss: -0.2025146484375|cri_loss: -0.0733642578125|unsuper_loss: 0.0 average reward score: 4.0859375 ------------------------------------------------------------------------------------- |E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.53s (74.67%) |Training time=0.64s (19.03%) |Others=0.21 (6.30%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.37 epoch: 0|step: 3134|ppo_ep: 1|act_loss: 0.0633544921875|cri_loss: 0.04693603515625|unsuper_loss: 0.0 average reward score: 3.8984375 ------------------------------------------------------------------------------------- |E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.55s (75.17%) |Training time=0.65s (19.02%) |Others=0.20 (5.80%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.37 epoch: 0|step: 3135|ppo_ep: 1|act_loss: 0.077392578125|cri_loss: 0.051239013671875|unsuper_loss: 0.0 average reward score: 3.09765625 ------------------------------------------------------------------------------------- |E2E latency=3.71s |Gather latency=0.00s (0.00%) |Generate time=2.48s (66.99%) |Training time=0.93s (25.01%) |Others=0.30 (8.00%)|CurSamplesPerSec=2.16 |AvgSamplesPerSec=2.37 epoch: 0|step: 3136|ppo_ep: 1|act_loss: -0.1051025390625|cri_loss: -0.042755126953125|unsuper_loss: 0.0 average reward score: 4.3515625 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.45s (73.28%) |Training time=0.69s (20.51%) |Others=0.21 (6.22%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.37 epoch: 0|step: 3137|ppo_ep: 1|act_loss: -0.0139007568359375|cri_loss: 0.002105712890625|unsuper_loss: 0.0 average reward score: 3.615234375 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.31%) |Training time=0.65s (19.43%) |Others=0.21 (6.26%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.37 epoch: 0|step: 3138|ppo_ep: 1|act_loss: -0.0728759765625|cri_loss: -0.005126953125|unsuper_loss: 0.0 average reward score: 2.94921875 ------------------------------------------------------------------------------------- |E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.48s (73.21%) |Training time=0.70s (20.63%) |Others=0.21 (6.16%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.37 epoch: 0|step: 3139|ppo_ep: 1|act_loss: 0.00634002685546875|cri_loss: 0.0174560546875|unsuper_loss: 0.0 average reward score: 3.5546875 ------------------------------------------------------------------------------------- |E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.69s (76.05%) |Training time=0.64s (18.12%) |Others=0.21 (5.83%)|CurSamplesPerSec=2.26 |AvgSamplesPerSec=2.37 epoch: 0|step: 3140|ppo_ep: 1|act_loss: -0.08734130859375|cri_loss: -0.00653076171875|unsuper_loss: 0.0 average reward score: 1.7177734375 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.37%) |Training time=0.64s (19.47%) |Others=0.20 (6.16%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.37 epoch: 0|step: 3141|ppo_ep: 1|act_loss: 0.019439697265625|cri_loss: 0.027374267578125|unsuper_loss: 0.0 average reward score: 2.955078125 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.37%) |Training time=0.64s (19.27%) |Others=0.21 (6.36%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.37 epoch: 0|step: 3142|ppo_ep: 1|act_loss: 0.105712890625|cri_loss: 0.067138671875|unsuper_loss: 0.0 average reward score: 2.029296875 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.47s (73.80%) |Training time=0.66s (19.67%) |Others=0.22 (6.54%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.37 epoch: 0|step: 3143|ppo_ep: 1|act_loss: 0.057159423828125|cri_loss: 0.03741455078125|unsuper_loss: 0.0 average reward score: 3.05078125 ------------------------------------------------------------------------------------- |E2E latency=3.70s |Gather latency=0.00s (0.00%) |Generate time=2.47s (66.78%) |Training time=0.94s (25.33%) |Others=0.29 (7.89%)|CurSamplesPerSec=2.16 |AvgSamplesPerSec=2.37 epoch: 0|step: 3144|ppo_ep: 1|act_loss: -0.025421142578125|cri_loss: 0.0011749267578125|unsuper_loss: 0.0 average reward score: 4.41796875 ------------------------------------------------------------------------------------- |E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.53s (74.83%) |Training time=0.64s (18.89%) |Others=0.21 (6.29%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.37 epoch: 0|step: 3145|ppo_ep: 1|act_loss: 0.2117919921875|cri_loss: 0.1519775390625|unsuper_loss: 0.0 average reward score: 3.515625 ------------------------------------------------------------------------------------- |E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.53s (74.70%) |Training time=0.65s (19.07%) |Others=0.21 (6.23%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.37 epoch: 0|step: 3146|ppo_ep: 1|act_loss: -0.09405517578125|cri_loss: -0.040374755859375|unsuper_loss: 0.0 average reward score: 5.078125 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.43s (72.99%) |Training time=0.69s (20.66%) |Others=0.21 (6.35%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.37 epoch: 0|step: 3147|ppo_ep: 1|act_loss: -0.17041015625|cri_loss: -0.063232421875|unsuper_loss: 0.0 average reward score: 3.947265625 ------------------------------------------------------------------------------------- |E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.51s (73.75%) |Training time=0.69s (20.20%) |Others=0.21 (6.05%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.37 epoch: 0|step: 3148|ppo_ep: 1|act_loss: -0.0008087158203125|cri_loss: 0.0134124755859375|unsuper_loss: 0.0 average reward score: 4.0078125 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.36s (71.49%) |Training time=0.74s (22.29%) |Others=0.21 (6.22%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.37 epoch: 0|step: 3149|ppo_ep: 1|act_loss: -0.033477783203125|cri_loss: -0.0077972412109375|unsuper_loss: 0.0 average reward score: 2.91015625 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.61%) |Training time=0.64s (19.32%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.37 epoch: 0|step: 3150|ppo_ep: 1|act_loss: 0.30859375|cri_loss: 0.1826171875|unsuper_loss: 0.0 average reward score: 1.93359375 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.17%) |Training time=0.65s (19.61%) |Others=0.21 (6.21%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.37 epoch: 0|step: 3151|ppo_ep: 1|act_loss: 0.166259765625|cri_loss: 0.10247802734375|unsuper_loss: 0.0 average reward score: 2.646484375 ------------------------------------------------------------------------------------- |E2E latency=3.70s |Gather latency=0.00s (0.00%) |Generate time=2.47s (66.74%) |Training time=0.93s (25.14%) |Others=0.30 (8.12%)|CurSamplesPerSec=2.16 |AvgSamplesPerSec=2.37 epoch: 0|step: 3152|ppo_ep: 1|act_loss: 0.0145721435546875|cri_loss: 0.04217529296875|unsuper_loss: 0.0 average reward score: 3.275390625 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.56%) |Training time=0.64s (19.36%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.37 epoch: 0|step: 3153|ppo_ep: 1|act_loss: -0.1922607421875|cri_loss: -0.07586669921875|unsuper_loss: 0.0 average reward score: 4.984375 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.27%) |Training time=0.66s (19.77%) |Others=0.20 (5.96%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.37 epoch: 0|step: 3154|ppo_ep: 1|act_loss: -0.021759033203125|cri_loss: -0.00476837158203125|unsuper_loss: 0.0 average reward score: 4.2890625 ------------------------------------------------------------------------------------- |E2E latency=3.41s |Gather latency=0.00s (0.00%) |Generate time=2.54s (74.52%) |Training time=0.65s (19.15%) |Others=0.22 (6.33%)|CurSamplesPerSec=2.34 |AvgSamplesPerSec=2.37 epoch: 0|step: 3155|ppo_ep: 1|act_loss: -0.08154296875|cri_loss: -0.013214111328125|unsuper_loss: 0.0 average reward score: 4.109375 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.26%) |Training time=0.68s (20.53%) |Others=0.20 (6.21%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.37 epoch: 0|step: 3156|ppo_ep: 1|act_loss: -0.1641845703125|cri_loss: -0.073486328125|unsuper_loss: 0.0 average reward score: 4.59765625 ------------------------------------------------------------------------------------- |E2E latency=6.75s |Gather latency=0.00s (0.00%) |Generate time=2.46s (36.50%) |Training time=3.56s (52.76%) |Others=0.72 (10.74%)|CurSamplesPerSec=1.19 |AvgSamplesPerSec=2.37 epoch: 0|step: 3157|ppo_ep: 1|act_loss: 0.01271820068359375|cri_loss: 0.01079559326171875|unsuper_loss: 0.0 average reward score: 4.69140625 ------------------------------------------------------------------------------------- |E2E latency=4.01s |Gather latency=0.00s (0.00%) |Generate time=3.03s (75.56%) |Training time=0.67s (16.69%) |Others=0.31 (7.75%)|CurSamplesPerSec=1.99 |AvgSamplesPerSec=2.37 epoch: 0|step: 3158|ppo_ep: 1|act_loss: 0.017974853515625|cri_loss: 0.016998291015625|unsuper_loss: 0.0 average reward score: 5.03125 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.25%) |Training time=0.64s (19.07%) |Others=0.22 (6.68%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.37 epoch: 0|step: 3159|ppo_ep: 1|act_loss: -0.1033935546875|cri_loss: -0.027923583984375|unsuper_loss: 0.0 average reward score: 3.45703125 ------------------------------------------------------------------------------------- |E2E latency=3.74s |Gather latency=0.00s (0.00%) |Generate time=2.50s (66.97%) |Training time=0.94s (25.27%) |Others=0.29 (7.76%)|CurSamplesPerSec=2.14 |AvgSamplesPerSec=2.37 epoch: 0|step: 3160|ppo_ep: 1|act_loss: -0.153076171875|cri_loss: -0.065185546875|unsuper_loss: 0.0 average reward score: 4.3984375 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.44%) |Training time=0.64s (19.19%) |Others=0.21 (6.37%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.37 epoch: 0|step: 3161|ppo_ep: 1|act_loss: -0.0474853515625|cri_loss: -0.003631591796875|unsuper_loss: 0.0 average reward score: 3.53515625 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.38s (71.99%) |Training time=0.72s (21.79%) |Others=0.21 (6.22%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.37 epoch: 0|step: 3162|ppo_ep: 1|act_loss: -0.033172607421875|cri_loss: 0.004852294921875|unsuper_loss: 0.0 average reward score: 3.197265625 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.75%) |Training time=0.65s (19.23%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.37 epoch: 0|step: 3163|ppo_ep: 1|act_loss: -0.0302734375|cri_loss: 0.002899169921875|unsuper_loss: 0.0 average reward score: 4.9296875 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.52%) |Training time=0.65s (19.53%) |Others=0.20 (5.95%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.37 epoch: 0|step: 3164|ppo_ep: 1|act_loss: -0.0204315185546875|cri_loss: -0.003173828125|unsuper_loss: 0.0 average reward score: 4.12890625 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.00%) |Training time=0.64s (19.46%) |Others=0.22 (6.55%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.37 epoch: 0|step: 3165|ppo_ep: 1|act_loss: -0.035980224609375|cri_loss: 0.007537841796875|unsuper_loss: 0.0 average reward score: 3.560546875 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.42%) |Training time=0.65s (19.36%) |Others=0.21 (6.22%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.37 epoch: 0|step: 3166|ppo_ep: 1|act_loss: -0.013580322265625|cri_loss: 0.019927978515625|unsuper_loss: 0.0 average reward score: 3.126953125 ------------------------------------------------------------------------------------- |E2E latency=3.42s |Gather latency=0.00s (0.00%) |Generate time=2.55s (74.45%) |Training time=0.66s (19.43%) |Others=0.21 (6.12%)|CurSamplesPerSec=2.34 |AvgSamplesPerSec=2.37 epoch: 0|step: 3167|ppo_ep: 1|act_loss: -0.0906982421875|cri_loss: -0.026275634765625|unsuper_loss: 0.0 average reward score: 4.62109375 ------------------------------------------------------------------------------------- |E2E latency=3.69s |Gather latency=0.00s (0.00%) |Generate time=2.48s (67.27%) |Training time=0.92s (24.99%) |Others=0.29 (7.74%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.37 epoch: 0|step: 3168|ppo_ep: 1|act_loss: 0.052734375|cri_loss: 0.060150146484375|unsuper_loss: 0.0 average reward score: 3.98828125 ------------------------------------------------------------------------------------- |E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.54s (74.82%) |Training time=0.64s (18.90%) |Others=0.21 (6.27%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.37 epoch: 0|step: 3169|ppo_ep: 1|act_loss: -0.0506591796875|cri_loss: -0.009796142578125|unsuper_loss: 0.0 average reward score: 4.4375 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.23%) |Training time=0.66s (19.72%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.37 epoch: 0|step: 3170|ppo_ep: 1|act_loss: -0.0997314453125|cri_loss: -0.02392578125|unsuper_loss: 0.0 average reward score: 3.990234375 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.35%) |Training time=0.64s (19.32%) |Others=0.21 (6.33%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.37 epoch: 0|step: 3171|ppo_ep: 1|act_loss: -0.1160888671875|cri_loss: -0.040618896484375|unsuper_loss: 0.0 average reward score: 2.1796875 ------------------------------------------------------------------------------------- |E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.53s (74.79%) |Training time=0.64s (18.98%) |Others=0.21 (6.23%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.37 epoch: 0|step: 3172|ppo_ep: 1|act_loss: -0.087890625|cri_loss: -0.03082275390625|unsuper_loss: 0.0 average reward score: 3.23828125 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.43%) |Training time=0.64s (19.07%) |Others=0.22 (6.49%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.37 epoch: 0|step: 3173|ppo_ep: 1|act_loss: -0.0018768310546875|cri_loss: 0.0104827880859375|unsuper_loss: 0.0 average reward score: 4.6875 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.34%) |Training time=0.64s (19.49%) |Others=0.20 (6.17%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.37 epoch: 0|step: 3174|ppo_ep: 1|act_loss: 0.00799560546875|cri_loss: 0.03271484375|unsuper_loss: 0.0 average reward score: 2.64453125 ------------------------------------------------------------------------------------- |E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.52s (74.24%) |Training time=0.65s (19.19%) |Others=0.22 (6.58%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.37 epoch: 0|step: 3175|ppo_ep: 1|act_loss: 0.12841796875|cri_loss: 0.07794189453125|unsuper_loss: 0.0 average reward score: 3.0546875 ------------------------------------------------------------------------------------- |E2E latency=3.77s |Gather latency=0.00s (0.00%) |Generate time=2.54s (67.37%) |Training time=0.94s (24.95%) |Others=0.29 (7.68%)|CurSamplesPerSec=2.12 |AvgSamplesPerSec=2.37 epoch: 0|step: 3176|ppo_ep: 1|act_loss: -0.08428955078125|cri_loss: -0.0255126953125|unsuper_loss: 0.0 average reward score: 3.755859375 ------------------------------------------------------------------------------------- |E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.65s (75.39%) |Training time=0.65s (18.59%) |Others=0.21 (6.02%)|CurSamplesPerSec=2.28 |AvgSamplesPerSec=2.37 epoch: 0|step: 3177|ppo_ep: 1|act_loss: -0.125|cri_loss: -0.053070068359375|unsuper_loss: 0.0 average reward score: 3.625 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.44s (73.25%) |Training time=0.69s (20.61%) |Others=0.20 (6.14%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.37 epoch: 0|step: 3178|ppo_ep: 1|act_loss: 0.20166015625|cri_loss: 0.1307373046875|unsuper_loss: 0.0 average reward score: 2.630859375 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.37%) |Training time=0.65s (19.30%) |Others=0.21 (6.33%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.37 epoch: 0|step: 3179|ppo_ep: 1|act_loss: -0.0477294921875|cri_loss: -0.0170135498046875|unsuper_loss: 0.0 average reward score: 4.0546875 ------------------------------------------------------------------------------------- |E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.54s (74.76%) |Training time=0.64s (18.90%) |Others=0.22 (6.34%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.37 epoch: 0|step: 3180|ppo_ep: 1|act_loss: -0.12005615234375|cri_loss: -0.03912353515625|unsuper_loss: 0.0 average reward score: 3.89453125 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.02%) |Training time=0.65s (19.59%) |Others=0.21 (6.39%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.37 epoch: 0|step: 3181|ppo_ep: 1|act_loss: -0.07110595703125|cri_loss: -0.0272674560546875|unsuper_loss: 0.0 average reward score: 3.81640625 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.34%) |Training time=0.66s (19.55%) |Others=0.21 (6.12%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.37 epoch: 0|step: 3182|ppo_ep: 1|act_loss: 0.07568359375|cri_loss: 0.07305908203125|unsuper_loss: 0.0 average reward score: 3.662109375 ------------------------------------------------------------------------------------- |E2E latency=3.43s |Gather latency=0.00s (0.00%) |Generate time=2.56s (74.66%) |Training time=0.65s (19.00%) |Others=0.22 (6.34%)|CurSamplesPerSec=2.33 |AvgSamplesPerSec=2.37 epoch: 0|step: 3183|ppo_ep: 1|act_loss: -0.1328125|cri_loss: -0.048675537109375|unsuper_loss: 0.0 average reward score: 3.083984375 ------------------------------------------------------------------------------------- |E2E latency=3.70s |Gather latency=0.00s (0.00%) |Generate time=2.48s (66.91%) |Training time=0.93s (25.05%) |Others=0.30 (8.04%)|CurSamplesPerSec=2.16 |AvgSamplesPerSec=2.37 epoch: 0|step: 3184|ppo_ep: 1|act_loss: -0.0479736328125|cri_loss: -0.0114898681640625|unsuper_loss: 0.0 average reward score: 4.3671875 ------------------------------------------------------------------------------------- |E2E latency=3.44s |Gather latency=0.00s (0.00%) |Generate time=2.57s (74.73%) |Training time=0.66s (19.12%) |Others=0.21 (6.15%)|CurSamplesPerSec=2.33 |AvgSamplesPerSec=2.37 epoch: 0|step: 3185|ppo_ep: 1|act_loss: -0.078369140625|cri_loss: -0.0289306640625|unsuper_loss: 0.0 average reward score: 3.71875 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.07%) |Training time=0.65s (19.34%) |Others=0.22 (6.59%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.37 epoch: 0|step: 3186|ppo_ep: 1|act_loss: -0.0291900634765625|cri_loss: -0.0019378662109375|unsuper_loss: 0.0 average reward score: 1.95703125 ------------------------------------------------------------------------------------- |E2E latency=3.42s |Gather latency=0.00s (0.00%) |Generate time=2.56s (74.84%) |Training time=0.65s (19.03%) |Others=0.21 (6.12%)|CurSamplesPerSec=2.34 |AvgSamplesPerSec=2.37 epoch: 0|step: 3187|ppo_ep: 1|act_loss: 0.00335693359375|cri_loss: 0.01486968994140625|unsuper_loss: 0.0 average reward score: 3.83203125 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.45%) |Training time=0.64s (19.24%) |Others=0.21 (6.31%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.37 epoch: 0|step: 3188|ppo_ep: 1|act_loss: -0.0902099609375|cri_loss: -0.032318115234375|unsuper_loss: 0.0 average reward score: 3.1328125 ------------------------------------------------------------------------------------- |E2E latency=3.42s |Gather latency=0.00s (0.00%) |Generate time=2.56s (74.82%) |Training time=0.65s (19.03%) |Others=0.21 (6.15%)|CurSamplesPerSec=2.34 |AvgSamplesPerSec=2.37 epoch: 0|step: 3189|ppo_ep: 1|act_loss: -0.1190185546875|cri_loss: -0.050994873046875|unsuper_loss: 0.0 average reward score: 4.18359375 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.17%) |Training time=0.65s (19.71%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.37 epoch: 0|step: 3190|ppo_ep: 1|act_loss: -0.110595703125|cri_loss: -0.048583984375|unsuper_loss: 0.0 average reward score: 4.109375 ------------------------------------------------------------------------------------- |E2E latency=6.40s |Gather latency=0.00s (0.00%) |Generate time=2.47s (38.55%) |Training time=2.71s (42.35%) |Others=1.22 (19.10%)|CurSamplesPerSec=1.25 |AvgSamplesPerSec=2.37 epoch: 0|step: 3191|ppo_ep: 1|act_loss: -0.1083984375|cri_loss: -0.03857421875|unsuper_loss: 0.0 average reward score: 3.984375 ------------------------------------------------------------------------------------- |E2E latency=4.08s |Gather latency=0.00s (0.00%) |Generate time=2.68s (65.73%) |Training time=0.99s (24.39%) |Others=0.40 (9.88%)|CurSamplesPerSec=1.96 |AvgSamplesPerSec=2.36 epoch: 0|step: 3192|ppo_ep: 1|act_loss: -0.05841064453125|cri_loss: -0.0142974853515625|unsuper_loss: 0.0 average reward score: 3.98046875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.22%) |Training time=0.64s (19.35%) |Others=0.21 (6.43%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.36 epoch: 0|step: 3193|ppo_ep: 1|act_loss: 0.141845703125|cri_loss: 0.08599853515625|unsuper_loss: 0.0 average reward score: 0.41552734375 ------------------------------------------------------------------------------------- |E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.67s (75.83%) |Training time=0.64s (18.24%) |Others=0.21 (5.93%)|CurSamplesPerSec=2.27 |AvgSamplesPerSec=2.36 epoch: 0|step: 3194|ppo_ep: 1|act_loss: -0.0171051025390625|cri_loss: 0.000152587890625|unsuper_loss: 0.0 average reward score: 4.34375 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.35%) |Training time=0.65s (19.46%) |Others=0.21 (6.19%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.36 epoch: 0|step: 3195|ppo_ep: 1|act_loss: 0.1025390625|cri_loss: 0.0791015625|unsuper_loss: 0.0 average reward score: 3.48828125 ------------------------------------------------------------------------------------- |E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.54s (73.30%) |Training time=0.71s (20.59%) |Others=0.21 (6.11%)|CurSamplesPerSec=2.31 |AvgSamplesPerSec=2.36 epoch: 0|step: 3196|ppo_ep: 1|act_loss: -0.05755615234375|cri_loss: -0.0140533447265625|unsuper_loss: 0.0 average reward score: 4.0859375 ------------------------------------------------------------------------------------- |E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.52s (74.36%) |Training time=0.65s (19.16%) |Others=0.22 (6.48%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.36 epoch: 0|step: 3197|ppo_ep: 1|act_loss: 0.046966552734375|cri_loss: 0.039703369140625|unsuper_loss: 0.0 average reward score: 4.11328125 ------------------------------------------------------------------------------------- |E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.54s (74.96%) |Training time=0.65s (19.06%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.36 epoch: 0|step: 3198|ppo_ep: 1|act_loss: 0.0540771484375|cri_loss: 0.03955078125|unsuper_loss: 0.0 average reward score: 3.107421875 ------------------------------------------------------------------------------------- |E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.49s (73.76%) |Training time=0.67s (19.70%) |Others=0.22 (6.55%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.36 [2023-04-24 16:48:07,197] [INFO] [logging.py:96:log_dist] [Rank 0] step=400, skipped=5, lr=[1.063459645505535e-06, 1.063459645505535e-06], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 16:48:07,448] [INFO] [timer.py:199:stop] epoch=0/micro_step=3200/global_step=400, RunningAvgSamplesPerSec=15.31510068274688, CurrSamplesPerSec=15.524306470635148, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 16:48:07,661] [INFO] [logging.py:96:log_dist] [Rank 0] step=400, skipped=6, lr=[5.641652437644668e-07, 5.641652437644668e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 3199|ppo_ep: 1|act_loss: -0.00818634033203125|cri_loss: 0.00260162353515625|unsuper_loss: 0.0 average reward score: 3.05859375 ------------------------------------------------------------------------------------- |E2E latency=3.75s |Gather latency=0.00s (0.00%) |Generate time=2.52s (67.23%) |Training time=0.94s (25.09%) |Others=0.29 (7.69%)|CurSamplesPerSec=2.13 |AvgSamplesPerSec=2.36 epoch: 0|step: 3200|ppo_ep: 1|act_loss: -0.039794921875|cri_loss: -0.0046539306640625|unsuper_loss: 0.0 average reward score: 2.8359375 ------------------------------------------------------------------------------------- |E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.54s (74.90%) |Training time=0.65s (19.21%) |Others=0.20 (5.89%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.36 epoch: 0|step: 3201|ppo_ep: 1|act_loss: 0.3779296875|cri_loss: 0.22314453125|unsuper_loss: 0.0 average reward score: 0.91015625 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.10%) |Training time=0.65s (19.70%) |Others=0.20 (6.20%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.36 epoch: 0|step: 3202|ppo_ep: 1|act_loss: 0.1568603515625|cri_loss: 0.09326171875|unsuper_loss: 0.0 average reward score: 3.171875 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.41s (72.84%) |Training time=0.68s (20.45%) |Others=0.22 (6.71%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.36 epoch: 0|step: 3203|ppo_ep: 1|act_loss: 0.003509521484375|cri_loss: 0.02044677734375|unsuper_loss: 0.0 average reward score: 3.95703125 ------------------------------------------------------------------------------------- |E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.47s (73.15%) |Training time=0.69s (20.50%) |Others=0.21 (6.35%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.36 epoch: 0|step: 3204|ppo_ep: 1|act_loss: 0.07318115234375|cri_loss: 0.057373046875|unsuper_loss: 0.0 average reward score: 3.46875 ------------------------------------------------------------------------------------- |E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.46s (72.33%) |Training time=0.73s (21.38%) |Others=0.21 (6.29%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.36 epoch: 0|step: 3205|ppo_ep: 1|act_loss: -0.1025390625|cri_loss: -0.04400634765625|unsuper_loss: 0.0 average reward score: 5.109375 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.33%) |Training time=0.65s (19.31%) |Others=0.21 (6.36%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.36 epoch: 0|step: 3206|ppo_ep: 1|act_loss: 0.0946044921875|cri_loss: 0.059417724609375|unsuper_loss: 0.0 average reward score: 3.54296875 ------------------------------------------------------------------------------------- |E2E latency=3.43s |Gather latency=0.00s (0.00%) |Generate time=2.55s (74.31%) |Training time=0.67s (19.57%) |Others=0.21 (6.12%)|CurSamplesPerSec=2.33 |AvgSamplesPerSec=2.36 epoch: 0|step: 3207|ppo_ep: 1|act_loss: 0.07611083984375|cri_loss: 0.0577392578125|unsuper_loss: 0.0 average reward score: 3.583984375 ------------------------------------------------------------------------------------- |E2E latency=5.56s |Gather latency=0.00s (0.00%) |Generate time=3.44s (61.85%) |Training time=1.70s (30.58%) |Others=0.42 (7.56%)|CurSamplesPerSec=1.44 |AvgSamplesPerSec=2.36 epoch: 0|step: 3208|ppo_ep: 1|act_loss: 0.0155792236328125|cri_loss: 0.0143585205078125|unsuper_loss: 0.0 average reward score: 1.626953125 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.08%) |Training time=0.66s (19.67%) |Others=0.21 (6.25%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.36 epoch: 0|step: 3209|ppo_ep: 1|act_loss: 0.0892333984375|cri_loss: 0.058868408203125|unsuper_loss: 0.0 average reward score: 2.283203125 ------------------------------------------------------------------------------------- |E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.58s (74.97%) |Training time=0.65s (18.79%) |Others=0.21 (6.23%)|CurSamplesPerSec=2.32 |AvgSamplesPerSec=2.36 epoch: 0|step: 3210|ppo_ep: 1|act_loss: 0.12445068359375|cri_loss: 0.08197021484375|unsuper_loss: 0.0 average reward score: 4.40625 ------------------------------------------------------------------------------------- |E2E latency=3.41s |Gather latency=0.00s (0.00%) |Generate time=2.55s (74.86%) |Training time=0.65s (19.06%) |Others=0.21 (6.08%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.36 epoch: 0|step: 3211|ppo_ep: 1|act_loss: -0.09173583984375|cri_loss: -0.0377197265625|unsuper_loss: 0.0 average reward score: 3.572265625 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.37%) |Training time=0.65s (19.45%) |Others=0.21 (6.18%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.36 epoch: 0|step: 3212|ppo_ep: 1|act_loss: -0.0002899169921875|cri_loss: 0.014434814453125|unsuper_loss: 0.0 average reward score: 3.998046875 ------------------------------------------------------------------------------------- |E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.50s (73.55%) |Training time=0.68s (19.91%) |Others=0.22 (6.55%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.36 epoch: 0|step: 3213|ppo_ep: 1|act_loss: 0.15673828125|cri_loss: 0.10430908203125|unsuper_loss: 0.0 average reward score: 3.234375 ------------------------------------------------------------------------------------- |E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.49s (73.66%) |Training time=0.66s (19.68%) |Others=0.22 (6.66%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.36 epoch: 0|step: 3214|ppo_ep: 1|act_loss: 0.00628662109375|cri_loss: 0.0227813720703125|unsuper_loss: 0.0 average reward score: 2.74609375 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.73s (75.63%) |Training time=0.66s (18.38%) |Others=0.22 (5.99%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.36 epoch: 0|step: 3215|ppo_ep: 1|act_loss: -0.003662109375|cri_loss: 0.0154266357421875|unsuper_loss: 0.0 average reward score: 4.06640625 ------------------------------------------------------------------------------------- |E2E latency=3.90s |Gather latency=0.00s (0.00%) |Generate time=2.67s (68.30%) |Training time=0.94s (23.98%) |Others=0.30 (7.72%)|CurSamplesPerSec=2.05 |AvgSamplesPerSec=2.36 epoch: 0|step: 3216|ppo_ep: 1|act_loss: 0.090087890625|cri_loss: 0.06011962890625|unsuper_loss: 0.0 average reward score: 3.33203125 ------------------------------------------------------------------------------------- |E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.45s (72.00%) |Training time=0.72s (21.16%) |Others=0.23 (6.84%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.36 epoch: 0|step: 3217|ppo_ep: 1|act_loss: -0.034210205078125|cri_loss: -0.006927490234375|unsuper_loss: 0.0 average reward score: 3.861328125 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.52s (74.77%) |Training time=0.64s (19.12%) |Others=0.21 (6.11%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.36 epoch: 0|step: 3218|ppo_ep: 1|act_loss: 0.421875|cri_loss: 0.2469482421875|unsuper_loss: 0.0 average reward score: 2.142578125 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.45s (72.73%) |Training time=0.70s (20.75%) |Others=0.22 (6.52%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.36 epoch: 0|step: 3219|ppo_ep: 1|act_loss: 0.0887451171875|cri_loss: 0.0550537109375|unsuper_loss: 0.0 average reward score: 3.640625 ------------------------------------------------------------------------------------- |E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.48s (73.30%) |Training time=0.69s (20.47%) |Others=0.21 (6.23%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.36 epoch: 0|step: 3220|ppo_ep: 1|act_loss: -0.0565185546875|cri_loss: -0.00848388671875|unsuper_loss: 0.0 average reward score: 4.078125 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.46%) |Training time=0.64s (19.19%) |Others=0.21 (6.34%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.36 epoch: 0|step: 3221|ppo_ep: 1|act_loss: -0.034393310546875|cri_loss: -0.0038604736328125|unsuper_loss: 0.0 average reward score: 4.71875 ------------------------------------------------------------------------------------- |E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.53s (74.82%) |Training time=0.65s (19.13%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.36 epoch: 0|step: 3222|ppo_ep: 1|act_loss: 0.0208587646484375|cri_loss: 0.023468017578125|unsuper_loss: 0.0 average reward score: 2.978515625 ------------------------------------------------------------------------------------- |E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.22%) |Training time=0.65s (19.22%) |Others=0.22 (6.56%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.36 epoch: 0|step: 3223|ppo_ep: 1|act_loss: 0.1083984375|cri_loss: 0.08221435546875|unsuper_loss: 0.0 average reward score: 4.23046875 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.18%) |Training time=0.93s (25.44%) |Others=0.30 (8.37%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.36 epoch: 0|step: 3224|ppo_ep: 1|act_loss: 0.1065673828125|cri_loss: 0.0703125|unsuper_loss: 0.0 average reward score: 3.90234375 ------------------------------------------------------------------------------------- |E2E latency=9.14s |Gather latency=0.00s (0.00%) |Generate time=6.14s (67.18%) |Training time=2.41s (26.32%) |Others=0.59 (6.50%)|CurSamplesPerSec=0.88 |AvgSamplesPerSec=2.36 epoch: 0|step: 3225|ppo_ep: 1|act_loss: -0.0589599609375|cri_loss: -0.012451171875|unsuper_loss: 0.0 average reward score: 3.2265625 ------------------------------------------------------------------------------------- |E2E latency=3.71s |Gather latency=0.00s (0.00%) |Generate time=2.77s (74.70%) |Training time=0.64s (17.24%) |Others=0.30 (8.06%)|CurSamplesPerSec=2.16 |AvgSamplesPerSec=2.36 epoch: 0|step: 3226|ppo_ep: 1|act_loss: 0.05206298828125|cri_loss: 0.0303192138671875|unsuper_loss: 0.0 average reward score: 3.69921875 ------------------------------------------------------------------------------------- |E2E latency=3.44s |Gather latency=0.00s (0.00%) |Generate time=2.57s (74.69%) |Training time=0.65s (19.01%) |Others=0.22 (6.30%)|CurSamplesPerSec=2.33 |AvgSamplesPerSec=2.36 epoch: 0|step: 3227|ppo_ep: 1|act_loss: 0.116455078125|cri_loss: 0.0772705078125|unsuper_loss: 0.0 average reward score: 1.9541015625 ------------------------------------------------------------------------------------- |E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.67s (75.42%) |Training time=0.66s (18.63%) |Others=0.21 (5.95%)|CurSamplesPerSec=2.26 |AvgSamplesPerSec=2.36 epoch: 0|step: 3228|ppo_ep: 1|act_loss: 0.155517578125|cri_loss: 0.09075927734375|unsuper_loss: 0.0 average reward score: 2.880859375 ------------------------------------------------------------------------------------- |E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.68s (75.84%) |Training time=0.65s (18.30%) |Others=0.21 (5.87%)|CurSamplesPerSec=2.26 |AvgSamplesPerSec=2.36 epoch: 0|step: 3229|ppo_ep: 1|act_loss: 0.09375|cri_loss: 0.05950927734375|unsuper_loss: 0.0 average reward score: 4.11328125 ------------------------------------------------------------------------------------- |E2E latency=3.71s |Gather latency=0.00s (0.00%) |Generate time=2.85s (76.77%) |Training time=0.65s (17.58%) |Others=0.21 (5.66%)|CurSamplesPerSec=2.16 |AvgSamplesPerSec=2.36 epoch: 0|step: 3230|ppo_ep: 1|act_loss: -0.109130859375|cri_loss: -0.035491943359375|unsuper_loss: 0.0 average reward score: 3.60546875 ------------------------------------------------------------------------------------- |E2E latency=3.95s |Gather latency=0.00s (0.00%) |Generate time=3.06s (77.40%) |Training time=0.65s (16.48%) |Others=0.24 (6.12%)|CurSamplesPerSec=2.03 |AvgSamplesPerSec=2.36 epoch: 0|step: 3231|ppo_ep: 1|act_loss: 0.09893798828125|cri_loss: 0.066162109375|unsuper_loss: 0.0 average reward score: 3.9609375 ------------------------------------------------------------------------------------- |E2E latency=4.43s |Gather latency=0.00s (0.00%) |Generate time=3.15s (71.11%) |Training time=0.95s (21.34%) |Others=0.33 (7.56%)|CurSamplesPerSec=1.81 |AvgSamplesPerSec=2.36 epoch: 0|step: 3232|ppo_ep: 1|act_loss: 0.061859130859375|cri_loss: 0.038970947265625|unsuper_loss: 0.0 average reward score: 3.259765625 ------------------------------------------------------------------------------------- |E2E latency=3.77s |Gather latency=0.00s (0.00%) |Generate time=2.74s (72.56%) |Training time=0.79s (20.84%) |Others=0.25 (6.59%)|CurSamplesPerSec=2.12 |AvgSamplesPerSec=2.36 epoch: 0|step: 3233|ppo_ep: 1|act_loss: -0.020050048828125|cri_loss: -0.00296783447265625|unsuper_loss: 0.0 average reward score: 3.59375 ------------------------------------------------------------------------------------- |E2E latency=3.69s |Gather latency=0.00s (0.00%) |Generate time=2.73s (74.02%) |Training time=0.72s (19.54%) |Others=0.24 (6.45%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.36 epoch: 0|step: 3234|ppo_ep: 1|act_loss: 0.1328125|cri_loss: 0.0859375|unsuper_loss: 0.0 average reward score: 2.5390625 ------------------------------------------------------------------------------------- |E2E latency=4.20s |Gather latency=0.00s (0.00%) |Generate time=3.32s (79.07%) |Training time=0.65s (15.38%) |Others=0.23 (5.55%)|CurSamplesPerSec=1.91 |AvgSamplesPerSec=2.36 epoch: 0|step: 3235|ppo_ep: 1|act_loss: 0.23486328125|cri_loss: 0.130126953125|unsuper_loss: 0.0 average reward score: 2.845703125 ------------------------------------------------------------------------------------- |E2E latency=3.70s |Gather latency=0.00s (0.00%) |Generate time=2.83s (76.49%) |Training time=0.65s (17.62%) |Others=0.22 (5.89%)|CurSamplesPerSec=2.16 |AvgSamplesPerSec=2.36 epoch: 0|step: 3236|ppo_ep: 1|act_loss: 0.1142578125|cri_loss: 0.0697021484375|unsuper_loss: 0.0 average reward score: 3.986328125 ------------------------------------------------------------------------------------- |E2E latency=3.79s |Gather latency=0.00s (0.00%) |Generate time=2.72s (71.89%) |Training time=0.81s (21.50%) |Others=0.25 (6.61%)|CurSamplesPerSec=2.11 |AvgSamplesPerSec=2.36 epoch: 0|step: 3237|ppo_ep: 1|act_loss: 0.154541015625|cri_loss: 0.0927734375|unsuper_loss: 0.0 average reward score: 2.8125 ------------------------------------------------------------------------------------- |E2E latency=3.77s |Gather latency=0.00s (0.00%) |Generate time=2.89s (76.72%) |Training time=0.64s (17.00%) |Others=0.24 (6.27%)|CurSamplesPerSec=2.12 |AvgSamplesPerSec=2.36 epoch: 0|step: 3238|ppo_ep: 1|act_loss: 0.001068115234375|cri_loss: 0.0116119384765625|unsuper_loss: 0.0 average reward score: 4.34375 ------------------------------------------------------------------------------------- |E2E latency=5.05s |Gather latency=0.00s (0.00%) |Generate time=3.58s (70.91%) |Training time=1.14s (22.67%) |Others=0.32 (6.42%)|CurSamplesPerSec=1.59 |AvgSamplesPerSec=2.36 epoch: 0|step: 3239|ppo_ep: 1|act_loss: -0.039215087890625|cri_loss: -0.0159759521484375|unsuper_loss: 0.0 average reward score: 4.0859375 ------------------------------------------------------------------------------------- |E2E latency=3.83s |Gather latency=0.00s (0.00%) |Generate time=2.59s (67.50%) |Training time=0.94s (24.42%) |Others=0.31 (8.08%)|CurSamplesPerSec=2.09 |AvgSamplesPerSec=2.36 epoch: 0|step: 3240|ppo_ep: 1|act_loss: 0.0249786376953125|cri_loss: 0.03228759765625|unsuper_loss: 0.0 average reward score: 4.19921875 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.72s (75.82%) |Training time=0.64s (17.94%) |Others=0.22 (6.25%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.36 epoch: 0|step: 3241|ppo_ep: 1|act_loss: 0.00217437744140625|cri_loss: 0.0087127685546875|unsuper_loss: 0.0 average reward score: 3.6875 ------------------------------------------------------------------------------------- |E2E latency=3.78s |Gather latency=0.00s (0.00%) |Generate time=2.92s (77.14%) |Training time=0.65s (17.04%) |Others=0.22 (5.82%)|CurSamplesPerSec=2.11 |AvgSamplesPerSec=2.36 epoch: 0|step: 3242|ppo_ep: 1|act_loss: 0.0102691650390625|cri_loss: 0.013092041015625|unsuper_loss: 0.0 average reward score: 3.41015625 ------------------------------------------------------------------------------------- |E2E latency=3.96s |Gather latency=0.00s (0.00%) |Generate time=2.73s (69.11%) |Training time=1.00s (25.23%) |Others=0.22 (5.66%)|CurSamplesPerSec=2.02 |AvgSamplesPerSec=2.36 epoch: 0|step: 3243|ppo_ep: 1|act_loss: 0.006389617919921875|cri_loss: 0.009979248046875|unsuper_loss: 0.0 average reward score: 4.6484375 ------------------------------------------------------------------------------------- |E2E latency=3.74s |Gather latency=0.00s (0.00%) |Generate time=2.78s (74.45%) |Training time=0.71s (19.11%) |Others=0.24 (6.44%)|CurSamplesPerSec=2.14 |AvgSamplesPerSec=2.36 epoch: 0|step: 3244|ppo_ep: 1|act_loss: -0.05743408203125|cri_loss: -0.023284912109375|unsuper_loss: 0.0 average reward score: 4.1484375 ------------------------------------------------------------------------------------- |E2E latency=4.06s |Gather latency=0.00s (0.00%) |Generate time=3.19s (78.68%) |Training time=0.64s (15.84%) |Others=0.22 (5.49%)|CurSamplesPerSec=1.97 |AvgSamplesPerSec=2.36 epoch: 0|step: 3245|ppo_ep: 1|act_loss: -0.01434326171875|cri_loss: -0.0008392333984375|unsuper_loss: 0.0 average reward score: 5.05859375 ------------------------------------------------------------------------------------- |E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.69s (74.09%) |Training time=0.72s (19.71%) |Others=0.22 (6.20%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.36 epoch: 0|step: 3246|ppo_ep: 1|act_loss: 0.37255859375|cri_loss: 0.21533203125|unsuper_loss: 0.0 average reward score: 2.51171875 ------------------------------------------------------------------------------------- |E2E latency=3.59s |Gather latency=0.00s (0.00%) |Generate time=2.64s (73.38%) |Training time=0.73s (20.35%) |Others=0.23 (6.27%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.36 epoch: 0|step: 3247|ppo_ep: 1|act_loss: 0.15673828125|cri_loss: 0.1141357421875|unsuper_loss: 0.0 average reward score: 3.96484375 ------------------------------------------------------------------------------------- |E2E latency=4.05s |Gather latency=0.00s (0.00%) |Generate time=2.76s (68.23%) |Training time=0.97s (23.87%) |Others=0.32 (7.90%)|CurSamplesPerSec=1.98 |AvgSamplesPerSec=2.36 epoch: 0|step: 3248|ppo_ep: 1|act_loss: -0.047637939453125|cri_loss: -0.01336669921875|unsuper_loss: 0.0 average reward score: 3.3671875 ------------------------------------------------------------------------------------- |E2E latency=4.16s |Gather latency=0.00s (0.00%) |Generate time=3.27s (78.72%) |Training time=0.66s (15.97%) |Others=0.22 (5.31%)|CurSamplesPerSec=1.92 |AvgSamplesPerSec=2.36 epoch: 0|step: 3249|ppo_ep: 1|act_loss: -0.0416259765625|cri_loss: -0.015960693359375|unsuper_loss: 0.0 average reward score: 4.265625 ------------------------------------------------------------------------------------- |E2E latency=3.86s |Gather latency=0.00s (0.00%) |Generate time=2.91s (75.31%) |Training time=0.73s (18.86%) |Others=0.22 (5.83%)|CurSamplesPerSec=2.07 |AvgSamplesPerSec=2.36 epoch: 0|step: 3250|ppo_ep: 1|act_loss: -0.126220703125|cri_loss: -0.054656982421875|unsuper_loss: 0.0 average reward score: 3.630859375 ------------------------------------------------------------------------------------- |E2E latency=3.96s |Gather latency=0.00s (0.00%) |Generate time=2.68s (67.71%) |Training time=1.05s (26.45%) |Others=0.23 (5.84%)|CurSamplesPerSec=2.02 |AvgSamplesPerSec=2.36 epoch: 0|step: 3251|ppo_ep: 1|act_loss: 0.150390625|cri_loss: 0.0904541015625|unsuper_loss: 0.0 average reward score: 2.28125 ------------------------------------------------------------------------------------- |E2E latency=3.69s |Gather latency=0.00s (0.00%) |Generate time=2.80s (76.00%) |Training time=0.65s (17.52%) |Others=0.24 (6.49%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.36 epoch: 0|step: 3252|ppo_ep: 1|act_loss: 0.125732421875|cri_loss: 0.075439453125|unsuper_loss: 0.0 average reward score: 2.416015625 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.76s (75.80%) |Training time=0.65s (17.87%) |Others=0.23 (6.32%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.36 epoch: 0|step: 3253|ppo_ep: 1|act_loss: -0.02410888671875|cri_loss: -0.00516510009765625|unsuper_loss: 0.0 average reward score: 4.34765625 ------------------------------------------------------------------------------------- |E2E latency=4.36s |Gather latency=0.00s (0.00%) |Generate time=3.00s (68.76%) |Training time=0.65s (14.90%) |Others=0.71 (16.34%)|CurSamplesPerSec=1.83 |AvgSamplesPerSec=2.36 epoch: 0|step: 3254|ppo_ep: 1|act_loss: 0.0987548828125|cri_loss: 0.0693359375|unsuper_loss: 0.0 average reward score: 3.830078125 ------------------------------------------------------------------------------------- |E2E latency=3.72s |Gather latency=0.00s (0.00%) |Generate time=2.82s (75.76%) |Training time=0.65s (17.57%) |Others=0.25 (6.66%)|CurSamplesPerSec=2.15 |AvgSamplesPerSec=2.36 epoch: 0|step: 3255|ppo_ep: 1|act_loss: 0.149658203125|cri_loss: 0.0904541015625|unsuper_loss: 0.0 average reward score: 2.498046875 ------------------------------------------------------------------------------------- |E2E latency=4.22s |Gather latency=0.00s (0.00%) |Generate time=2.97s (70.32%) |Training time=0.94s (22.26%) |Others=0.31 (7.42%)|CurSamplesPerSec=1.89 |AvgSamplesPerSec=2.36 epoch: 0|step: 3256|ppo_ep: 1|act_loss: 0.20654296875|cri_loss: 0.1317138671875|unsuper_loss: 0.0 average reward score: 2.12109375 ------------------------------------------------------------------------------------- |E2E latency=3.90s |Gather latency=0.00s (0.00%) |Generate time=3.01s (77.13%) |Training time=0.67s (17.13%) |Others=0.22 (5.74%)|CurSamplesPerSec=2.05 |AvgSamplesPerSec=2.36 epoch: 0|step: 3257|ppo_ep: 1|act_loss: 0.0299072265625|cri_loss: 0.0240631103515625|unsuper_loss: 0.0 average reward score: 5.4140625 ------------------------------------------------------------------------------------- |E2E latency=3.74s |Gather latency=0.00s (0.00%) |Generate time=2.80s (74.87%) |Training time=0.74s (19.65%) |Others=0.21 (5.48%)|CurSamplesPerSec=2.14 |AvgSamplesPerSec=2.36 epoch: 0|step: 3258|ppo_ep: 1|act_loss: -0.031036376953125|cri_loss: -0.01297760009765625|unsuper_loss: 0.0 average reward score: 2.974609375 ------------------------------------------------------------------------------------- |E2E latency=3.86s |Gather latency=0.00s (0.00%) |Generate time=2.96s (76.80%) |Training time=0.66s (17.23%) |Others=0.23 (5.97%)|CurSamplesPerSec=2.07 |AvgSamplesPerSec=2.36 epoch: 0|step: 3259|ppo_ep: 1|act_loss: 0.0369873046875|cri_loss: 0.023223876953125|unsuper_loss: 0.0 average reward score: 3.46484375 ------------------------------------------------------------------------------------- |E2E latency=3.82s |Gather latency=0.00s (0.00%) |Generate time=2.95s (77.09%) |Training time=0.65s (16.92%) |Others=0.23 (5.99%)|CurSamplesPerSec=2.09 |AvgSamplesPerSec=2.36 epoch: 0|step: 3260|ppo_ep: 1|act_loss: -0.072509765625|cri_loss: -0.030609130859375|unsuper_loss: 0.0 average reward score: 3.98046875 ------------------------------------------------------------------------------------- |E2E latency=3.59s |Gather latency=0.00s (0.00%) |Generate time=2.74s (76.39%) |Training time=0.64s (17.96%) |Others=0.20 (5.64%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.36 epoch: 0|step: 3261|ppo_ep: 1|act_loss: 0.0809326171875|cri_loss: 0.066162109375|unsuper_loss: 0.0 average reward score: 1.0048828125 ------------------------------------------------------------------------------------- |E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.65s (74.69%) |Training time=0.67s (18.77%) |Others=0.23 (6.54%)|CurSamplesPerSec=2.25 |AvgSamplesPerSec=2.36 epoch: 0|step: 3262|ppo_ep: 1|act_loss: 0.094970703125|cri_loss: 0.056732177734375|unsuper_loss: 0.0 average reward score: 4.45703125 ------------------------------------------------------------------------------------- |E2E latency=3.77s |Gather latency=0.00s (0.00%) |Generate time=2.87s (76.31%) |Training time=0.66s (17.61%) |Others=0.23 (6.08%)|CurSamplesPerSec=2.12 |AvgSamplesPerSec=2.36 epoch: 0|step: 3263|ppo_ep: 1|act_loss: 0.053192138671875|cri_loss: 0.03485107421875|unsuper_loss: 0.0 average reward score: 3.95703125 ------------------------------------------------------------------------------------- |E2E latency=3.95s |Gather latency=0.00s (0.00%) |Generate time=2.68s (67.85%) |Training time=0.94s (23.83%) |Others=0.33 (8.32%)|CurSamplesPerSec=2.02 |AvgSamplesPerSec=2.36 epoch: 0|step: 3264|ppo_ep: 1|act_loss: -0.0119171142578125|cri_loss: 0.003173828125|unsuper_loss: 0.0 average reward score: 4.10546875 ------------------------------------------------------------------------------------- |E2E latency=3.83s |Gather latency=0.00s (0.00%) |Generate time=2.94s (76.90%) |Training time=0.65s (17.01%) |Others=0.23 (6.08%)|CurSamplesPerSec=2.09 |AvgSamplesPerSec=2.36 epoch: 0|step: 3265|ppo_ep: 1|act_loss: 0.0009765625|cri_loss: 0.0128326416015625|unsuper_loss: 0.0 average reward score: 3.08203125 ------------------------------------------------------------------------------------- |E2E latency=3.96s |Gather latency=0.00s (0.00%) |Generate time=3.00s (75.71%) |Training time=0.74s (18.60%) |Others=0.23 (5.69%)|CurSamplesPerSec=2.02 |AvgSamplesPerSec=2.36 epoch: 0|step: 3266|ppo_ep: 1|act_loss: -0.004364013671875|cri_loss: 0.003391265869140625|unsuper_loss: 0.0 average reward score: 4.5234375 ------------------------------------------------------------------------------------- |E2E latency=3.90s |Gather latency=0.00s (0.00%) |Generate time=3.03s (77.66%) |Training time=0.64s (16.41%) |Others=0.23 (5.93%)|CurSamplesPerSec=2.05 |AvgSamplesPerSec=2.36 epoch: 0|step: 3267|ppo_ep: 1|act_loss: -0.030731201171875|cri_loss: -0.00868988037109375|unsuper_loss: 0.0 average reward score: 2.666015625 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.72s (75.51%) |Training time=0.66s (18.36%) |Others=0.22 (6.12%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.36 epoch: 0|step: 3268|ppo_ep: 1|act_loss: -0.08050537109375|cri_loss: -0.0289459228515625|unsuper_loss: 0.0 average reward score: 4.28125 ------------------------------------------------------------------------------------- |E2E latency=3.67s |Gather latency=0.00s (0.00%) |Generate time=2.79s (76.10%) |Training time=0.64s (17.52%) |Others=0.23 (6.38%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.36 epoch: 0|step: 3269|ppo_ep: 1|act_loss: 0.12213134765625|cri_loss: 0.087646484375|unsuper_loss: 0.0 average reward score: 3.17578125 ------------------------------------------------------------------------------------- |E2E latency=3.70s |Gather latency=0.00s (0.00%) |Generate time=2.76s (74.66%) |Training time=0.71s (19.20%) |Others=0.23 (6.13%)|CurSamplesPerSec=2.16 |AvgSamplesPerSec=2.36 epoch: 0|step: 3270|ppo_ep: 1|act_loss: -0.01192474365234375|cri_loss: -0.000762939453125|unsuper_loss: 0.0 average reward score: 4.25 ------------------------------------------------------------------------------------- |E2E latency=3.83s |Gather latency=0.00s (0.00%) |Generate time=2.95s (77.15%) |Training time=0.65s (17.06%) |Others=0.22 (5.79%)|CurSamplesPerSec=2.09 |AvgSamplesPerSec=2.36 epoch: 0|step: 3271|ppo_ep: 1|act_loss: -0.0712890625|cri_loss: -0.02728271484375|unsuper_loss: 0.0 average reward score: 3.54296875 ------------------------------------------------------------------------------------- |E2E latency=3.85s |Gather latency=0.00s (0.00%) |Generate time=2.62s (68.11%) |Training time=0.93s (24.24%) |Others=0.29 (7.65%)|CurSamplesPerSec=2.08 |AvgSamplesPerSec=2.36 epoch: 0|step: 3272|ppo_ep: 1|act_loss: -0.0341796875|cri_loss: -0.0021820068359375|unsuper_loss: 0.0 average reward score: 4.2578125 ------------------------------------------------------------------------------------- |E2E latency=3.85s |Gather latency=0.00s (0.00%) |Generate time=2.98s (77.37%) |Training time=0.65s (16.79%) |Others=0.22 (5.84%)|CurSamplesPerSec=2.08 |AvgSamplesPerSec=2.36 epoch: 0|step: 3273|ppo_ep: 1|act_loss: 0.033294677734375|cri_loss: 0.03594970703125|unsuper_loss: 0.0 average reward score: 3.4609375 ------------------------------------------------------------------------------------- |E2E latency=3.95s |Gather latency=0.00s (0.00%) |Generate time=2.79s (70.58%) |Training time=0.94s (23.75%) |Others=0.22 (5.67%)|CurSamplesPerSec=2.03 |AvgSamplesPerSec=2.36 epoch: 0|step: 3274|ppo_ep: 1|act_loss: 0.0908203125|cri_loss: 0.061614990234375|unsuper_loss: 0.0 average reward score: 3.2421875 ------------------------------------------------------------------------------------- |E2E latency=4.01s |Gather latency=0.00s (0.00%) |Generate time=2.82s (70.19%) |Training time=0.96s (23.86%) |Others=0.24 (5.95%)|CurSamplesPerSec=1.99 |AvgSamplesPerSec=2.36 epoch: 0|step: 3275|ppo_ep: 1|act_loss: 0.04608154296875|cri_loss: 0.04229736328125|unsuper_loss: 0.0 average reward score: 3.720703125 ------------------------------------------------------------------------------------- |E2E latency=3.67s |Gather latency=0.00s (0.00%) |Generate time=2.80s (76.18%) |Training time=0.65s (17.65%) |Others=0.23 (6.17%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.36 epoch: 0|step: 3276|ppo_ep: 1|act_loss: -0.01340484619140625|cri_loss: 0.000579833984375|unsuper_loss: 0.0 average reward score: 1.6220703125 ------------------------------------------------------------------------------------- |E2E latency=3.67s |Gather latency=0.00s (0.00%) |Generate time=2.76s (75.04%) |Training time=0.67s (18.34%) |Others=0.24 (6.62%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.36 epoch: 0|step: 3277|ppo_ep: 1|act_loss: -0.1094970703125|cri_loss: -0.029144287109375|unsuper_loss: 0.0 average reward score: 2.4140625 ------------------------------------------------------------------------------------- |E2E latency=3.73s |Gather latency=0.00s (0.00%) |Generate time=2.84s (76.14%) |Training time=0.65s (17.33%) |Others=0.24 (6.54%)|CurSamplesPerSec=2.14 |AvgSamplesPerSec=2.36 epoch: 0|step: 3278|ppo_ep: 1|act_loss: -0.006072998046875|cri_loss: 0.0115509033203125|unsuper_loss: 0.0 average reward score: 3.708984375 ------------------------------------------------------------------------------------- |E2E latency=3.67s |Gather latency=0.00s (0.00%) |Generate time=2.78s (75.88%) |Training time=0.65s (17.59%) |Others=0.24 (6.53%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.36 [2023-04-24 16:53:11,798] [INFO] [logging.py:96:log_dist] [Rank 0] step=410, skipped=5, lr=[8.243928151091435e-07, 8.243928151091435e-07], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 16:53:12,054] [INFO] [timer.py:199:stop] epoch=0/micro_step=3280/global_step=410, RunningAvgSamplesPerSec=15.289758404681205, CurrSamplesPerSec=13.442923223253306, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 16:53:12,295] [INFO] [logging.py:96:log_dist] [Rank 0] step=410, skipped=6, lr=[4.388960991455998e-07, 4.388960991455998e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 3279|ppo_ep: 1|act_loss: 0.0567626953125|cri_loss: 0.049041748046875|unsuper_loss: 0.0 average reward score: 3.927734375 ------------------------------------------------------------------------------------- |E2E latency=4.05s |Gather latency=0.00s (0.00%) |Generate time=2.70s (66.57%) |Training time=1.04s (25.63%) |Others=0.32 (7.80%)|CurSamplesPerSec=1.97 |AvgSamplesPerSec=2.36 epoch: 0|step: 3280|ppo_ep: 1|act_loss: -0.0684814453125|cri_loss: -0.02923583984375|unsuper_loss: 0.0 average reward score: 3.796875 ------------------------------------------------------------------------------------- |E2E latency=3.65s |Gather latency=0.00s (0.00%) |Generate time=2.78s (75.97%) |Training time=0.65s (17.78%) |Others=0.23 (6.26%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.36 epoch: 0|step: 3281|ppo_ep: 1|act_loss: 0.05072021484375|cri_loss: 0.0347900390625|unsuper_loss: 0.0 average reward score: 3.81640625 ------------------------------------------------------------------------------------- |E2E latency=3.69s |Gather latency=0.00s (0.00%) |Generate time=2.81s (76.05%) |Training time=0.67s (18.07%) |Others=0.22 (5.88%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.36 epoch: 0|step: 3282|ppo_ep: 1|act_loss: 0.1871337890625|cri_loss: 0.124267578125|unsuper_loss: 0.0 average reward score: 5.078125 ------------------------------------------------------------------------------------- |E2E latency=3.84s |Gather latency=0.00s (0.00%) |Generate time=2.88s (74.99%) |Training time=0.73s (19.09%) |Others=0.23 (5.92%)|CurSamplesPerSec=2.08 |AvgSamplesPerSec=2.36 epoch: 0|step: 3283|ppo_ep: 1|act_loss: 0.437255859375|cri_loss: 0.2724609375|unsuper_loss: 0.0 average reward score: 3.5625 ------------------------------------------------------------------------------------- |E2E latency=3.78s |Gather latency=0.00s (0.00%) |Generate time=2.62s (69.36%) |Training time=0.92s (24.40%) |Others=0.24 (6.24%)|CurSamplesPerSec=2.12 |AvgSamplesPerSec=2.36 epoch: 0|step: 3284|ppo_ep: 1|act_loss: 0.022552490234375|cri_loss: 0.025421142578125|unsuper_loss: 0.0 average reward score: 2.94921875 ------------------------------------------------------------------------------------- |E2E latency=3.73s |Gather latency=0.00s (0.00%) |Generate time=2.82s (75.52%) |Training time=0.68s (18.19%) |Others=0.24 (6.29%)|CurSamplesPerSec=2.14 |AvgSamplesPerSec=2.36 epoch: 0|step: 3285|ppo_ep: 1|act_loss: 0.02728271484375|cri_loss: 0.025543212890625|unsuper_loss: 0.0 average reward score: 3.86328125 ------------------------------------------------------------------------------------- |E2E latency=3.87s |Gather latency=0.00s (0.00%) |Generate time=2.78s (71.93%) |Training time=0.86s (22.28%) |Others=0.22 (5.79%)|CurSamplesPerSec=2.07 |AvgSamplesPerSec=2.36 epoch: 0|step: 3286|ppo_ep: 1|act_loss: -0.0728759765625|cri_loss: -0.0277862548828125|unsuper_loss: 0.0 average reward score: 4.875 ------------------------------------------------------------------------------------- |E2E latency=3.70s |Gather latency=0.00s (0.00%) |Generate time=2.79s (75.35%) |Training time=0.67s (18.11%) |Others=0.24 (6.53%)|CurSamplesPerSec=2.16 |AvgSamplesPerSec=2.36 epoch: 0|step: 3287|ppo_ep: 1|act_loss: -0.0135498046875|cri_loss: 0.0013427734375|unsuper_loss: 0.0 average reward score: 4.95703125 ------------------------------------------------------------------------------------- |E2E latency=3.86s |Gather latency=0.00s (0.00%) |Generate time=2.59s (67.17%) |Training time=0.94s (24.44%) |Others=0.32 (8.38%)|CurSamplesPerSec=2.07 |AvgSamplesPerSec=2.36 epoch: 0|step: 3288|ppo_ep: 1|act_loss: -0.0004119873046875|cri_loss: 0.01001739501953125|unsuper_loss: 0.0 average reward score: 3.6484375 ------------------------------------------------------------------------------------- |E2E latency=3.95s |Gather latency=0.00s (0.00%) |Generate time=3.10s (78.54%) |Training time=0.64s (16.27%) |Others=0.20 (5.19%)|CurSamplesPerSec=2.03 |AvgSamplesPerSec=2.36 epoch: 0|step: 3289|ppo_ep: 1|act_loss: 0.00299072265625|cri_loss: 0.04718017578125|unsuper_loss: 0.0 average reward score: 3.53125 ------------------------------------------------------------------------------------- |E2E latency=3.95s |Gather latency=0.00s (0.00%) |Generate time=3.04s (77.02%) |Training time=0.67s (16.94%) |Others=0.24 (6.04%)|CurSamplesPerSec=2.03 |AvgSamplesPerSec=2.36 epoch: 0|step: 3290|ppo_ep: 1|act_loss: 0.036834716796875|cri_loss: 0.02838134765625|unsuper_loss: 0.0 average reward score: 3.84765625 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.73s (75.59%) |Training time=0.65s (18.07%) |Others=0.23 (6.34%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.36 epoch: 0|step: 3291|ppo_ep: 1|act_loss: -0.0916748046875|cri_loss: -0.04254150390625|unsuper_loss: 0.0 average reward score: 3.98046875 ------------------------------------------------------------------------------------- |E2E latency=3.89s |Gather latency=0.00s (0.00%) |Generate time=3.02s (77.74%) |Training time=0.65s (16.75%) |Others=0.21 (5.51%)|CurSamplesPerSec=2.06 |AvgSamplesPerSec=2.36 epoch: 0|step: 3292|ppo_ep: 1|act_loss: 0.1314697265625|cri_loss: 0.07489013671875|unsuper_loss: 0.0 average reward score: 3.6953125 ------------------------------------------------------------------------------------- |E2E latency=3.81s |Gather latency=0.00s (0.00%) |Generate time=2.94s (77.16%) |Training time=0.65s (17.05%) |Others=0.22 (5.80%)|CurSamplesPerSec=2.10 |AvgSamplesPerSec=2.36 epoch: 0|step: 3293|ppo_ep: 1|act_loss: 0.082763671875|cri_loss: 0.0555419921875|unsuper_loss: 0.0 average reward score: 2.650390625 ------------------------------------------------------------------------------------- |E2E latency=3.88s |Gather latency=0.00s (0.00%) |Generate time=3.00s (77.19%) |Training time=0.65s (16.78%) |Others=0.23 (6.03%)|CurSamplesPerSec=2.06 |AvgSamplesPerSec=2.36 epoch: 0|step: 3294|ppo_ep: 1|act_loss: 0.11407470703125|cri_loss: 0.0662841796875|unsuper_loss: 0.0 average reward score: 2.513671875 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.66s (73.89%) |Training time=0.72s (19.96%) |Others=0.22 (6.15%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.36 epoch: 0|step: 3295|ppo_ep: 1|act_loss: -0.030548095703125|cri_loss: -0.00762939453125|unsuper_loss: 0.0 average reward score: 3.61328125 ------------------------------------------------------------------------------------- |E2E latency=3.94s |Gather latency=0.00s (0.00%) |Generate time=2.70s (68.49%) |Training time=0.93s (23.69%) |Others=0.31 (7.83%)|CurSamplesPerSec=2.03 |AvgSamplesPerSec=2.36 epoch: 0|step: 3296|ppo_ep: 1|act_loss: -0.0836181640625|cri_loss: -0.03192138671875|unsuper_loss: 0.0 average reward score: 4.47265625 ------------------------------------------------------------------------------------- |E2E latency=4.04s |Gather latency=0.00s (0.00%) |Generate time=3.14s (77.85%) |Training time=0.66s (16.22%) |Others=0.24 (5.93%)|CurSamplesPerSec=1.98 |AvgSamplesPerSec=2.36 epoch: 0|step: 3297|ppo_ep: 1|act_loss: -0.0323486328125|cri_loss: 0.02423095703125|unsuper_loss: 0.0 average reward score: 2.55859375 ------------------------------------------------------------------------------------- |E2E latency=3.85s |Gather latency=0.00s (0.00%) |Generate time=2.99s (77.53%) |Training time=0.65s (16.77%) |Others=0.22 (5.70%)|CurSamplesPerSec=2.08 |AvgSamplesPerSec=2.36 epoch: 0|step: 3298|ppo_ep: 1|act_loss: 0.00604248046875|cri_loss: 0.01390838623046875|unsuper_loss: 0.0 average reward score: 4.828125 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.72s (75.55%) |Training time=0.65s (17.99%) |Others=0.23 (6.45%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.36 epoch: 0|step: 3299|ppo_ep: 1|act_loss: -0.100341796875|cri_loss: -0.02288818359375|unsuper_loss: 0.0 average reward score: 2.97265625 ------------------------------------------------------------------------------------- |E2E latency=3.95s |Gather latency=0.00s (0.00%) |Generate time=3.07s (77.70%) |Training time=0.65s (16.49%) |Others=0.23 (5.81%)|CurSamplesPerSec=2.03 |AvgSamplesPerSec=2.36 epoch: 0|step: 3300|ppo_ep: 1|act_loss: 0.0276336669921875|cri_loss: 0.025360107421875|unsuper_loss: 0.0 average reward score: 1.6259765625 ------------------------------------------------------------------------------------- |E2E latency=3.74s |Gather latency=0.00s (0.00%) |Generate time=2.86s (76.48%) |Training time=0.65s (17.38%) |Others=0.23 (6.14%)|CurSamplesPerSec=2.14 |AvgSamplesPerSec=2.36 epoch: 0|step: 3301|ppo_ep: 1|act_loss: 0.034759521484375|cri_loss: 0.02960205078125|unsuper_loss: 0.0 average reward score: 3.833984375 ------------------------------------------------------------------------------------- |E2E latency=4.19s |Gather latency=0.00s (0.00%) |Generate time=3.25s (77.71%) |Training time=0.69s (16.45%) |Others=0.24 (5.84%)|CurSamplesPerSec=1.91 |AvgSamplesPerSec=2.36 epoch: 0|step: 3302|ppo_ep: 1|act_loss: 0.271484375|cri_loss: 0.1685791015625|unsuper_loss: 0.0 average reward score: 4.1171875 ------------------------------------------------------------------------------------- |E2E latency=3.91s |Gather latency=0.00s (0.00%) |Generate time=2.78s (71.12%) |Training time=0.91s (23.24%) |Others=0.22 (5.64%)|CurSamplesPerSec=2.05 |AvgSamplesPerSec=2.36 epoch: 0|step: 3303|ppo_ep: 1|act_loss: 0.14013671875|cri_loss: 0.0887451171875|unsuper_loss: 0.0 average reward score: 4.2734375 ------------------------------------------------------------------------------------- |E2E latency=4.26s |Gather latency=0.00s (0.00%) |Generate time=2.99s (70.08%) |Training time=0.97s (22.65%) |Others=0.31 (7.28%)|CurSamplesPerSec=1.88 |AvgSamplesPerSec=2.36 epoch: 0|step: 3304|ppo_ep: 1|act_loss: -0.0931396484375|cri_loss: -0.029144287109375|unsuper_loss: 0.0 average reward score: 4.09375 ------------------------------------------------------------------------------------- |E2E latency=3.99s |Gather latency=0.00s (0.00%) |Generate time=2.92s (73.35%) |Training time=0.84s (21.17%) |Others=0.22 (5.48%)|CurSamplesPerSec=2.01 |AvgSamplesPerSec=2.36 epoch: 0|step: 3305|ppo_ep: 1|act_loss: 0.0772705078125|cri_loss: 0.056549072265625|unsuper_loss: 0.0 average reward score: 3.72265625 ------------------------------------------------------------------------------------- |E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.69s (75.73%) |Training time=0.65s (18.31%) |Others=0.21 (5.96%)|CurSamplesPerSec=2.26 |AvgSamplesPerSec=2.36 epoch: 0|step: 3306|ppo_ep: 1|act_loss: 0.05731201171875|cri_loss: 0.054779052734375|unsuper_loss: 0.0 average reward score: 3.62890625 ------------------------------------------------------------------------------------- |E2E latency=3.69s |Gather latency=0.00s (0.00%) |Generate time=2.79s (75.68%) |Training time=0.64s (17.46%) |Others=0.25 (6.86%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.36 epoch: 0|step: 3307|ppo_ep: 1|act_loss: 0.089599609375|cri_loss: 0.06195068359375|unsuper_loss: 0.0 average reward score: 4.40625 ------------------------------------------------------------------------------------- |E2E latency=3.98s |Gather latency=0.00s (0.00%) |Generate time=2.71s (68.00%) |Training time=1.04s (26.21%) |Others=0.23 (5.79%)|CurSamplesPerSec=2.01 |AvgSamplesPerSec=2.35 epoch: 0|step: 3308|ppo_ep: 1|act_loss: 0.080810546875|cri_loss: 0.0546875|unsuper_loss: 0.0 average reward score: 1.6943359375 ------------------------------------------------------------------------------------- |E2E latency=3.97s |Gather latency=0.00s (0.00%) |Generate time=3.10s (78.10%) |Training time=0.65s (16.35%) |Others=0.22 (5.54%)|CurSamplesPerSec=2.01 |AvgSamplesPerSec=2.35 epoch: 0|step: 3309|ppo_ep: 1|act_loss: 0.0682373046875|cri_loss: 0.055267333984375|unsuper_loss: 0.0 average reward score: 3.228515625 ------------------------------------------------------------------------------------- |E2E latency=3.81s |Gather latency=0.00s (0.00%) |Generate time=2.93s (76.98%) |Training time=0.65s (16.99%) |Others=0.23 (6.03%)|CurSamplesPerSec=2.10 |AvgSamplesPerSec=2.35 epoch: 0|step: 3310|ppo_ep: 1|act_loss: -0.04278564453125|cri_loss: -0.015472412109375|unsuper_loss: 0.0 average reward score: 2.9296875 ------------------------------------------------------------------------------------- |E2E latency=3.97s |Gather latency=0.00s (0.00%) |Generate time=3.04s (76.50%) |Training time=0.67s (16.78%) |Others=0.27 (6.72%)|CurSamplesPerSec=2.01 |AvgSamplesPerSec=2.35 epoch: 0|step: 3311|ppo_ep: 1|act_loss: 0.115234375|cri_loss: 0.08123779296875|unsuper_loss: 0.0 average reward score: 3.96875 ------------------------------------------------------------------------------------- |E2E latency=4.01s |Gather latency=0.00s (0.00%) |Generate time=2.73s (68.00%) |Training time=0.99s (24.71%) |Others=0.29 (7.29%)|CurSamplesPerSec=2.00 |AvgSamplesPerSec=2.35 epoch: 0|step: 3312|ppo_ep: 1|act_loss: -0.0093231201171875|cri_loss: 0.0003204345703125|unsuper_loss: 0.0 average reward score: 3.921875 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.71s (75.55%) |Training time=0.65s (18.14%) |Others=0.23 (6.30%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.35 epoch: 0|step: 3313|ppo_ep: 1|act_loss: 0.013336181640625|cri_loss: 0.014892578125|unsuper_loss: 0.0 average reward score: 2.751953125 ------------------------------------------------------------------------------------- |E2E latency=3.80s |Gather latency=0.00s (0.00%) |Generate time=2.90s (76.37%) |Training time=0.65s (17.14%) |Others=0.25 (6.49%)|CurSamplesPerSec=2.11 |AvgSamplesPerSec=2.35 epoch: 0|step: 3314|ppo_ep: 1|act_loss: 0.0161895751953125|cri_loss: 0.01540374755859375|unsuper_loss: 0.0 average reward score: 3.76171875 ------------------------------------------------------------------------------------- |E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.64s (75.79%) |Training time=0.65s (18.64%) |Others=0.19 (5.58%)|CurSamplesPerSec=2.30 |AvgSamplesPerSec=2.35 epoch: 0|step: 3315|ppo_ep: 1|act_loss: 0.0084228515625|cri_loss: 0.032470703125|unsuper_loss: 0.0 average reward score: 3.9921875 ------------------------------------------------------------------------------------- |E2E latency=3.43s |Gather latency=0.00s (0.00%) |Generate time=2.52s (73.32%) |Training time=0.69s (20.17%) |Others=0.22 (6.51%)|CurSamplesPerSec=2.33 |AvgSamplesPerSec=2.35 epoch: 0|step: 3316|ppo_ep: 1|act_loss: 0.239013671875|cri_loss: 0.1475830078125|unsuper_loss: 0.0 average reward score: 2.34375 ------------------------------------------------------------------------------------- |E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.58s (73.10%) |Training time=0.73s (20.80%) |Others=0.22 (6.10%)|CurSamplesPerSec=2.27 |AvgSamplesPerSec=2.35 epoch: 0|step: 3317|ppo_ep: 1|act_loss: -0.0283203125|cri_loss: 0.015289306640625|unsuper_loss: 0.0 average reward score: 4.7578125 ------------------------------------------------------------------------------------- |E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.61s (75.50%) |Training time=0.65s (18.73%) |Others=0.20 (5.77%)|CurSamplesPerSec=2.32 |AvgSamplesPerSec=2.35 epoch: 0|step: 3318|ppo_ep: 1|act_loss: -0.09906005859375|cri_loss: -0.03851318359375|unsuper_loss: 0.0 average reward score: 4.16015625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.13%) |Training time=0.64s (19.77%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.35 epoch: 0|step: 3319|ppo_ep: 1|act_loss: 0.226806640625|cri_loss: 0.145263671875|unsuper_loss: 0.0 average reward score: 2.806640625 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.38s (66.00%) |Training time=0.93s (25.72%) |Others=0.30 (8.28%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.35 epoch: 0|step: 3320|ppo_ep: 1|act_loss: 0.0012378692626953125|cri_loss: 0.00466156005859375|unsuper_loss: 0.0 average reward score: 5.0703125 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.78%) |Training time=0.64s (19.40%) |Others=0.19 (5.82%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.35 epoch: 0|step: 3321|ppo_ep: 1|act_loss: 0.13720703125|cri_loss: 0.0802001953125|unsuper_loss: 0.0 average reward score: 3.724609375 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.03%) |Training time=0.64s (19.87%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.35 epoch: 0|step: 3322|ppo_ep: 1|act_loss: 0.06097412109375|cri_loss: 0.04949951171875|unsuper_loss: 0.0 average reward score: 1.18359375 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.94%) |Training time=0.64s (19.89%) |Others=0.20 (6.18%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.35 epoch: 0|step: 3323|ppo_ep: 1|act_loss: 0.025390625|cri_loss: 0.017578125|unsuper_loss: 0.0 average reward score: 3.92578125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.28%) |Training time=0.64s (19.81%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.35 epoch: 0|step: 3324|ppo_ep: 1|act_loss: 0.09820556640625|cri_loss: 0.07318115234375|unsuper_loss: 0.0 average reward score: 3.81640625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.81%) |Training time=0.64s (19.95%) |Others=0.20 (6.23%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.35 epoch: 0|step: 3325|ppo_ep: 1|act_loss: 0.2802734375|cri_loss: 0.164306640625|unsuper_loss: 0.0 average reward score: 2.890625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.85%) |Training time=0.64s (19.95%) |Others=0.20 (6.20%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.35 epoch: 0|step: 3326|ppo_ep: 1|act_loss: -0.17626953125|cri_loss: -0.06878662109375|unsuper_loss: 0.0 average reward score: 4.41015625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.36%) |Training time=0.66s (20.46%) |Others=0.20 (6.17%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.35 epoch: 0|step: 3327|ppo_ep: 1|act_loss: 0.04302978515625|cri_loss: 0.035430908203125|unsuper_loss: 0.0 average reward score: 2.50390625 ------------------------------------------------------------------------------------- |E2E latency=3.75s |Gather latency=0.00s (0.00%) |Generate time=2.53s (67.49%) |Training time=0.94s (24.97%) |Others=0.28 (7.54%)|CurSamplesPerSec=2.14 |AvgSamplesPerSec=2.35 epoch: 0|step: 3328|ppo_ep: 1|act_loss: 0.0135040283203125|cri_loss: 0.01549530029296875|unsuper_loss: 0.0 average reward score: 3.9921875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.23%) |Training time=0.64s (19.68%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.35 epoch: 0|step: 3329|ppo_ep: 1|act_loss: 0.2086181640625|cri_loss: 0.144775390625|unsuper_loss: 0.0 average reward score: 5.140625 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.71%) |Training time=0.64s (19.37%) |Others=0.20 (5.92%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.35 epoch: 0|step: 3330|ppo_ep: 1|act_loss: -0.0400390625|cri_loss: -0.0045318603515625|unsuper_loss: 0.0 average reward score: 2.70703125 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.28%) |Training time=0.64s (19.50%) |Others=0.20 (6.23%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.35 epoch: 0|step: 3331|ppo_ep: 1|act_loss: -0.009429931640625|cri_loss: 0.023834228515625|unsuper_loss: 0.0 average reward score: 3.93359375 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.50%) |Training time=0.65s (19.45%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.35 epoch: 0|step: 3332|ppo_ep: 1|act_loss: -0.09820556640625|cri_loss: -0.03485107421875|unsuper_loss: 0.0 average reward score: 4.10546875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.19%) |Training time=0.68s (20.83%) |Others=0.20 (5.98%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.35 epoch: 0|step: 3333|ppo_ep: 1|act_loss: -0.03253173828125|cri_loss: -0.0071868896484375|unsuper_loss: 0.0 average reward score: 3.923828125 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.09%) |Training time=0.66s (19.86%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.35 epoch: 0|step: 3334|ppo_ep: 1|act_loss: -0.0970458984375|cri_loss: -0.025726318359375|unsuper_loss: 0.0 average reward score: 3.5546875 ------------------------------------------------------------------------------------- |E2E latency=5.60s |Gather latency=0.00s (0.00%) |Generate time=2.45s (43.75%) |Training time=1.71s (30.52%) |Others=1.44 (25.74%)|CurSamplesPerSec=1.43 |AvgSamplesPerSec=2.35 epoch: 0|step: 3335|ppo_ep: 1|act_loss: 0.051971435546875|cri_loss: 0.033477783203125|unsuper_loss: 0.0 average reward score: 3.986328125 ------------------------------------------------------------------------------------- |E2E latency=3.92s |Gather latency=0.00s (0.00%) |Generate time=2.66s (67.78%) |Training time=0.94s (24.06%) |Others=0.32 (8.15%)|CurSamplesPerSec=2.04 |AvgSamplesPerSec=2.35 epoch: 0|step: 3336|ppo_ep: 1|act_loss: -0.0755615234375|cri_loss: -0.021820068359375|unsuper_loss: 0.0 average reward score: 2.84765625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.17%) |Training time=0.64s (19.88%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.35 epoch: 0|step: 3337|ppo_ep: 1|act_loss: 0.0682373046875|cri_loss: 0.0447998046875|unsuper_loss: 0.0 average reward score: 2.3671875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.30%) |Training time=0.66s (20.18%) |Others=0.21 (6.52%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.35 epoch: 0|step: 3338|ppo_ep: 1|act_loss: -0.046356201171875|cri_loss: -0.017730712890625|unsuper_loss: 0.0 average reward score: 4.2578125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.77%) |Training time=0.64s (19.96%) |Others=0.20 (6.27%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.35 epoch: 0|step: 3339|ppo_ep: 1|act_loss: 0.10931396484375|cri_loss: 0.06561279296875|unsuper_loss: 0.0 average reward score: 2.095703125 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.07%) |Training time=0.64s (19.94%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.35 epoch: 0|step: 3340|ppo_ep: 1|act_loss: -0.12255859375|cri_loss: -0.047882080078125|unsuper_loss: 0.0 average reward score: 3.125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.67%) |Training time=0.65s (20.10%) |Others=0.20 (6.22%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.35 epoch: 0|step: 3341|ppo_ep: 1|act_loss: 0.00786590576171875|cri_loss: 0.02337646484375|unsuper_loss: 0.0 average reward score: 3.546875 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.22%) |Training time=0.64s (19.80%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.35 epoch: 0|step: 3342|ppo_ep: 1|act_loss: 0.095947265625|cri_loss: 0.0570068359375|unsuper_loss: 0.0 average reward score: 3.93359375 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.50%) |Training time=0.65s (19.53%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.35 epoch: 0|step: 3343|ppo_ep: 1|act_loss: -0.0384521484375|cri_loss: -0.015625|unsuper_loss: 0.0 average reward score: 4.390625 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.38s (66.27%) |Training time=0.93s (25.86%) |Others=0.28 (7.87%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.35 epoch: 0|step: 3344|ppo_ep: 1|act_loss: 0.11590576171875|cri_loss: 0.087158203125|unsuper_loss: 0.0 average reward score: 5.09765625 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.89%) |Training time=0.64s (19.29%) |Others=0.19 (5.82%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.35 epoch: 0|step: 3345|ppo_ep: 1|act_loss: -0.10986328125|cri_loss: -0.04296875|unsuper_loss: 0.0 average reward score: 5.1640625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.04%) |Training time=0.64s (20.04%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.35 epoch: 0|step: 3346|ppo_ep: 1|act_loss: -0.0648193359375|cri_loss: -0.0226898193359375|unsuper_loss: 0.0 average reward score: 2.892578125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.77%) |Training time=0.66s (20.26%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.35 epoch: 0|step: 3347|ppo_ep: 1|act_loss: -0.05963134765625|cri_loss: -0.0172119140625|unsuper_loss: 0.0 average reward score: 4.453125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.21%) |Training time=0.64s (19.63%) |Others=0.20 (6.16%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.35 epoch: 0|step: 3348|ppo_ep: 1|act_loss: -0.1875|cri_loss: -0.06707763671875|unsuper_loss: 0.0 average reward score: 3.56640625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.15%) |Training time=0.64s (19.67%) |Others=0.20 (6.17%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.35 epoch: 0|step: 3349|ppo_ep: 1|act_loss: -0.07940673828125|cri_loss: -0.0264434814453125|unsuper_loss: 0.0 average reward score: 3.2265625 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.47%) |Training time=0.64s (19.61%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.35 epoch: 0|step: 3350|ppo_ep: 1|act_loss: 0.1904296875|cri_loss: 0.1302490234375|unsuper_loss: 0.0 average reward score: 1.4111328125 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.49%) |Training time=0.65s (19.49%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.35 epoch: 0|step: 3351|ppo_ep: 1|act_loss: -0.0804443359375|cri_loss: -0.02789306640625|unsuper_loss: 0.0 average reward score: 3.2421875 ------------------------------------------------------------------------------------- |E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.36s (65.19%) |Training time=0.95s (26.28%) |Others=0.31 (8.53%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.35 epoch: 0|step: 3352|ppo_ep: 1|act_loss: -0.095458984375|cri_loss: -0.0399169921875|unsuper_loss: 0.0 average reward score: 4.234375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.32%) |Training time=0.64s (19.68%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.35 epoch: 0|step: 3353|ppo_ep: 1|act_loss: 0.27978515625|cri_loss: 0.19091796875|unsuper_loss: 0.0 average reward score: 1.697265625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.91%) |Training time=0.65s (19.99%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.35 epoch: 0|step: 3354|ppo_ep: 1|act_loss: -0.16845703125|cri_loss: -0.05010986328125|unsuper_loss: 0.0 average reward score: 2.158203125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.93%) |Training time=0.64s (19.89%) |Others=0.20 (6.18%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.35 epoch: 0|step: 3355|ppo_ep: 1|act_loss: 0.02520751953125|cri_loss: 0.05718994140625|unsuper_loss: 0.0 average reward score: 2.9609375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.15%) |Training time=0.67s (20.60%) |Others=0.20 (6.25%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.35 epoch: 0|step: 3356|ppo_ep: 1|act_loss: 0.01285552978515625|cri_loss: 0.0264892578125|unsuper_loss: 0.0 average reward score: 3.119140625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.26%) |Training time=0.64s (19.61%) |Others=0.20 (6.14%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.35 epoch: 0|step: 3357|ppo_ep: 1|act_loss: -0.00579071044921875|cri_loss: 0.003448486328125|unsuper_loss: 0.0 average reward score: 3.93359375 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.82%) |Training time=0.64s (19.33%) |Others=0.19 (5.85%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.35 epoch: 0|step: 3358|ppo_ep: 1|act_loss: -0.077880859375|cri_loss: -0.0313720703125|unsuper_loss: 0.0 average reward score: 4.71875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.67%) |Training time=0.65s (20.19%) |Others=0.20 (6.14%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.35 [2023-04-24 16:57:58,132] [INFO] [logging.py:96:log_dist] [Rank 0] step=420, skipped=5, lr=[6.132383922137024e-07, 6.132383922137024e-07], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 16:57:58,378] [INFO] [timer.py:199:stop] epoch=0/micro_step=3360/global_step=420, RunningAvgSamplesPerSec=15.277436569132451, CurrSamplesPerSec=15.868363419339065, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 16:57:58,577] [INFO] [logging.py:96:log_dist] [Rank 0] step=420, skipped=6, lr=[3.2800736463040883e-07, 3.2800736463040883e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 3359|ppo_ep: 1|act_loss: -0.046173095703125|cri_loss: -0.0149383544921875|unsuper_loss: 0.0 average reward score: 4.1640625 ------------------------------------------------------------------------------------- |E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.42s (66.84%) |Training time=0.93s (25.62%) |Others=0.27 (7.53%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.35 epoch: 0|step: 3360|ppo_ep: 1|act_loss: 0.06298828125|cri_loss: 0.0411376953125|unsuper_loss: 0.0 average reward score: 2.85546875 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.04%) |Training time=0.64s (19.82%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.35 epoch: 0|step: 3361|ppo_ep: 1|act_loss: -0.043701171875|cri_loss: -0.0160980224609375|unsuper_loss: 0.0 average reward score: 3.861328125 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.10%) |Training time=0.66s (19.83%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.35 epoch: 0|step: 3362|ppo_ep: 1|act_loss: 0.0904541015625|cri_loss: 0.05499267578125|unsuper_loss: 0.0 average reward score: 5.0390625 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.66%) |Training time=0.67s (20.42%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.35 epoch: 0|step: 3363|ppo_ep: 1|act_loss: 0.175048828125|cri_loss: 0.10870361328125|unsuper_loss: 0.0 average reward score: 3.32421875 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.34s (72.95%) |Training time=0.67s (21.06%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.35 epoch: 0|step: 3364|ppo_ep: 1|act_loss: 0.0234375|cri_loss: 0.028076171875|unsuper_loss: 0.0 average reward score: 3.48046875 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.30s (71.80%) |Training time=0.70s (21.99%) |Others=0.20 (6.21%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.35 epoch: 0|step: 3365|ppo_ep: 1|act_loss: -0.12255859375|cri_loss: -0.0513916015625|unsuper_loss: 0.0 average reward score: 3.42578125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.21%) |Training time=0.64s (19.77%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.35 epoch: 0|step: 3366|ppo_ep: 1|act_loss: -0.034912109375|cri_loss: -0.007476806640625|unsuper_loss: 0.0 average reward score: 4.57421875 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.38%) |Training time=0.66s (20.56%) |Others=0.19 (6.06%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.35 epoch: 0|step: 3367|ppo_ep: 1|act_loss: -0.0838623046875|cri_loss: -0.0280914306640625|unsuper_loss: 0.0 average reward score: 4.6875 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.43s (67.00%) |Training time=0.92s (25.30%) |Others=0.28 (7.70%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.35 epoch: 0|step: 3368|ppo_ep: 1|act_loss: 0.056915283203125|cri_loss: 0.040496826171875|unsuper_loss: 0.0 average reward score: 4.22265625 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.78%) |Training time=0.64s (19.18%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.35 epoch: 0|step: 3369|ppo_ep: 1|act_loss: 0.0694580078125|cri_loss: 0.046661376953125|unsuper_loss: 0.0 average reward score: 1.453125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.53%) |Training time=0.67s (20.56%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.35 epoch: 0|step: 3370|ppo_ep: 1|act_loss: 0.08447265625|cri_loss: 0.050872802734375|unsuper_loss: 0.0 average reward score: 4.53515625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.34s (72.45%) |Training time=0.70s (21.56%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.35 epoch: 0|step: 3371|ppo_ep: 1|act_loss: 0.057098388671875|cri_loss: 0.0455322265625|unsuper_loss: 0.0 average reward score: 4.0 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.01%) |Training time=0.64s (20.03%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.35 epoch: 0|step: 3372|ppo_ep: 1|act_loss: 0.009613037109375|cri_loss: 0.0109405517578125|unsuper_loss: 0.0 average reward score: 3.3984375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.77%) |Training time=0.66s (20.22%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.35 epoch: 0|step: 3373|ppo_ep: 1|act_loss: 0.158203125|cri_loss: 0.09490966796875|unsuper_loss: 0.0 average reward score: 3.853515625 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.02%) |Training time=0.64s (19.51%) |Others=0.21 (6.47%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.35 epoch: 0|step: 3374|ppo_ep: 1|act_loss: -0.0074920654296875|cri_loss: 0.0026092529296875|unsuper_loss: 0.0 average reward score: 3.62109375 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.56%) |Training time=0.64s (19.47%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.35 epoch: 0|step: 3375|ppo_ep: 1|act_loss: -0.0001983642578125|cri_loss: 0.00983428955078125|unsuper_loss: 0.0 average reward score: 4.55859375 ------------------------------------------------------------------------------------- |E2E latency=3.57s |Gather latency=0.00s (0.00%) |Generate time=2.30s (64.62%) |Training time=0.99s (27.69%) |Others=0.27 (7.69%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.35 epoch: 0|step: 3376|ppo_ep: 1|act_loss: -0.0621337890625|cri_loss: -0.022857666015625|unsuper_loss: 0.0 average reward score: 3.638671875 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.29s (72.03%) |Training time=0.70s (22.02%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.35 epoch: 0|step: 3377|ppo_ep: 1|act_loss: 0.0191497802734375|cri_loss: 0.0214996337890625|unsuper_loss: 0.0 average reward score: 3.697265625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.92%) |Training time=0.64s (19.93%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.35 epoch: 0|step: 3378|ppo_ep: 1|act_loss: -0.0081939697265625|cri_loss: 0.0099334716796875|unsuper_loss: 0.0 average reward score: 4.109375 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.42%) |Training time=0.65s (19.69%) |Others=0.20 (5.89%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.35 epoch: 0|step: 3379|ppo_ep: 1|act_loss: -0.0693359375|cri_loss: -0.027252197265625|unsuper_loss: 0.0 average reward score: 4.11328125 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.43%) |Training time=0.64s (19.34%) |Others=0.21 (6.23%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.35 epoch: 0|step: 3380|ppo_ep: 1|act_loss: 0.0384521484375|cri_loss: 0.0257568359375|unsuper_loss: 0.0 average reward score: 4.1796875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.54%) |Training time=0.64s (19.52%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.35 epoch: 0|step: 3381|ppo_ep: 1|act_loss: 0.034759521484375|cri_loss: 0.0294342041015625|unsuper_loss: 0.0 average reward score: 2.52734375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.97%) |Training time=0.64s (19.76%) |Others=0.20 (6.27%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.36 epoch: 0|step: 3382|ppo_ep: 1|act_loss: -0.013031005859375|cri_loss: 0.003143310546875|unsuper_loss: 0.0 average reward score: 2.9375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.21%) |Training time=0.64s (19.58%) |Others=0.20 (6.21%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.36 epoch: 0|step: 3383|ppo_ep: 1|act_loss: -0.0386962890625|cri_loss: -0.0115203857421875|unsuper_loss: 0.0 average reward score: 3.65625 ------------------------------------------------------------------------------------- |E2E latency=3.76s |Gather latency=0.00s (0.00%) |Generate time=2.54s (67.62%) |Training time=0.93s (24.77%) |Others=0.29 (7.60%)|CurSamplesPerSec=2.13 |AvgSamplesPerSec=2.35 epoch: 0|step: 3384|ppo_ep: 1|act_loss: -0.0002288818359375|cri_loss: 0.006076812744140625|unsuper_loss: 0.0 average reward score: 3.29296875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.86%) |Training time=0.64s (19.38%) |Others=0.19 (5.77%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.35 epoch: 0|step: 3385|ppo_ep: 1|act_loss: 0.2041015625|cri_loss: 0.1322021484375|unsuper_loss: 0.0 average reward score: 3.59375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.42%) |Training time=0.64s (19.58%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.36 epoch: 0|step: 3386|ppo_ep: 1|act_loss: -0.127197265625|cri_loss: -0.05023193359375|unsuper_loss: 0.0 average reward score: 3.89453125 ------------------------------------------------------------------------------------- |E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.56s (75.38%) |Training time=0.64s (18.83%) |Others=0.20 (5.78%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.36 epoch: 0|step: 3387|ppo_ep: 1|act_loss: 0.0767822265625|cri_loss: 0.0662841796875|unsuper_loss: 0.0 average reward score: 2.998046875 ------------------------------------------------------------------------------------- |E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.66s (75.97%) |Training time=0.64s (18.37%) |Others=0.20 (5.65%)|CurSamplesPerSec=2.28 |AvgSamplesPerSec=2.35 epoch: 0|step: 3388|ppo_ep: 1|act_loss: -0.03192138671875|cri_loss: -0.007476806640625|unsuper_loss: 0.0 average reward score: 2.171875 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.56%) |Training time=0.65s (19.54%) |Others=0.20 (5.90%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.36 epoch: 0|step: 3389|ppo_ep: 1|act_loss: 0.036956787109375|cri_loss: 0.027191162109375|unsuper_loss: 0.0 average reward score: 2.712890625 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.69%) |Training time=0.64s (20.13%) |Others=0.20 (6.18%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.36 epoch: 0|step: 3390|ppo_ep: 1|act_loss: 0.0374755859375|cri_loss: 0.037261962890625|unsuper_loss: 0.0 average reward score: 3.11328125 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.12%) |Training time=0.65s (19.63%) |Others=0.21 (6.24%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.36 epoch: 0|step: 3391|ppo_ep: 1|act_loss: -0.137939453125|cri_loss: -0.04180908203125|unsuper_loss: 0.0 average reward score: 4.3359375 ------------------------------------------------------------------------------------- |E2E latency=3.69s |Gather latency=0.00s (0.00%) |Generate time=2.48s (67.17%) |Training time=0.93s (25.18%) |Others=0.28 (7.65%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.36 epoch: 0|step: 3392|ppo_ep: 1|act_loss: -0.131103515625|cri_loss: -0.03387451171875|unsuper_loss: 0.0 average reward score: 3.0078125 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.45s (73.84%) |Training time=0.64s (19.30%) |Others=0.23 (6.86%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.36 epoch: 0|step: 3393|ppo_ep: 1|act_loss: -0.108642578125|cri_loss: -0.0179443359375|unsuper_loss: 0.0 average reward score: 3.9140625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.32%) |Training time=0.64s (19.68%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.36 epoch: 0|step: 3394|ppo_ep: 1|act_loss: 0.0687255859375|cri_loss: 0.041107177734375|unsuper_loss: 0.0 average reward score: 2.9453125 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.22%) |Training time=0.65s (19.69%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.36 epoch: 0|step: 3395|ppo_ep: 1|act_loss: 0.037933349609375|cri_loss: 0.042327880859375|unsuper_loss: 0.0 average reward score: 3.265625 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.75%) |Training time=0.64s (19.40%) |Others=0.19 (5.85%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.36 epoch: 0|step: 3396|ppo_ep: 1|act_loss: 0.131103515625|cri_loss: 0.080078125|unsuper_loss: 0.0 average reward score: 3.275390625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.05%) |Training time=0.64s (19.83%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.36 epoch: 0|step: 3397|ppo_ep: 1|act_loss: 0.1314697265625|cri_loss: 0.08575439453125|unsuper_loss: 0.0 average reward score: 2.931640625 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.17%) |Training time=0.68s (20.63%) |Others=0.20 (6.19%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.36 epoch: 0|step: 3398|ppo_ep: 1|act_loss: -0.07568359375|cri_loss: -0.017181396484375|unsuper_loss: 0.0 average reward score: 3.671875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.14%) |Training time=0.65s (19.70%) |Others=0.20 (6.16%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.36 epoch: 0|step: 3399|ppo_ep: 1|act_loss: 0.02294921875|cri_loss: 0.037872314453125|unsuper_loss: 0.0 average reward score: 4.10546875 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.39%) |Training time=0.93s (25.69%) |Others=0.29 (7.91%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.36 epoch: 0|step: 3400|ppo_ep: 1|act_loss: 0.1195068359375|cri_loss: 0.06842041015625|unsuper_loss: 0.0 average reward score: 2.59375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.66%) |Training time=0.64s (19.40%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.36 epoch: 0|step: 3401|ppo_ep: 1|act_loss: -0.162353515625|cri_loss: -0.0546875|unsuper_loss: 0.0 average reward score: 2.70703125 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.10%) |Training time=0.64s (19.65%) |Others=0.20 (6.25%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.36 epoch: 0|step: 3402|ppo_ep: 1|act_loss: -0.05780029296875|cri_loss: -0.000762939453125|unsuper_loss: 0.0 average reward score: 2.54296875 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.55%) |Training time=0.64s (19.48%) |Others=0.20 (5.96%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.36 epoch: 0|step: 3403|ppo_ep: 1|act_loss: -0.18603515625|cri_loss: -0.04193115234375|unsuper_loss: 0.0 average reward score: 3.390625 ------------------------------------------------------------------------------------- |E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.53s (74.73%) |Training time=0.65s (19.36%) |Others=0.20 (5.91%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.36 epoch: 0|step: 3404|ppo_ep: 1|act_loss: 0.06298828125|cri_loss: 0.03900146484375|unsuper_loss: 0.0 average reward score: 2.765625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.00%) |Training time=0.64s (19.76%) |Others=0.20 (6.24%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.36 epoch: 0|step: 3405|ppo_ep: 1|act_loss: 0.206787109375|cri_loss: 0.132568359375|unsuper_loss: 0.0 average reward score: 3.6875 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.13%) |Training time=0.65s (19.85%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.36 epoch: 0|step: 3406|ppo_ep: 1|act_loss: 0.00555419921875|cri_loss: 0.04669189453125|unsuper_loss: 0.0 average reward score: 3.1484375 ------------------------------------------------------------------------------------- |E2E latency=4.02s |Gather latency=0.00s (0.00%) |Generate time=3.16s (78.75%) |Training time=0.65s (16.19%) |Others=0.20 (5.06%)|CurSamplesPerSec=1.99 |AvgSamplesPerSec=2.36 epoch: 0|step: 3407|ppo_ep: 1|act_loss: 0.09393310546875|cri_loss: 0.06689453125|unsuper_loss: 0.0 average reward score: 3.044921875 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.40s (66.33%) |Training time=0.94s (25.91%) |Others=0.28 (7.76%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.36 epoch: 0|step: 3408|ppo_ep: 1|act_loss: -0.0662841796875|cri_loss: -0.021484375|unsuper_loss: 0.0 average reward score: 2.11328125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.44%) |Training time=0.64s (19.70%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.36 epoch: 0|step: 3409|ppo_ep: 1|act_loss: -0.1273193359375|cri_loss: 0.00439453125|unsuper_loss: 0.0 average reward score: 2.3359375 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.43%) |Training time=0.64s (19.56%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.36 epoch: 0|step: 3410|ppo_ep: 1|act_loss: 0.00579833984375|cri_loss: 0.0274658203125|unsuper_loss: 0.0 average reward score: 2.291015625 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.39s (71.64%) |Training time=0.75s (22.38%) |Others=0.20 (5.98%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.36 epoch: 0|step: 3411|ppo_ep: 1|act_loss: -0.12030029296875|cri_loss: -0.0301513671875|unsuper_loss: 0.0 average reward score: 3.568359375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.21%) |Training time=0.64s (19.71%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.36 epoch: 0|step: 3412|ppo_ep: 1|act_loss: 0.036468505859375|cri_loss: 0.0318603515625|unsuper_loss: 0.0 average reward score: 2.716796875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.10%) |Training time=0.64s (19.70%) |Others=0.20 (6.19%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.36 epoch: 0|step: 3413|ppo_ep: 1|act_loss: -0.10870361328125|cri_loss: -0.01910400390625|unsuper_loss: 0.0 average reward score: 2.58984375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.59%) |Training time=0.66s (20.30%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.36 epoch: 0|step: 3414|ppo_ep: 1|act_loss: 0.0201873779296875|cri_loss: 0.02508544921875|unsuper_loss: 0.0 average reward score: 4.12109375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.52%) |Training time=0.66s (20.21%) |Others=0.20 (6.27%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.36 epoch: 0|step: 3415|ppo_ep: 1|act_loss: -0.1754150390625|cri_loss: -0.0535888671875|unsuper_loss: 0.0 average reward score: 3.5078125 ------------------------------------------------------------------------------------- |E2E latency=3.71s |Gather latency=0.00s (0.00%) |Generate time=2.49s (67.19%) |Training time=0.94s (25.23%) |Others=0.28 (7.58%)|CurSamplesPerSec=2.16 |AvgSamplesPerSec=2.36 epoch: 0|step: 3416|ppo_ep: 1|act_loss: -0.192626953125|cri_loss: -0.06146240234375|unsuper_loss: 0.0 average reward score: 1.8359375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.11%) |Training time=0.64s (19.84%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.36 epoch: 0|step: 3417|ppo_ep: 1|act_loss: -0.033447265625|cri_loss: -0.0054168701171875|unsuper_loss: 0.0 average reward score: 3.609375 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.46%) |Training time=0.64s (19.49%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.36 epoch: 0|step: 3418|ppo_ep: 1|act_loss: -0.12042236328125|cri_loss: -0.0469970703125|unsuper_loss: 0.0 average reward score: 3.033203125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.90%) |Training time=0.65s (19.92%) |Others=0.20 (6.18%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.36 epoch: 0|step: 3419|ppo_ep: 1|act_loss: 0.233642578125|cri_loss: 0.145751953125|unsuper_loss: 0.0 average reward score: 3.173828125 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.54%) |Training time=0.64s (19.46%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.36 epoch: 0|step: 3420|ppo_ep: 1|act_loss: -0.219970703125|cri_loss: -0.06170654296875|unsuper_loss: 0.0 average reward score: 3.46875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.97%) |Training time=0.65s (19.90%) |Others=0.20 (6.14%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.36 epoch: 0|step: 3421|ppo_ep: 1|act_loss: -0.1220703125|cri_loss: -0.036224365234375|unsuper_loss: 0.0 average reward score: 3.0859375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.83%) |Training time=0.65s (19.78%) |Others=0.21 (6.39%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.36 epoch: 0|step: 3422|ppo_ep: 1|act_loss: 0.05841064453125|cri_loss: 0.054901123046875|unsuper_loss: 0.0 average reward score: 2.240234375 ------------------------------------------------------------------------------------- |E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.54s (75.13%) |Training time=0.64s (18.99%) |Others=0.20 (5.88%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.36 epoch: 0|step: 3423|ppo_ep: 1|act_loss: -0.152587890625|cri_loss: -0.0614013671875|unsuper_loss: 0.0 average reward score: 4.62890625 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.39s (66.33%) |Training time=0.93s (25.85%) |Others=0.28 (7.82%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.36 epoch: 0|step: 3424|ppo_ep: 1|act_loss: -0.0709228515625|cri_loss: -0.029205322265625|unsuper_loss: 0.0 average reward score: 4.0 ------------------------------------------------------------------------------------- |E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.66s (75.83%) |Training time=0.64s (18.37%) |Others=0.20 (5.80%)|CurSamplesPerSec=2.28 |AvgSamplesPerSec=2.36 epoch: 0|step: 3425|ppo_ep: 1|act_loss: -0.1856689453125|cri_loss: -0.07098388671875|unsuper_loss: 0.0 average reward score: 4.57421875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.07%) |Training time=0.65s (19.86%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.36 epoch: 0|step: 3426|ppo_ep: 1|act_loss: -0.05657958984375|cri_loss: 0.009033203125|unsuper_loss: 0.0 average reward score: 1.755859375 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.91%) |Training time=0.64s (19.31%) |Others=0.19 (5.78%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.36 epoch: 0|step: 3427|ppo_ep: 1|act_loss: 0.040252685546875|cri_loss: 0.0302734375|unsuper_loss: 0.0 average reward score: 3.09765625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.10%) |Training time=0.64s (19.80%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.36 epoch: 0|step: 3428|ppo_ep: 1|act_loss: -0.1165771484375|cri_loss: -0.029266357421875|unsuper_loss: 0.0 average reward score: 3.59765625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.14%) |Training time=0.64s (19.76%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.36 epoch: 0|step: 3429|ppo_ep: 1|act_loss: 0.078857421875|cri_loss: 0.0679931640625|unsuper_loss: 0.0 average reward score: 2.2265625 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.45%) |Training time=0.64s (19.21%) |Others=0.21 (6.34%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.36 epoch: 0|step: 3430|ppo_ep: 1|act_loss: -0.21044921875|cri_loss: -0.0709228515625|unsuper_loss: 0.0 average reward score: 4.25 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.39%) |Training time=0.65s (19.53%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.36 epoch: 0|step: 3431|ppo_ep: 1|act_loss: -0.09814453125|cri_loss: -0.02484130859375|unsuper_loss: 0.0 average reward score: 3.65625 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.43s (66.72%) |Training time=0.93s (25.55%) |Others=0.28 (7.73%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.36 epoch: 0|step: 3432|ppo_ep: 1|act_loss: -0.15283203125|cri_loss: -0.051361083984375|unsuper_loss: 0.0 average reward score: 3.14453125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.19%) |Training time=0.64s (19.53%) |Others=0.20 (6.28%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.36 epoch: 0|step: 3433|ppo_ep: 1|act_loss: 0.0303955078125|cri_loss: 0.03546142578125|unsuper_loss: 0.0 average reward score: 3.99609375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.66%) |Training time=0.66s (20.22%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.36 epoch: 0|step: 3434|ppo_ep: 1|act_loss: 0.09014892578125|cri_loss: 0.076171875|unsuper_loss: 0.0 average reward score: 3.92578125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.20%) |Training time=0.64s (19.50%) |Others=0.21 (6.30%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.36 epoch: 0|step: 3435|ppo_ep: 1|act_loss: -0.08721923828125|cri_loss: 0.01116943359375|unsuper_loss: 0.0 average reward score: 3.0078125 ------------------------------------------------------------------------------------- |E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.52s (74.67%) |Training time=0.65s (19.15%) |Others=0.21 (6.17%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.36 epoch: 0|step: 3436|ppo_ep: 1|act_loss: -0.03271484375|cri_loss: -0.0024871826171875|unsuper_loss: 0.0 average reward score: 3.7109375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.39%) |Training time=0.64s (19.59%) |Others=0.20 (6.02%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.36 epoch: 0|step: 3437|ppo_ep: 1|act_loss: -0.1015625|cri_loss: -0.015380859375|unsuper_loss: 0.0 average reward score: 2.69921875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.22%) |Training time=0.64s (19.86%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.36 epoch: 0|step: 3438|ppo_ep: 1|act_loss: -0.2578125|cri_loss: -0.0887451171875|unsuper_loss: 0.0 average reward score: 2.359375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.92%) |Training time=0.65s (19.90%) |Others=0.20 (6.18%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.36 [2023-04-24 17:02:25,061] [INFO] [logging.py:96:log_dist] [Rank 0] step=430, skipped=5, lr=[4.3146961026231e-07, 4.3146961026231e-07], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 17:02:25,303] [INFO] [timer.py:199:stop] epoch=0/micro_step=3440/global_step=430, RunningAvgSamplesPerSec=15.286053108377097, CurrSamplesPerSec=15.69153809371825, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 17:02:25,504] [INFO] [logging.py:96:log_dist] [Rank 0] step=430, skipped=6, lr=[2.3227271566414827e-07, 2.3227271566414827e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 3439|ppo_ep: 1|act_loss: -0.043701171875|cri_loss: -0.0042724609375|unsuper_loss: 0.0 average reward score: 4.19140625 ------------------------------------------------------------------------------------- |E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.32s (65.37%) |Training time=0.95s (26.78%) |Others=0.28 (7.86%)|CurSamplesPerSec=2.25 |AvgSamplesPerSec=2.36 epoch: 0|step: 3440|ppo_ep: 1|act_loss: 0.01367950439453125|cri_loss: 0.01207733154296875|unsuper_loss: 0.0 average reward score: 3.46875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.53%) |Training time=0.64s (19.62%) |Others=0.19 (5.85%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.36 epoch: 0|step: 3441|ppo_ep: 1|act_loss: -0.02459716796875|cri_loss: 0.02069091796875|unsuper_loss: 0.0 average reward score: 3.99609375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.08%) |Training time=0.64s (19.81%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.36 epoch: 0|step: 3442|ppo_ep: 1|act_loss: 0.237060546875|cri_loss: 0.154541015625|unsuper_loss: 0.0 average reward score: 3.17578125 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.61%) |Training time=0.64s (19.24%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.36 epoch: 0|step: 3443|ppo_ep: 1|act_loss: 0.062225341796875|cri_loss: 0.064697265625|unsuper_loss: 0.0 average reward score: 3.5390625 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.32%) |Training time=0.65s (19.82%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.36 epoch: 0|step: 3444|ppo_ep: 1|act_loss: -0.0181884765625|cri_loss: 0.00665283203125|unsuper_loss: 0.0 average reward score: 2.708984375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.95%) |Training time=0.65s (19.78%) |Others=0.20 (6.27%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.36 epoch: 0|step: 3445|ppo_ep: 1|act_loss: -0.1236572265625|cri_loss: -0.036102294921875|unsuper_loss: 0.0 average reward score: 2.42578125 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.17%) |Training time=0.65s (19.77%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.36 epoch: 0|step: 3446|ppo_ep: 1|act_loss: -0.08441162109375|cri_loss: -0.02734375|unsuper_loss: 0.0 average reward score: 3.56640625 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.45s (73.47%) |Training time=0.66s (19.83%) |Others=0.22 (6.69%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.36 epoch: 0|step: 3447|ppo_ep: 1|act_loss: -0.1295166015625|cri_loss: -0.0191650390625|unsuper_loss: 0.0 average reward score: 1.78515625 ------------------------------------------------------------------------------------- |E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.34s (65.88%) |Training time=0.94s (26.36%) |Others=0.28 (7.76%)|CurSamplesPerSec=2.25 |AvgSamplesPerSec=2.36 epoch: 0|step: 3448|ppo_ep: 1|act_loss: 0.07958984375|cri_loss: 0.0765380859375|unsuper_loss: 0.0 average reward score: 3.830078125 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.82%) |Training time=0.64s (20.18%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.36 epoch: 0|step: 3449|ppo_ep: 1|act_loss: -0.06610107421875|cri_loss: -0.0235137939453125|unsuper_loss: 0.0 average reward score: 3.40234375 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.43s (73.77%) |Training time=0.66s (20.12%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.36 epoch: 0|step: 3450|ppo_ep: 1|act_loss: 0.11865234375|cri_loss: 0.0831298828125|unsuper_loss: 0.0 average reward score: 3.5 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.76%) |Training time=0.65s (19.96%) |Others=0.20 (6.28%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.36 epoch: 0|step: 3451|ppo_ep: 1|act_loss: 0.0760498046875|cri_loss: 0.054412841796875|unsuper_loss: 0.0 average reward score: 2.53125 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.29s (71.29%) |Training time=0.72s (22.48%) |Others=0.20 (6.24%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.36 epoch: 0|step: 3452|ppo_ep: 1|act_loss: -0.1800537109375|cri_loss: -0.0577392578125|unsuper_loss: 0.0 average reward score: 2.794921875 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.47%) |Training time=0.64s (19.43%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.36 epoch: 0|step: 3453|ppo_ep: 1|act_loss: 0.027801513671875|cri_loss: 0.03656005859375|unsuper_loss: 0.0 average reward score: 3.974609375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.43s (73.93%) |Training time=0.66s (20.21%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.36 epoch: 0|step: 3454|ppo_ep: 1|act_loss: -0.254638671875|cri_loss: -0.09185791015625|unsuper_loss: 0.0 average reward score: 3.33203125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.22%) |Training time=0.64s (19.72%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.36 epoch: 0|step: 3455|ppo_ep: 1|act_loss: -0.0318603515625|cri_loss: -0.0019683837890625|unsuper_loss: 0.0 average reward score: 2.845703125 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.44s (66.95%) |Training time=0.92s (25.39%) |Others=0.28 (7.66%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.36 epoch: 0|step: 3456|ppo_ep: 1|act_loss: 0.0171356201171875|cri_loss: 0.036376953125|unsuper_loss: 0.0 average reward score: 2.853515625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.34s (72.87%) |Training time=0.68s (21.06%) |Others=0.19 (6.07%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.36 epoch: 0|step: 3457|ppo_ep: 1|act_loss: 0.30517578125|cri_loss: 0.208251953125|unsuper_loss: 0.0 average reward score: 3.6796875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.30s (71.77%) |Training time=0.71s (22.01%) |Others=0.20 (6.22%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.36 epoch: 0|step: 3458|ppo_ep: 1|act_loss: 0.10186767578125|cri_loss: 0.066162109375|unsuper_loss: 0.0 average reward score: 2.9140625 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.06%) |Training time=0.67s (20.06%) |Others=0.20 (5.88%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.36 epoch: 0|step: 3459|ppo_ep: 1|act_loss: 0.27001953125|cri_loss: 0.1605224609375|unsuper_loss: 0.0 average reward score: 2.630859375 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.62%) |Training time=0.64s (19.36%) |Others=0.20 (6.02%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.36 epoch: 0|step: 3460|ppo_ep: 1|act_loss: 0.1505126953125|cri_loss: 0.1107177734375|unsuper_loss: 0.0 average reward score: 1.3173828125 ------------------------------------------------------------------------------------- |E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.56s (75.39%) |Training time=0.64s (18.83%) |Others=0.20 (5.78%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.36 epoch: 0|step: 3461|ppo_ep: 1|act_loss: 0.16552734375|cri_loss: 0.1060791015625|unsuper_loss: 0.0 average reward score: 2.396484375 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.06%) |Training time=0.66s (19.97%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.36 epoch: 0|step: 3462|ppo_ep: 1|act_loss: 0.18017578125|cri_loss: 0.12548828125|unsuper_loss: 0.0 average reward score: 2.974609375 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.44%) |Training time=0.65s (19.59%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.36 epoch: 0|step: 3463|ppo_ep: 1|act_loss: 0.0556640625|cri_loss: 0.052215576171875|unsuper_loss: 0.0 average reward score: 2.361328125 ------------------------------------------------------------------------------------- |E2E latency=3.66s |Gather latency=0.00s (0.00%) |Generate time=2.41s (65.99%) |Training time=0.95s (26.12%) |Others=0.29 (7.89%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.36 epoch: 0|step: 3464|ppo_ep: 1|act_loss: 0.25537109375|cri_loss: 0.1591796875|unsuper_loss: 0.0 average reward score: 2.22265625 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.83%) |Training time=0.64s (19.24%) |Others=0.20 (5.93%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.36 epoch: 0|step: 3465|ppo_ep: 1|act_loss: 0.0616455078125|cri_loss: 0.04791259765625|unsuper_loss: 0.0 average reward score: 1.0927734375 ------------------------------------------------------------------------------------- |E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.54s (74.98%) |Training time=0.65s (19.18%) |Others=0.20 (5.84%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.36 epoch: 0|step: 3466|ppo_ep: 1|act_loss: -0.0802001953125|cri_loss: -0.019989013671875|unsuper_loss: 0.0 average reward score: 2.912109375 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.21%) |Training time=0.64s (19.79%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.36 epoch: 0|step: 3467|ppo_ep: 1|act_loss: -0.0841064453125|cri_loss: -0.025970458984375|unsuper_loss: 0.0 average reward score: 3.8515625 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.98%) |Training time=0.64s (19.17%) |Others=0.20 (5.86%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.36 epoch: 0|step: 3468|ppo_ep: 1|act_loss: -0.05810546875|cri_loss: -0.000579833984375|unsuper_loss: 0.0 average reward score: 3.96875 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.44s (73.46%) |Training time=0.67s (20.34%) |Others=0.21 (6.20%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.36 epoch: 0|step: 3469|ppo_ep: 1|act_loss: 0.34228515625|cri_loss: 0.219970703125|unsuper_loss: 0.0 average reward score: 2.5859375 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.75%) |Training time=0.64s (19.45%) |Others=0.19 (5.80%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.36 epoch: 0|step: 3470|ppo_ep: 1|act_loss: 0.3544921875|cri_loss: 0.231201171875|unsuper_loss: 0.0 average reward score: 3.494140625 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.41%) |Training time=0.65s (19.54%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.36 epoch: 0|step: 3471|ppo_ep: 1|act_loss: -0.09661865234375|cri_loss: -0.01580810546875|unsuper_loss: 0.0 average reward score: 4.046875 ------------------------------------------------------------------------------------- |E2E latency=3.73s |Gather latency=0.00s (0.00%) |Generate time=2.50s (67.14%) |Training time=0.95s (25.36%) |Others=0.28 (7.49%)|CurSamplesPerSec=2.15 |AvgSamplesPerSec=2.36 epoch: 0|step: 3472|ppo_ep: 1|act_loss: 0.10821533203125|cri_loss: 0.07843017578125|unsuper_loss: 0.0 average reward score: 2.9375 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.06%) |Training time=0.68s (20.14%) |Others=0.19 (5.80%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.36 epoch: 0|step: 3473|ppo_ep: 1|act_loss: 0.281982421875|cri_loss: 0.17822265625|unsuper_loss: 0.0 average reward score: 3.126953125 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.49%) |Training time=0.65s (19.54%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.36 epoch: 0|step: 3474|ppo_ep: 1|act_loss: 0.21484375|cri_loss: 0.132568359375|unsuper_loss: 0.0 average reward score: 2.7578125 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.63%) |Training time=0.64s (19.33%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.36 epoch: 0|step: 3475|ppo_ep: 1|act_loss: 0.220703125|cri_loss: 0.1435546875|unsuper_loss: 0.0 average reward score: 2.8515625 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.68%) |Training time=0.65s (19.33%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.36 epoch: 0|step: 3476|ppo_ep: 1|act_loss: 0.126953125|cri_loss: 0.0980224609375|unsuper_loss: 0.0 average reward score: 3.5 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.35%) |Training time=0.64s (19.54%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.36 epoch: 0|step: 3477|ppo_ep: 1|act_loss: 0.479736328125|cri_loss: 0.307861328125|unsuper_loss: 0.0 average reward score: 2.435546875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.09%) |Training time=0.65s (19.83%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.36 epoch: 0|step: 3478|ppo_ep: 1|act_loss: 0.06549072265625|cri_loss: 0.0628662109375|unsuper_loss: 0.0 average reward score: 1.8330078125 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.48%) |Training time=0.65s (19.52%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.36 epoch: 0|step: 3479|ppo_ep: 1|act_loss: 0.037628173828125|cri_loss: 0.039703369140625|unsuper_loss: 0.0 average reward score: 3.60546875 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.43s (66.71%) |Training time=0.93s (25.56%) |Others=0.28 (7.73%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.36 epoch: 0|step: 3480|ppo_ep: 1|act_loss: 0.00836181640625|cri_loss: 0.034576416015625|unsuper_loss: 0.0 average reward score: 3.12109375 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.68%) |Training time=0.64s (19.35%) |Others=0.20 (5.97%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.36 epoch: 0|step: 3481|ppo_ep: 1|act_loss: 0.09613037109375|cri_loss: 0.07781982421875|unsuper_loss: 0.0 average reward score: 2.572265625 ------------------------------------------------------------------------------------- |E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.58s (74.80%) |Training time=0.65s (18.94%) |Others=0.22 (6.26%)|CurSamplesPerSec=2.32 |AvgSamplesPerSec=2.36 epoch: 0|step: 3482|ppo_ep: 1|act_loss: 0.1875|cri_loss: 0.123779296875|unsuper_loss: 0.0 average reward score: 2.62890625 ------------------------------------------------------------------------------------- |E2E latency=3.88s |Gather latency=0.00s (0.00%) |Generate time=2.93s (75.59%) |Training time=0.70s (17.96%) |Others=0.25 (6.45%)|CurSamplesPerSec=2.06 |AvgSamplesPerSec=2.36 epoch: 0|step: 3483|ppo_ep: 1|act_loss: 0.39306640625|cri_loss: 0.236572265625|unsuper_loss: 0.0 average reward score: 2.6328125 ------------------------------------------------------------------------------------- |E2E latency=3.99s |Gather latency=0.00s (0.00%) |Generate time=3.09s (77.55%) |Training time=0.66s (16.57%) |Others=0.23 (5.88%)|CurSamplesPerSec=2.01 |AvgSamplesPerSec=2.36 epoch: 0|step: 3484|ppo_ep: 1|act_loss: 0.14306640625|cri_loss: 0.1121826171875|unsuper_loss: 0.0 average reward score: 2.640625 ------------------------------------------------------------------------------------- |E2E latency=3.73s |Gather latency=0.00s (0.00%) |Generate time=2.85s (76.51%) |Training time=0.65s (17.35%) |Others=0.23 (6.14%)|CurSamplesPerSec=2.15 |AvgSamplesPerSec=2.36 epoch: 0|step: 3485|ppo_ep: 1|act_loss: -0.11016845703125|cri_loss: -0.0191650390625|unsuper_loss: 0.0 average reward score: 3.19140625 ------------------------------------------------------------------------------------- |E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.63s (74.35%) |Training time=0.68s (19.18%) |Others=0.23 (6.47%)|CurSamplesPerSec=2.26 |AvgSamplesPerSec=2.36 epoch: 0|step: 3486|ppo_ep: 1|act_loss: 0.31201171875|cri_loss: 0.19580078125|unsuper_loss: 0.0 average reward score: 3.796875 ------------------------------------------------------------------------------------- |E2E latency=3.80s |Gather latency=0.00s (0.00%) |Generate time=2.90s (76.22%) |Training time=0.65s (17.13%) |Others=0.25 (6.65%)|CurSamplesPerSec=2.11 |AvgSamplesPerSec=2.36 epoch: 0|step: 3487|ppo_ep: 1|act_loss: 0.6025390625|cri_loss: 0.357666015625|unsuper_loss: 0.0 average reward score: 1.744140625 ------------------------------------------------------------------------------------- |E2E latency=4.42s |Gather latency=0.00s (0.00%) |Generate time=2.80s (63.24%) |Training time=1.30s (29.28%) |Others=0.33 (7.48%)|CurSamplesPerSec=1.81 |AvgSamplesPerSec=2.36 epoch: 0|step: 3488|ppo_ep: 1|act_loss: 0.472900390625|cri_loss: 0.2783203125|unsuper_loss: 0.0 average reward score: 2.78125 ------------------------------------------------------------------------------------- |E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.58s (73.30%) |Training time=0.72s (20.38%) |Others=0.22 (6.32%)|CurSamplesPerSec=2.27 |AvgSamplesPerSec=2.36 epoch: 0|step: 3489|ppo_ep: 1|act_loss: 0.2998046875|cri_loss: 0.182861328125|unsuper_loss: 0.0 average reward score: 4.2265625 ------------------------------------------------------------------------------------- |E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.49s (73.79%) |Training time=0.65s (19.14%) |Others=0.24 (7.07%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.36 epoch: 0|step: 3490|ppo_ep: 1|act_loss: 0.301513671875|cri_loss: 0.1875|unsuper_loss: 0.0 average reward score: 2.9609375 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.09%) |Training time=0.65s (19.74%) |Others=0.20 (6.17%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.36 epoch: 0|step: 3491|ppo_ep: 1|act_loss: 0.1070556640625|cri_loss: 0.09906005859375|unsuper_loss: 0.0 average reward score: 3.3046875 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.36%) |Training time=0.65s (19.54%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.36 epoch: 0|step: 3492|ppo_ep: 1|act_loss: 0.267333984375|cri_loss: 0.18115234375|unsuper_loss: 0.0 average reward score: 3.533203125 ------------------------------------------------------------------------------------- |E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.56s (73.46%) |Training time=0.67s (19.38%) |Others=0.25 (7.16%)|CurSamplesPerSec=2.30 |AvgSamplesPerSec=2.36 epoch: 0|step: 3493|ppo_ep: 1|act_loss: 0.003570556640625|cri_loss: 0.03204345703125|unsuper_loss: 0.0 average reward score: 3.28125 ------------------------------------------------------------------------------------- |E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.62s (75.34%) |Training time=0.65s (18.77%) |Others=0.20 (5.88%)|CurSamplesPerSec=2.30 |AvgSamplesPerSec=2.36 epoch: 0|step: 3494|ppo_ep: 1|act_loss: 0.167236328125|cri_loss: 0.11572265625|unsuper_loss: 0.0 average reward score: 2.078125 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.02%) |Training time=0.65s (19.28%) |Others=0.23 (6.70%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.36 epoch: 0|step: 3495|ppo_ep: 1|act_loss: 0.3056640625|cri_loss: 0.1793212890625|unsuper_loss: 0.0 average reward score: 3.39453125 ------------------------------------------------------------------------------------- |E2E latency=3.80s |Gather latency=0.00s (0.00%) |Generate time=2.55s (67.14%) |Training time=0.94s (24.71%) |Others=0.31 (8.15%)|CurSamplesPerSec=2.11 |AvgSamplesPerSec=2.36 epoch: 0|step: 3496|ppo_ep: 1|act_loss: 0.396484375|cri_loss: 0.234619140625|unsuper_loss: 0.0 average reward score: 4.5078125 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.44s (73.78%) |Training time=0.64s (19.49%) |Others=0.22 (6.72%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.36 epoch: 0|step: 3497|ppo_ep: 1|act_loss: -0.07159423828125|cri_loss: -0.013153076171875|unsuper_loss: 0.0 average reward score: 3.193359375 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.47s (73.36%) |Training time=0.65s (19.19%) |Others=0.25 (7.45%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.36 epoch: 0|step: 3498|ppo_ep: 1|act_loss: 0.1343994140625|cri_loss: 0.08782958984375|unsuper_loss: 0.0 average reward score: 3.53125 ------------------------------------------------------------------------------------- |E2E latency=3.41s |Gather latency=0.00s (0.00%) |Generate time=2.53s (74.07%) |Training time=0.65s (19.02%) |Others=0.24 (6.91%)|CurSamplesPerSec=2.34 |AvgSamplesPerSec=2.36 epoch: 0|step: 3499|ppo_ep: 1|act_loss: 0.1439208984375|cri_loss: 0.1083984375|unsuper_loss: 0.0 average reward score: 3.529296875 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.83%) |Training time=0.64s (19.09%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.36 epoch: 0|step: 3500|ppo_ep: 1|act_loss: 0.443603515625|cri_loss: 0.274169921875|unsuper_loss: 0.0 average reward score: 3.87890625 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.03%) |Training time=0.64s (19.04%) |Others=0.23 (6.93%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.36 epoch: 0|step: 3501|ppo_ep: 1|act_loss: 0.38671875|cri_loss: 0.236572265625|unsuper_loss: 0.0 average reward score: 4.296875 ------------------------------------------------------------------------------------- |E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.77s (76.33%) |Training time=0.64s (17.60%) |Others=0.22 (6.07%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.36 epoch: 0|step: 3502|ppo_ep: 1|act_loss: 0.2724609375|cri_loss: 0.1728515625|unsuper_loss: 0.0 average reward score: 2.546875 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.03%) |Training time=0.65s (19.26%) |Others=0.23 (6.71%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.36 epoch: 0|step: 3503|ppo_ep: 1|act_loss: 0.02008056640625|cri_loss: 0.04144287109375|unsuper_loss: 0.0 average reward score: 3.85546875 ------------------------------------------------------------------------------------- |E2E latency=3.94s |Gather latency=0.00s (0.00%) |Generate time=2.58s (65.36%) |Training time=1.06s (26.89%) |Others=0.31 (7.75%)|CurSamplesPerSec=2.03 |AvgSamplesPerSec=2.36 epoch: 0|step: 3504|ppo_ep: 1|act_loss: 0.403076171875|cri_loss: 0.248779296875|unsuper_loss: 0.0 average reward score: 1.9130859375 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.55%) |Training time=0.65s (19.16%) |Others=0.21 (6.29%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.36 epoch: 0|step: 3505|ppo_ep: 1|act_loss: 0.4111328125|cri_loss: 0.2457275390625|unsuper_loss: 0.0 average reward score: 1.7119140625 ------------------------------------------------------------------------------------- |E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.64s (74.54%) |Training time=0.69s (19.36%) |Others=0.22 (6.10%)|CurSamplesPerSec=2.26 |AvgSamplesPerSec=2.36 epoch: 0|step: 3506|ppo_ep: 1|act_loss: 0.278076171875|cri_loss: 0.1690673828125|unsuper_loss: 0.0 average reward score: 2.69921875 ------------------------------------------------------------------------------------- |E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.55s (73.10%) |Training time=0.71s (20.38%) |Others=0.23 (6.52%)|CurSamplesPerSec=2.29 |AvgSamplesPerSec=2.36 epoch: 0|step: 3507|ppo_ep: 1|act_loss: 0.197021484375|cri_loss: 0.130126953125|unsuper_loss: 0.0 average reward score: 3.57421875 ------------------------------------------------------------------------------------- |E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.22%) |Training time=0.64s (18.99%) |Others=0.23 (6.79%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.36 epoch: 0|step: 3508|ppo_ep: 1|act_loss: 0.04937744140625|cri_loss: 0.05609130859375|unsuper_loss: 0.0 average reward score: 4.7109375 ------------------------------------------------------------------------------------- |E2E latency=3.41s |Gather latency=0.00s (0.00%) |Generate time=2.48s (72.85%) |Training time=0.70s (20.67%) |Others=0.22 (6.48%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.36 epoch: 0|step: 3509|ppo_ep: 1|act_loss: 0.40478515625|cri_loss: 0.2476806640625|unsuper_loss: 0.0 average reward score: 4.12109375 ------------------------------------------------------------------------------------- |E2E latency=3.44s |Gather latency=0.00s (0.00%) |Generate time=2.58s (74.94%) |Training time=0.64s (18.73%) |Others=0.22 (6.32%)|CurSamplesPerSec=2.33 |AvgSamplesPerSec=2.36 epoch: 0|step: 3510|ppo_ep: 1|act_loss: 0.201171875|cri_loss: 0.138671875|unsuper_loss: 0.0 average reward score: 1.896484375 ------------------------------------------------------------------------------------- |E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.54s (73.11%) |Training time=0.71s (20.57%) |Others=0.22 (6.31%)|CurSamplesPerSec=2.30 |AvgSamplesPerSec=2.36 epoch: 0|step: 3511|ppo_ep: 1|act_loss: 0.02459716796875|cri_loss: 0.051300048828125|unsuper_loss: 0.0 average reward score: 3.455078125 ------------------------------------------------------------------------------------- |E2E latency=3.85s |Gather latency=0.00s (0.00%) |Generate time=2.62s (68.09%) |Training time=0.94s (24.45%) |Others=0.29 (7.47%)|CurSamplesPerSec=2.08 |AvgSamplesPerSec=2.36 epoch: 0|step: 3512|ppo_ep: 1|act_loss: 0.05645751953125|cri_loss: 0.047515869140625|unsuper_loss: 0.0 average reward score: 3.7734375 ------------------------------------------------------------------------------------- |E2E latency=3.70s |Gather latency=0.00s (0.00%) |Generate time=2.45s (66.36%) |Training time=1.01s (27.22%) |Others=0.24 (6.42%)|CurSamplesPerSec=2.16 |AvgSamplesPerSec=2.35 epoch: 0|step: 3513|ppo_ep: 1|act_loss: 0.1689453125|cri_loss: 0.1109619140625|unsuper_loss: 0.0 average reward score: 3.73828125 ------------------------------------------------------------------------------------- |E2E latency=3.82s |Gather latency=0.00s (0.00%) |Generate time=2.95s (77.09%) |Training time=0.64s (16.84%) |Others=0.23 (6.07%)|CurSamplesPerSec=2.09 |AvgSamplesPerSec=2.35 epoch: 0|step: 3514|ppo_ep: 1|act_loss: 0.369873046875|cri_loss: 0.22314453125|unsuper_loss: 0.0 average reward score: 3.953125 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.45s (73.82%) |Training time=0.64s (19.23%) |Others=0.23 (6.95%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.35 epoch: 0|step: 3515|ppo_ep: 1|act_loss: 0.1004638671875|cri_loss: 0.081298828125|unsuper_loss: 0.0 average reward score: 3.4296875 ------------------------------------------------------------------------------------- |E2E latency=3.98s |Gather latency=0.00s (0.00%) |Generate time=2.47s (62.00%) |Training time=1.29s (32.49%) |Others=0.22 (5.51%)|CurSamplesPerSec=2.01 |AvgSamplesPerSec=2.35 epoch: 0|step: 3516|ppo_ep: 1|act_loss: 0.31201171875|cri_loss: 0.1854248046875|unsuper_loss: 0.0 average reward score: 3.56640625 ------------------------------------------------------------------------------------- |E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.66s (74.76%) |Training time=0.66s (18.55%) |Others=0.24 (6.69%)|CurSamplesPerSec=2.25 |AvgSamplesPerSec=2.35 epoch: 0|step: 3517|ppo_ep: 1|act_loss: 0.05755615234375|cri_loss: 0.062255859375|unsuper_loss: 0.0 average reward score: 3.2421875 ------------------------------------------------------------------------------------- |E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.66s (75.24%) |Training time=0.64s (18.13%) |Others=0.23 (6.63%)|CurSamplesPerSec=2.26 |AvgSamplesPerSec=2.35 epoch: 0|step: 3518|ppo_ep: 1|act_loss: 0.291015625|cri_loss: 0.173095703125|unsuper_loss: 0.0 average reward score: 4.21875 ------------------------------------------------------------------------------------- |E2E latency=3.75s |Gather latency=0.00s (0.00%) |Generate time=2.89s (77.13%) |Training time=0.64s (17.10%) |Others=0.22 (5.78%)|CurSamplesPerSec=2.13 |AvgSamplesPerSec=2.35 [2023-04-24 17:07:01,465] [INFO] [logging.py:96:log_dist] [Rank 0] step=440, skipped=5, lr=[2.80354677823893e-07, 2.80354677823893e-07], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 17:07:01,713] [INFO] [timer.py:199:stop] epoch=0/micro_step=3520/global_step=440, RunningAvgSamplesPerSec=15.280178101221631, CurrSamplesPerSec=12.540322908036309, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 17:07:01,947] [INFO] [logging.py:96:log_dist] [Rank 0] step=440, skipped=6, lr=[1.5236009698880532e-07, 1.5236009698880532e-07], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 3519|ppo_ep: 1|act_loss: 0.257568359375|cri_loss: 0.18359375|unsuper_loss: 0.0 average reward score: 2.951171875 ------------------------------------------------------------------------------------- |E2E latency=3.91s |Gather latency=0.00s (0.00%) |Generate time=2.62s (66.99%) |Training time=0.98s (25.06%) |Others=0.31 (7.95%)|CurSamplesPerSec=2.05 |AvgSamplesPerSec=2.35 epoch: 0|step: 3520|ppo_ep: 1|act_loss: 0.2196044921875|cri_loss: 0.1424560546875|unsuper_loss: 0.0 average reward score: 2.201171875 ------------------------------------------------------------------------------------- |E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.68s (75.70%) |Training time=0.64s (17.93%) |Others=0.23 (6.37%)|CurSamplesPerSec=2.26 |AvgSamplesPerSec=2.35 epoch: 0|step: 3521|ppo_ep: 1|act_loss: 0.44140625|cri_loss: 0.26416015625|unsuper_loss: 0.0 average reward score: 1.84765625 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.76s (75.87%) |Training time=0.64s (17.58%) |Others=0.24 (6.55%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.35 epoch: 0|step: 3522|ppo_ep: 1|act_loss: 0.841796875|cri_loss: 0.5634765625|unsuper_loss: 0.0 average reward score: 4.046875 ------------------------------------------------------------------------------------- |E2E latency=3.81s |Gather latency=0.00s (0.00%) |Generate time=2.69s (70.69%) |Training time=0.88s (23.04%) |Others=0.24 (6.27%)|CurSamplesPerSec=2.10 |AvgSamplesPerSec=2.35 epoch: 0|step: 3523|ppo_ep: 1|act_loss: 0.0843505859375|cri_loss: 0.0692138671875|unsuper_loss: 0.0 average reward score: 3.23828125 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.46s (73.58%) |Training time=0.66s (19.63%) |Others=0.23 (6.79%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.35 epoch: 0|step: 3524|ppo_ep: 1|act_loss: 0.24658203125|cri_loss: 0.1495361328125|unsuper_loss: 0.0 average reward score: 1.984375 ------------------------------------------------------------------------------------- |E2E latency=3.67s |Gather latency=0.00s (0.00%) |Generate time=2.76s (75.39%) |Training time=0.67s (18.21%) |Others=0.23 (6.40%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.35 epoch: 0|step: 3525|ppo_ep: 1|act_loss: 0.37548828125|cri_loss: 0.2435302734375|unsuper_loss: 0.0 average reward score: 3.01171875 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.44s (67.67%) |Training time=0.97s (26.91%) |Others=0.20 (5.42%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.35 epoch: 0|step: 3526|ppo_ep: 1|act_loss: 0.26220703125|cri_loss: 0.15966796875|unsuper_loss: 0.0 average reward score: 4.9296875 ------------------------------------------------------------------------------------- |E2E latency=3.41s |Gather latency=0.00s (0.00%) |Generate time=2.52s (73.74%) |Training time=0.70s (20.47%) |Others=0.20 (5.79%)|CurSamplesPerSec=2.34 |AvgSamplesPerSec=2.35 epoch: 0|step: 3527|ppo_ep: 1|act_loss: 0.04974365234375|cri_loss: 0.048797607421875|unsuper_loss: 0.0 average reward score: 3.39453125 ------------------------------------------------------------------------------------- |E2E latency=4.03s |Gather latency=0.00s (0.00%) |Generate time=2.80s (69.54%) |Training time=0.93s (23.01%) |Others=0.30 (7.45%)|CurSamplesPerSec=1.99 |AvgSamplesPerSec=2.35 epoch: 0|step: 3528|ppo_ep: 1|act_loss: 0.117919921875|cri_loss: 0.082763671875|unsuper_loss: 0.0 average reward score: 4.453125 ------------------------------------------------------------------------------------- |E2E latency=3.72s |Gather latency=0.00s (0.00%) |Generate time=2.72s (73.26%) |Training time=0.70s (18.80%) |Others=0.30 (7.95%)|CurSamplesPerSec=2.15 |AvgSamplesPerSec=2.35 epoch: 0|step: 3529|ppo_ep: 1|act_loss: 0.235595703125|cri_loss: 0.158447265625|unsuper_loss: 0.0 average reward score: 3.54296875 ------------------------------------------------------------------------------------- |E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.68s (75.48%) |Training time=0.65s (18.27%) |Others=0.22 (6.24%)|CurSamplesPerSec=2.25 |AvgSamplesPerSec=2.35 epoch: 0|step: 3530|ppo_ep: 1|act_loss: 0.223388671875|cri_loss: 0.1510009765625|unsuper_loss: 0.0 average reward score: 3.703125 ------------------------------------------------------------------------------------- |E2E latency=3.57s |Gather latency=0.00s (0.00%) |Generate time=2.63s (73.73%) |Training time=0.72s (20.21%) |Others=0.22 (6.06%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.35 epoch: 0|step: 3531|ppo_ep: 1|act_loss: 0.28076171875|cri_loss: 0.181396484375|unsuper_loss: 0.0 average reward score: 3.62109375 ------------------------------------------------------------------------------------- |E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.46s (69.65%) |Training time=0.85s (24.11%) |Others=0.22 (6.24%)|CurSamplesPerSec=2.26 |AvgSamplesPerSec=2.35 epoch: 0|step: 3532|ppo_ep: 1|act_loss: 0.29638671875|cri_loss: 0.181396484375|unsuper_loss: 0.0 average reward score: 4.640625 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.59s (70.99%) |Training time=0.83s (22.83%) |Others=0.23 (6.18%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.35 epoch: 0|step: 3533|ppo_ep: 1|act_loss: 0.1925048828125|cri_loss: 0.126953125|unsuper_loss: 0.0 average reward score: 2.810546875 ------------------------------------------------------------------------------------- |E2E latency=3.69s |Gather latency=0.00s (0.00%) |Generate time=2.78s (75.37%) |Training time=0.68s (18.54%) |Others=0.22 (6.09%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.35 epoch: 0|step: 3534|ppo_ep: 1|act_loss: 0.36181640625|cri_loss: 0.2198486328125|unsuper_loss: 0.0 average reward score: 3.244140625 ------------------------------------------------------------------------------------- |E2E latency=3.85s |Gather latency=0.00s (0.00%) |Generate time=2.99s (77.67%) |Training time=0.64s (16.60%) |Others=0.22 (5.72%)|CurSamplesPerSec=2.08 |AvgSamplesPerSec=2.35 epoch: 0|step: 3535|ppo_ep: 1|act_loss: 0.3544921875|cri_loss: 0.22509765625|unsuper_loss: 0.0 average reward score: 1.94921875 ------------------------------------------------------------------------------------- |E2E latency=4.07s |Gather latency=0.00s (0.00%) |Generate time=2.84s (69.72%) |Training time=0.93s (22.74%) |Others=0.31 (7.54%)|CurSamplesPerSec=1.96 |AvgSamplesPerSec=2.35 epoch: 0|step: 3536|ppo_ep: 1|act_loss: 0.51220703125|cri_loss: 0.309326171875|unsuper_loss: 0.0 average reward score: 2.9140625 ------------------------------------------------------------------------------------- |E2E latency=3.65s |Gather latency=0.00s (0.00%) |Generate time=2.81s (76.82%) |Training time=0.64s (17.41%) |Others=0.21 (5.77%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.35 epoch: 0|step: 3537|ppo_ep: 1|act_loss: 0.287109375|cri_loss: 0.17236328125|unsuper_loss: 0.0 average reward score: 3.2265625 ------------------------------------------------------------------------------------- |E2E latency=3.66s |Gather latency=0.00s (0.00%) |Generate time=2.79s (76.28%) |Training time=0.64s (17.49%) |Others=0.23 (6.23%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.35 epoch: 0|step: 3538|ppo_ep: 1|act_loss: 0.56494140625|cri_loss: 0.343994140625|unsuper_loss: 0.0 average reward score: 4.34375 ------------------------------------------------------------------------------------- |E2E latency=3.65s |Gather latency=0.00s (0.00%) |Generate time=2.79s (76.51%) |Training time=0.64s (17.55%) |Others=0.22 (5.94%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.35 epoch: 0|step: 3539|ppo_ep: 1|act_loss: 0.2484130859375|cri_loss: 0.153564453125|unsuper_loss: 0.0 average reward score: 2.583984375 ------------------------------------------------------------------------------------- |E2E latency=3.76s |Gather latency=0.00s (0.00%) |Generate time=2.89s (76.75%) |Training time=0.65s (17.22%) |Others=0.23 (6.03%)|CurSamplesPerSec=2.13 |AvgSamplesPerSec=2.35 epoch: 0|step: 3540|ppo_ep: 1|act_loss: 0.05828857421875|cri_loss: 0.054107666015625|unsuper_loss: 0.0 average reward score: 2.49609375 ------------------------------------------------------------------------------------- |E2E latency=3.59s |Gather latency=0.00s (0.00%) |Generate time=2.58s (71.86%) |Training time=0.78s (21.81%) |Others=0.23 (6.33%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.35 epoch: 0|step: 3541|ppo_ep: 1|act_loss: 0.5830078125|cri_loss: 0.3642578125|unsuper_loss: 0.0 average reward score: 3.189453125 ------------------------------------------------------------------------------------- |E2E latency=3.68s |Gather latency=0.00s (0.00%) |Generate time=2.82s (76.49%) |Training time=0.66s (17.97%) |Others=0.20 (5.53%)|CurSamplesPerSec=2.17 |AvgSamplesPerSec=2.35 epoch: 0|step: 3542|ppo_ep: 1|act_loss: 0.206787109375|cri_loss: 0.145263671875|unsuper_loss: 0.0 average reward score: 3.685546875 ------------------------------------------------------------------------------------- |E2E latency=3.78s |Gather latency=0.00s (0.00%) |Generate time=2.75s (72.90%) |Training time=0.81s (21.51%) |Others=0.21 (5.59%)|CurSamplesPerSec=2.12 |AvgSamplesPerSec=2.35 epoch: 0|step: 3543|ppo_ep: 1|act_loss: 0.11328125|cri_loss: 0.0765380859375|unsuper_loss: 0.0 average reward score: 3.080078125 ------------------------------------------------------------------------------------- |E2E latency=3.73s |Gather latency=0.00s (0.00%) |Generate time=2.49s (66.90%) |Training time=0.92s (24.81%) |Others=0.31 (8.29%)|CurSamplesPerSec=2.15 |AvgSamplesPerSec=2.35 epoch: 0|step: 3544|ppo_ep: 1|act_loss: 0.42919921875|cri_loss: 0.264404296875|unsuper_loss: 0.0 average reward score: 3.791015625 ------------------------------------------------------------------------------------- |E2E latency=3.82s |Gather latency=0.00s (0.00%) |Generate time=2.71s (70.80%) |Training time=0.85s (22.18%) |Others=0.27 (7.01%)|CurSamplesPerSec=2.09 |AvgSamplesPerSec=2.35 epoch: 0|step: 3545|ppo_ep: 1|act_loss: 0.67333984375|cri_loss: 0.42822265625|unsuper_loss: 0.0 average reward score: 3.8359375 ------------------------------------------------------------------------------------- |E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.59s (74.70%) |Training time=0.65s (18.89%) |Others=0.22 (6.41%)|CurSamplesPerSec=2.31 |AvgSamplesPerSec=2.35 epoch: 0|step: 3546|ppo_ep: 1|act_loss: 0.319580078125|cri_loss: 0.2078857421875|unsuper_loss: 0.0 average reward score: 3.8671875 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.70s (75.08%) |Training time=0.67s (18.55%) |Others=0.23 (6.37%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.35 epoch: 0|step: 3547|ppo_ep: 1|act_loss: -0.08734130859375|cri_loss: -0.026214599609375|unsuper_loss: 0.0 average reward score: 3.6796875 ------------------------------------------------------------------------------------- |E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.44s (69.74%) |Training time=0.83s (23.62%) |Others=0.23 (6.64%)|CurSamplesPerSec=2.28 |AvgSamplesPerSec=2.35 epoch: 0|step: 3548|ppo_ep: 1|act_loss: 0.253173828125|cri_loss: 0.154052734375|unsuper_loss: 0.0 average reward score: 2.30078125 ------------------------------------------------------------------------------------- |E2E latency=3.82s |Gather latency=0.00s (0.00%) |Generate time=2.54s (66.56%) |Training time=1.08s (28.26%) |Others=0.20 (5.18%)|CurSamplesPerSec=2.10 |AvgSamplesPerSec=2.35 epoch: 0|step: 3549|ppo_ep: 1|act_loss: -0.065673828125|cri_loss: -0.015655517578125|unsuper_loss: 0.0 average reward score: 4.15625 ------------------------------------------------------------------------------------- |E2E latency=3.67s |Gather latency=0.00s (0.00%) |Generate time=2.83s (76.97%) |Training time=0.65s (17.60%) |Others=0.20 (5.43%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.35 epoch: 0|step: 3550|ppo_ep: 1|act_loss: 0.5|cri_loss: 0.297119140625|unsuper_loss: 0.0 average reward score: 2.80078125 ------------------------------------------------------------------------------------- |E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.52s (74.24%) |Training time=0.65s (19.14%) |Others=0.23 (6.62%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.35 epoch: 0|step: 3551|ppo_ep: 1|act_loss: -0.0179443359375|cri_loss: 0.012908935546875|unsuper_loss: 0.0 average reward score: 4.5546875 ------------------------------------------------------------------------------------- |E2E latency=3.77s |Gather latency=0.00s (0.00%) |Generate time=2.53s (67.27%) |Training time=0.93s (24.82%) |Others=0.30 (7.91%)|CurSamplesPerSec=2.12 |AvgSamplesPerSec=2.35 epoch: 0|step: 3552|ppo_ep: 1|act_loss: 0.19140625|cri_loss: 0.155517578125|unsuper_loss: 0.0 average reward score: 3.58203125 ------------------------------------------------------------------------------------- |E2E latency=3.52s |Gather latency=0.00s (0.00%) |Generate time=2.47s (70.21%) |Training time=0.82s (23.31%) |Others=0.23 (6.48%)|CurSamplesPerSec=2.27 |AvgSamplesPerSec=2.35 epoch: 0|step: 3553|ppo_ep: 1|act_loss: 0.100830078125|cri_loss: 0.0845947265625|unsuper_loss: 0.0 average reward score: 3.3125 ------------------------------------------------------------------------------------- |E2E latency=3.42s |Gather latency=0.00s (0.00%) |Generate time=2.54s (74.32%) |Training time=0.64s (18.78%) |Others=0.24 (6.89%)|CurSamplesPerSec=2.34 |AvgSamplesPerSec=2.35 epoch: 0|step: 3554|ppo_ep: 1|act_loss: -0.0838623046875|cri_loss: -0.01129150390625|unsuper_loss: 0.0 average reward score: 3.923828125 ------------------------------------------------------------------------------------- |E2E latency=3.72s |Gather latency=0.00s (0.00%) |Generate time=2.70s (72.65%) |Training time=0.78s (21.03%) |Others=0.24 (6.32%)|CurSamplesPerSec=2.15 |AvgSamplesPerSec=2.35 epoch: 0|step: 3555|ppo_ep: 1|act_loss: 0.268798828125|cri_loss: 0.1856689453125|unsuper_loss: 0.0 average reward score: 2.53515625 ------------------------------------------------------------------------------------- |E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.63s (75.48%) |Training time=0.65s (18.52%) |Others=0.21 (6.01%)|CurSamplesPerSec=2.29 |AvgSamplesPerSec=2.35 epoch: 0|step: 3556|ppo_ep: 1|act_loss: 0.32421875|cri_loss: 0.20849609375|unsuper_loss: 0.0 average reward score: 3.123046875 ------------------------------------------------------------------------------------- |E2E latency=3.59s |Gather latency=0.00s (0.00%) |Generate time=2.72s (75.76%) |Training time=0.65s (18.07%) |Others=0.22 (6.17%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.35 epoch: 0|step: 3557|ppo_ep: 1|act_loss: 0.2373046875|cri_loss: 0.1505126953125|unsuper_loss: 0.0 average reward score: 4.328125 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.67s (74.07%) |Training time=0.70s (19.46%) |Others=0.23 (6.47%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.35 epoch: 0|step: 3558|ppo_ep: 1|act_loss: 0.175048828125|cri_loss: 0.126708984375|unsuper_loss: 0.0 average reward score: 2.68359375 ------------------------------------------------------------------------------------- |E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.57s (73.59%) |Training time=0.69s (19.83%) |Others=0.23 (6.57%)|CurSamplesPerSec=2.29 |AvgSamplesPerSec=2.35 epoch: 0|step: 3559|ppo_ep: 1|act_loss: 0.06927490234375|cri_loss: 0.0687255859375|unsuper_loss: 0.0 average reward score: 4.2421875 ------------------------------------------------------------------------------------- |E2E latency=3.86s |Gather latency=0.00s (0.00%) |Generate time=2.61s (67.60%) |Training time=0.93s (24.07%) |Others=0.32 (8.32%)|CurSamplesPerSec=2.07 |AvgSamplesPerSec=2.35 epoch: 0|step: 3560|ppo_ep: 1|act_loss: 0.08001708984375|cri_loss: 0.06689453125|unsuper_loss: 0.0 average reward score: 2.88671875 ------------------------------------------------------------------------------------- |E2E latency=3.66s |Gather latency=0.00s (0.00%) |Generate time=2.77s (75.75%) |Training time=0.65s (17.89%) |Others=0.23 (6.37%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.35 epoch: 0|step: 3561|ppo_ep: 1|act_loss: 0.01556396484375|cri_loss: 0.05126953125|unsuper_loss: 0.0 average reward score: 2.578125 ------------------------------------------------------------------------------------- |E2E latency=3.81s |Gather latency=0.00s (0.00%) |Generate time=2.94s (77.09%) |Training time=0.65s (16.95%) |Others=0.23 (5.96%)|CurSamplesPerSec=2.10 |AvgSamplesPerSec=2.35 epoch: 0|step: 3562|ppo_ep: 1|act_loss: 0.603515625|cri_loss: 0.35107421875|unsuper_loss: 0.0 average reward score: 3.140625 ------------------------------------------------------------------------------------- |E2E latency=3.62s |Gather latency=0.00s (0.00%) |Generate time=2.76s (76.16%) |Training time=0.64s (17.67%) |Others=0.22 (6.17%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.35 epoch: 0|step: 3563|ppo_ep: 1|act_loss: -0.1546630859375|cri_loss: -0.03375244140625|unsuper_loss: 0.0 average reward score: 2.55078125 ------------------------------------------------------------------------------------- |E2E latency=3.71s |Gather latency=0.00s (0.00%) |Generate time=2.85s (76.82%) |Training time=0.64s (17.34%) |Others=0.22 (5.84%)|CurSamplesPerSec=2.16 |AvgSamplesPerSec=2.35 epoch: 0|step: 3564|ppo_ep: 1|act_loss: -0.035888671875|cri_loss: 0.002960205078125|unsuper_loss: 0.0 average reward score: 4.2109375 ------------------------------------------------------------------------------------- |E2E latency=3.42s |Gather latency=0.00s (0.00%) |Generate time=2.54s (74.36%) |Training time=0.66s (19.24%) |Others=0.22 (6.40%)|CurSamplesPerSec=2.34 |AvgSamplesPerSec=2.35 epoch: 0|step: 3565|ppo_ep: 1|act_loss: 0.2008056640625|cri_loss: 0.1395263671875|unsuper_loss: 0.0 average reward score: 3.42578125 ------------------------------------------------------------------------------------- |E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.52s (72.15%) |Training time=0.75s (21.40%) |Others=0.23 (6.45%)|CurSamplesPerSec=2.29 |AvgSamplesPerSec=2.35 epoch: 0|step: 3566|ppo_ep: 1|act_loss: -0.1407470703125|cri_loss: -0.05389404296875|unsuper_loss: 0.0 average reward score: 3.63671875 ------------------------------------------------------------------------------------- |E2E latency=3.43s |Gather latency=0.00s (0.00%) |Generate time=2.55s (74.31%) |Training time=0.65s (18.83%) |Others=0.24 (6.86%)|CurSamplesPerSec=2.33 |AvgSamplesPerSec=2.35 epoch: 0|step: 3567|ppo_ep: 1|act_loss: 0.25830078125|cri_loss: 0.1595458984375|unsuper_loss: 0.0 average reward score: 2.1015625 ------------------------------------------------------------------------------------- |E2E latency=3.70s |Gather latency=0.00s (0.00%) |Generate time=2.45s (66.35%) |Training time=0.93s (25.22%) |Others=0.31 (8.43%)|CurSamplesPerSec=2.16 |AvgSamplesPerSec=2.35 epoch: 0|step: 3568|ppo_ep: 1|act_loss: 0.07879638671875|cri_loss: 0.07293701171875|unsuper_loss: 0.0 average reward score: 2.876953125 ------------------------------------------------------------------------------------- |E2E latency=3.55s |Gather latency=0.00s (0.00%) |Generate time=2.68s (75.43%) |Training time=0.64s (18.07%) |Others=0.23 (6.50%)|CurSamplesPerSec=2.25 |AvgSamplesPerSec=2.35 epoch: 0|step: 3569|ppo_ep: 1|act_loss: 0.10693359375|cri_loss: 0.0787353515625|unsuper_loss: 0.0 average reward score: 3.830078125 ------------------------------------------------------------------------------------- |E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.52s (74.07%) |Training time=0.64s (18.95%) |Others=0.24 (6.98%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.35 epoch: 0|step: 3570|ppo_ep: 1|act_loss: 0.197265625|cri_loss: 0.13134765625|unsuper_loss: 0.0 average reward score: 3.81640625 ------------------------------------------------------------------------------------- |E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.51s (74.37%) |Training time=0.64s (19.08%) |Others=0.22 (6.55%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.35 epoch: 0|step: 3571|ppo_ep: 1|act_loss: 0.1396484375|cri_loss: 0.10955810546875|unsuper_loss: 0.0 average reward score: 3.48828125 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.69s (75.20%) |Training time=0.65s (18.05%) |Others=0.24 (6.75%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.35 epoch: 0|step: 3572|ppo_ep: 1|act_loss: -0.0953369140625|cri_loss: -0.01629638671875|unsuper_loss: 0.0 average reward score: 4.27734375 ------------------------------------------------------------------------------------- |E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.50s (71.61%) |Training time=0.75s (21.58%) |Others=0.24 (6.81%)|CurSamplesPerSec=2.29 |AvgSamplesPerSec=2.35 epoch: 0|step: 3573|ppo_ep: 1|act_loss: 0.042755126953125|cri_loss: 0.050567626953125|unsuper_loss: 0.0 average reward score: 4.09375 ------------------------------------------------------------------------------------- |E2E latency=3.74s |Gather latency=0.00s (0.00%) |Generate time=2.87s (76.65%) |Training time=0.65s (17.37%) |Others=0.22 (5.98%)|CurSamplesPerSec=2.14 |AvgSamplesPerSec=2.35 epoch: 0|step: 3574|ppo_ep: 1|act_loss: 0.203369140625|cri_loss: 0.131591796875|unsuper_loss: 0.0 average reward score: 2.92578125 ------------------------------------------------------------------------------------- |E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.47s (71.10%) |Training time=0.78s (22.49%) |Others=0.22 (6.41%)|CurSamplesPerSec=2.30 |AvgSamplesPerSec=2.35 epoch: 0|step: 3575|ppo_ep: 1|act_loss: 0.0765380859375|cri_loss: 0.0635986328125|unsuper_loss: 0.0 average reward score: 3.5234375 ------------------------------------------------------------------------------------- |E2E latency=3.87s |Gather latency=0.00s (0.00%) |Generate time=2.63s (67.96%) |Training time=0.93s (24.12%) |Others=0.31 (7.91%)|CurSamplesPerSec=2.07 |AvgSamplesPerSec=2.35 epoch: 0|step: 3576|ppo_ep: 1|act_loss: 0.0050048828125|cri_loss: 0.04473876953125|unsuper_loss: 0.0 average reward score: 4.6484375 ------------------------------------------------------------------------------------- |E2E latency=3.43s |Gather latency=0.00s (0.00%) |Generate time=2.57s (74.93%) |Training time=0.64s (18.73%) |Others=0.22 (6.34%)|CurSamplesPerSec=2.33 |AvgSamplesPerSec=2.35 epoch: 0|step: 3577|ppo_ep: 1|act_loss: 0.4287109375|cri_loss: 0.26220703125|unsuper_loss: 0.0 average reward score: 3.09375 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.21%) |Training time=0.64s (19.13%) |Others=0.22 (6.66%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.35 epoch: 0|step: 3578|ppo_ep: 1|act_loss: -0.01171875|cri_loss: 0.031494140625|unsuper_loss: 0.0 average reward score: 3.73046875 ------------------------------------------------------------------------------------- |E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.52s (74.48%) |Training time=0.64s (18.93%) |Others=0.22 (6.58%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.35 epoch: 0|step: 3579|ppo_ep: 1|act_loss: 0.026153564453125|cri_loss: 0.042449951171875|unsuper_loss: 0.0 average reward score: 4.71875 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.48s (73.88%) |Training time=0.66s (19.70%) |Others=0.22 (6.41%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.35 epoch: 0|step: 3580|ppo_ep: 1|act_loss: 0.46435546875|cri_loss: 0.285400390625|unsuper_loss: 0.0 average reward score: 3.123046875 ------------------------------------------------------------------------------------- |E2E latency=3.43s |Gather latency=0.00s (0.00%) |Generate time=2.58s (75.09%) |Training time=0.65s (18.93%) |Others=0.21 (5.98%)|CurSamplesPerSec=2.33 |AvgSamplesPerSec=2.35 epoch: 0|step: 3581|ppo_ep: 1|act_loss: -0.0902099609375|cri_loss: -0.017364501953125|unsuper_loss: 0.0 average reward score: 3.703125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.36%) |Training time=0.65s (19.76%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.35 epoch: 0|step: 3582|ppo_ep: 1|act_loss: 0.057159423828125|cri_loss: 0.06329345703125|unsuper_loss: 0.0 average reward score: 2.443359375 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.43s (73.37%) |Training time=0.68s (20.57%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.35 epoch: 0|step: 3583|ppo_ep: 1|act_loss: 0.04571533203125|cri_loss: 0.05352783203125|unsuper_loss: 0.0 average reward score: 2.6640625 ------------------------------------------------------------------------------------- |E2E latency=3.74s |Gather latency=0.00s (0.00%) |Generate time=2.50s (66.70%) |Training time=0.93s (24.93%) |Others=0.31 (8.37%)|CurSamplesPerSec=2.14 |AvgSamplesPerSec=2.35 epoch: 0|step: 3584|ppo_ep: 1|act_loss: 0.471435546875|cri_loss: 0.2998046875|unsuper_loss: 0.0 average reward score: 4.41796875 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.89%) |Training time=0.64s (19.24%) |Others=0.20 (5.86%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.35 epoch: 0|step: 3585|ppo_ep: 1|act_loss: 0.1175537109375|cri_loss: 0.084716796875|unsuper_loss: 0.0 average reward score: 4.26953125 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.08%) |Training time=0.67s (19.99%) |Others=0.20 (5.93%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.35 epoch: 0|step: 3586|ppo_ep: 1|act_loss: -0.1009521484375|cri_loss: -0.020538330078125|unsuper_loss: 0.0 average reward score: 4.09375 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.53%) |Training time=0.64s (19.47%) |Others=0.20 (6.00%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.35 epoch: 0|step: 3587|ppo_ep: 1|act_loss: 0.156494140625|cri_loss: 0.1280517578125|unsuper_loss: 0.0 average reward score: 4.5859375 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.41%) |Training time=0.65s (19.45%) |Others=0.21 (6.13%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.35 epoch: 0|step: 3588|ppo_ep: 1|act_loss: -0.168212890625|cri_loss: -0.0416259765625|unsuper_loss: 0.0 average reward score: 1.5849609375 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.64%) |Training time=0.65s (19.34%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.35 epoch: 0|step: 3589|ppo_ep: 1|act_loss: 0.1912841796875|cri_loss: 0.136474609375|unsuper_loss: 0.0 average reward score: 4.1484375 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.45%) |Training time=0.65s (19.52%) |Others=0.20 (6.02%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.35 epoch: 0|step: 3590|ppo_ep: 1|act_loss: -0.224365234375|cri_loss: -0.07659912109375|unsuper_loss: 0.0 average reward score: 2.984375 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.55%) |Training time=0.64s (19.42%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.35 epoch: 0|step: 3591|ppo_ep: 1|act_loss: 0.0065460205078125|cri_loss: 0.03228759765625|unsuper_loss: 0.0 average reward score: 3.517578125 ------------------------------------------------------------------------------------- |E2E latency=3.67s |Gather latency=0.00s (0.00%) |Generate time=2.47s (67.19%) |Training time=0.92s (25.12%) |Others=0.28 (7.69%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.35 epoch: 0|step: 3592|ppo_ep: 1|act_loss: 0.2017822265625|cri_loss: 0.135498046875|unsuper_loss: 0.0 average reward score: 3.9375 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.44s (72.75%) |Training time=0.72s (21.52%) |Others=0.19 (5.73%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.35 epoch: 0|step: 3593|ppo_ep: 1|act_loss: 0.17578125|cri_loss: 0.136474609375|unsuper_loss: 0.0 average reward score: 3.91796875 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.96%) |Training time=0.64s (19.99%) |Others=0.19 (6.05%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.35 epoch: 0|step: 3594|ppo_ep: 1|act_loss: -0.0479736328125|cri_loss: 0.00933837890625|unsuper_loss: 0.0 average reward score: 4.5234375 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.52s (74.76%) |Training time=0.65s (19.23%) |Others=0.20 (6.01%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.35 epoch: 0|step: 3595|ppo_ep: 1|act_loss: 0.0457763671875|cri_loss: 0.0543212890625|unsuper_loss: 0.0 average reward score: 2.6875 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.45s (73.78%) |Training time=0.65s (19.67%) |Others=0.22 (6.55%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.35 epoch: 0|step: 3596|ppo_ep: 1|act_loss: 0.2349853515625|cri_loss: 0.158447265625|unsuper_loss: 0.0 average reward score: 1.931640625 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.50s (74.45%) |Training time=0.65s (19.49%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.35 epoch: 0|step: 3597|ppo_ep: 1|act_loss: 0.128173828125|cri_loss: 0.0899658203125|unsuper_loss: 0.0 average reward score: 4.15625 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.78%) |Training time=0.64s (19.32%) |Others=0.20 (5.90%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.35 epoch: 0|step: 3598|ppo_ep: 1|act_loss: 0.25439453125|cri_loss: 0.1533203125|unsuper_loss: 0.0 average reward score: 3.390625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.32s (72.39%) |Training time=0.69s (21.38%) |Others=0.20 (6.23%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.35 [2023-04-24 17:11:45,899] [INFO] [logging.py:96:log_dist] [Rank 0] step=450, skipped=5, lr=[1.6094793024775132e-07, 1.6094793024775132e-07], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 17:11:46,149] [INFO] [timer.py:199:stop] epoch=0/micro_step=3600/global_step=450, RunningAvgSamplesPerSec=15.265384707326893, CurrSamplesPerSec=14.820354050360795, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 17:11:46,360] [INFO] [logging.py:96:log_dist] [Rank 0] step=450, skipped=6, lr=[8.882706236405886e-08, 8.882706236405886e-08], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 3599|ppo_ep: 1|act_loss: 0.248291015625|cri_loss: 0.155029296875|unsuper_loss: 0.0 average reward score: 0.853515625 ------------------------------------------------------------------------------------- |E2E latency=3.66s |Gather latency=0.00s (0.00%) |Generate time=2.30s (62.83%) |Training time=1.08s (29.40%) |Others=0.28 (7.78%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.35 epoch: 0|step: 3600|ppo_ep: 1|act_loss: 0.058837890625|cri_loss: 0.07049560546875|unsuper_loss: 0.0 average reward score: 3.6640625 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.40s (72.82%) |Training time=0.68s (20.55%) |Others=0.22 (6.62%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.35 epoch: 0|step: 3601|ppo_ep: 1|act_loss: 0.303955078125|cri_loss: 0.1923828125|unsuper_loss: 0.0 average reward score: 3.484375 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.40s (72.19%) |Training time=0.72s (21.62%) |Others=0.21 (6.19%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.35 epoch: 0|step: 3602|ppo_ep: 1|act_loss: -0.1400146484375|cri_loss: -0.04949951171875|unsuper_loss: 0.0 average reward score: 4.46875 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.43s (73.63%) |Training time=0.66s (20.13%) |Others=0.21 (6.24%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.35 epoch: 0|step: 3603|ppo_ep: 1|act_loss: 0.1767578125|cri_loss: 0.12158203125|unsuper_loss: 0.0 average reward score: 3.83984375 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.16%) |Training time=0.64s (19.40%) |Others=0.21 (6.44%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.35 epoch: 0|step: 3604|ppo_ep: 1|act_loss: 0.1180419921875|cri_loss: 0.102294921875|unsuper_loss: 0.0 average reward score: 2.640625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.35%) |Training time=0.64s (19.47%) |Others=0.20 (6.18%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.35 epoch: 0|step: 3605|ppo_ep: 1|act_loss: 0.1197509765625|cri_loss: 0.1007080078125|unsuper_loss: 0.0 average reward score: 3.458984375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.14%) |Training time=0.64s (19.71%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.35 epoch: 0|step: 3606|ppo_ep: 1|act_loss: 0.08331298828125|cri_loss: 0.0799560546875|unsuper_loss: 0.0 average reward score: 4.09375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.60%) |Training time=0.65s (19.83%) |Others=0.21 (6.57%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.35 epoch: 0|step: 3607|ppo_ep: 1|act_loss: 0.31640625|cri_loss: 0.2060546875|unsuper_loss: 0.0 average reward score: 2.140625 ------------------------------------------------------------------------------------- |E2E latency=3.67s |Gather latency=0.00s (0.00%) |Generate time=2.44s (66.41%) |Training time=0.95s (25.93%) |Others=0.28 (7.66%)|CurSamplesPerSec=2.18 |AvgSamplesPerSec=2.35 epoch: 0|step: 3608|ppo_ep: 1|act_loss: 0.1259765625|cri_loss: 0.08795166015625|unsuper_loss: 0.0 average reward score: 3.35546875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.42s (73.78%) |Training time=0.67s (20.40%) |Others=0.19 (5.83%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.35 epoch: 0|step: 3609|ppo_ep: 1|act_loss: 0.17236328125|cri_loss: 0.1180419921875|unsuper_loss: 0.0 average reward score: 2.681640625 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.39s (72.43%) |Training time=0.71s (21.53%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.35 epoch: 0|step: 3610|ppo_ep: 1|act_loss: 0.07159423828125|cri_loss: 0.06292724609375|unsuper_loss: 0.0 average reward score: 2.40234375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.38%) |Training time=0.64s (19.64%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.35 epoch: 0|step: 3611|ppo_ep: 1|act_loss: 0.324462890625|cri_loss: 0.197509765625|unsuper_loss: 0.0 average reward score: 2.28125 ------------------------------------------------------------------------------------- |E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.53s (72.85%) |Training time=0.74s (21.23%) |Others=0.21 (5.93%)|CurSamplesPerSec=2.30 |AvgSamplesPerSec=2.35 epoch: 0|step: 3612|ppo_ep: 1|act_loss: 0.274658203125|cri_loss: 0.1741943359375|unsuper_loss: 0.0 average reward score: 2.578125 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.41%) |Training time=0.64s (19.42%) |Others=0.20 (6.18%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.35 epoch: 0|step: 3613|ppo_ep: 1|act_loss: -0.04248046875|cri_loss: -0.004913330078125|unsuper_loss: 0.0 average reward score: 2.982421875 ------------------------------------------------------------------------------------- |E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.56s (74.00%) |Training time=0.65s (18.82%) |Others=0.25 (7.17%)|CurSamplesPerSec=2.31 |AvgSamplesPerSec=2.35 epoch: 0|step: 3614|ppo_ep: 1|act_loss: -0.1163330078125|cri_loss: -0.028167724609375|unsuper_loss: 0.0 average reward score: 2.39453125 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.49%) |Training time=0.64s (19.23%) |Others=0.21 (6.28%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.35 epoch: 0|step: 3615|ppo_ep: 1|act_loss: -0.1507568359375|cri_loss: -0.053741455078125|unsuper_loss: 0.0 average reward score: 2.640625 ------------------------------------------------------------------------------------- |E2E latency=3.83s |Gather latency=0.00s (0.00%) |Generate time=2.58s (67.50%) |Training time=0.96s (25.02%) |Others=0.29 (7.48%)|CurSamplesPerSec=2.09 |AvgSamplesPerSec=2.35 epoch: 0|step: 3616|ppo_ep: 1|act_loss: 0.030242919921875|cri_loss: 0.0347900390625|unsuper_loss: 0.0 average reward score: 4.125 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.71%) |Training time=0.64s (19.24%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.35 epoch: 0|step: 3617|ppo_ep: 1|act_loss: 0.1534423828125|cri_loss: 0.124755859375|unsuper_loss: 0.0 average reward score: 3.958984375 ------------------------------------------------------------------------------------- |E2E latency=3.41s |Gather latency=0.00s (0.00%) |Generate time=2.55s (74.77%) |Training time=0.64s (18.79%) |Others=0.22 (6.44%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.35 epoch: 0|step: 3618|ppo_ep: 1|act_loss: 0.17431640625|cri_loss: 0.123046875|unsuper_loss: 0.0 average reward score: 1.1298828125 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.46s (73.35%) |Training time=0.69s (20.57%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.35 epoch: 0|step: 3619|ppo_ep: 1|act_loss: 0.00970458984375|cri_loss: 0.036468505859375|unsuper_loss: 0.0 average reward score: 2.58203125 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.49s (73.95%) |Training time=0.67s (19.91%) |Others=0.21 (6.14%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.35 epoch: 0|step: 3620|ppo_ep: 1|act_loss: 0.1136474609375|cri_loss: 0.0848388671875|unsuper_loss: 0.0 average reward score: 3.826171875 ------------------------------------------------------------------------------------- |E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.44s (70.58%) |Training time=0.82s (23.70%) |Others=0.20 (5.72%)|CurSamplesPerSec=2.31 |AvgSamplesPerSec=2.35 epoch: 0|step: 3621|ppo_ep: 1|act_loss: 0.12158203125|cri_loss: 0.0911865234375|unsuper_loss: 0.0 average reward score: 3.9375 ------------------------------------------------------------------------------------- |E2E latency=3.47s |Gather latency=0.00s (0.00%) |Generate time=2.44s (70.45%) |Training time=0.81s (23.31%) |Others=0.22 (6.24%)|CurSamplesPerSec=2.31 |AvgSamplesPerSec=2.35 epoch: 0|step: 3622|ppo_ep: 1|act_loss: -0.12109375|cri_loss: -0.031585693359375|unsuper_loss: 0.0 average reward score: 2.44921875 ------------------------------------------------------------------------------------- |E2E latency=3.44s |Gather latency=0.00s (0.00%) |Generate time=2.46s (71.68%) |Training time=0.77s (22.40%) |Others=0.20 (5.92%)|CurSamplesPerSec=2.33 |AvgSamplesPerSec=2.35 epoch: 0|step: 3623|ppo_ep: 1|act_loss: 0.034820556640625|cri_loss: 0.039581298828125|unsuper_loss: 0.0 average reward score: 2.826171875 ------------------------------------------------------------------------------------- |E2E latency=3.64s |Gather latency=0.00s (0.00%) |Generate time=2.42s (66.44%) |Training time=0.93s (25.67%) |Others=0.29 (7.89%)|CurSamplesPerSec=2.20 |AvgSamplesPerSec=2.35 epoch: 0|step: 3624|ppo_ep: 1|act_loss: -0.013671875|cri_loss: 0.02825927734375|unsuper_loss: 0.0 average reward score: 3.5859375 ------------------------------------------------------------------------------------- |E2E latency=3.43s |Gather latency=0.00s (0.00%) |Generate time=2.47s (71.85%) |Training time=0.76s (22.02%) |Others=0.21 (6.13%)|CurSamplesPerSec=2.33 |AvgSamplesPerSec=2.35 epoch: 0|step: 3625|ppo_ep: 1|act_loss: 0.117431640625|cri_loss: 0.093994140625|unsuper_loss: 0.0 average reward score: 2.314453125 ------------------------------------------------------------------------------------- |E2E latency=3.34s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.48%) |Training time=0.65s (19.47%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.35 epoch: 0|step: 3626|ppo_ep: 1|act_loss: 0.24951171875|cri_loss: 0.1571044921875|unsuper_loss: 0.0 average reward score: 3.96875 ------------------------------------------------------------------------------------- |E2E latency=3.48s |Gather latency=0.00s (0.00%) |Generate time=2.61s (75.09%) |Training time=0.65s (18.61%) |Others=0.22 (6.30%)|CurSamplesPerSec=2.30 |AvgSamplesPerSec=2.35 epoch: 0|step: 3627|ppo_ep: 1|act_loss: 0.217529296875|cri_loss: 0.138427734375|unsuper_loss: 0.0 average reward score: 3.6953125 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.53s (75.04%) |Training time=0.64s (19.08%) |Others=0.20 (5.88%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.35 epoch: 0|step: 3628|ppo_ep: 1|act_loss: 0.011932373046875|cri_loss: 0.028656005859375|unsuper_loss: 0.0 average reward score: 1.9501953125 ------------------------------------------------------------------------------------- |E2E latency=3.38s |Gather latency=0.00s (0.00%) |Generate time=2.52s (74.55%) |Training time=0.64s (18.94%) |Others=0.22 (6.51%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.35 epoch: 0|step: 3629|ppo_ep: 1|act_loss: 0.1785888671875|cri_loss: 0.1326904296875|unsuper_loss: 0.0 average reward score: 3.8828125 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.49s (75.00%) |Training time=0.64s (19.22%) |Others=0.19 (5.78%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.35 epoch: 0|step: 3630|ppo_ep: 1|act_loss: 0.31640625|cri_loss: 0.2008056640625|unsuper_loss: 0.0 average reward score: 2.234375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.04%) |Training time=0.64s (19.79%) |Others=0.20 (6.18%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.35 epoch: 0|step: 3631|ppo_ep: 1|act_loss: -0.1990966796875|cri_loss: -0.0638427734375|unsuper_loss: 0.0 average reward score: 3.013671875 ------------------------------------------------------------------------------------- |E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.34s (66.30%) |Training time=0.92s (25.97%) |Others=0.27 (7.73%)|CurSamplesPerSec=2.26 |AvgSamplesPerSec=2.35 epoch: 0|step: 3632|ppo_ep: 1|act_loss: -0.01611328125|cri_loss: 0.020355224609375|unsuper_loss: 0.0 average reward score: 3.21875 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.31%) |Training time=0.63s (19.80%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.35 epoch: 0|step: 3633|ppo_ep: 1|act_loss: 0.105712890625|cri_loss: 0.08233642578125|unsuper_loss: 0.0 average reward score: 3.43359375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.39%) |Training time=0.64s (19.65%) |Others=0.23 (6.96%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.35 epoch: 0|step: 3634|ppo_ep: 1|act_loss: 0.13427734375|cri_loss: 0.100830078125|unsuper_loss: 0.0 average reward score: 4.22265625 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.63%) |Training time=0.64s (19.42%) |Others=0.20 (5.96%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.35 epoch: 0|step: 3635|ppo_ep: 1|act_loss: 0.0347900390625|cri_loss: 0.06396484375|unsuper_loss: 0.0 average reward score: 4.4375 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.40%) |Training time=0.65s (19.76%) |Others=0.19 (5.84%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.35 epoch: 0|step: 3636|ppo_ep: 1|act_loss: 0.1710205078125|cri_loss: 0.1241455078125|unsuper_loss: 0.0 average reward score: 1.61328125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.22%) |Training time=0.64s (19.72%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.35 epoch: 0|step: 3637|ppo_ep: 1|act_loss: 0.0269317626953125|cri_loss: 0.04815673828125|unsuper_loss: 0.0 average reward score: 4.6328125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.04%) |Training time=0.65s (19.86%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.35 epoch: 0|step: 3638|ppo_ep: 1|act_loss: 0.096923828125|cri_loss: 0.10076904296875|unsuper_loss: 0.0 average reward score: 2.921875 ------------------------------------------------------------------------------------- |E2E latency=3.45s |Gather latency=0.00s (0.00%) |Generate time=2.59s (75.15%) |Training time=0.65s (18.90%) |Others=0.21 (5.95%)|CurSamplesPerSec=2.32 |AvgSamplesPerSec=2.35 epoch: 0|step: 3639|ppo_ep: 1|act_loss: -0.023345947265625|cri_loss: 0.016387939453125|unsuper_loss: 0.0 average reward score: -0.000244140625 ------------------------------------------------------------------------------------- |E2E latency=3.76s |Gather latency=0.00s (0.00%) |Generate time=2.53s (67.10%) |Training time=0.94s (24.93%) |Others=0.30 (7.98%)|CurSamplesPerSec=2.13 |AvgSamplesPerSec=2.35 epoch: 0|step: 3640|ppo_ep: 1|act_loss: 0.1400146484375|cri_loss: 0.10198974609375|unsuper_loss: 0.0 average reward score: 3.681640625 ------------------------------------------------------------------------------------- |E2E latency=3.43s |Gather latency=0.00s (0.00%) |Generate time=2.59s (75.52%) |Training time=0.64s (18.81%) |Others=0.19 (5.67%)|CurSamplesPerSec=2.33 |AvgSamplesPerSec=2.35 epoch: 0|step: 3641|ppo_ep: 1|act_loss: -0.03155517578125|cri_loss: 0.01043701171875|unsuper_loss: 0.0 average reward score: 2.3828125 ------------------------------------------------------------------------------------- |E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.56s (74.15%) |Training time=0.69s (19.98%) |Others=0.20 (5.87%)|CurSamplesPerSec=2.31 |AvgSamplesPerSec=2.35 epoch: 0|step: 3642|ppo_ep: 1|act_loss: 0.02392578125|cri_loss: 0.060882568359375|unsuper_loss: 0.0 average reward score: 2.509765625 ------------------------------------------------------------------------------------- |E2E latency=3.54s |Gather latency=0.00s (0.00%) |Generate time=2.50s (70.52%) |Training time=0.84s (23.75%) |Others=0.20 (5.72%)|CurSamplesPerSec=2.26 |AvgSamplesPerSec=2.35 epoch: 0|step: 3643|ppo_ep: 1|act_loss: 0.419921875|cri_loss: 0.26904296875|unsuper_loss: 0.0 average reward score: 2.0625 ------------------------------------------------------------------------------------- |E2E latency=3.44s |Gather latency=0.00s (0.00%) |Generate time=2.57s (74.84%) |Training time=0.65s (18.89%) |Others=0.22 (6.27%)|CurSamplesPerSec=2.33 |AvgSamplesPerSec=2.35 epoch: 0|step: 3644|ppo_ep: 1|act_loss: -0.00897216796875|cri_loss: 0.019134521484375|unsuper_loss: 0.0 average reward score: 3.59765625 ------------------------------------------------------------------------------------- |E2E latency=3.46s |Gather latency=0.00s (0.00%) |Generate time=2.52s (72.87%) |Training time=0.71s (20.41%) |Others=0.23 (6.71%)|CurSamplesPerSec=2.31 |AvgSamplesPerSec=2.35 epoch: 0|step: 3645|ppo_ep: 1|act_loss: -0.06854248046875|cri_loss: -0.00128173828125|unsuper_loss: 0.0 average reward score: 3.67578125 ------------------------------------------------------------------------------------- |E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.52s (74.44%) |Training time=0.66s (19.40%) |Others=0.21 (6.17%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.35 epoch: 0|step: 3646|ppo_ep: 1|act_loss: 0.3046875|cri_loss: 0.1922607421875|unsuper_loss: 0.0 average reward score: 2.828125 ------------------------------------------------------------------------------------- |E2E latency=3.49s |Gather latency=0.00s (0.00%) |Generate time=2.65s (75.93%) |Training time=0.64s (18.40%) |Others=0.20 (5.67%)|CurSamplesPerSec=2.29 |AvgSamplesPerSec=2.35 epoch: 0|step: 3647|ppo_ep: 1|act_loss: -0.0615234375|cri_loss: -0.0155181884765625|unsuper_loss: 0.0 average reward score: 4.46875 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.40s (66.57%) |Training time=0.93s (25.71%) |Others=0.28 (7.72%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.35 epoch: 0|step: 3648|ppo_ep: 1|act_loss: 0.071044921875|cri_loss: 0.063232421875|unsuper_loss: 0.0 average reward score: 3.41796875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.27%) |Training time=0.64s (19.79%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.35 epoch: 0|step: 3649|ppo_ep: 1|act_loss: -0.11962890625|cri_loss: -0.03790283203125|unsuper_loss: 0.0 average reward score: 3.07421875 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.41s (73.64%) |Training time=0.67s (20.50%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.35 epoch: 0|step: 3650|ppo_ep: 1|act_loss: 0.0703125|cri_loss: 0.07537841796875|unsuper_loss: 0.0 average reward score: 3.591796875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.00%) |Training time=0.64s (19.87%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.35 epoch: 0|step: 3651|ppo_ep: 1|act_loss: 0.1722412109375|cri_loss: 0.13232421875|unsuper_loss: 0.0 average reward score: 2.15234375 ------------------------------------------------------------------------------------- |E2E latency=3.29s |Gather latency=0.00s (0.00%) |Generate time=2.45s (74.62%) |Training time=0.64s (19.53%) |Others=0.19 (5.85%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.35 epoch: 0|step: 3652|ppo_ep: 1|act_loss: 0.21435546875|cri_loss: 0.14990234375|unsuper_loss: 0.0 average reward score: 3.4296875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.53%) |Training time=0.64s (19.75%) |Others=0.22 (6.72%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.35 epoch: 0|step: 3653|ppo_ep: 1|act_loss: 0.1697998046875|cri_loss: 0.1158447265625|unsuper_loss: 0.0 average reward score: 3.76953125 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.76%) |Training time=0.64s (20.13%) |Others=0.19 (6.11%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.35 epoch: 0|step: 3654|ppo_ep: 1|act_loss: 0.07470703125|cri_loss: 0.06304931640625|unsuper_loss: 0.0 average reward score: 1.458984375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.01%) |Training time=0.64s (19.71%) |Others=0.20 (6.28%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.35 epoch: 0|step: 3655|ppo_ep: 1|act_loss: -0.0888671875|cri_loss: -0.01348876953125|unsuper_loss: 0.0 average reward score: 4.703125 ------------------------------------------------------------------------------------- |E2E latency=3.65s |Gather latency=0.00s (0.00%) |Generate time=2.44s (66.97%) |Training time=0.92s (25.34%) |Others=0.28 (7.69%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.35 epoch: 0|step: 3656|ppo_ep: 1|act_loss: 0.049774169921875|cri_loss: 0.04339599609375|unsuper_loss: 0.0 average reward score: 3.306640625 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.79%) |Training time=0.64s (20.10%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.35 epoch: 0|step: 3657|ppo_ep: 1|act_loss: 0.25|cri_loss: 0.15478515625|unsuper_loss: 0.0 average reward score: 2.59765625 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.43%) |Training time=0.65s (20.39%) |Others=0.20 (6.18%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.35 epoch: 0|step: 3658|ppo_ep: 1|act_loss: -0.0843505859375|cri_loss: -0.015289306640625|unsuper_loss: 0.0 average reward score: 4.34765625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.19%) |Training time=0.66s (20.42%) |Others=0.21 (6.39%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.35 epoch: 0|step: 3659|ppo_ep: 1|act_loss: -0.047637939453125|cri_loss: -0.0013427734375|unsuper_loss: 0.0 average reward score: 3.220703125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.26%) |Training time=0.65s (19.76%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.35 epoch: 0|step: 3660|ppo_ep: 1|act_loss: -0.07672119140625|cri_loss: -0.01312255859375|unsuper_loss: 0.0 average reward score: 4.08203125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.27%) |Training time=0.64s (19.64%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.35 epoch: 0|step: 3661|ppo_ep: 1|act_loss: -0.100341796875|cri_loss: -0.034271240234375|unsuper_loss: 0.0 average reward score: 3.51953125 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.38%) |Training time=0.65s (19.69%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.35 epoch: 0|step: 3662|ppo_ep: 1|act_loss: -0.0697021484375|cri_loss: -0.015533447265625|unsuper_loss: 0.0 average reward score: 4.05078125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.03%) |Training time=0.64s (19.84%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.35 epoch: 0|step: 3663|ppo_ep: 1|act_loss: 0.37890625|cri_loss: 0.23583984375|unsuper_loss: 0.0 average reward score: 2.796875 ------------------------------------------------------------------------------------- |E2E latency=3.59s |Gather latency=0.00s (0.00%) |Generate time=2.39s (66.41%) |Training time=0.93s (25.77%) |Others=0.28 (7.82%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.35 epoch: 0|step: 3664|ppo_ep: 1|act_loss: 0.176025390625|cri_loss: 0.1156005859375|unsuper_loss: 0.0 average reward score: 4.05859375 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.44%) |Training time=0.63s (19.69%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.35 epoch: 0|step: 3665|ppo_ep: 1|act_loss: 0.02239990234375|cri_loss: 0.03582763671875|unsuper_loss: 0.0 average reward score: 3.740234375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.10%) |Training time=0.64s (19.88%) |Others=0.19 (6.02%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.35 epoch: 0|step: 3666|ppo_ep: 1|act_loss: -0.281982421875|cri_loss: -0.10919189453125|unsuper_loss: 0.0 average reward score: 3.533203125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.93%) |Training time=0.64s (19.93%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.35 epoch: 0|step: 3667|ppo_ep: 1|act_loss: 0.10784912109375|cri_loss: 0.06982421875|unsuper_loss: 0.0 average reward score: 3.091796875 ------------------------------------------------------------------------------------- |E2E latency=4.13s |Gather latency=0.00s (0.00%) |Generate time=2.71s (65.56%) |Training time=1.22s (29.48%) |Others=0.21 (4.97%)|CurSamplesPerSec=1.94 |AvgSamplesPerSec=2.35 epoch: 0|step: 3668|ppo_ep: 1|act_loss: -0.18994140625|cri_loss: -0.0780029296875|unsuper_loss: 0.0 average reward score: 3.6875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.14%) |Training time=0.64s (19.82%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.35 epoch: 0|step: 3669|ppo_ep: 1|act_loss: 0.26025390625|cri_loss: 0.1669921875|unsuper_loss: 0.0 average reward score: 1.810546875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.34%) |Training time=0.64s (19.73%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.35 epoch: 0|step: 3670|ppo_ep: 1|act_loss: 0.15380859375|cri_loss: 0.0989990234375|unsuper_loss: 0.0 average reward score: 2.3125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.12%) |Training time=0.64s (19.83%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.35 epoch: 0|step: 3671|ppo_ep: 1|act_loss: 0.23974609375|cri_loss: 0.1392822265625|unsuper_loss: 0.0 average reward score: 3.21875 ------------------------------------------------------------------------------------- |E2E latency=3.60s |Gather latency=0.00s (0.00%) |Generate time=2.40s (66.61%) |Training time=0.93s (25.79%) |Others=0.27 (7.60%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.35 epoch: 0|step: 3672|ppo_ep: 1|act_loss: 0.1943359375|cri_loss: 0.1209716796875|unsuper_loss: 0.0 average reward score: 3.546875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.05%) |Training time=0.64s (19.87%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.35 epoch: 0|step: 3673|ppo_ep: 1|act_loss: 0.170166015625|cri_loss: 0.1298828125|unsuper_loss: 0.0 average reward score: 1.5830078125 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (74.09%) |Training time=0.64s (19.96%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.35 epoch: 0|step: 3674|ppo_ep: 1|act_loss: 0.035888671875|cri_loss: 0.04901123046875|unsuper_loss: 0.0 average reward score: 3.6171875 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.91%) |Training time=0.64s (19.98%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.35 epoch: 0|step: 3675|ppo_ep: 1|act_loss: -0.32470703125|cri_loss: -0.122314453125|unsuper_loss: 0.0 average reward score: 3.70703125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.20%) |Training time=0.64s (19.88%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.35 epoch: 0|step: 3676|ppo_ep: 1|act_loss: 0.024627685546875|cri_loss: 0.042388916015625|unsuper_loss: 0.0 average reward score: 4.69140625 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.32s (73.25%) |Training time=0.64s (20.05%) |Others=0.21 (6.70%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.35 epoch: 0|step: 3677|ppo_ep: 1|act_loss: 0.1468505859375|cri_loss: 0.1304931640625|unsuper_loss: 0.0 average reward score: 3.435546875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.26%) |Training time=0.64s (19.78%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.35 epoch: 0|step: 3678|ppo_ep: 1|act_loss: 0.051849365234375|cri_loss: 0.05096435546875|unsuper_loss: 0.0 average reward score: 3.302734375 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.51%) |Training time=0.64s (20.13%) |Others=0.20 (6.36%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.35 [2023-04-24 17:16:14,567] [INFO] [logging.py:96:log_dist] [Rank 0] step=460, skipped=5, lr=[7.408247352072008e-08, 7.408247352072008e-08], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 17:16:14,813] [INFO] [timer.py:199:stop] epoch=0/micro_step=3680/global_step=460, RunningAvgSamplesPerSec=15.262481084202106, CurrSamplesPerSec=15.805546330983304, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 17:16:15,012] [INFO] [logging.py:96:log_dist] [Rank 0] step=460, skipped=6, lr=[4.211688449084123e-08, 4.211688449084123e-08], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 3679|ppo_ep: 1|act_loss: 0.1318359375|cri_loss: 0.1199951171875|unsuper_loss: 0.0 average reward score: 4.2734375 ------------------------------------------------------------------------------------- |E2E latency=3.51s |Gather latency=0.00s (0.00%) |Generate time=2.30s (65.43%) |Training time=0.94s (26.78%) |Others=0.27 (7.80%)|CurSamplesPerSec=2.28 |AvgSamplesPerSec=2.35 epoch: 0|step: 3680|ppo_ep: 1|act_loss: 0.031646728515625|cri_loss: 0.042877197265625|unsuper_loss: 0.0 average reward score: 3.140625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.39%) |Training time=0.66s (20.36%) |Others=0.20 (6.25%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.35 epoch: 0|step: 3681|ppo_ep: 1|act_loss: -0.1036376953125|cri_loss: -0.01861572265625|unsuper_loss: 0.0 average reward score: 2.7734375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.51%) |Training time=0.64s (19.62%) |Others=0.19 (5.87%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.35 epoch: 0|step: 3682|ppo_ep: 1|act_loss: -0.002838134765625|cri_loss: 0.016815185546875|unsuper_loss: 0.0 average reward score: 4.2421875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.79%) |Training time=0.64s (20.07%) |Others=0.20 (6.14%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.35 epoch: 0|step: 3683|ppo_ep: 1|act_loss: 0.009521484375|cri_loss: 0.032958984375|unsuper_loss: 0.0 average reward score: 2.41015625 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.21%) |Training time=0.65s (19.71%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.35 epoch: 0|step: 3684|ppo_ep: 1|act_loss: 0.338134765625|cri_loss: 0.2098388671875|unsuper_loss: 0.0 average reward score: 2.404296875 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.42s (74.20%) |Training time=0.64s (19.69%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.35 epoch: 0|step: 3685|ppo_ep: 1|act_loss: -0.00396728515625|cri_loss: 0.0419921875|unsuper_loss: 0.0 average reward score: 3.341796875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.81%) |Training time=0.65s (20.18%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.35 epoch: 0|step: 3686|ppo_ep: 1|act_loss: 0.0697021484375|cri_loss: 0.05816650390625|unsuper_loss: 0.0 average reward score: 3.1640625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.02%) |Training time=0.65s (19.96%) |Others=0.19 (6.02%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.35 epoch: 0|step: 3687|ppo_ep: 1|act_loss: -0.097900390625|cri_loss: -0.028533935546875|unsuper_loss: 0.0 average reward score: 2.24609375 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.38s (66.47%) |Training time=0.93s (25.90%) |Others=0.27 (7.63%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.35 epoch: 0|step: 3688|ppo_ep: 1|act_loss: -0.067626953125|cri_loss: -0.014617919921875|unsuper_loss: 0.0 average reward score: 3.841796875 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (74.23%) |Training time=0.64s (19.87%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.35 epoch: 0|step: 3689|ppo_ep: 1|act_loss: -0.05609130859375|cri_loss: -0.0069580078125|unsuper_loss: 0.0 average reward score: 4.28125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.13%) |Training time=0.64s (19.90%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.35 epoch: 0|step: 3690|ppo_ep: 1|act_loss: 0.073974609375|cri_loss: 0.04998779296875|unsuper_loss: 0.0 average reward score: 1.66796875 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.16%) |Training time=0.64s (19.82%) |Others=0.19 (6.02%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.35 epoch: 0|step: 3691|ppo_ep: 1|act_loss: -0.0628662109375|cri_loss: -0.006683349609375|unsuper_loss: 0.0 average reward score: 2.93359375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.03%) |Training time=0.64s (19.87%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.35 epoch: 0|step: 3692|ppo_ep: 1|act_loss: -0.00128173828125|cri_loss: 0.0265350341796875|unsuper_loss: 0.0 average reward score: 3.78125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.91%) |Training time=0.65s (20.03%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.35 epoch: 0|step: 3693|ppo_ep: 1|act_loss: 0.0849609375|cri_loss: 0.06988525390625|unsuper_loss: 0.0 average reward score: 4.3828125 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.21%) |Training time=0.64s (19.86%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.35 epoch: 0|step: 3694|ppo_ep: 1|act_loss: -0.0196533203125|cri_loss: 0.024169921875|unsuper_loss: 0.0 average reward score: 3.671875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.12%) |Training time=0.64s (19.84%) |Others=0.19 (6.04%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.35 epoch: 0|step: 3695|ppo_ep: 1|act_loss: -0.04022216796875|cri_loss: -0.007110595703125|unsuper_loss: 0.0 average reward score: 2.73828125 ------------------------------------------------------------------------------------- |E2E latency=3.59s |Gather latency=0.00s (0.00%) |Generate time=2.39s (66.66%) |Training time=0.92s (25.68%) |Others=0.28 (7.66%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.35 epoch: 0|step: 3696|ppo_ep: 1|act_loss: 0.126220703125|cri_loss: 0.09454345703125|unsuper_loss: 0.0 average reward score: 2.90625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.39%) |Training time=0.64s (19.75%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.35 epoch: 0|step: 3697|ppo_ep: 1|act_loss: 0.0029296875|cri_loss: 0.02032470703125|unsuper_loss: 0.0 average reward score: 4.2109375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.13%) |Training time=0.64s (19.77%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.35 epoch: 0|step: 3698|ppo_ep: 1|act_loss: 0.171875|cri_loss: 0.122802734375|unsuper_loss: 0.0 average reward score: 3.126953125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.04%) |Training time=0.64s (19.88%) |Others=0.20 (6.07%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.35 epoch: 0|step: 3699|ppo_ep: 1|act_loss: 0.2269287109375|cri_loss: 0.1400146484375|unsuper_loss: 0.0 average reward score: 3.02734375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.17%) |Training time=0.64s (19.85%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.35 epoch: 0|step: 3700|ppo_ep: 1|act_loss: -0.29296875|cri_loss: -0.1182861328125|unsuper_loss: 0.0 average reward score: 3.955078125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.08%) |Training time=0.64s (19.83%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.35 epoch: 0|step: 3701|ppo_ep: 1|act_loss: -0.09051513671875|cri_loss: -0.03485107421875|unsuper_loss: 0.0 average reward score: 3.0 ------------------------------------------------------------------------------------- |E2E latency=3.12s |Gather latency=0.00s (0.00%) |Generate time=2.29s (73.28%) |Training time=0.64s (20.58%) |Others=0.19 (6.13%)|CurSamplesPerSec=2.56 |AvgSamplesPerSec=2.35 epoch: 0|step: 3702|ppo_ep: 1|act_loss: 0.43115234375|cri_loss: 0.255859375|unsuper_loss: 0.0 average reward score: 2.9140625 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (74.09%) |Training time=0.63s (19.85%) |Others=0.19 (6.06%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.35 epoch: 0|step: 3703|ppo_ep: 1|act_loss: 0.265625|cri_loss: 0.151123046875|unsuper_loss: 0.0 average reward score: 3.75390625 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.38s (66.39%) |Training time=0.93s (25.86%) |Others=0.28 (7.76%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.35 epoch: 0|step: 3704|ppo_ep: 1|act_loss: 0.195556640625|cri_loss: 0.14404296875|unsuper_loss: 0.0 average reward score: 1.287109375 ------------------------------------------------------------------------------------- |E2E latency=3.35s |Gather latency=0.00s (0.00%) |Generate time=2.53s (75.47%) |Training time=0.64s (18.94%) |Others=0.19 (5.59%)|CurSamplesPerSec=2.39 |AvgSamplesPerSec=2.35 epoch: 0|step: 3705|ppo_ep: 1|act_loss: 0.12939453125|cri_loss: 0.095458984375|unsuper_loss: 0.0 average reward score: 3.58203125 ------------------------------------------------------------------------------------- |E2E latency=3.18s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.69%) |Training time=0.64s (20.28%) |Others=0.19 (6.03%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.35 epoch: 0|step: 3706|ppo_ep: 1|act_loss: 0.0296783447265625|cri_loss: 0.046630859375|unsuper_loss: 0.0 average reward score: 3.185546875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.95%) |Training time=0.64s (19.93%) |Others=0.20 (6.12%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.35 epoch: 0|step: 3707|ppo_ep: 1|act_loss: 0.05517578125|cri_loss: 0.056243896484375|unsuper_loss: 0.0 average reward score: 3.0703125 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.48s (74.71%) |Training time=0.65s (19.50%) |Others=0.19 (5.78%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.35 epoch: 0|step: 3708|ppo_ep: 1|act_loss: -0.055084228515625|cri_loss: 0.001495361328125|unsuper_loss: 0.0 average reward score: 3.857421875 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.75%) |Training time=0.64s (19.35%) |Others=0.19 (5.90%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.35 epoch: 0|step: 3709|ppo_ep: 1|act_loss: 0.100830078125|cri_loss: 0.08270263671875|unsuper_loss: 0.0 average reward score: 3.640625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.16%) |Training time=0.65s (19.91%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.35 epoch: 0|step: 3710|ppo_ep: 1|act_loss: -0.1011962890625|cri_loss: -0.02764892578125|unsuper_loss: 0.0 average reward score: 3.8203125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.24%) |Training time=0.64s (19.79%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.35 epoch: 0|step: 3711|ppo_ep: 1|act_loss: 0.281005859375|cri_loss: 0.182373046875|unsuper_loss: 0.0 average reward score: 2.60546875 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.86%) |Training time=0.92s (25.53%) |Others=0.27 (7.61%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.35 epoch: 0|step: 3712|ppo_ep: 1|act_loss: 0.08648681640625|cri_loss: 0.082275390625|unsuper_loss: 0.0 average reward score: 2.890625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.57%) |Training time=0.67s (20.47%) |Others=0.19 (5.95%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.35 epoch: 0|step: 3713|ppo_ep: 1|act_loss: 0.1278076171875|cri_loss: 0.0970458984375|unsuper_loss: 0.0 average reward score: 2.46875 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.37s (71.62%) |Training time=0.74s (22.47%) |Others=0.20 (5.90%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.35 epoch: 0|step: 3714|ppo_ep: 1|act_loss: 0.1602783203125|cri_loss: 0.110595703125|unsuper_loss: 0.0 average reward score: 3.55078125 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.37s (71.88%) |Training time=0.73s (22.20%) |Others=0.20 (5.92%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.35 epoch: 0|step: 3715|ppo_ep: 1|act_loss: 0.0606689453125|cri_loss: 0.05291748046875|unsuper_loss: 0.0 average reward score: 3.84765625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.38%) |Training time=0.67s (20.64%) |Others=0.19 (5.98%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.35 epoch: 0|step: 3716|ppo_ep: 1|act_loss: 0.415771484375|cri_loss: 0.281494140625|unsuper_loss: 0.0 average reward score: 2.810546875 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.35s (71.80%) |Training time=0.72s (21.88%) |Others=0.21 (6.33%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.35 epoch: 0|step: 3717|ppo_ep: 1|act_loss: -0.0059814453125|cri_loss: 0.02960205078125|unsuper_loss: 0.0 average reward score: 2.361328125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.10%) |Training time=0.64s (19.82%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.35 epoch: 0|step: 3718|ppo_ep: 1|act_loss: -0.0684814453125|cri_loss: -0.01483154296875|unsuper_loss: 0.0 average reward score: 3.255859375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.18%) |Training time=0.64s (19.76%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.35 epoch: 0|step: 3719|ppo_ep: 1|act_loss: 0.148193359375|cri_loss: 0.09033203125|unsuper_loss: 0.0 average reward score: 2.197265625 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.74%) |Training time=0.92s (25.55%) |Others=0.28 (7.71%)|CurSamplesPerSec=2.22 |AvgSamplesPerSec=2.35 epoch: 0|step: 3720|ppo_ep: 1|act_loss: 0.29345703125|cri_loss: 0.18603515625|unsuper_loss: 0.0 average reward score: 1.3984375 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.19%) |Training time=0.64s (19.85%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.35 epoch: 0|step: 3721|ppo_ep: 1|act_loss: -0.076416015625|cri_loss: -0.02154541015625|unsuper_loss: 0.0 average reward score: 2.171875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.91%) |Training time=0.64s (19.98%) |Others=0.20 (6.11%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.35 epoch: 0|step: 3722|ppo_ep: 1|act_loss: 0.129150390625|cri_loss: 0.10040283203125|unsuper_loss: 0.0 average reward score: 2.5625 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.46s (74.09%) |Training time=0.66s (19.89%) |Others=0.20 (6.03%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.35 epoch: 0|step: 3723|ppo_ep: 1|act_loss: 0.167236328125|cri_loss: 0.1123046875|unsuper_loss: 0.0 average reward score: 3.126953125 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.49s (74.09%) |Training time=0.66s (19.73%) |Others=0.21 (6.18%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.35 epoch: 0|step: 3724|ppo_ep: 1|act_loss: 0.0400390625|cri_loss: 0.05499267578125|unsuper_loss: 0.0 average reward score: 3.759765625 ------------------------------------------------------------------------------------- |E2E latency=3.36s |Gather latency=0.00s (0.00%) |Generate time=2.52s (75.04%) |Training time=0.64s (19.14%) |Others=0.20 (5.81%)|CurSamplesPerSec=2.38 |AvgSamplesPerSec=2.35 epoch: 0|step: 3725|ppo_ep: 1|act_loss: 0.039398193359375|cri_loss: 0.041046142578125|unsuper_loss: 0.0 average reward score: 3.6328125 ------------------------------------------------------------------------------------- |E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.54s (74.81%) |Training time=0.65s (19.06%) |Others=0.21 (6.13%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.35 epoch: 0|step: 3726|ppo_ep: 1|act_loss: -0.1455078125|cri_loss: -0.04278564453125|unsuper_loss: 0.0 average reward score: 3.625 ------------------------------------------------------------------------------------- |E2E latency=3.37s |Gather latency=0.00s (0.00%) |Generate time=2.53s (75.01%) |Training time=0.64s (19.06%) |Others=0.20 (5.93%)|CurSamplesPerSec=2.37 |AvgSamplesPerSec=2.35 epoch: 0|step: 3727|ppo_ep: 1|act_loss: -0.2841796875|cri_loss: -0.119384765625|unsuper_loss: 0.0 average reward score: 3.5390625 ------------------------------------------------------------------------------------- |E2E latency=3.78s |Gather latency=0.00s (0.00%) |Generate time=2.56s (67.80%) |Training time=0.93s (24.64%) |Others=0.29 (7.56%)|CurSamplesPerSec=2.12 |AvgSamplesPerSec=2.35 epoch: 0|step: 3728|ppo_ep: 1|act_loss: -0.06805419921875|cri_loss: -0.024444580078125|unsuper_loss: 0.0 average reward score: 4.328125 ------------------------------------------------------------------------------------- |E2E latency=3.40s |Gather latency=0.00s (0.00%) |Generate time=2.53s (74.48%) |Training time=0.67s (19.57%) |Others=0.20 (5.95%)|CurSamplesPerSec=2.35 |AvgSamplesPerSec=2.35 epoch: 0|step: 3729|ppo_ep: 1|act_loss: 0.282958984375|cri_loss: 0.1884765625|unsuper_loss: 0.0 average reward score: 3.666015625 ------------------------------------------------------------------------------------- |E2E latency=3.39s |Gather latency=0.00s (0.00%) |Generate time=2.55s (75.32%) |Training time=0.64s (18.96%) |Others=0.19 (5.72%)|CurSamplesPerSec=2.36 |AvgSamplesPerSec=2.35 epoch: 0|step: 3730|ppo_ep: 1|act_loss: 0.12176513671875|cri_loss: 0.10003662109375|unsuper_loss: 0.0 average reward score: 3.34375 ------------------------------------------------------------------------------------- |E2E latency=3.50s |Gather latency=0.00s (0.00%) |Generate time=2.65s (75.81%) |Training time=0.65s (18.45%) |Others=0.20 (5.73%)|CurSamplesPerSec=2.29 |AvgSamplesPerSec=2.35 epoch: 0|step: 3731|ppo_ep: 1|act_loss: 0.012481689453125|cri_loss: 0.031524658203125|unsuper_loss: 0.0 average reward score: 3.34375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.07%) |Training time=0.64s (20.02%) |Others=0.22 (6.91%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.35 epoch: 0|step: 3732|ppo_ep: 1|act_loss: 0.1533203125|cri_loss: 0.1014404296875|unsuper_loss: 0.0 average reward score: 4.515625 ------------------------------------------------------------------------------------- |E2E latency=3.31s |Gather latency=0.00s (0.00%) |Generate time=2.43s (73.26%) |Training time=0.64s (19.23%) |Others=0.25 (7.51%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.35 epoch: 0|step: 3733|ppo_ep: 1|act_loss: -0.013031005859375|cri_loss: 0.0102691650390625|unsuper_loss: 0.0 average reward score: 3.796875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.98%) |Training time=0.64s (19.97%) |Others=0.19 (6.05%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.35 epoch: 0|step: 3734|ppo_ep: 1|act_loss: 0.07177734375|cri_loss: 0.048004150390625|unsuper_loss: 0.0 average reward score: 4.1953125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.82%) |Training time=0.65s (20.03%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.35 epoch: 0|step: 3735|ppo_ep: 1|act_loss: 0.253173828125|cri_loss: 0.16162109375|unsuper_loss: 0.0 average reward score: 2.986328125 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.38s (66.49%) |Training time=0.92s (25.83%) |Others=0.27 (7.67%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.35 epoch: 0|step: 3736|ppo_ep: 1|act_loss: 0.0343017578125|cri_loss: 0.0369873046875|unsuper_loss: 0.0 average reward score: 2.876953125 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (74.04%) |Training time=0.64s (19.99%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.35 epoch: 0|step: 3737|ppo_ep: 1|act_loss: 0.0362548828125|cri_loss: 0.057861328125|unsuper_loss: 0.0 average reward score: 3.2421875 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.06%) |Training time=0.64s (19.91%) |Others=0.19 (6.03%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.35 epoch: 0|step: 3738|ppo_ep: 1|act_loss: -0.128173828125|cri_loss: -0.0384521484375|unsuper_loss: 0.0 average reward score: 3.44140625 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.38s (72.04%) |Training time=0.73s (22.18%) |Others=0.19 (5.78%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.35 epoch: 0|step: 3739|ppo_ep: 1|act_loss: 0.2310791015625|cri_loss: 0.151123046875|unsuper_loss: 0.0 average reward score: 4.5703125 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.28%) |Training time=0.64s (19.56%) |Others=0.23 (7.16%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.35 epoch: 0|step: 3740|ppo_ep: 1|act_loss: 0.052978515625|cri_loss: 0.06085205078125|unsuper_loss: 0.0 average reward score: 4.04296875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.20%) |Training time=0.63s (19.66%) |Others=0.20 (6.15%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.35 epoch: 0|step: 3741|ppo_ep: 1|act_loss: 0.5703125|cri_loss: 0.35107421875|unsuper_loss: 0.0 average reward score: 3.181640625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.85%) |Training time=0.65s (20.06%) |Others=0.20 (6.09%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.35 epoch: 0|step: 3742|ppo_ep: 1|act_loss: 0.26171875|cri_loss: 0.167724609375|unsuper_loss: 0.0 average reward score: 2.701171875 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.88%) |Training time=0.64s (20.03%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.35 epoch: 0|step: 3743|ppo_ep: 1|act_loss: 0.3359375|cri_loss: 0.231201171875|unsuper_loss: 0.0 average reward score: 2.6640625 ------------------------------------------------------------------------------------- |E2E latency=3.57s |Gather latency=0.00s (0.00%) |Generate time=2.36s (66.06%) |Training time=0.93s (26.11%) |Others=0.28 (7.83%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.35 epoch: 0|step: 3744|ppo_ep: 1|act_loss: 0.083740234375|cri_loss: 0.0665283203125|unsuper_loss: 0.0 average reward score: 3.390625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.23%) |Training time=0.64s (19.83%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.35 epoch: 0|step: 3745|ppo_ep: 1|act_loss: 0.10540771484375|cri_loss: 0.08270263671875|unsuper_loss: 0.0 average reward score: 3.65625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.03%) |Training time=0.64s (19.89%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.35 epoch: 0|step: 3746|ppo_ep: 1|act_loss: -0.064453125|cri_loss: -0.0047607421875|unsuper_loss: 0.0 average reward score: 3.9921875 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.77%) |Training time=0.65s (20.17%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.35 epoch: 0|step: 3747|ppo_ep: 1|act_loss: 0.14404296875|cri_loss: 0.094970703125|unsuper_loss: 0.0 average reward score: 3.5390625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.18%) |Training time=0.64s (19.88%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.35 epoch: 0|step: 3748|ppo_ep: 1|act_loss: -0.03717041015625|cri_loss: 0.01361083984375|unsuper_loss: 0.0 average reward score: 3.36328125 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.33%) |Training time=0.64s (19.68%) |Others=0.20 (5.99%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.35 epoch: 0|step: 3749|ppo_ep: 1|act_loss: -0.1416015625|cri_loss: -0.049224853515625|unsuper_loss: 0.0 average reward score: 4.84375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (73.99%) |Training time=0.65s (19.91%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.35 epoch: 0|step: 3750|ppo_ep: 1|act_loss: 0.431640625|cri_loss: 0.26904296875|unsuper_loss: 0.0 average reward score: 0.916015625 ------------------------------------------------------------------------------------- |E2E latency=3.33s |Gather latency=0.00s (0.00%) |Generate time=2.50s (75.07%) |Training time=0.64s (19.12%) |Others=0.19 (5.82%)|CurSamplesPerSec=2.40 |AvgSamplesPerSec=2.35 epoch: 0|step: 3751|ppo_ep: 1|act_loss: 0.01934814453125|cri_loss: 0.043487548828125|unsuper_loss: 0.0 average reward score: 3.234375 ------------------------------------------------------------------------------------- |E2E latency=3.58s |Gather latency=0.00s (0.00%) |Generate time=2.38s (66.41%) |Training time=0.93s (25.87%) |Others=0.28 (7.72%)|CurSamplesPerSec=2.23 |AvgSamplesPerSec=2.35 epoch: 0|step: 3752|ppo_ep: 1|act_loss: -0.00152587890625|cri_loss: 0.03350830078125|unsuper_loss: 0.0 average reward score: 2.40234375 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.36s (74.00%) |Training time=0.64s (20.01%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.35 epoch: 0|step: 3753|ppo_ep: 1|act_loss: 0.131591796875|cri_loss: 0.09075927734375|unsuper_loss: 0.0 average reward score: 3.8359375 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.71%) |Training time=0.64s (20.10%) |Others=0.20 (6.19%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.35 epoch: 0|step: 3754|ppo_ep: 1|act_loss: 0.08294677734375|cri_loss: 0.07843017578125|unsuper_loss: 0.0 average reward score: 1.998046875 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.76%) |Training time=0.65s (20.08%) |Others=0.20 (6.16%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.35 epoch: 0|step: 3755|ppo_ep: 1|act_loss: 0.08660888671875|cri_loss: 0.0830078125|unsuper_loss: 0.0 average reward score: 3.73828125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.07%) |Training time=0.64s (19.88%) |Others=0.20 (6.05%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.35 epoch: 0|step: 3756|ppo_ep: 1|act_loss: 0.1954345703125|cri_loss: 0.1375732421875|unsuper_loss: 0.0 average reward score: 0.85986328125 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.19%) |Training time=0.64s (19.78%) |Others=0.19 (6.04%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.35 epoch: 0|step: 3757|ppo_ep: 1|act_loss: -0.19677734375|cri_loss: -0.0672607421875|unsuper_loss: 0.0 average reward score: 3.7890625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.13%) |Training time=0.64s (19.94%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.35 epoch: 0|step: 3758|ppo_ep: 1|act_loss: -0.1021728515625|cri_loss: -0.027313232421875|unsuper_loss: 0.0 average reward score: 4.546875 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.01%) |Training time=0.64s (19.89%) |Others=0.20 (6.10%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.35 [2023-04-24 17:20:38,372] [INFO] [logging.py:96:log_dist] [Rank 0] step=470, skipped=5, lr=[2.036437165105328e-08, 2.036437165105328e-08], mom=[(0.9, 0.95), (0.9, 0.95)] [2023-04-24 17:20:38,618] [INFO] [timer.py:199:stop] epoch=0/micro_step=3760/global_step=470, RunningAvgSamplesPerSec=15.27062382918362, CurrSamplesPerSec=15.823832094297583, MemAllocated=20.44GB, MaxMemAllocated=31.45GB [2023-04-24 17:20:38,820] [INFO] [logging.py:96:log_dist] [Rank 0] step=470, skipped=6, lr=[1.255546227873966e-08, 1.255546227873966e-08], mom=[(0.9, 0.95), (0.9, 0.95)] epoch: 0|step: 3759|ppo_ep: 1|act_loss: 0.44580078125|cri_loss: 0.282470703125|unsuper_loss: 0.0 average reward score: 3.640625 ------------------------------------------------------------------------------------- |E2E latency=3.57s |Gather latency=0.00s (0.00%) |Generate time=2.37s (66.41%) |Training time=0.92s (25.86%) |Others=0.28 (7.73%)|CurSamplesPerSec=2.24 |AvgSamplesPerSec=2.35 epoch: 0|step: 3760|ppo_ep: 1|act_loss: -0.131103515625|cri_loss: -0.0213623046875|unsuper_loss: 0.0 average reward score: 3.7421875 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.86%) |Training time=0.63s (19.84%) |Others=0.20 (6.30%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.35 epoch: 0|step: 3761|ppo_ep: 1|act_loss: 0.1583251953125|cri_loss: 0.112060546875|unsuper_loss: 0.0 average reward score: 3.0546875 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.73%) |Training time=0.64s (20.04%) |Others=0.20 (6.23%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.35 epoch: 0|step: 3762|ppo_ep: 1|act_loss: -0.071044921875|cri_loss: -0.022186279296875|unsuper_loss: 0.0 average reward score: 4.22265625 ------------------------------------------------------------------------------------- |E2E latency=3.16s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.67%) |Training time=0.64s (20.11%) |Others=0.20 (6.21%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.35 epoch: 0|step: 3763|ppo_ep: 1|act_loss: -0.133544921875|cri_loss: -0.038238525390625|unsuper_loss: 0.0 average reward score: 2.23046875 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.72%) |Training time=0.64s (20.16%) |Others=0.19 (6.11%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.35 epoch: 0|step: 3764|ppo_ep: 1|act_loss: 0.16552734375|cri_loss: 0.1160888671875|unsuper_loss: 0.0 average reward score: 3.4375 ------------------------------------------------------------------------------------- |E2E latency=3.16s |Gather latency=0.00s (0.00%) |Generate time=2.32s (73.60%) |Training time=0.64s (20.16%) |Others=0.20 (6.23%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.35 epoch: 0|step: 3765|ppo_ep: 1|act_loss: -0.1055908203125|cri_loss: -0.027496337890625|unsuper_loss: 0.0 average reward score: 2.884765625 ------------------------------------------------------------------------------------- |E2E latency=3.17s |Gather latency=0.00s (0.00%) |Generate time=2.33s (73.63%) |Training time=0.64s (20.14%) |Others=0.20 (6.23%)|CurSamplesPerSec=2.52 |AvgSamplesPerSec=2.35 epoch: 0|step: 3766|ppo_ep: 1|act_loss: 0.028076171875|cri_loss: 0.04315185546875|unsuper_loss: 0.0 average reward score: 4.3203125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.06%) |Training time=0.64s (19.80%) |Others=0.20 (6.14%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.35 epoch: 0|step: 3767|ppo_ep: 1|act_loss: -0.00225830078125|cri_loss: 0.0149078369140625|unsuper_loss: 0.0 average reward score: 3.3203125 ------------------------------------------------------------------------------------- |E2E latency=3.61s |Gather latency=0.00s (0.00%) |Generate time=2.41s (66.76%) |Training time=0.92s (25.56%) |Others=0.28 (7.69%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.35 epoch: 0|step: 3768|ppo_ep: 1|act_loss: 0.188720703125|cri_loss: 0.115234375|unsuper_loss: 0.0 average reward score: 1.6650390625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.19%) |Training time=0.64s (19.95%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.35 epoch: 0|step: 3769|ppo_ep: 1|act_loss: 0.04168701171875|cri_loss: 0.0433349609375|unsuper_loss: 0.0 average reward score: 3.515625 ------------------------------------------------------------------------------------- |E2E latency=3.15s |Gather latency=0.00s (0.00%) |Generate time=2.32s (73.64%) |Training time=0.64s (20.35%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.54 |AvgSamplesPerSec=2.35 epoch: 0|step: 3770|ppo_ep: 1|act_loss: 0.380859375|cri_loss: 0.242919921875|unsuper_loss: 0.0 average reward score: 2.322265625 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.75%) |Training time=0.64s (19.40%) |Others=0.19 (5.85%)|CurSamplesPerSec=2.43 |AvgSamplesPerSec=2.35 epoch: 0|step: 3771|ppo_ep: 1|act_loss: -0.02557373046875|cri_loss: 0.011505126953125|unsuper_loss: 0.0 average reward score: 2.15234375 ------------------------------------------------------------------------------------- |E2E latency=3.32s |Gather latency=0.00s (0.00%) |Generate time=2.44s (73.48%) |Training time=0.64s (19.41%) |Others=0.24 (7.11%)|CurSamplesPerSec=2.41 |AvgSamplesPerSec=2.35 epoch: 0|step: 3772|ppo_ep: 1|act_loss: -0.31201171875|cri_loss: -0.11468505859375|unsuper_loss: 0.0 average reward score: 3.796875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.15%) |Training time=0.64s (19.85%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.35 epoch: 0|step: 3773|ppo_ep: 1|act_loss: 0.309326171875|cri_loss: 0.180908203125|unsuper_loss: 0.0 average reward score: 1.32421875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.22%) |Training time=0.64s (19.74%) |Others=0.20 (6.04%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.35 epoch: 0|step: 3774|ppo_ep: 1|act_loss: 0.00146484375|cri_loss: 0.0183258056640625|unsuper_loss: 0.0 average reward score: 3.0390625 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.39s (73.93%) |Training time=0.64s (19.82%) |Others=0.20 (6.24%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.35 epoch: 0|step: 3775|ppo_ep: 1|act_loss: -0.05230712890625|cri_loss: 0.01300048828125|unsuper_loss: 0.0 average reward score: 3.935546875 ------------------------------------------------------------------------------------- |E2E latency=3.63s |Gather latency=0.00s (0.00%) |Generate time=2.42s (66.81%) |Training time=0.93s (25.53%) |Others=0.28 (7.65%)|CurSamplesPerSec=2.21 |AvgSamplesPerSec=2.35 epoch: 0|step: 3776|ppo_ep: 1|act_loss: 0.10528564453125|cri_loss: 0.0943603515625|unsuper_loss: 0.0 average reward score: 1.6015625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.36%) |Training time=0.64s (19.75%) |Others=0.19 (5.89%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.35 epoch: 0|step: 3777|ppo_ep: 1|act_loss: 0.131103515625|cri_loss: 0.08966064453125|unsuper_loss: 0.0 average reward score: 3.53515625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.40s (74.25%) |Training time=0.64s (19.75%) |Others=0.19 (6.01%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.35 epoch: 0|step: 3778|ppo_ep: 1|act_loss: 0.135498046875|cri_loss: 0.08740234375|unsuper_loss: 0.0 average reward score: 3.47265625 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.05%) |Training time=0.64s (19.96%) |Others=0.19 (5.99%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.35 epoch: 0|step: 3779|ppo_ep: 1|act_loss: 0.1588134765625|cri_loss: 0.10894775390625|unsuper_loss: 0.0 average reward score: 3.3828125 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.15%) |Training time=0.64s (19.94%) |Others=0.19 (5.91%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.35 epoch: 0|step: 3780|ppo_ep: 1|act_loss: -0.0445556640625|cri_loss: 0.01971435546875|unsuper_loss: 0.0 average reward score: 3.357421875 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.33%) |Training time=0.64s (19.74%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.35 epoch: 0|step: 3781|ppo_ep: 1|act_loss: -0.2186279296875|cri_loss: -0.086181640625|unsuper_loss: 0.0 average reward score: 3.4296875 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.87%) |Training time=0.65s (20.13%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.35 epoch: 0|step: 3782|ppo_ep: 1|act_loss: 0.427978515625|cri_loss: 0.254638671875|unsuper_loss: 0.0 average reward score: 2.11328125 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.00%) |Training time=0.64s (19.84%) |Others=0.20 (6.16%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.35 epoch: 0|step: 3783|ppo_ep: 1|act_loss: 0.0836181640625|cri_loss: 0.07000732421875|unsuper_loss: 0.0 average reward score: 3.283203125 ------------------------------------------------------------------------------------- |E2E latency=3.65s |Gather latency=0.00s (0.00%) |Generate time=2.45s (67.11%) |Training time=0.92s (25.26%) |Others=0.28 (7.63%)|CurSamplesPerSec=2.19 |AvgSamplesPerSec=2.35 epoch: 0|step: 3784|ppo_ep: 1|act_loss: 0.212646484375|cri_loss: 0.1378173828125|unsuper_loss: 0.0 average reward score: 2.6484375 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.17%) |Training time=0.64s (19.81%) |Others=0.19 (6.02%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.35 epoch: 0|step: 3785|ppo_ep: 1|act_loss: 0.04296875|cri_loss: 0.0372314453125|unsuper_loss: 0.0 average reward score: 3.423828125 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.70%) |Training time=0.65s (20.22%) |Others=0.19 (6.07%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.35 epoch: 0|step: 3786|ppo_ep: 1|act_loss: 0.06390380859375|cri_loss: 0.064453125|unsuper_loss: 0.0 average reward score: 2.25 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.84%) |Training time=0.64s (20.06%) |Others=0.19 (6.10%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.35 epoch: 0|step: 3787|ppo_ep: 1|act_loss: 0.320556640625|cri_loss: 0.1966552734375|unsuper_loss: 0.0 average reward score: 2.26171875 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.36s (72.88%) |Training time=0.64s (19.86%) |Others=0.24 (7.26%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.36 epoch: 0|step: 3788|ppo_ep: 1|act_loss: 0.2393798828125|cri_loss: 0.156494140625|unsuper_loss: 0.0 average reward score: 3.771484375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.21%) |Training time=0.63s (19.71%) |Others=0.20 (6.08%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.36 epoch: 0|step: 3789|ppo_ep: 1|act_loss: -0.04754638671875|cri_loss: 0.000885009765625|unsuper_loss: 0.0 average reward score: 4.26171875 ------------------------------------------------------------------------------------- |E2E latency=3.20s |Gather latency=0.00s (0.00%) |Generate time=2.37s (73.98%) |Training time=0.64s (20.02%) |Others=0.19 (6.00%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.36 epoch: 0|step: 3790|ppo_ep: 1|act_loss: 0.003570556640625|cri_loss: 0.031005859375|unsuper_loss: 0.0 average reward score: 2.62109375 ------------------------------------------------------------------------------------- |E2E latency=3.30s |Gather latency=0.00s (0.00%) |Generate time=2.47s (74.83%) |Training time=0.64s (19.31%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.42 |AvgSamplesPerSec=2.36 epoch: 0|step: 3791|ppo_ep: 1|act_loss: -0.281005859375|cri_loss: -0.10784912109375|unsuper_loss: 0.0 average reward score: 3.70703125 ------------------------------------------------------------------------------------- |E2E latency=3.87s |Gather latency=0.00s (0.00%) |Generate time=2.65s (68.41%) |Training time=0.94s (24.42%) |Others=0.28 (7.16%)|CurSamplesPerSec=2.07 |AvgSamplesPerSec=2.36 epoch: 0|step: 3792|ppo_ep: 1|act_loss: 0.105712890625|cri_loss: 0.068359375|unsuper_loss: 0.0 average reward score: 4.2890625 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.60%) |Training time=0.64s (19.54%) |Others=0.19 (5.86%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.36 epoch: 0|step: 3793|ppo_ep: 1|act_loss: 0.17138671875|cri_loss: 0.12548828125|unsuper_loss: 0.0 average reward score: 2.880859375 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.28%) |Training time=0.64s (19.80%) |Others=0.19 (5.92%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.36 epoch: 0|step: 3794|ppo_ep: 1|act_loss: -0.1162109375|cri_loss: -0.040863037109375|unsuper_loss: 0.0 average reward score: 4.1796875 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.34s (72.67%) |Training time=0.68s (20.99%) |Others=0.20 (6.34%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.36 epoch: 0|step: 3795|ppo_ep: 1|act_loss: -0.008544921875|cri_loss: 0.023895263671875|unsuper_loss: 0.0 average reward score: 2.2734375 ------------------------------------------------------------------------------------- |E2E latency=3.23s |Gather latency=0.00s (0.00%) |Generate time=2.35s (72.79%) |Training time=0.69s (21.28%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.48 |AvgSamplesPerSec=2.36 epoch: 0|step: 3796|ppo_ep: 1|act_loss: -0.03375244140625|cri_loss: 0.009063720703125|unsuper_loss: 0.0 average reward score: 3.06640625 ------------------------------------------------------------------------------------- |E2E latency=3.21s |Gather latency=0.00s (0.00%) |Generate time=2.38s (74.13%) |Training time=0.64s (19.93%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.36 epoch: 0|step: 3797|ppo_ep: 1|act_loss: 0.06707763671875|cri_loss: 0.060394287109375|unsuper_loss: 0.0 average reward score: 2.53125 ------------------------------------------------------------------------------------- |E2E latency=3.15s |Gather latency=0.00s (0.00%) |Generate time=2.31s (73.34%) |Training time=0.65s (20.54%) |Others=0.19 (6.12%)|CurSamplesPerSec=2.54 |AvgSamplesPerSec=2.36 epoch: 0|step: 3798|ppo_ep: 1|act_loss: -0.14404296875|cri_loss: -0.0518798828125|unsuper_loss: 0.0 average reward score: 2.673828125 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.35s (73.55%) |Training time=0.65s (20.27%) |Others=0.20 (6.18%)|CurSamplesPerSec=2.51 |AvgSamplesPerSec=2.36 epoch: 0|step: 3799|ppo_ep: 1|act_loss: 0.05401611328125|cri_loss: 0.04803466796875|unsuper_loss: 0.0 average reward score: 1.6953125 ------------------------------------------------------------------------------------- |E2E latency=3.53s |Gather latency=0.00s (0.00%) |Generate time=2.33s (65.98%) |Training time=0.93s (26.23%) |Others=0.27 (7.79%)|CurSamplesPerSec=2.27 |AvgSamplesPerSec=2.36 epoch: 0|step: 3800|ppo_ep: 1|act_loss: 0.06182861328125|cri_loss: 0.046173095703125|unsuper_loss: 0.0 average reward score: 3.6015625 ------------------------------------------------------------------------------------- |E2E latency=3.27s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.59%) |Training time=0.64s (19.46%) |Others=0.19 (5.94%)|CurSamplesPerSec=2.45 |AvgSamplesPerSec=2.36 epoch: 0|step: 3801|ppo_ep: 1|act_loss: -0.058197021484375|cri_loss: 0.001861572265625|unsuper_loss: 0.0 average reward score: 2.2578125 ------------------------------------------------------------------------------------- |E2E latency=3.24s |Gather latency=0.00s (0.00%) |Generate time=2.38s (73.49%) |Training time=0.64s (19.69%) |Others=0.22 (6.82%)|CurSamplesPerSec=2.47 |AvgSamplesPerSec=2.36 epoch: 0|step: 3802|ppo_ep: 1|act_loss: 0.1624755859375|cri_loss: 0.1043701171875|unsuper_loss: 0.0 average reward score: 2.640625 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.20%) |Training time=0.64s (19.87%) |Others=0.19 (5.93%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.36 epoch: 0|step: 3803|ppo_ep: 1|act_loss: 0.170654296875|cri_loss: 0.1297607421875|unsuper_loss: 0.0 average reward score: 4.3359375 ------------------------------------------------------------------------------------- |E2E latency=3.22s |Gather latency=0.00s (0.00%) |Generate time=2.39s (74.21%) |Training time=0.64s (19.91%) |Others=0.19 (5.88%)|CurSamplesPerSec=2.49 |AvgSamplesPerSec=2.36 epoch: 0|step: 3804|ppo_ep: 1|act_loss: 0.014923095703125|cri_loss: 0.0272674560546875|unsuper_loss: 0.0 average reward score: 2.380859375 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.21%) |Training time=0.64s (19.73%) |Others=0.20 (6.06%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.36 epoch: 0|step: 3805|ppo_ep: 1|act_loss: 0.156005859375|cri_loss: 0.0946044921875|unsuper_loss: 0.0 average reward score: 3.0 ------------------------------------------------------------------------------------- |E2E latency=3.16s |Gather latency=0.00s (0.00%) |Generate time=2.28s (72.11%) |Training time=0.69s (21.80%) |Others=0.19 (6.09%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.36 epoch: 0|step: 3806|ppo_ep: 1|act_loss: -0.113525390625|cri_loss: -0.033905029296875|unsuper_loss: 0.0 average reward score: 2.67578125 ------------------------------------------------------------------------------------- |E2E latency=3.19s |Gather latency=0.00s (0.00%) |Generate time=2.36s (73.77%) |Training time=0.64s (20.02%) |Others=0.20 (6.21%)|CurSamplesPerSec=2.50 |AvgSamplesPerSec=2.36 epoch: 0|step: 3807|ppo_ep: 1|act_loss: 0.21337890625|cri_loss: 0.1571044921875|unsuper_loss: 0.0 average reward score: 3.8515625 ------------------------------------------------------------------------------------- |E2E latency=3.56s |Gather latency=0.00s (0.00%) |Generate time=2.36s (66.26%) |Training time=0.92s (25.97%) |Others=0.28 (7.77%)|CurSamplesPerSec=2.25 |AvgSamplesPerSec=2.36 epoch: 0|step: 3808|ppo_ep: 1|act_loss: 0.08770751953125|cri_loss: 0.08319091796875|unsuper_loss: 0.0 average reward score: 3.2890625 ------------------------------------------------------------------------------------- |E2E latency=3.16s |Gather latency=0.00s (0.00%) |Generate time=2.34s (73.82%) |Training time=0.64s (20.22%) |Others=0.19 (5.96%)|CurSamplesPerSec=2.53 |AvgSamplesPerSec=2.36 epoch: 0|step: 3809|ppo_ep: 1|act_loss: 0.07421875|cri_loss: 0.05810546875|unsuper_loss: 0.0 average reward score: 2.55078125 ------------------------------------------------------------------------------------- |E2E latency=3.25s |Gather latency=0.00s (0.00%) |Generate time=2.41s (74.30%) |Training time=0.64s (19.73%) |Others=0.19 (5.97%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.36 epoch: 0|step: 3810|ppo_ep: 1|act_loss: -0.0750732421875|cri_loss: -0.010284423828125|unsuper_loss: 0.0 average reward score: 5.375 ------------------------------------------------------------------------------------- |E2E latency=3.26s |Gather latency=0.00s (0.00%) |Generate time=2.43s (74.55%) |Training time=0.64s (19.65%) |Others=0.19 (5.79%)|CurSamplesPerSec=2.46 |AvgSamplesPerSec=2.36 epoch: 0|step: 3811|ppo_ep: 1|act_loss: 0.259765625|cri_loss: 0.158447265625|unsuper_loss: 0.0 average reward score: 2.958984375 ------------------------------------------------------------------------------------- |E2E latency=3.28s |Gather latency=0.00s (0.00%) |Generate time=2.44s (74.33%) |Training time=0.64s (19.54%) |Others=0.20 (6.13%)|CurSamplesPerSec=2.44 |AvgSamplesPerSec=2.36 epoch: 0|step: 3812|ppo_ep: 1|act_loss: 0.2265625|cri_loss: 0.14990234375|unsuper_loss: 0.0 average reward score: 1.7412109375 ------------------------------------------------------------------------------------- saving model ... saving model ... [2023-04-24 17:23:37,147] [INFO] [launch.py:460:main] Process 28287 exits successfully. [2023-04-24 17:23:45,159] [INFO] [launch.py:460:main] Process 28286 exits successfully.