2024-09-09 14:14:39.290280: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`. 2024-09-09 14:14:39.308004: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-09-09 14:14:39.329151: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-09-09 14:14:39.335520: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-09-09 14:14:39.350729: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-09-09 14:14:40.598840: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT /usr/local/lib/python3.10/dist-packages/transformers/training_args.py:1525: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead warnings.warn( 09/09/2024 14:14:42 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: False 09/09/2024 14:14:42 - INFO - __main__ - Training/evaluation parameters TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, batch_eval_metrics=False, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=None, disable_tqdm=False, dispatch_batches=None, do_eval=True, do_predict=True, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=epoch, eval_use_gather_object=False, evaluation_strategy=epoch, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=2, gradient_checkpointing=False, gradient_checkpointing_kwargs=None, greater_is_better=True, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=None, hub_private_repo=False, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=5e-05, length_column_name=length, load_best_model_at_end=True, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/content/dissertation/scripts/ner/output/tb, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=500, logging_strategy=steps, lr_scheduler_kwargs={}, lr_scheduler_type=linear, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=f1, mp_parameters=, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=10.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/content/dissertation/scripts/ner/output, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=8, per_device_train_batch_size=32, prediction_loss_only=False, push_to_hub=True, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=['tensorboard'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=/content/dissertation/scripts/ner/output, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=500, save_strategy=epoch, save_total_limit=None, seed=42, skip_memory_metrics=True, split_batches=None, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, ) Downloading builder script: 0%| | 0.00/3.62k [00:00> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--michiyasunaga--BioLinkBERT-base/snapshots/b71f5d70f063d1c8f1124070ce86f1ee463ca1fe/config.json [INFO|configuration_utils.py:800] 2024-09-09 14:15:00,008 >> Model config BertConfig { "_name_or_path": "michiyasunaga/BioLinkBERT-base", "architectures": [ "BertModel" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "finetuning_task": "ner", "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "id2label": { "0": "O", "1": "B-FARMACO", "2": "I-FARMACO" }, "initializer_range": 0.02, "intermediate_size": 3072, "label2id": { "B-FARMACO": 1, "I-FARMACO": 2, "O": 0 }, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.44.2", "type_vocab_size": 2, "use_cache": true, "vocab_size": 28895 } [INFO|tokenization_utils_base.py:2269] 2024-09-09 14:15:00,242 >> loading file vocab.txt from cache at /root/.cache/huggingface/hub/models--michiyasunaga--BioLinkBERT-base/snapshots/b71f5d70f063d1c8f1124070ce86f1ee463ca1fe/vocab.txt [INFO|tokenization_utils_base.py:2269] 2024-09-09 14:15:00,242 >> loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--michiyasunaga--BioLinkBERT-base/snapshots/b71f5d70f063d1c8f1124070ce86f1ee463ca1fe/tokenizer.json [INFO|tokenization_utils_base.py:2269] 2024-09-09 14:15:00,242 >> loading file added_tokens.json from cache at None [INFO|tokenization_utils_base.py:2269] 2024-09-09 14:15:00,242 >> loading file special_tokens_map.json from cache at /root/.cache/huggingface/hub/models--michiyasunaga--BioLinkBERT-base/snapshots/b71f5d70f063d1c8f1124070ce86f1ee463ca1fe/special_tokens_map.json [INFO|tokenization_utils_base.py:2269] 2024-09-09 14:15:00,242 >> loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--michiyasunaga--BioLinkBERT-base/snapshots/b71f5d70f063d1c8f1124070ce86f1ee463ca1fe/tokenizer_config.json /usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884 warnings.warn( [INFO|modeling_utils.py:3678] 2024-09-09 14:15:00,548 >> loading weights file pytorch_model.bin from cache at /root/.cache/huggingface/hub/models--michiyasunaga--BioLinkBERT-base/snapshots/b71f5d70f063d1c8f1124070ce86f1ee463ca1fe/pytorch_model.bin [INFO|modeling_utils.py:4497] 2024-09-09 14:15:00,628 >> Some weights of the model checkpoint at michiyasunaga/BioLinkBERT-base were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight'] - This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). [WARNING|modeling_utils.py:4509] 2024-09-09 14:15:00,628 >> Some weights of BertForTokenClassification were not initialized from the model checkpoint at michiyasunaga/BioLinkBERT-base and are newly initialized: ['classifier.bias', 'classifier.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Map: 0%| | 0/28668 [00:00> The following columns in the training set don't have a corresponding argument in `BertForTokenClassification.forward` and have been ignored: id, tokens, ner_tags. If id, tokens, ner_tags are not expected by `BertForTokenClassification.forward`, you can safely ignore this message. [INFO|trainer.py:2134] 2024-09-09 14:15:07,820 >> ***** Running training ***** [INFO|trainer.py:2135] 2024-09-09 14:15:07,820 >> Num examples = 28,668 [INFO|trainer.py:2136] 2024-09-09 14:15:07,820 >> Num Epochs = 10 [INFO|trainer.py:2137] 2024-09-09 14:15:07,820 >> Instantaneous batch size per device = 32 [INFO|trainer.py:2140] 2024-09-09 14:15:07,820 >> Total train batch size (w. parallel, distributed & accumulation) = 64 [INFO|trainer.py:2141] 2024-09-09 14:15:07,820 >> Gradient Accumulation steps = 2 [INFO|trainer.py:2142] 2024-09-09 14:15:07,820 >> Total optimization steps = 4,480 [INFO|trainer.py:2143] 2024-09-09 14:15:07,820 >> Number of trainable parameters = 107,644,419 0%| | 0/4480 [00:00> The following columns in the evaluation set don't have a corresponding argument in `BertForTokenClassification.forward` and have been ignored: id, tokens, ner_tags. If id, tokens, ner_tags are not expected by `BertForTokenClassification.forward`, you can safely ignore this message. [INFO|trainer.py:3819] 2024-09-09 14:17:54,863 >> ***** Running Evaluation ***** [INFO|trainer.py:3821] 2024-09-09 14:17:54,863 >> Num examples = 6946 [INFO|trainer.py:3824] 2024-09-09 14:17:54,863 >> Batch size = 8 0%| | 0/869 [00:00> Saving model checkpoint to /content/dissertation/scripts/ner/output/checkpoint-448 [INFO|configuration_utils.py:472] 2024-09-09 14:18:09,953 >> Configuration saved in /content/dissertation/scripts/ner/output/checkpoint-448/config.json [INFO|modeling_utils.py:2799] 2024-09-09 14:18:10,847 >> Model weights saved in /content/dissertation/scripts/ner/output/checkpoint-448/model.safetensors [INFO|tokenization_utils_base.py:2684] 2024-09-09 14:18:10,848 >> tokenizer config file saved in /content/dissertation/scripts/ner/output/checkpoint-448/tokenizer_config.json [INFO|tokenization_utils_base.py:2693] 2024-09-09 14:18:10,848 >> Special tokens file saved in /content/dissertation/scripts/ner/output/checkpoint-448/special_tokens_map.json [INFO|tokenization_utils_base.py:2684] 2024-09-09 14:18:13,510 >> tokenizer config file saved in /content/dissertation/scripts/ner/output/tokenizer_config.json [INFO|tokenization_utils_base.py:2693] 2024-09-09 14:18:13,510 >> Special tokens file saved in /content/dissertation/scripts/ner/output/special_tokens_map.json 10%|█ | 449/4480 [03:06<6:41:43, 5.98s/it] 10%|█ | 450/4480 [03:06<4:51:00, 4.33s/it] 10%|█ | 451/4480 [03:07<3:31:43, 3.15s/it] 10%|█ | 452/4480 [03:07<2:36:10, 2.33s/it] 10%|█ | 453/4480 [03:07<1:54:40, 1.71s/it] 10%|█ | 454/4480 [03:08<1:28:07, 1.31s/it] 10%|█ | 455/4480 [03:08<1:08:58, 1.03s/it] 10%|█ | 456/4480 [03:08<53:19, 1.26it/s] 10%|█ | 457/4480 [03:09<43:53, 1.53it/s] 10%|█ | 458/4480 [03:09<37:31, 1.79it/s] 10%|█ | 459/4480 [03:09<38:22, 1.75it/s] 10%|█ | 460/4480 [03:10<33:39, 1.99it/s] 10%|█ | 461/4480 [03:10<31:32, 2.12it/s] 10%|█ | 462/4480 [03:11<28:45, 2.33it/s] 10%|█ | 463/4480 [03:11<27:17, 2.45it/s] 10%|█ | 464/4480 [03:11<26:13, 2.55it/s] 10%|█ | 465/4480 [03:12<25:11, 2.66it/s] 10%|█ | 466/4480 [03:12<23:50, 2.81it/s] 10%|█ | 467/4480 [03:12<23:50, 2.81it/s] 10%|█ | 468/4480 [03:13<27:53, 2.40it/s] 10%|█ | 469/4480 [03:13<25:14, 2.65it/s] 10%|█ | 470/4480 [03:13<24:07, 2.77it/s] 11%|█ | 471/4480 [03:14<23:45, 2.81it/s] 11%|█ | 472/4480 [03:14<23:26, 2.85it/s] 11%|█ | 473/4480 [03:15<24:05, 2.77it/s] 11%|█ | 474/4480 [03:15<24:20, 2.74it/s] 11%|█ | 475/4480 [03:15<25:03, 2.66it/s] 11%|█ | 476/4480 [03:16<23:49, 2.80it/s] 11%|█ | 477/4480 [03:16<23:51, 2.80it/s] 11%|█ | 478/4480 [03:16<22:26, 2.97it/s] 11%|█ | 479/4480 [03:17<21:31, 3.10it/s] 11%|█ | 480/4480 [03:17<21:46, 3.06it/s] 11%|█ | 481/4480 [03:17<23:15, 2.87it/s] 11%|█ | 482/4480 [03:18<24:33, 2.71it/s] 11%|█ | 483/4480 [03:18<24:43, 2.69it/s] 11%|█ | 484/4480 [03:18<25:12, 2.64it/s] 11%|█ | 485/4480 [03:19<24:29, 2.72it/s] 11%|█ | 486/4480 [03:19<27:55, 2.38it/s] 11%|█ | 487/4480 [03:20<32:18, 2.06it/s] 11%|█ | 488/4480 [03:20<31:42, 2.10it/s] 11%|█ | 489/4480 [03:21<28:55, 2.30it/s] 11%|█ | 490/4480 [03:21<27:55, 2.38it/s] 11%|█ | 491/4480 [03:21<25:09, 2.64it/s] 11%|█ | 492/4480 [03:22<24:27, 2.72it/s] 11%|█ | 493/4480 [03:22<24:30, 2.71it/s] 11%|█ | 494/4480 [03:23<23:53, 2.78it/s] 11%|█ | 495/4480 [03:23<23:30, 2.82it/s] 11%|█ | 496/4480 [03:23<23:04, 2.88it/s] 11%|█ | 497/4480 [03:24<23:29, 2.82it/s] 11%|█ | 498/4480 [03:24<23:06, 2.87it/s] 11%|█ | 499/4480 [03:24<22:46, 2.91it/s] 11%|█ | 500/4480 [03:25<27:16, 2.43it/s] 11%|█ | 500/4480 [03:25<27:16, 2.43it/s] 11%|█ | 501/4480 [03:25<31:27, 2.11it/s] 11%|█ | 502/4480 [03:26<28:12, 2.35it/s] 11%|█ | 503/4480 [03:26<26:23, 2.51it/s] 11%|█▏ | 504/4480 [03:26<25:09, 2.63it/s] 11%|█▏ | 505/4480 [03:27<24:50, 2.67it/s] 11%|█▏ | 506/4480 [03:27<27:31, 2.41it/s] 11%|█▏ | 507/4480 [03:28<24:50, 2.66it/s] 11%|█▏ | 508/4480 [03:28<27:23, 2.42it/s] 11%|█▏ | 509/4480 [03:28<24:41, 2.68it/s] 11%|█▏ | 510/4480 [03:29<23:15, 2.85it/s] 11%|█▏ | 511/4480 [03:29<22:12, 2.98it/s] 11%|█▏ | 512/4480 [03:29<23:53, 2.77it/s] 11%|█▏ | 513/4480 [03:30<22:50, 2.89it/s] 11%|█▏ | 514/4480 [03:30<23:24, 2.82it/s] 11%|█▏ | 515/4480 [03:30<21:48, 3.03it/s] 12%|█▏ | 516/4480 [03:31<21:58, 3.01it/s] 12%|█▏ | 517/4480 [03:31<21:40, 3.05it/s] 12%|█▏ | 518/4480 [03:31<23:52, 2.77it/s] 12%|█▏ | 519/4480 [03:32<22:42, 2.91it/s] 12%|█▏ | 520/4480 [03:32<22:14, 2.97it/s] 12%|█▏ | 521/4480 [03:32<21:35, 3.06it/s] 12%|█▏ | 522/4480 [03:33<23:50, 2.77it/s] 12%|█▏ | 523/4480 [03:33<24:23, 2.70it/s] 12%|█▏ | 524/4480 [03:33<21:43, 3.03it/s] 12%|█▏ | 525/4480 [03:34<23:16, 2.83it/s] 12%|█▏ | 526/4480 [03:34<23:03, 2.86it/s] 12%|█▏ | 527/4480 [03:34<22:30, 2.93it/s] 12%|█▏ | 528/4480 [03:35<21:39, 3.04it/s] 12%|█▏ | 529/4480 [03:35<23:01, 2.86it/s] 12%|█▏ | 530/4480 [03:36<23:29, 2.80it/s] 12%|█▏ | 531/4480 [03:36<25:11, 2.61it/s] 12%|█▏ | 532/4480 [03:36<24:50, 2.65it/s] 12%|█▏ | 533/4480 [03:37<28:27, 2.31it/s] 12%|█▏ | 534/4480 [03:37<27:49, 2.36it/s] 12%|█▏ | 535/4480 [03:38<27:50, 2.36it/s] 12%|█▏ | 536/4480 [03:38<25:21, 2.59it/s] 12%|█▏ | 537/4480 [03:38<26:08, 2.51it/s] 12%|█▏ | 538/4480 [03:39<25:05, 2.62it/s] 12%|█▏ | 539/4480 [03:39<25:29, 2.58it/s] 12%|█▏ | 540/4480 [03:40<25:05, 2.62it/s] 12%|█▏ | 541/4480 [03:40<24:07, 2.72it/s] 12%|█▏ | 542/4480 [03:40<22:54, 2.86it/s] 12%|█▏ | 543/4480 [03:41<21:59, 2.98it/s] 12%|█▏ | 544/4480 [03:41<25:18, 2.59it/s] 12%|█▏ | 545/4480 [03:41<25:08, 2.61it/s] 12%|█▏ | 546/4480 [03:42<27:46, 2.36it/s] 12%|█▏ | 547/4480 [03:42<26:14, 2.50it/s] 12%|█▏ | 548/4480 [03:43<25:27, 2.57it/s] 12%|█▏ | 549/4480 [03:43<26:18, 2.49it/s] 12%|█▏ | 550/4480 [03:44<28:02, 2.34it/s] 12%|█▏ | 551/4480 [03:44<27:01, 2.42it/s] 12%|█▏ | 552/4480 [03:44<29:30, 2.22it/s] 12%|█▏ | 553/4480 [03:45<26:33, 2.46it/s] 12%|█▏ | 554/4480 [03:45<24:58, 2.62it/s] 12%|█▏ | 555/4480 [03:45<23:37, 2.77it/s] 12%|█▏ | 556/4480 [03:46<25:37, 2.55it/s] 12%|█▏ | 557/4480 [03:46<24:21, 2.68it/s] 12%|█▏ | 558/4480 [03:46<22:27, 2.91it/s] 12%|█▏ | 559/4480 [03:47<21:36, 3.02it/s] 12%|█▎ | 560/4480 [03:47<21:46, 3.00it/s] 13%|█▎ | 561/4480 [03:47<22:41, 2.88it/s] 13%|█▎ | 562/4480 [03:48<23:15, 2.81it/s] 13%|█▎ | 563/4480 [03:49<29:23, 2.22it/s] 13%|█▎ | 564/4480 [03:49<28:08, 2.32it/s] 13%|█▎ | 565/4480 [03:49<28:23, 2.30it/s] 13%|█▎ | 566/4480 [03:50<26:58, 2.42it/s] 13%|█▎ | 567/4480 [03:50<27:07, 2.40it/s] 13%|█▎ | 568/4480 [03:51<26:20, 2.48it/s] 13%|█▎ | 569/4480 [03:51<25:58, 2.51it/s] 13%|█▎ | 570/4480 [03:51<23:38, 2.76it/s]