pszemraj's picture
Adding Evaluation Results (#1)
bea3606 verified
metadata
language:
  - en
license: apache-2.0
tags:
  - edu
  - continual pretraining
base_model: BEE-spoke-data/smol_llama-220M-GQA
datasets:
  - HuggingFaceFW/fineweb-edu
metrics:
  - accuracy
inference:
  parameters:
    max_new_tokens: 64
    do_sample: true
    temperature: 0.8
    repetition_penalty: 1.05
    no_repeat_ngram_size: 4
    eta_cutoff: 0.0006
    renormalize_logits: true
widget:
  - text: My name is El Microondas the Wise, and
    example_title: El Microondas
  - text: Kennesaw State University is a public
    example_title: Kennesaw State University
  - text: >-
      Bungie Studios is an American video game developer. They are most famous
      for developing the award winning Halo series of video games. They also
      made Destiny. The studio was founded
    example_title: Bungie
  - text: The Mona Lisa is a world-renowned painting created by
    example_title: Mona Lisa
  - text: >-
      The Harry Potter series, written by J.K. Rowling, begins with the book
      titled
    example_title: Harry Potter Series
  - text: >-
      Question: I have cities, but no houses. I have mountains, but no trees. I
      have water, but no fish. What am I?

      Answer:
    example_title: Riddle
  - text: The process of photosynthesis involves the conversion of
    example_title: Photosynthesis
  - text: >-
      Jane went to the store to buy some groceries. She picked up apples,
      oranges, and a loaf of bread. When she got home, she realized she forgot
    example_title: Story Continuation
  - text: >-
      Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph,
      and another train leaves Station B at 10:00 AM and travels at 80 mph, when
      will they meet if the distance between the stations is 300 miles?

      To determine
    example_title: Math Problem
  - text: In the context of computer programming, an algorithm is
    example_title: Algorithm Definition
pipeline_tag: text-generation
model-index:
  - name: smol_llama-220M-GQA-fineweb_edu
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: IFEval (0-Shot)
          type: HuggingFaceH4/ifeval
          args:
            num_few_shot: 0
        metrics:
          - type: inst_level_strict_acc and prompt_level_strict_acc
            value: 19.88
            name: strict accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA-fineweb_edu
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: BBH (3-Shot)
          type: BBH
          args:
            num_few_shot: 3
        metrics:
          - type: acc_norm
            value: 2.31
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA-fineweb_edu
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MATH Lvl 5 (4-Shot)
          type: hendrycks/competition_math
          args:
            num_few_shot: 4
        metrics:
          - type: exact_match
            value: 0
            name: exact match
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA-fineweb_edu
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GPQA (0-shot)
          type: Idavidrein/gpqa
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 1.23
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA-fineweb_edu
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MuSR (0-shot)
          type: TAUR-Lab/MuSR
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 14.26
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA-fineweb_edu
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU-PRO (5-shot)
          type: TIGER-Lab/MMLU-Pro
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 1.41
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=BEE-spoke-data/smol_llama-220M-GQA-fineweb_edu
          name: Open LLM Leaderboard

smol_llama-220M-GQA-fineweb-edu-10BT

This model is a continously pretrained version of BEE-spoke-data/smol_llama-220M-GQA on the 10BT-sample subset of HuggingFaceFW/fineweb-edu.

It achieves the following results on the evaluation set:

  • Loss: 2.7416
  • Accuracy: 0.4560
  • Num Input Tokens Seen: 10810818560

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 80085
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 256
  • optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss Accuracy Input Tokens Seen
2.8567 0.0145 300 2.8291 0.4450 157286400
2.8517 0.0291 600 2.8153 0.4465 314572800
2.8224 0.0436 900 2.8025 0.4481 471859200
2.8178 0.0582 1200 2.7912 0.4495 629145600
2.8001 0.0727 1500 2.7832 0.4505 786432000
2.8045 0.0873 1800 2.7772 0.4512 943718400
2.8019 0.1018 2100 2.7729 0.4516 1101004800
2.7995 0.1164 2400 2.7691 0.4522 1258291200
2.8006 0.1309 2700 2.7657 0.4526 1415577600
2.7886 0.1455 3000 2.7631 0.4528 1572864000
2.7907 0.1600 3300 2.7606 0.4532 1730150400
2.7907 0.1746 3600 2.7588 0.4536 1887436800
2.7788 0.1891 3900 2.7569 0.4537 2044723200
2.7942 0.2037 4200 2.7552 0.4540 2202009600
2.793 0.2182 4500 2.7538 0.4543 2359296000
2.7958 0.2328 4800 2.7526 0.4544 2516582400
2.78 0.2473 5100 2.7515 0.4547 2673868800
2.7937 0.2619 5400 2.7506 0.4548 2831155200
2.7717 0.2764 5700 2.7498 0.4548 2988441600
2.7832 0.2910 6000 2.7490 0.4548 3145728000
2.768 0.3055 6300 2.7482 0.4550 3303014400
2.7653 0.3201 6600 2.7476 0.4551 3460300800
2.7843 0.3346 6900 2.7470 0.4551 3617587200
2.7765 0.3492 7200 2.7464 0.4550 3774873600
2.7778 0.3637 7500 2.7460 0.4552 3932160000
2.7655 0.3783 7800 2.7455 0.4553 4089446400
2.7943 0.3928 8100 2.7449 0.4554 4246732800
2.7715 0.4074 8400 2.7447 0.4552 4404019200
2.7828 0.4219 8700 2.7443 0.4554 4561305600
2.7883 0.4365 9000 2.7440 0.4556 4718592000
2.7627 0.4510 9300 2.7437 0.4556 4875878400
2.7841 0.4656 9600 2.7435 0.4557 5033164800
2.7734 0.4801 9900 2.7433 0.4557 5190451200
2.7829 0.4947 10200 2.7430 0.4557 5347737600
2.781 0.5092 10500 2.7429 0.4557 5505024000
2.7757 0.5238 10800 2.7428 0.4557 5662310400
2.779 0.5383 11100 2.7426 0.4559 5819596800
2.7771 0.5529 11400 2.7425 0.4559 5976883200
2.7828 0.5674 11700 2.7424 0.4560 6134169600
2.7814 0.5820 12000 2.7423 0.4558 6291456000
2.7735 0.5965 12300 2.7422 0.4559 6448742400
2.7848 0.6111 12600 2.7420 0.4559 6606028800
2.7748 0.6256 12900 2.7420 0.4559 6763315200
2.7697 0.6402 13200 2.7419 0.4560 6920601600
2.7689 0.6547 13500 2.7419 0.4560 7077888000
2.7747 0.6692 13800 2.7419 0.4559 7235174400
2.786 0.6838 14100 2.7418 0.4561 7392460800
2.7801 0.6983 14400 2.7417 0.4560 7549747200
2.7658 0.7129 14700 2.7417 0.4561 7707033600
2.7717 0.7274 15000 2.7417 0.4560 7864320000
2.7717 0.7420 15300 2.7417 0.4560 8021606400
2.777 0.7565 15600 2.7417 0.4559 8178892800
2.7793 0.7711 15900 2.7416 0.4560 8336179200
2.7718 0.7856 16200 2.7416 0.4559 8493465600
2.7757 0.8002 16500 2.7416 0.4560 8650752000
2.7763 0.8147 16800 2.7416 0.4559 8808038400
2.7581 0.8293 17100 2.7416 0.4559 8965324800
2.7719 0.8438 17400 2.7416 0.4560 9122611200
2.7609 0.8584 17700 2.7416 0.4560 9279897600
2.7753 0.8729 18000 2.7416 0.4559 9437184000
2.7674 0.8875 18300 2.7415 0.4560 9594470400
2.7601 0.9020 18600 2.7416 0.4560 9751756800
2.7823 0.9166 18900 2.7416 0.4560 9909043200
2.7767 0.9311 19200 2.7416 0.4560 10066329600
2.7759 0.9457 19500 2.7416 0.4560 10223616000
2.7722 0.9602 19800 2.7415 0.4560 10380902400
2.7764 0.9748 20100 2.7416 0.4560 10538188800
2.7724 0.9893 20400 2.7416 0.4559 10695475200

Framework versions

  • Transformers 4.41.1
  • Pytorch 2.3.1+cu118
  • Datasets 2.19.1
  • Tokenizers 0.19.1

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 6.52
IFEval (0-Shot) 19.88
BBH (3-Shot) 2.31
MATH Lvl 5 (4-Shot) 0.00
GPQA (0-shot) 1.23
MuSR (0-shot) 14.26
MMLU-PRO (5-shot) 1.41