louislu9911 commited on
Commit
f65854a
1 Parent(s): f637744

Model save

Browse files
Files changed (2) hide show
  1. README.md +75 -0
  2. modeling_moe.py +75 -0
README.md ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - generated_from_trainer
4
+ datasets:
5
+ - imagefolder
6
+ metrics:
7
+ - accuracy
8
+ model-index:
9
+ - name: MoE-leaf-disease-convnextv2-base-22k-224
10
+ results: []
11
+ ---
12
+
13
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
+ should probably proofread and complete it, then remove this comment. -->
15
+
16
+ # MoE-leaf-disease-convnextv2-base-22k-224
17
+
18
+ This model is a fine-tuned version of [](https://huggingface.co/) on the imagefolder dataset.
19
+ It achieves the following results on the evaluation set:
20
+ - Loss: 0.3415
21
+ - Accuracy: 0.8827
22
+
23
+ ## Model description
24
+
25
+ More information needed
26
+
27
+ ## Intended uses & limitations
28
+
29
+ More information needed
30
+
31
+ ## Training and evaluation data
32
+
33
+ More information needed
34
+
35
+ ## Training procedure
36
+
37
+ ### Training hyperparameters
38
+
39
+ The following hyperparameters were used during training:
40
+ - learning_rate: 5e-05
41
+ - train_batch_size: 2000
42
+ - eval_batch_size: 2000
43
+ - seed: 42
44
+ - gradient_accumulation_steps: 4
45
+ - total_train_batch_size: 8000
46
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
47
+ - lr_scheduler_type: linear
48
+ - lr_scheduler_warmup_ratio: 0.1
49
+ - num_epochs: 16
50
+
51
+ ### Training results
52
+
53
+ | Training Loss | Epoch | Step | Validation Loss | Accuracy |
54
+ |:-------------:|:-----:|:----:|:---------------:|:--------:|
55
+ | No log | 0.8 | 2 | 0.4614 | 0.8808 |
56
+ | No log | 2.0 | 5 | 0.3440 | 0.8827 |
57
+ | No log | 2.8 | 7 | 0.3441 | 0.8827 |
58
+ | 0.3155 | 4.0 | 10 | 0.3439 | 0.8827 |
59
+ | 0.3155 | 4.8 | 12 | 0.3437 | 0.8827 |
60
+ | 0.3155 | 6.0 | 15 | 0.3431 | 0.8827 |
61
+ | 0.3155 | 6.8 | 17 | 0.3426 | 0.8827 |
62
+ | 0.2577 | 8.0 | 20 | 0.3421 | 0.8827 |
63
+ | 0.2577 | 8.8 | 22 | 0.3419 | 0.8827 |
64
+ | 0.2577 | 10.0 | 25 | 0.3417 | 0.8827 |
65
+ | 0.2577 | 10.8 | 27 | 0.3416 | 0.8827 |
66
+ | 0.2601 | 12.0 | 30 | 0.3415 | 0.8827 |
67
+ | 0.2601 | 12.8 | 32 | 0.3415 | 0.8827 |
68
+
69
+
70
+ ### Framework versions
71
+
72
+ - Transformers 4.39.3
73
+ - Pytorch 2.2.1
74
+ - Datasets 2.18.0
75
+ - Tokenizers 0.15.1
modeling_moe.py ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ import torch.nn as nn
3
+ import torch.nn.functional as F
4
+ from transformers import PreTrainedModel, AutoModelForImageClassification
5
+ from .configuration_moe import MoEConfig
6
+
7
+
8
+ def subgate(num_out):
9
+ layers = nn.Sequential(
10
+ nn.Flatten(),
11
+ nn.Linear(224 * 224 * 3, 1024),
12
+ nn.ReLU(),
13
+ nn.Linear(1024, 512),
14
+ nn.ReLU(),
15
+ nn.Linear(512, num_out),
16
+ )
17
+ return layers
18
+
19
+
20
+ class MoEModelForImageClassification(PreTrainedModel):
21
+ config_class = MoEConfig
22
+
23
+ def __init__(self, config):
24
+ super().__init__(config)
25
+ self.num_classes = config.num_classes
26
+ self.switch_gate_model = AutoModelForImageClassification.from_pretrained(
27
+ config.switch_gate
28
+ )
29
+ self.baseline_model = AutoModelForImageClassification.from_pretrained(
30
+ config.baseline_model
31
+ )
32
+ self.expert_model_1 = AutoModelForImageClassification.from_pretrained(
33
+ config.experts[0]
34
+ )
35
+ self.expert_model_2 = AutoModelForImageClassification.from_pretrained(
36
+ config.experts[1]
37
+ )
38
+
39
+ self.subgate = subgate(2)
40
+
41
+ # Freeze all params
42
+ for module in [
43
+ self.switch_gate_model,
44
+ self.baseline_model,
45
+ self.expert_model_1,
46
+ self.expert_model_2,
47
+ ]:
48
+ for param in module.parameters():
49
+ param.requires_grad = False
50
+
51
+ def forward(self, pixel_values, labels=None):
52
+ switch_gate_result = self.switch_gate_model(pixel_values).logits
53
+ expert1_result = self.expert_model_1(pixel_values).logits
54
+ expert2_result = self.expert_model_2(pixel_values).logits
55
+
56
+ # Gating Network
57
+ experts_result = torch.stack(
58
+ [expert1_result, expert2_result], dim=1
59
+ ) * switch_gate_result.unsqueeze(-1)
60
+
61
+ experts_result = experts_result.sum(dim=1)
62
+ baseline_model_result = self.baseline_model(pixel_values).logits
63
+
64
+ subgate_result = self.subgate(pixel_values)
65
+ subgate_prob = F.softmax(subgate_result, dim=-1)
66
+
67
+ experts_and_base_result = torch.stack(
68
+ [experts_result, baseline_model_result], dim=1
69
+ ) * subgate_prob.unsqueeze(-1)
70
+
71
+ logits = experts_and_base_result.sum(dim=1)
72
+ if labels is not None:
73
+ loss = F.cross_entropy(logits, labels)
74
+ return {"loss": loss, "logits": logits}
75
+ return {"logits": logits}