Update README.md with benchmark results

#2
Files changed (1) hide show
  1. README.md +20 -2
README.md CHANGED
@@ -36,16 +36,34 @@ InternLM2.5 has open-sourced a 7 billion parameter base model and a chat model t
36
 
37
  - **Outstanding reasoning capability**: State-of-the-art performance on Math reasoning, surpassing models like Llama3 and Gemma2-9B.
38
 
39
- - **1M Context window**: Nearly perfect at finding needles in the haystack with 1M-long context, with leading performance on long-context tasks like LongBench. Try it with [LMDeploy](https://github.com/InternLM/InternLM/blob/main/chat/lmdeploy.md) for 1M-context inference.
40
 
41
  - **Stronger tool use**: InternLM2.5 supports gathering information from more than 100 web pages, corresponding implementation will be released in [Lagent](https://github.com/InternLM/lagent/tree/main) soon. InternLM2.5 has better tool utilization-related capabilities in instruction following, tool selection and reflection. See [examples](https://github.com/InternLM/InternLM/blob/main/agent/lagent.md).
42
 
43
  ## InternLM2.5-7B-Chat-1M
44
 
45
- InternLM2.5-7B-Chat-1M is the 1M-long-context version of InternLM2.5-7B-Chat. Since huggingface Transformers does not directly support inference with 1M-long context, we recommand to use LMDeploy. The conventional usage with huggingface Transformers is also shown below.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
 
47
  ### LMDeploy
48
 
 
 
 
49
  LMDeploy is a toolkit for compressing, deploying, and serving LLM, developed by the MMRazor and MMDeploy teams.
50
 
51
  Here is an example of 1M-long context inference. **Note: 1M context length requires 4xA100-80G!**
 
36
 
37
  - **Outstanding reasoning capability**: State-of-the-art performance on Math reasoning, surpassing models like Llama3 and Gemma2-9B.
38
 
39
+ - **1M Context window**: Nearly perfect at finding needles in the haystack with 1M-long context, with leading performance on long-context tasks like LongBench. Try it with [LMDeploy](https://github.com/InternLM/InternLM/blob/main/chat/lmdeploy.md) for 1M-context inference and a [file chat demo](https://github.com/InternLM/InternLM/tree/main/long_context).
40
 
41
  - **Stronger tool use**: InternLM2.5 supports gathering information from more than 100 web pages, corresponding implementation will be released in [Lagent](https://github.com/InternLM/lagent/tree/main) soon. InternLM2.5 has better tool utilization-related capabilities in instruction following, tool selection and reflection. See [examples](https://github.com/InternLM/InternLM/blob/main/agent/lagent.md).
42
 
43
  ## InternLM2.5-7B-Chat-1M
44
 
45
+ InternLM2.5-7B-Chat-1M is the 1M-long-context version of InternLM2.5-7B-Chat.
46
+
47
+ ### Performance Evaluation
48
+
49
+ We employed the "*needle in a haystack approach*" to evaluate the model's ability to retrieve information from long texts. Results show that InternLM2.5-7B-Chat-1M can accurately locate key information in documents up to 1M tokens in length.
50
+
51
+ <p align="center">
52
+ <img src="https://github.com/libowen2121/InternLM/assets/19970308/2ce3745f-26f5-4a39-bdcd-2075790d7b1d" alt="drawing" width="700"/>
53
+ </p>
54
+
55
+ We also used the [LongBench](https://github.com/THUDM/LongBench) benchmark to assess long-document comprehension capabilities. Our model achieved optimal performance in these tests.
56
+
57
+ <p align="center">
58
+ <img src="https://github.com/libowen2121/InternLM/assets/19970308/1e8f7da8-8193-4def-8b06-0550bab6a12f" alt="drawing" width="800"/>
59
+ </p>
60
+
61
 
62
  ### LMDeploy
63
 
64
+ Since huggingface Transformers does not directly support inference with 1M-long context, we recommand to use LMDeploy. The conventional usage with huggingface Transformers is also shown below.
65
+
66
+
67
  LMDeploy is a toolkit for compressing, deploying, and serving LLM, developed by the MMRazor and MMDeploy teams.
68
 
69
  Here is an example of 1M-long context inference. **Note: 1M context length requires 4xA100-80G!**