Introduction

OpenCodeInterpreter is a family of open-source code generation systems designed to bridge the gap between large language models and advanced proprietary systems like the GPT-4 Code Interpreter. It significantly advances code generation capabilities by integrating execution and iterative refinement functionalities.

OpenCodeInterpreter-DS-1.3B GGUF

Original model: OpenCodeInterpreter-DS-1.3B

Model creator: Multimodal Art Projection Research Community

This repo contains GGUF format model files for Multimodal Art Projection Research Community (M-A-P)’s OpenCodeInterpreter-DS-1.3B.

The introduction of large language models has significantly advanced code generation. However, open-source models often lack the execution capabilities and iterative refinement of advanced systems like the GPT-4 Code Interpreter. To address this, we introduce OpenCodeInterpreter, a family of open-source code systems designed for generating, executing, and iteratively refining code. Supported by Code-Feedback, a dataset featuring 68K multi-turn interactions, OpenCodeInterpreter integrates execution and human feedback for dynamic code refinement. Our comprehensive evaluation of OpenCodeInterpreter across key benchmarks such as HumanEval, MBPP, and their enhanced versions from EvalPlus reveals its exceptional performance. Notably, OpenCodeInterpreter-33B achieves an accuracy of 83.2 (76.4) on the average (and plus versions) of HumanEval and MBPP, closely rivaling GPT-4's 84.2 (76.2) and further elevates to 91.6 (84.6) with synthesized human feedback from GPT-4. OpenCodeInterpreter brings the gap between open-source code generation models and proprietary systems like GPT-4 Code Interpreter.

Learn more on M-A-P’s Model page.

What is GGUF?

GGUF is a file format for representing AI models. It is the third version of the format, introduced by the llama.cpp team on August 21st 2023. It is a replacement for GGML, which is no longer supported by llama.cpp. Converted using llama.cpp build 2276 (revision b11a93d)

Prompt template

<|User|>
{{prompt}}

<|Assistant|>

Download & run with cnvrs on iPhone, iPad, and Mac!

cnvrs is the best app for private, local AI on your device:

create & save Characters with custom system prompts & temperature settings
download and experiment with any GGUF model you can find on HuggingFace!
make it your own with custom Theme colors
powered by Metal ⚡️ & Llama.cpp, with haptics during response streaming!
try it out yourself today, on Testflight!
follow cnvrs on twitter to stay up to date

Original Model Evaluation

The performance of the OpenCodeInterpreter-DS-1.3B is highlighted below, showcasing the improvements when execution feedback is incorporated. Scores are presented for two benchmarks: HumanEval and MBPP, with an average increase indicated to demonstrate the overall enhancement in performance.

Benchmark	HumanEval (+)	MBPP (+)	Average (+)
OpenCodeInterpreter-DS-1.3B	65.2 (61)	63.4 (52.4)	64.3 (56.7)
+ Execution Feedback	65.2 (62.2)	65.2 (55.6)	65.2 (58.9)
—	—	—	—
GPT-3.5-Turbo	72.6 (65.9)	81.7 (69.4)	77.2 (67.7)
+ Execution Feedback	76.8 (70.7)	87.0 (73.9)	81.9 (72.3)

Note: The "(+)" notation represents scores from extended versions of the HumanEval and MBPP benchmarks. To ensure a fair comparison, the results shown for adding execution feedback are based on outcomes after just one iteration of feedback, without unrestricted iterations. This approach highlights the immediate impact of execution feedback on performance improvements across benchmarks.