What is the convert method for onnx-web model

#4
by nickelshh - opened

I used optimum-cli export onnx --task text-generation-with-past --opset 14 --device cuda --dtype fp16 --model ~/models/Phi-3.5-mini-instruct/ ./microsoft. and then quantized by --input_folder ./microsoft --output_folder ./microsoft/onnx --modes q4f16 --block_size 128. Seems does not work with onnx-web framework at all. The inference only repeat the same word instead of generate the correct content.

Sign up or log in to comment