What is the convert method for onnx-web model

by nickelshh - opened 1 day ago

1 day ago

I used optimum-cli export onnx --task text-generation-with-past --opset 14 --device cuda --dtype fp16 --model ~/models/Phi-3.5-mini-instruct/ ./microsoft. and then quantized by --input_folder ./microsoft --output_folder ./microsoft/onnx --modes q4f16 --block_size 128. Seems does not work with onnx-web framework at all. The inference only repeat the same word instead of generate the correct content.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment