FUXI
/

Multi-modal_10B_CN

Model card Files Files and versions Community

Multi-modal_10B_CN / README.md

Bingleng's picture

Update README.md

d3dbda0 12 months ago

|

No virus

2.34 kB

	---
	license: apache-2.0
	---
	# Chinese Visual-language Multi-modal models for captions and robot actions

	## Release
	- [9/22] 🔥 We release two major models. The CN-caption model is for accurate chinese image captioning while robot action model is for demo-level robot action.

	## Contents
	### CNCaption models
	This model can provide accurate and fine-grained Chinese descriptions of given images.


	### robot models
	This model can provide accurate instructions for robot actions

	## Install

	1. Install Package
	```Shell
	conda create -n llava python=3.10 -y
	conda activate llava
	pip install --upgrade pip
	pip install -e .
	```

	2. Install additional packages for training cases
	```
	pip install ninja
	pip install flash-attn --no-build-isolation
	```

	## Demo

	To run our demo, you need to prepare LLaVA checkpoints locally. Please follow the instructions [here](#llava-weights) to download the checkpoints.

	### Gradio Web UI

	To launch a Gradio demo locally, please run the following commands one by one. If you plan to launch multiple model workers to compare between different checkpoints, you only need to launch the controller and the web server ONCE.

	#### Launch a controller
	```Shell
	python -m llava.serve.controller --host 0.0.0.0 --port 10000
	```

	#### Launch a gradio web server.
	```Shell
	python -m llava.serve.gradio_web_server --controller http://localhost:10000 --model-list-mode reload
	```
	You just launched the Gradio web interface. Now, you can open the web interface with the URL printed on the screen worker.

	#### Launch a model worker

	This is the actual worker that performs the inference on the GPU. Each worker is responsible for a single model specified in `--model-path`.

	```Shell
	python -m llava.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path {--model-path}
	```

	## API inference
	We also provide an API interface for more convenient use. We provide server-side startup scripts and client-side test code here.


	### Server
	```Shell
	python -m llava.serve.controller --host 0.0.0.0 --port 10000
	python -m llava.serve.model_worker --host 0.0.0.0 --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path {--model-path}
	```

	### Client

	```Shell
	python req_test.py ${text} ${image}
	```