Usage example for the visual model

#3
by Mihaiii - opened

"Bonus: Included is an implementation of a Vision Language Model that has undergone Locked-Image Tuning."

How to use it as a visual model? Please provide inference code.

This comment has been hidden

The model inference should refer to THUDM/glm-4-9b-chat-1m and THUDM/glm-4v-9b.

JosephusCheung changed discussion status to closed

that is not nice; those models do not document well the chat template; nor do they give clear examples.
Can we please have some simple examples?

I was able to find this template; but it's hardly clear what a conversation should look like

[gMASK]<sop>{% for item in messages %}{% if item['tools'] is defined %}<|system|>\n你是一个名为 GLM-4 的人工智能助手。你是基于智谱AI训练的语言模型 GLM-4 模型开发的,你的任务是针对用户的问题和要求提供适当的答复和支持。\n\n# 可用工具{% set tools = item['tools'] %}{% for tool in tools %}{% if tool['type'] == 'function' %}\n\n## {{ tool['function']['name'] }}\n\n{{ tool['function'] | tojson(indent=4) }}\n在调用上述函数时,请使用 Json 格式表示调用的参数。{% elif tool['type'] == 'python' %}\n\n## python\n\n当你向 python 发送包含 Python 代码的消息时,该代码将会在一个有状态的 Jupyter notebook 环境中执行。\npython 返回代码执行的输出,或在执行 60 秒后返回超时。\n/mnt/data 将会持久化存储你的文件。在此会话中,python 无法访问互联网。不要使用 python 进行任何网络请求或者在线 API 调用,这些在线内容的访问将不会成功。{% elif tool['type'] == 'simple_browser' %}\n\n## simple_browser\n\n你可以使用 simple_browser 工具。该工具支持以下函数:\nsearch(query: str, recency_days: int):使用搜索引擎进行查询并显示结果,可以使用 recency_days 参数控制搜索内容的时效性。\nmclick(ids: list[int]):获取一系列指定 id 的页面内容。每次调用时,须选择3-10个页面。选择多个角度的页面,同时尽可能选择可信任的信息来源。考虑到部分页面是无法加载的,你也可以多打开一些可能有用的页面而不用担心内容过多。\nopen_url(url: str):打开指定的 URL。\n\n使用 【{引用 id}†{引用文本}】 来引用内容。\n\n操作步骤:1. 使用 search 来获得信息列表; 2. 使用 mclick 来获取指定 ID 页面的内容; 3. 根据获得的内容进行回复。在回复中应当引用信息来源。\n 如果用户提供了 URL,也可以用 open_url 直接打开页面。\n如果初次搜索结果没有找到合适的信息,也可以再次使用 search 进行搜索。{% elif tool['type'] == 'cogview' %}\n\n## cogview\n\n如果用户的请求中包含了对图像的描述,你可以使用 cogview 来生成图像并展示给用户。你需要向 cogview 发送图像描述,规则:\n- 发送给 cogview 的消息必须使用英语。用户的中文描述必须完全翻译为英语。\n- 应当尽可能详细地描述图像生成的需求,需求描述约 100 英文单词。\n- 保持用户原始描述的意图。不要虚构内容或者没见过的人物。\n- 如无特殊说明,所在地为中国,持有中国立场并遵循中国社会主义价值观。{% endif %}{% endfor %}{% endif %}{% if item['content'] %}<|{{ item['role'] }}|>{{ item['metadata'] }}\n{{ item['content'] }}{% endif %}{% endfor %}{% if add_generation_prompt %}<|assistant|>{% endif %}

Please consult to the usage of tokenizer.apply_chat_template: https://huggingface.co/docs/transformers/chat_templating

There is no need to do it manually. And it should always be of the same case as THUDM/glm-4-9b-chat-1m and THUDM/glm-4v-9b, all works well with transformers.

Chat template are implemented for both text-only and main revisions.

I like to do it manually, so that i can understand how it works.

i get it, you dont wanna help. Thats an acceptable answer. thank you for your time.

I figured it out:

[gMASK]<sop><|system|>
the system prompt
<|user|>
the user prompt
<|assistant|>

thats all i wanted to know

I am afraid that is incorrect. It seems that there is no newline after content, right before <|role|>. That's why I do not recommend you write it manually.

And it is somehow quite awkward to write about function calls and images input out manually. Another hint for images input is that, there's special tokens before and after image embeds, if you insist to do it by yourself.

And the other point is that there's always a newline after <|assistant|>, and if not, the model will always try to generate one.

Thank you for the correction! That's very helpful

Corrected

[gMASK]<sop><|system|>
the system prompt<|user|>
the user prompt<|assistant|>

Sign up or log in to comment