Alibaba-NLP/gte-Qwen2-7B-instruct · Running this model without flash-attn

Jul 10

•

Hi, is it possible to run this model without flash-attn? It seems to be a requirement in your code at the moment (e.g. modeling_qwen.py).

DKSadx

Jul 18

Hi, I managed to run it without installing flash-attn on SageMaker. Looking at the code, it checks if it's installed and if it is not, it skips the import

from transformers.utils import (
    add_start_docstrings,
    add_start_docstrings_to_model_forward,
    is_flash_attn_2_available,
    is_flash_attn_greater_or_equal_2_10,
    logging,
    replace_return_docstrings,
)


if is_flash_attn_2_available():
    from flash_attn import flash_attn_func, flash_attn_varlen_func
    from flash_attn.bert_padding import index_first_axis, pad_input, unpad_input  # noqa

    _flash_supports_window_size = "window_size" in list(inspect.signature(flash_attn_func).parameters)

michaelfeil

Jul 21

•

edited Jul 21

Bumping this request: This model creates a a bad user experience - https://github.com/michaelfeil/infinity/issues/308 -> flash-attn is checked and raises.
I'd suggest to PR this remote code to the actual transformers lib for a better experience. The embedding model does not employ window_attn, hence there should be little need for FA2.