Nomic Embeddings API vs Transformers output

#3
by rajaiswal - opened

I have this code:

    import torch.nn.functional as F
    from transformers import AutoModel, AutoImageProcessor
    from PIL import Image

    img = Image.open("./image.jpg")

    processor = AutoImageProcessor.from_pretrained("nomic-ai/nomic-embed-vision-v1")
    vision_model = AutoModel.from_pretrained("nomic-ai/nomic-embed-vision-v1",
                                             trust_remote_code=True)

    inputs = processor([img], return_tensors="pt")

    img_emb = vision_model(**inputs).last_hidden_state
    img_embeddings = F.normalize(img_emb[:, 0], p=2, dim=1)
    print(img_embeddings[0][:5])

This prints: tensor([-0.0672, -0.0483, -0.0122, -0.0547, -0.0542], grad_fn=<SliceBackward0>)

    import nomic
    from PIL import Image
    from nomic import embed

    nomic.cli.login(NOMIC_API_KEY)

    output = embed.image(
        images=[img],
        model='nomic-embed-vision-v1',
    )

    print(output["embeddings"][0][:5])

This prints: [-0.06616211, -0.072265625, 0.002506256, -0.05718994, -0.04675293]

Both should produce vectors with similar values, right? What am I missing? And like the outputs from nomic case how to we get values with more precision for the transformers case?

Nomic AI org

hmm thanks for raising this. when i deployed the models they had equivalent outputs but seems like something’s gone wrong. i will investigate and get back to you asap

Nomic AI org

The only immediate thing I would check is if you get similar values for running the transformers model in fp16. the model we have running in production is run with that same precision

@zpn Thanks for the quick response. I tried what you suggested, it did not make the output similar nor did it increase precision

Nomic AI org

Ok, will dig in! apologies for this

Nomic AI org

@rajaiswal ok i believe i've identified an issue. it seems like something is going on when we upload the image via bytes to our API. trying to debug a bit more. a workaround (if possible) is to instead pass urls to the API and you should see similar results. The error should be around

np.abs(emb - nom_emb).min()=0.0
np.abs(emb - nom_emb).mean()=6.73e-05
np.abs(emb - nom_emb).max()=0.0003052

@zpn Thanks for working on this. Tried your suggestion, yep I am getting same results if I pass in url to the Nomic API. Does that mean the issue is on the Nomic API side and not on the transformers side?

Nomic AI org
edited Jun 8

do you mind sharing the image so I can test? it appears there may be a bug in our api, we’re working to fix it asap

@zpn What I meant was - Yes if I use the image_url then both the transformers and API produce the same results.
transformers output: tensor([-0.0201, 0.0056, -0.0255, -0.0168, -0.0528], dtype=torch.float16, grad_fn=<SliceBackward0>)
API output: [-0.020095825, 0.0056610107, -0.025756836, -0.016479492, -0.052612305]
Here is the image I am testing with url: https://m.media-amazon.com/images/M/MV5BZTc0ZjNkYTktMmJmOS00OTJlLTg1NWUtMzQ5ZGMxM2NhY2M0L2ltYWdlL2ltYWdlXkEyXkFqcGdeQXVyNTAyODkwOQ@@._V1_.jpg

But does this mean that the bug lies within Nomic's hosted API and not with the transformers implementation?

Nomic AI org
edited Jun 8

Oh! I realize it is a client side "bug". If you upload the image directly, we resize the image otherwise it'll be too big for the request: https://github.com/nomic-ai/nomic/blob/main/nomic/embed.py#L342

The API and the transformers should be equivalent. When I've tested for a fixed input, they return nearly identical values.

I think the optimal solution is to use the URLs if possible

@zpn Okay that explains the difference. Thanks for investigating! Also any way we can get more precision using transformers like we get from the API? In the text model I get less precision using transformers but more precision using sentence_transformers. Something like that for vision model?

Sign up or log in to comment