Potential base for one of your new models.

#1
by DazzlingXeno - opened

I made a merge with 32k context. It might be a good base for one your tour future creations.

DazzlingXeno/Westlake-Scribe

@DazzlingXeno

Thank you for the heads-up - I'll check it out!

Have a lovely day!

I've no idea how useful it'll be, hope it is though

@DazzlingXeno

It looks interesting! I haven't yet tried it out, but it looks like you replicated WestLake-11B's stack pattern? Seems like you put in some care when formulating the recipe. I'm not familiar with Scribe.

Why do you think it works at 32K? I'm curious what your thoughts are. Mistral's trained context is actually 8K, so it's nifty that your model can go so much further. While Fimbulvetr-11B-v2 was trained on only 4K, Chaifighter-v3 can do 16K using RoPE, but my hardware isn't really up to the task, so it's much harder to test if it works.

I haven't done much testing but the models used in the merge all have 32k context.

@DazzlingXeno

I'd be cautious about that. Mistral 7B models say that they support 32K in the config, but it's not actually true. The trained context length is 8K, and I believe the 32K has to do with some of the SWA that they were trying to get people to use. It doesn't really say it anywhere, though, and it also breaks RoPE, which also stinks.

But yeah. For that reason, I'd assume that your merge has a native context window of 8K.

Have a wonderful day!!

Sign up or log in to comment