Gryphe/Pantheon-RP-1.5-12b-Nemo · A small opinion. (Now a long feedback thread!)

Aug 4

•

The model is really good, downloaded it GGUF from mradermacher. The model understands complex characters where aggression borders on vulnerability, usually such characters are called tsundere. Even the big models LLama 3 70b+, WizardLM 2 8*22 did not cope well with this char, as a result reducing his behaviour to the average. Thank you for your hard work!

Gryphe

Owner Aug 5

That's great to hear! Pantheon's training is supposed to do exactly that, so I'm glad to hear it's working.

traveltube

Aug 5

•

edited Aug 5

I'm liking it a lot as well after switching back to chatml format as you suggested. This improves upon the last model for sure - it's descriptive in all the right ways, overall noticeably smarter, better at knowing human anatomy and limitations, longer context, better at keeping track of details, and writes very well - better in basically every way to 1.0. I'd like to compare it with other nemo finetunes/ merges so far. Interestingly, while it struggles a little in picking up the gist, after giving it a bit of a nudge and edits on the initial step it really gets what you are getting at, goes off and is smart at pick up relevant details from thereon. This is kind of the opposite situation compared to other nemo finetunes that somehow seem like they are better at picking up what you want to get at initially, but fleshing out the details after that seem like more of a struggle and they would slowly revert to a neutral boring AI way of writing over time. I'm not sure why this happens but maybe others may know better, if this is indeed a real thing. I also needed to logit bias away variations of the word "dark" as it kept popping up despite the situation probably not being related at the time, but this is the only word that I had to do this to. That being said, it still sometimes mixes up some details and more frequently pronouns from he/she to you, some spelling errors, but it's all easily correctable enough and doesn't usually interfere with the responses as a whole, but it's probably a nemo/llm problem overall. There is a small consistency issue when the context gets a bit longer, and a mild repetition problem in mostly regards to formatting and paragraph structure. Regardless, overall this is excellent and one of if not the best nemo finetunes so far in my experience.

SaisExperiments

Aug 5

•

edited Aug 5

I noticed it liked to cling to a pattern of speech.
Eg.

I bet you ... . I bet you ... . I bet you ... . I bet you ... .

This is from one 107 token message with details removed x•x

106 token message:

I can just make... . I can make... . I can make... . I can make...

traveltube

Aug 5

I noticed it liked to cling to a pattern of speech.
Eg.

I bet you ... . I bet you ... . I bet you ... . I bet you ... .

This is from one 107 token message with details removed x•x

I had a bit of that issue too until I switched to chatml, with rep penalty 1.05 and DRY 0.8/1.75/2/0. Not sure if you are already using those though, but if you aren't, see if that helps!

Gryphe

Owner Aug 5

Sadly repetition is a typical issue with Mistral-trained models, and hard to get rid of.

I eventually decided on Nemo for the smarter brain it offers, but I do plan on following the same multi-stage finetuning sequence for a Llama 3.1 8B model to see how it compares.

Ransss

Aug 5

Hat down to you sir, Professionalism, as always.

Chief-Inspector

Aug 5

Gryphe, Thank you so much - this is the perfect model for me! <3

Diavator

Aug 5

•

edited Aug 5

I'm liking it a lot as well after switching back to chatml format as you suggested. This improves upon the last model for sure - it's descriptive in all the right ways, overall noticeably smarter, better at knowing human anatomy and limitations, longer context, better at keeping track of details, and writes very well - better in basically every way to 1.0. I'd like to compare it with other nemo finetunes/ merges so far. Interestingly, while it struggles a little in picking up the gist, after giving it a bit of a nudge and edits on the initial step it really gets what you are getting at, goes off and is smart at pick up relevant details from thereon. This is kind of the opposite situation compared to other nemo finetunes that somehow seem like they are better at picking up what you want to get at initially, but fleshing out the details after that seem like more of a struggle and they would slowly revert to a neutral boring AI way of writing over time. I'm not sure why this happens but maybe others may know better, if this is indeed a real thing. I also needed to logit bias away variations of the word "dark" as it kept popping up despite the situation probably not being related at the time, but this is the only word that I had to do this to. That being said, it still sometimes mixes up some details and more frequently pronouns from he/she to you, some spelling errors, but it's all easily correctable enough and doesn't usually interfere with the responses as a whole, but it's probably a nemo/llm problem overall. There is a small consistency issue when the context gets a bit longer, and a mild repetition problem in mostly regards to formatting and paragraph structure. Regardless, overall this is excellent and one of if not the best nemo finetunes so far in my experience.

All my RPs have also been reduced from the format "he/she" to the response format - "you". There were problems with formatting, the model does not like to put speech in "dialogue" in inverted commas. reducing everything to the format: action dialogue action. But against the thought-provoking and deeply emotional responses of this model, such problems simply pale into insignificance.
I notice the model has a favourite swear word, in various forms: "Fuck me sideways with a (cactus)". The word in quotes varies regularly, I've already seen 3 variants with a rusty chainsaw, a cactus and a pillar.

I use the Silly Tavern, and this model writes very huge messages, even the token restrictions do not help. To load the model, I use the kobold cpp. Can anyone suggest a solution to the problem?

traveltube

Aug 5

Hmm maybe you can try trimming incomplete sentences and tell it to be concise in the instruct prompt? I guess it depends on the person because some ppl prefer longer messages haha.

Diavator

Aug 5

Hmm maybe you can try trimming incomplete sentences and tell it to be concise in the instruct prompt? I guess it depends on the person because some ppl prefer longer messages haha.

In my opinion, 1800 tokens is too much!Instead of RP, we get storytelling, albeit high-quality, but you also want to participate in it, and not just read.)))

traveltube

Aug 5

The ones I get back are usually around 200-400 tokens. If you start a chat with the initial message being long and you let it add more and more though, I can see it snowballing onto a giant essay per response lol. Like with most models if you trim the responses down for the first few, it usually doesn't snowball more than that.

Diavator

Aug 5

The ones I get back are usually around 200-400 tokens. If you start a chat with the initial message being long and you let it add more and more though, I can see it snowballing onto a giant essay per response lol. Like with most models if you trim the responses down for the first few, it usually doesn't snowball more than that.

Yes, that's exactly what happens. Apparently I'm so used to MLewd-ReMM-L2-Chat-20B and Noromaid-v0.4-Mixtral-Instruct-8x7b, I haven't noticed such problems with them. And my last RPs were on 70b+ models on together.ai, they may be verbose but they keep the instructions in size very well. It's a pity that no one will undertake to finish teaching WizardLM 2 8*22, I think that this model looks very advantageous against the background of LLAMA 3.

Gryphe

Owner Aug 6

Hey all, fantastic to see all the back and forth going on in this thread - I haven't quite experienced any issues with message lengths myself in my extensive testing but I exclusively use GGUFs for that, which might result in a different result. The rebuilt persona dataset consists of full 4k dialogue examples and that could be a possible reason why it is biased towards producing longer content. (Or it's just Nemo.)

Either way, all your feedback (both positive and negative) is super valuable and I'm already brainstorming some ideas for the next iteration. Despite Nemo's shortcomings I'm going to continue using it a while longer simply because Llama 3.1 8B pales in comparison when it comes to complex roleplay.

Gryphe changed discussion title from A small opinion. to A small opinion. (Now a long feedback thread!) Aug 6

SaisExperiments

Aug 7

•

edited Aug 7

I haven't quite experienced any issues with message lengths myself in my extensive testing

I find with nemo you can make it use any message length you want, if you regenerate enough times you'll end up with the right length. If you do that for 2 - 3 messages it sticks to that length pretty well. I've had chats with average message lengths of 300ish tokens and chats with an average of below 50 tokens

And Nemo is much better than llama3 or 3.1 it's also not so overly verbose it gets boring like llama3 (I prefer short messages & first person which llama sucked at). Nemo reminds me a lot of solar but smarter again. At Q4_K_S it's better than LLama3 at Q6_K for what I like. And just fits into 8GB at 16k. Gemma and its 5.5GB context at 16k + model size @_@

Edit - Just to add on, I feel it would make an awesome ingredient in a merge as a bandage to many common rp model faults this model fixes

Duttones

Aug 7

•

edited Aug 7

Lol, my experience is the opposite as I had issue by getting short content instead of longer content, haha. I noticed that it is really following the pattern of the chat, so if you have 2-3 short messages at the beginning of the chat, it just reproduce same length.

I've been using mostly for general roleplaying with my own character cards, I didn't even noticed the persona things that comes with the model hehe

Btw, amazing model, I'm sure this one will get very popular soon.

Gryphe

Owner Aug 7

I'm currently cooking a variation of Pantheon that's focused on novel-style RP rather then Markdown-style RP as I discovered I can seamlessly transform the datasets back-and-forth. If successful I'll publish it as Pantheon-NovelRP-1.5. I will try to combine these datasets in later iterations, but for now I believe that specializing in a single style produces better results for models in this size range. Wonder what would happen if I merged the two... (My todo list is already way too long for my liking.)

I'm also fully aware few folks bother with the extensive personas that make Pantheon, well, Pantheon, but the entire idea is that these traits, situations and personalities bleed through to general roleplay. Though, funnily enough, in some cases it might bleed through a bit too much... "Fuck me sideways" being a recurring phrase from Stella.

Either way there's still plenty of ideas that I want to incorporate for future versions but those will take time to build, test and train, hence my sporadic release schedule.

Diavator

Aug 7

I hope I haven't bored you yet, so here's another interesting observation. Perhaps it will help develop the model in future iterations.
I've noticed that all new models can be divided into 2 types: with NSFW consent and without consent, but the problem is that they are two opposites. Either very aggressively ignoring the user, where the default model behaves like a cat in mating season and there's no way you can appease it, or asking for consent 3-5 times, and in both cases it gets infuriating during RP, incredibly killing the whole immersion.
I don't know what this is related to, but older models were more laconic in this behaviour. If the character they were portraying was aggressive, they acted aggressive without asking for NSFW consent, and if the character was calm, they asked permission...
Unsurprisingly, most both used the old models on Llama 2 and still do, as they are more appropriate and human, albeit less intelligent.

SaisExperiments

Aug 7

•

edited Aug 7

I've noticed that all new models can be divided into 2 types: with NSFW consent and without consent, but the problem is that they are two opposites. Either very aggressively ignoring the user, where the default model behaves like a cat in mating season and there's no way you can appease it, or asking for consent 3-5 times, and in both cases it gets infuriating during RP, incredibly killing the whole immersion.

Summarized my feelings really well, it was why I hated 90% of llama3 finetunes, I found myself leaning into merges that more evenly balanced it. It was either complete reluctance to the point of being annoying or so eager the character is all over you in 3 messages (rather unrealistic & awkward for cards with familial relations). Nemo seems to get the idea better in that it will usually ask once, but won't ask a hundred damn times.
But solar still wins in that area, somehow managing to know to ask consent or not depending on how the user acts.

Also an interesting thing I found messing around in ST, disabling instruct mode makes messages notably shorter.
No instruct: 108+141+116+132+90=587.463/5=108.4
Instruct: 306+214+340+138+240=1238/5=247.6

And it's just standard ST Mistral

Write {{char}}'s next reply in this fictional roleplay with {{user}}.

I bring this up for those that struggle with message length, I found disabling it for one message, then enabling it again from there brought following messages lengths down a lot

Diavator

Aug 7

I've noticed that all new models can be divided into 2 types: with NSFW consent and without consent, but the problem is that they are two opposites. Either very aggressively ignoring the user, where the default model behaves like a cat in mating season and there's no way you can appease it, or asking for consent 3-5 times, and in both cases it gets infuriating during RP, incredibly killing the whole immersion.

Summarized my feelings really well, it was why I hated 90% of llama3 finetunes, I found myself leaning into merges that more evenly balanced it. It was either complete reluctance to the point of being annoying or so eager the character is all over you in 3 messages (rather unrealistic & awkward for cards with familial relations). Nemo seems to get the idea better in that it will usually ask once, but won't ask a hundred damn times.
But solar still wins in that area, somehow managing to know to ask consent or not depending on how the user acts.

Also an interesting thing I found messing around in ST, disabling instruct mode makes messages notably shorter.
No instruct: 108+141+116+132+90=587.463/5=108.4
Instruct: 306+214+340+138+240=1238/5=247.6
And it's just standard ST Mistral
Write {{char}}'s next reply in this fictional roleplay with {{user}}.
I bring this up for those that struggle with message length, I found disabling it for one message, then enabling it again from there brought following messages lengths down a lot

I'm not one of those RP fans who are only interested in NSFW with bots, but I have nothing against it when it succinctly and logically fits into the story. But since I'm a girl and interact mostly with male characters, I'd like to see something in between from the model, without asking too many questions but also without rape for the 3rd post.

I really liked Pantheon, but the model still asked my consent quite often. On a test on a rather aggressive character, she asked for my consent 3 times in a row.

Her understanding of the environment and nuances not described in the character card is great! The way she described my waterman, realised that he swims, that after the belt he has a fish tail instead of legs, is even better than GPT4o and Claude 3.5 Sonnet. So realistically and accurately in my year of RP experience on the LLM no other model has been able to do so, only MM and WizardLM 2 came closest.

I use the simple setup too, just for ChatML.