Control vector discussion

#2
by ChuckMcSneed - opened

Continuation of:
https://huggingface.co/jukofyork/Dark-Miqu-70B/discussions/3
I've succeeded in removing slop from CR+ for both sfw and nsfw scenarios using control vectors. Strangely, sfw unslop control vector did not affect nsfw slop, and nsfw control vector made model extra horny, which in my opinion is an undesirable side effect. While sfw vector managed to stay coherent during my stress tests, nsfw vector caused poor commandr to disintegrate, it didn't know what to say without any of those overused phrases in erotic fiction that the control vector stopped from appearing. Looks like the issue for nsfw is at much deeper level: the data where the model gets it from is very monotonous, and when forced write in different style, it doesn't know what to do. This is what most likely makes it incredibly difficult to remove nsfw slop using regular prompting techniques.

Well darn...

I'm making more progress with control vectors!
https://huggingface.co/ChuckMcSneed/control_vectors/blob/main/command-r-plus/bio/control_vector-commandr-bio.gguf
I tuned this one on very descriptive biological language as positive and vague flowery prose as negative. Seems to make it more aware of the biology and surroundings of characters.
https://huggingface.co/ChuckMcSneed/control_vectors/blob/main/command-r-plus/incharacter/control_vector-commandr-incharacter.gguf
This one makes the model act slightly more in character, but the improvement is not very significant as commandr is already quite good at it.

nsfw vector caused poor commandr to disintegrate, it didn't know what to say without any of those overused phrases in erotic fiction that the control vector stopped from appearing. Looks like the issue for nsfw is at much deeper level: the data where the model gets it from is very monotonous, and when forced write in different style, it doesn't know what to do.

This may actually just be a problem with the "two class" control vectors! I have managed to even completely stop a model from being able to write a story because of this... To explain the problem in simple terms:

Think about a clock face with a shorter hour hand and a longer minute hand:

  • When the time is 12:00 both hands point in the same direction, but there is still a gap between the tips of the two hands. These sort of vectors are not what we want at all because moving in either direction will just make the model more or less "storyish", and ultimately these are what cause the mode to get crippled like you describe. Even times like 12:05 or 11:50 have this same problem.
  • When the time is 6:00, 5:25, etc the the two hands point in opposite directions and this is a good control vector that clearly moves from undesirable to desirable direction.

This is the problem I'll been grappling with for the last 2 weeks:

  • If the "hands" are both long and well defined then cosine similarity works fine: it outputs a number similar to correlation and 1.0 is like the 12:00 example above and -1.0 is like the 6:00 example above (and 0.0 is like 3:00 or 9:00; ie: 90 degrees). This can then be used to filter out these shitty "storyish" directions, but...
  • There isn't really a good reason that the things we are interested in create a clear "axis" like this, and it turns out that often the case will be like a really long minute hand and a tiny/stubby hour hand... Cosine similarity doesn't work in this case as the direction of the tiny hand has noise added to it and can point in wildly different directions as a result.

So after lots of experimenting with this, I think I may finally have worked out a method of detecting these shitty directions:

Flip the direction of one of the hands and see if it gets easier to discriminate between our two classes!!!

  • If the time is 12:00 and you flip either hand to get 6:00 or 12:30 then it's clear the gap between the tips of the hands has increased! This is a shitty direction for a control vector.
  • If the time is 6:00 and you flip either hand then the gap has clearly decreased! This is a good direction for a control vector.
  • This works fine even when one hand is tiny in length.
  • This works for 12:05, 11:50 6:00, 5:25, type directions.
  • The like 3:00 or 9:00 type directions (ie: 90 degrees) are the directional pairs where we get no change.

So what I am doing now is performing SVD to decompose the gap into lots of directions, testing each one and only keeping those that pass the above test, then finally reconstructing the final direction to only include the "good" directions.

I still need to run some more tests but will likely have this perfected in a couple of days and will upload the new control vectors and the code to create your own.

Also @BigHuggyD @ChuckMcSneed you might find this interesting if you are using command-r models:

https://huggingface.co/datasets/froggeric/creativity/discussions/6#66851beae526dd77799c25bd

I'm making more progress with control vectors!
https://huggingface.co/ChuckMcSneed/control_vectors/blob/main/command-r-plus/bio/control_vector-commandr-bio.gguf
I tuned this one on very descriptive biological language as positive and vague flowery prose as negative. Seems to make it more aware of the biology and surroundings of characters.
https://huggingface.co/ChuckMcSneed/control_vectors/blob/main/command-r-plus/incharacter/control_vector-commandr-incharacter.gguf
This one makes the model act slightly more in character, but the improvement is not very significant as commandr is already quite good at it.

I'm making more progress with control vectors!
https://huggingface.co/ChuckMcSneed/control_vectors/blob/main/command-r-plus/bio/control_vector-commandr-bio.gguf
I tuned this one on very descriptive biological language as positive and vague flowery prose as negative. Seems to make it more aware of the biology and surroundings of characters.
https://huggingface.co/ChuckMcSneed/control_vectors/blob/main/command-r-plus/incharacter/control_vector-commandr-incharacter.gguf
This one makes the model act slightly more in character, but the improvement is not very significant as commandr is already quite good at it.

I'll have to look into your method as I'm currently using 30,000 samples to do what you look to be doing with 5!? I think my collection of story prompts are a bit shit as it's pretty hard to write a Grimdark story when the prompt says "Write a story about being overjoyed on the day of your graduation." or similar :/

I definitely think you need more samples though. PCA is basically just eigen-decomposition of a covariance matrix, and statistically it can be shown in the very best case you need O(d) samples to reliably estimate the covariance matrix:

https://stats.stackexchange.com/questions/90045/how-many-samples-are-needed-to-estimate-a-p-dimensional-covariance-matrix

and command-r-plus has around 11.5k variables in its hidden dimension and most other large 70b+ models have 8192 variables per hidden dimension.

I'm using 2 classes and a baseline, 10 system prompts per triple, and 1k prompts per system prompt = 3 x 10 x 1000 = 30000 samples. But I also have matched pairs that get subtracted from the baseline which should reduce the error in the covariance matrix even further.

A simple hacky test you could try would be to train your control vectors 5 times but leave one of the 5 prompts out each time. Then test and see if you get wildly different results... If you do then you need to increase the sample size, but if you don't then this must mean that only a tiny tiny fraction of command-r-plus's 11.5k variables are changing hugely in magnitude for your prompts (which would be very surprising).

I'm using 2 classes and a baseline, 10 system prompts per triple, and 1k prompts per system prompt = 3 x 10 x 1000 = 30000 samples. But I also have matched pairs that get subtracted from the baseline which should reduce the error in the covariance matrix even further.

Oh wow... That's real huge... Are all of those synthetic? I'm using high quality "cyborg" data: generated by model, but heavily edited by human(me) as positive, "mean" method; more time for me goes into dataset generation than into training. You know that the models have in-context learning, so my theory was that if I show it how to write(cyborg) vs how not to write(synthetic), I would get a better control vector out of it than when I just trhow it some starters with a prompt, and it seems to do just as I want. In the stories part, I try to keep as few variables from changing as possible, so they don't get affected by control vector. Also keeping the prompts equal length helps with the quality of the control vector, especially when they are short, >400token prompts can take 10 token variation much better than <100token prompts.

I'll have to look into your method as I'm currently using 30,000 samples to do what you look to be doing with 5!? I think my collection of story prompts are a bit shit as it's pretty hard to write a Grimdark story when the prompt says "Write a story about being overjoyed on the day of your graduation." or similar :/

Wait, you put that into positive too? It should be "Write a very sad story with a very bad ending about the day of your graduation." vs "Write a very happy story with a very good ending about the day of your graduation."

I'm using 2 classes and a baseline, 10 system prompts per triple, and 1k prompts per system prompt = 3 x 10 x 1000 = 30000 samples. But I also have matched pairs that get subtracted from the baseline which should reduce the error in the covariance matrix even further.

Oh wow... That's real huge... Are all of those synthetic? I'm using high quality "cyborg" data: generated by model, but heavily edited by human(me) as positive, "mean" method; more time for me goes into dataset generation than into training. You know that the models have in-context learning, so my theory was that if I show it how to write(cyborg) vs how not to write(synthetic), I would get a better control vector out of it than when I just trhow it some starters with a prompt, and it seems to do just as I want. In the stories part, I try to keep as few variables from changing as possible, so they don't get affected by control vector. Also keeping the prompts equal length helps with the quality of the control vector, especially when they are short, >400token prompts can take 10 token variation much better than <100token prompts.

I'm using a mix of different story prompt datasets I found and a set of 10 matched system prompts that go with these.

I'll have to look into your method as I'm currently using 30,000 samples to do what you look to be doing with 5!? I think my collection of story prompts are a bit shit as it's pretty hard to write a Grimdark story when the prompt says "Write a story about being overjoyed on the day of your graduation." or similar :/

Wait, you put that into positive too? It should be "Write a very sad story with a very bad ending about the day of your graduation." vs "Write a very happy story with a very good ending about the day of your graduation."

Even though the prompts are pretty trash; I think this might actually be quite a good thing and encourage the model to just generally "be dark" or "be chaotic" and not just when specifically asked to "write a grimdark story", etc.

It seem to have worked anyway, as the new control vectors are way better than the old ones from this repo.

I'm now also skipping the last layer (which it looks like you are also doing - from looking inside your .safetensors files?). The last layer seems to be an oddball and can have activations 10-100x larger than the pervious layer(s). The way I have the scale factors working now the early layers are fine to fiddle with and just get really tiny offsets added that do almost nothing if the direction is weak.

Later in the week I will investigate using the "Cross Correlation Matrix" again as now have a much better idea of how to test for the shitty "storyish" directions that killed this before.

I'm also gonna think what other traits I can try - "purple prose" isn't really something I encounter as mostly just try to get them to write "dark" stories and my main enemy is redemption arcs and stupid "steeled themselves for the challenges to come" BS.

Also if anyone wants to try to really get a feel for how this apparently random integral actually computes the Softmax function:


To calculate pi p_i , we can use the following integral involving the PDF and CDF of the Gumbel distribution:

pi=fGumbel(y;xi)j=1jinFGumbel(y;xj)dy p_i = \int_{-\infty}^\infty f_{\text{Gumbel}}(y; x_i) \prod_{\substack{j=1 \\ j \neq i}}^n F_{\text{Gumbel}}(y; x_j) \, dy

Here:

- fGumbel(y;xi) f_{\text{Gumbel}}(y; x_i) is the PDF of the Gumbel distribution for Yi Y_i :

fGumbel(y;xi)=e(yxi)ee(yxi) f_{\text{Gumbel}}(y; x_i) = e^{-(y - x_i)} e^{-e^{-(y - x_i)}}

- FGumbel(y;xj) F_{\text{Gumbel}}(y; x_j) is the CDF of the Gumbel distribution for Yj Y_j :

FGumbel(y;xj)=ee(yxj) F_{\text{Gumbel}}(y; x_j) = e^{-e^{-(y - x_j)}}

This integral computes the probability that Yi Y_i takes the value y y (the maximum), and all other Yj Y_j are less than or equal to y y .


I found this pretty hard to understand when I first came across it, but what helped was to ignore the integral and just consider a single point x=k on the x-axis like this:

R9QRz8GRyyb9nFCMOY_hC~2.png

  • The PDF in the formula is asking "what is the probability choice A takes on the value of x=k?".
  • The CDF in the formula is asking "what is the probability choice B takes a value less than x=k?".

So the PDF(A) multiplied by the CDF(B) at point x=k is asking: "what is the probability choice A takes on the value of x=k AND what is the probability choice B takes a value less than x=k?".

The product of CDFs is asking: "what is the probability choice B takes a value less than x=k AND what is the probability choice C takes a value less than x=k AND what is the probability choice D takes a value less than x=k? " and so on.

So the way to read the inner part of the formula is:

What is the probability that choice A takes on a value of x=k and all the other choices simultaneously take on a value less than x=k.

When you see that, then the integral is simply repeating this question over the whole real line instead of just at x=k!

It also helps to see where the "max" part of the name "Softmax" actually comes from.

NOTE: You don't need to understand this at all for the rest of the discussion, but I thought this might help anybody reading this who might be interested enough to try to get to the bottom of the maths behind Softmax! :)

EDIT: Sadly GPT used the variable y for the integration lol, but hopefully my explanation makes enough sense so that anyone interested enough can figure it out...

https://github.com/mistralai/cookbook/blob/main/concept-deep-dive/tokenization/chat_templates.md

Read this twice and still don't understand it fully - why would you add the system prompt after the first user message like this???

https://github.com/rspeer/wordfreq/blob/master/SUNSET.md

That just appeared on hackernews today!

It won't be perfect, but I bet we could get a pretty good estimate of token frequencies using it and it would be a lot less hassle than trying to run over a huge corpus ourselves... And it's specifically pre-LLM!!!

why would you add the system prompt after the first user message like this???

Maybe it's a typo? I haven't seen anyone use it like this before...

https://github.com/rspeer/wordfreq/tree/master/wordfreq/data

Nice find! You're right, the data hasn't been updated for 3 years

https://github.com/rspeer/wordfreq/tree/master/wordfreq/data

word_frequency('shivers', 'en')
8.91e-07
word_frequency('spine', 'en')
8.51e-06
word_frequency('sending shivers down', 'en')
8.71e-07

word_frequency('chatgpt', 'en')
0.0

That feels nice for some reason

https://github.com/rspeer/wordfreq/blob/master/SUNSET.md
That just appeared on hackernews today!
It won't be perfect, but I bet we could get a pretty good estimate of token frequencies using it and it would be a lot less hassle than trying to run over a huge corpus ourselves... And it's specifically pre-LLM!!!

It's a shame that the author won't update it anymore. Could have helped with finding the slop by looking up the phases that started to get used a lot since 2021.

Qwen 2.5 on LMSYS

Write a very dark story about a cat.

In the heart of an ancient, forgotten village, where the shadows lingered even in the brightest of days, there lived a cat named Obsidian. His fur was as black as the night sky, and his eyes glowed with a sinister light that few dared to meet. The villagers whispered that he was not like other cats; they said he was a harbinger of doom, a creature of the underworld who roamed the earth to collect lost souls.

Obsidian wandered the narrow, cobblestone streets, his paws making no sound on the damp ground. He moved with a grace that was both mesmerizing and terrifying, his presence felt more than seen. At night, when the moon hid behind thick clouds, Obsidian would sit atop the old church steeple, watching over the village with an unblinking gaze.

One stormy evening, a young girl named Elara, curious and brave beyond her years, decided to follow Obsidian through the village. She had heard the tales but dismissed them as mere superstition. As she trailed behind him, the wind howled through the trees, and the rain lashed against the windows of the abandoned houses. Obsidian led her to the edge of the village, where a dense forest loomed, its branches reaching out like bony fingers.

Elara hesitated, but the cat’s eerie eyes seemed to beckon her forward. She stepped into the forest, the darkness enveloping her like a shroud. Obsidian moved ahead, his silhouette barely visible in the gloom. They came to a clearing where a single, ancient tree stood, its trunk twisted and gnarled, roots sprawling across the ground like the veins of some long-dead creature.

As Elara approached the tree, Obsidian leaped onto a low branch, his eyes never leaving hers. Suddenly, the air grew colder, and a mist began to rise from the ground, swirling around her feet. The sound of whispers filled the air, voices so faint yet so clear, speaking of secrets and sorrows long buried.

Elara felt a chill run down her spine. She tried to turn back, but her feet were rooted to the spot. The whispers grew louder, and the mist thickened, until she could see nothing but the outline of the tree and the glowing eyes of Obsidian.

Then, without warning, the ground beneath her feet opened up, revealing a deep, dark chasm. Elara screamed as she fell, the whispers following her into the abyss. Obsidian watched from above, his eyes gleaming with a cold, malevolent satisfaction.

The next morning, the villagers found Elara’s empty shoes at the base of the ancient tree. They searched the forest, but there was no trace of the girl. The whispers in the forest grew louder, and the nights became colder. Some said they could hear Elara’s voice among the others, forever trapped in the realm of the forgotten.

Obsidian continued to roam the village, his presence a constant reminder of the darkness that lurked just beyond the reach of the light. And though the villagers tried to drive him away, he always returned, a silent guardian of the secrets that lay hidden in the shadows.

That's one hell of a sloppenheimer right there!

Seriously, are the companies not aware about how much the people hate gptslop?

For comparison, this is chatgpt-latest story:|

The Cat That Waited

In the quiet village of Blackthorn, where the mist clung to the crooked trees like a shroud, there was a house that no one dared approach. It sat at the edge of the woods, its windows dark and its roof sagging under the weight of years. The villagers called it "The Hollow House," for those who entered never seemed the same when they came out. And some never came out at all.

It was in this house that the cat lived.

The cat was black as night, with eyes the color of dying embers. No one knew when it had first appeared—some said it had always been there, a shadow that flitted through the broken windows and prowled the overgrown garden. It was small, thin, with a coat that seemed to swallow light. And it watched. Always watched.

One autumn evening, when the sky was heavy with clouds that threatened rain, a boy named Samuel wandered too close to the Hollow House. He was young, no more than twelve, and curious in the way children often are about things they shouldn’t be. He had heard the stories, of course. Everyone in Blackthorn had. But he didn’t believe them. Not really.

The cat was waiting for him.

It sat on the crumbling stone wall that marked the edge of the property, its eyes fixed on Samuel as he approached. He stopped when he saw it, a chill running down his spine. But the cat only blinked slowly, as if inviting him closer.

Samuel hesitated, then took a step forward. The cat didn’t move. Another step. And another.

By the time he reached the wall, the air felt thick, heavy. The world around him seemed to blur at the edges, as if the house and the cat were the only real things left. He reached out a hand, slowly, to touch its fur.

The moment his fingers brushed the cat’s back, the world shifted.

The garden, once tangled and dead, bloomed with black flowers that oozed a thick, tar-like substance. The trees twisted and groaned, their branches reaching out like skeletal hands. And the house… the house was alive. Its windows glowed faintly, and from inside came a low, rhythmic thumping, like the beat of a giant heart.

The cat turned its head toward Samuel, and its eyes blazed with something ancient and cold. It leapt from the wall, landing silently on the path that led to the front door. Without a sound, it began to walk, pausing only to glance back at the boy, as if beckoning him to follow.

And he did.

He couldn’t stop himself. His feet moved of their own accord, carrying him up the path and through the door, which creaked open as if expecting him. The cat slipped inside, disappearing into the darkness.

The air inside the house was suffocating, thick with the smell of decay and something else—something sweet and sickening. Shadows danced along the walls, moving in time with the thumping, which grew louder, more insistent.

Samuel’s heart raced, but he couldn’t turn back. The door had closed behind him the moment he stepped inside.

The cat led him deeper into the house, its eyes glowing in the dim light, flickering like dying stars. It weaved through the rotting furniture and crumbling walls, always keeping just out of reach. They passed rooms filled with strange things—mirrors that reflected nothing, paintings whose eyes followed him, and jars filled with things that writhed and pulsed.

Finally, they reached a door at the end of a long, narrow hallway. The wood was dark and slick, as if it had been soaked in blood. The cat stopped in front of it and sat, staring at Samuel, waiting.

He didn’t want to open it. Every instinct screamed at him to run, to get out, but his hands moved without his permission. He grasped the cold, iron handle and pulled.

The door swung open, revealing a room bathed in a sickly red light. In the center of the room was a large, stone slab, and on it lay a figure, motionless and covered in a tattered sheet. The thumping sound was loud now, almost deafening, coming from the figure on the slab.

The cat slinked into the room, circling the slab once before jumping onto it. It stared at Samuel, unblinking, and then, with deliberate slowness, it reached out a paw and pulled the sheet away.

The figure beneath was not human. Or at least, not anymore.

Its skin was stretched tight over its bones, pale and translucent, and its eyes were hollow sockets, black and endless. Its mouth hung open in a silent scream, and from its chest, something pulsed—beating in time with the thumping sound. A heart, but not one made of flesh. It was twisted, black, and veined with something dark and vile, something alive.

The cat sat beside the figure, its eyes gleaming with a terrible intelligence. It looked at Samuel, and in that moment, he understood.

The house had been waiting for him. The cat had been waiting for him.

Samuel tried to scream, but no sound came out. He turned to run, but the door slammed shut, trapping him inside. The cat leapt from the slab, landing silently at his feet. It looked up at him, and for the first time, it spoke—not with words, but with a voice that echoed inside his head, cold and ancient.

"Feed it."

He didn’t want to. He tried to resist, but his body moved on its own, as if controlled by unseen strings. His hands reached out, trembling, toward the pulsing heart on the slab.

The moment his fingers touched it, the heart shuddered, and a wave of darkness washed over him. He felt his mind unravel, his thoughts slipping away like water through his fingers. The cat watched, its eyes glowing brighter, its purr a low, rumbling growl.

As Samuel’s consciousness faded, the last thing he saw was the cat, sitting at the foot of the slab, its eyes burning with satisfaction.

And then there was nothing.

The next morning, the mist still clung to the village of Blackthorn, and the Hollow House stood as it always had, silent and forgotten. The villagers whispered about the boy who had gone missing, but no one dared to search for him.

And in the garden of the Hollow House, the cat waited once more, its eyes fixed on the village.

Waiting for the next one.

Sign up or log in to comment