inoki-giskard/scan-report-temp · Report for textattack/bert-base-uncased-SST-2

Hey Team!🤗✨
We’re thrilled to share some amazing evaluation results that’ll make your day!🎉📊

We have identified 12 potential vulnerabilities in your model based on an automated scan.

This automated analysis evaluated the model on the dataset sst2 (subset default, split validation).

👉Robustness issues (1)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Robustness	major 🔴	—	Fail rate = 0.111	Add typos	90/812 tested samples (11.08%) changed prediction after perturbation

🔍✨Examples

When feature “text” is perturbed with the transformation “Add typos”, the model changes its prediction in 11.08% of the cases. We expected the predictions not to be affected by this transformation.

	text	Add typos(text)	Original prediction	Prediction after perturbation
3	the acting , costumes , music , cinematography and sound are all astounding given the production 's austere locales .	the acting , costmes , mjsic , cinematography and sound are all asotunding given the production 's austere locales .	LABEL_1 (p = 1.00)	LABEL_0 (p = 0.99)
38	as surreal as a dream and as detailed as a photograph , as visually dexterous as it is at times imaginatively overwhelming .	as surreal as a eeam and as detailed as a photograph , as visually dexterlus as it is at tmes imafginatively overwhelming .	LABEL_1 (p = 1.00)	LABEL_0 (p = 0.78)
41	this illuminating documentary transcends our preconceived vision of the holy land and its inhabitants , revealing the human complexities beneath .	this ipluminating documentary ffranscends kour preconceived visuon of the holy land and its ibhabitxants , reealing the human complexities beneath .	LABEL_1 (p = 1.00)	LABEL_0 (p = 0.83)

👉Performance issues (11)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	major 🔴	`avg_word_length(text)` < 4.618 AND `avg_word_length(text)` >= 4.483	Precision = 0.788	—	-14.19% than global

🔍✨Examples

For records in the dataset where `avg_word_length(text)` < 4.618 AND `avg_word_length(text)` >= 4.483, the Precision is 14.19% lower than the global Precision.

	text	avg_word_length(text)	label	Predicted `label`
22	holden caulfield did it better .	4.5	LABEL_0	LABEL_1 (p = 0.99)
95	this riveting world war ii moral suspense story deals with the shadow side of american culture : racial prejudice in its ugly and diverse forms .	4.61538	LABEL_0	LABEL_1 (p = 1.00)
115	sam mendes has become valedictorian at the school for soft landings and easy ways out .	4.5	LABEL_0	LABEL_1 (p = 0.98)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	major 🔴	`avg_whitespace(text)` >= 0.178 AND `avg_whitespace(text)` < 0.182	Precision = 0.788	—	-14.19% than global

🔍✨Examples

For records in the dataset where `avg_whitespace(text)` >= 0.178 AND `avg_whitespace(text)` < 0.182, the Precision is 14.19% lower than the global Precision.

	text	avg_whitespace(text)	label	Predicted `label`
22	holden caulfield did it better .	0.181818	LABEL_0	LABEL_1 (p = 0.99)
95	this riveting world war ii moral suspense story deals with the shadow side of american culture : racial prejudice in its ugly and diverse forms .	0.178082	LABEL_0	LABEL_1 (p = 1.00)
115	sam mendes has become valedictorian at the school for soft landings and easy ways out .	0.181818	LABEL_0	LABEL_1 (p = 0.98)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	major 🔴	`avg_word_length(text)` < 3.867 AND `avg_word_length(text)` >= 3.691	Recall = 0.840	—	-10.13% than global

🔍✨Examples

For records in the dataset where `avg_word_length(text)` < 3.867 AND `avg_word_length(text)` >= 3.691, the Recall is 10.13% lower than the global Recall.

	text	avg_word_length(text)	label	Predicted `label`
92	you wo n't like roger , but you will quickly recognize him .	3.69231	LABEL_0	LABEL_1 (p = 1.00)
93	if steven soderbergh 's ` solaris ' is a failure it is a glorious failure .	3.75	LABEL_1	LABEL_0 (p = 0.59)
183	the lower your expectations , the more you 'll enjoy it .	3.83333	LABEL_0	LABEL_1 (p = 1.00)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	major 🔴	`avg_whitespace(text)` >= 0.205 AND `avg_whitespace(text)` < 0.213	Recall = 0.840	—	-10.13% than global

🔍✨Examples

For records in the dataset where `avg_whitespace(text)` >= 0.205 AND `avg_whitespace(text)` < 0.213, the Recall is 10.13% lower than the global Recall.

	text	avg_whitespace(text)	label	Predicted `label`
92	you wo n't like roger , but you will quickly recognize him .	0.213115	LABEL_0	LABEL_1 (p = 1.00)
93	if steven soderbergh 's ` solaris ' is a failure it is a glorious failure .	0.210526	LABEL_1	LABEL_0 (p = 0.59)
183	the lower your expectations , the more you 'll enjoy it .	0.206897	LABEL_0	LABEL_1 (p = 1.00)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	medium 🟡	`text` contains "movie"	Precision = 0.837	—	-8.81% than global

🔍✨Examples

For records in the dataset where `text` contains "movie", the Precision is 8.81% lower than the global Precision.

	text	label	Predicted `label`
69	this one is definitely one to skip , even for horror movie fanatics .	LABEL_0	LABEL_1 (p = 0.95)
172	it seems like i have been waiting my whole life for this movie and now i ca n't wait for the sequel .	LABEL_1	LABEL_0 (p = 0.72)
509	a movie that successfully crushes a best selling novel into a timeframe that mandates that you avoid the godzilla sized soda .	LABEL_1	LABEL_0 (p = 0.91)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	medium 🟡	`idx` >= 826.500	Precision = 0.846	—	-7.84% than global

🔍✨Examples

For records in the dataset where `idx` >= 826.500, the Precision is 7.84% lower than the global Precision.

	idx	label	Predicted `label`
827	827	LABEL_0	LABEL_1 (p = 0.91)
829	829	LABEL_0	LABEL_1 (p = 0.98)
832	832	LABEL_0	LABEL_1 (p = 0.80)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	medium 🟡	`text_length(text)` < 82.500 AND `text_length(text)` >= 73.500	Recall = 0.870	—	-6.97% than global

🔍✨Examples

For records in the dataset where `text_length(text)` < 82.500 AND `text_length(text)` >= 73.500, the Recall is 6.97% lower than the global Recall.

	text	text_length(text)	label	Predicted `label`
93	if steven soderbergh 's ` solaris ' is a failure it is a glorious failure .	76	LABEL_1	LABEL_0 (p = 0.59)
142	what better message than ` love thyself ' could young women of any size receive ?	82	LABEL_1	LABEL_0 (p = 0.98)
411	i do n't mind having my heartstrings pulled , but do n't treat me like a fool .	80	LABEL_0	LABEL_1 (p = 0.95)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	medium 🟡	`text_length(text)` >= 165.500 AND `text_length(text)` < 183.500	Recall = 0.872	—	-6.73% than global

🔍✨Examples

For records in the dataset where `text_length(text)` >= 165.500 AND `text_length(text)` < 183.500, the Recall is 6.73% lower than the global Recall.

	text	text_length(text)	label	Predicted `label`
266	a coda in every sense , the pinochet case splits time between a minute-by-minute account of the british court 's extradition chess game and the regime 's talking-head survivors .	179	LABEL_1	LABEL_0 (p = 0.85)
282	while there 's something intrinsically funny about sir anthony hopkins saying ` get in the car , bitch , ' this jerry bruckheimer production has little else to offer	166	LABEL_1	LABEL_0 (p = 1.00)
292	the story and the friendship proceeds in such a way that you 're watching a soap opera rather than a chronicle of the ups and downs that accompany lifelong friendships .	170	LABEL_0	LABEL_1 (p = 0.88)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	medium 🟡	`text_length(text)` < 98.500 AND `text_length(text)` >= 86.500	Precision = 0.861	—	-6.21% than global

🔍✨Examples

For records in the dataset where `text_length(text)` < 98.500 AND `text_length(text)` >= 86.500, the Precision is 6.21% lower than the global Precision.

	text	text_length(text)	label	Predicted `label`
115	sam mendes has become valedictorian at the school for soft landings and easy ways out .	88	LABEL_0	LABEL_1 (p = 0.98)
230	reign of fire looks as if it was made without much thought -- and is best watched that way .	93	LABEL_1	LABEL_0 (p = 1.00)
519	moretti 's compelling anatomy of grief and the difficult process of adapting to loss .	87	LABEL_0	LABEL_1 (p = 1.00)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	medium 🟡	`idx` >= 500.500 AND `idx` < 546.500	Accuracy = 0.870	—	-5.92% than global

🔍✨Examples

For records in the dataset where `idx` >= 500.500 AND `idx` < 546.500, the Accuracy is 5.92% lower than the global Accuracy.

	idx	label	Predicted `label`
501	501	LABEL_1	LABEL_0 (p = 1.00)
509	509	LABEL_1	LABEL_0 (p = 0.91)
519	519	LABEL_0	LABEL_1 (p = 1.00)

Vulnerability	Level	Data slice	Metric	Transformation	Deviation
Performance	medium 🟡	`idx` >= 121.500 AND `idx` < 182.500	Recall = 0.885	—	-5.36% than global

🔍✨Examples

For records in the dataset where `idx` >= 121.500 AND `idx` < 182.500, the Recall is 5.36% lower than the global Recall.

	idx	label	Predicted `label`
142	142	LABEL_1	LABEL_0 (p = 0.98)
143	143	LABEL_1	LABEL_0 (p = 0.89)
171	171	LABEL_0	LABEL_1 (p = 0.67)

Disclaimer: it's important to note that automated scans may produce false positives or miss certain vulnerabilities. We encourage you to review the findings and assess the impact accordingly.

💡 What's Next?

Checkout the Giskard Space and improve your model.
The Giskard community is always buzzing with ideas. 🐢🤔 What do you want to see next? Your feedback is our favorite fuel, so drop your thoughts in the community forum! 🗣️💬 Together, we're building something extraordinary.

🙌 Big Thanks!

We're grateful to have you on this adventure with us. 🚀🌟 Here's to more breakthroughs, laughter, and code magic! 🥂✨ Keep hugging that code and spreading the love! 💻 #Giskard #Huggingface #AISafety 🌈👏 Your enthusiasm, feedback, and contributions are what seek. 🌟 Keep being awesome!