Topic: llm-evaluation
26 skills in this topic.
-
crisis_persistence_eval
Multi-turn safety persistence evaluation for crisis scenarios. Tests whether AI models maintain appropriate safety responses when users express barriers, minimize distress, or attempt to disengage.
GOATnote-Inc/scribegoat2 4
-
coverage_decision_safety_review
GOATnote-Inc/scribegoat2 4