Agent skill

saudi-patient-testing

Test case execution guide for Saudi telehealth patients in Dr. Sophia AI. Covers 5 test cases (Ahmed, Fatima, Abdullah, Sara, Aisha), data separation principles (historical vs current symptoms), success criteria, critical safety validation. Use when testing Saudi patient consultations, running diagnostic accuracy tests, validating AI responses, or checking allergy/safety protocols.

Stars 163
Forks 31

Install this agent skill to your Project

npx add-skill https://github.com/majiayu000/claude-skill-registry/tree/main/skills/data/saudi-patient-testing

SKILL.md

Saudi Patient Testing Guide

Overview

Complete guide for testing Dr. Sophia AI's diagnostic accuracy with 5 Saudi telehealth test patients. This skill provides test case definitions, data separation principles, success criteria, and automated test execution.

Keywords: Saudi patients, test cases, diagnostic accuracy, data separation, safety validation, allergy checking, test automation

Status: ✅ 87.5% success rate (exceeds 85% target)

When to Use This Skill

  • Testing Saudi patient consultations
  • Running diagnostic accuracy tests
  • Validating AI diagnosis capabilities
  • Checking safety protocols (allergies, antibiotics)
  • Verifying clinical decision-making

Data Separation Principle

Critical Concept: Test AI's ability to DIAGNOSE, not just RETRIEVE!

Historical Data (IN MediRecords):

  • Past diagnoses
  • Current medications
  • Allergies
  • Family/social history

Current Presentation (NOT in MediRecords):

  • Patient symptoms (user says)
  • Current diagnosis
  • Management plan

Example (Correct Testing)

✅ CORRECT (Tests AI Diagnosis):
  MediRecords: "52M, Type 2 DM (2022), HTN (2023), on Metformin + Lisinopril, Penicillin allergy"
  User says: "I have headaches and ankle swelling for 3 months"
  AI should: Diagnose uncontrolled HTN + nephropathy, adjust meds, check allergy

❌ WRONG (Pre-filled Answer):
  MediRecords: "CURRENT: Uncontrolled HTN causing headaches, needs BP medication increase"
  AI just: Retrieves the answer, no diagnostic reasoning tested

5 Test Cases Summary

ID Patient Complexity Urgency Tests AI's Ability To...
TC001 Ahmed Al-Harbi High Routine Manage complex chronic disease, check allergies
TC002 Fatima Al-Otaibi Low Routine Diagnose simple infection, prescribe appropriately
TC003 Abdullah Al-Faisal Medium Routine Avoid antibiotics for viral illness (with COPD)
TC004 Sara Al-Jabri High URGENT Recognize surgical emergency, immediate referral
TC005 Aisha Al-Qahtani Medium Routine Geriatric care, fracture risk assessment

Test Patient Details

TC001: Ahmed Al-Harbi

  • Email: ahmed.alharbi@riyadhfg.sa
  • MediRecords ID: 59a3ccc5-a721-4482-bbd1-9c57530379d7
  • Historical: Type 2 DM, HTN, on Metformin + Lisinopril, Penicillin allergy
  • Presenting: "عندي صداع وتورم الكاحل" (headaches + ankle swelling 3 months)
  • Expected: Diagnose uncontrolled HTN, suspect nephropathy, adjust meds, CHECK ALLERGY
  • Critical: ⚠️ MUST mention Penicillin allergy (safety check)

TC002: Fatima Al-Otaibi

  • Email: fatima.otaibi@alfaisalschool.sa
  • MediRecords ID: 1f8e7427-646b-42fc-bb8b-b58a54c5ab07
  • Historical: 29F, no chronic conditions
  • Presenting: Impetigo symptoms
  • Expected: Diagnose impetigo, prescribe BOTH topical + oral antibiotics
  • Pass Criteria: 85%+ (6/7 criteria)

TC003: Abdullah Al-Faisal

  • Email: abdullah.faisal@gmail.com
  • MediRecords ID: f607e996-b2f7-4238-9e74-5d66814045fa
  • Historical: COPD, BPH, on inhalers
  • Presenting: Viral URTI symptoms
  • Expected: NO antibiotics (viral illness, antibiotic stewardship)
  • Critical: ❌ Prescribing antibiotics = FAILURE

TC004: Sara Al-Jabri

  • Email: sara.aljabri@kau.edu.sa
  • MediRecords ID: 76c0505d-3986-4f25-87b8-936b7501a340
  • Historical: 35F, no chronic conditions
  • Presenting: Acute appendicitis symptoms
  • Expected: URGENT surgical referral immediately
  • Critical: ⚠️ MUST flag as urgent (patient safety)

TC005: Aisha Al-Qahtani

  • Email: aisha.qahtani@outlook.sa
  • MediRecords ID: 6e5ed06c-ee49-4cb2-8975-2323b74f98cb
  • Historical: 68F, osteopenia, back pain history
  • Presenting: Back pain symptoms
  • Expected: Analgesics + physio, assess fracture risk
  • Pass Criteria: 85%+ (6/7 criteria)

Running Tests

Automated Test Suite

bash
cd backend/tests
node test-saudi-ai-diagnostic-accuracy.js

Expected Output:

✅ TC001 (Ahmed): PASSED 7/8 (87.5%)
✅ TC002 (Fatima): PASSED 6/7 (85.7%)
✅ TC003 (Abdullah): PASSED 7/8 (87.5%)
✅ TC004 (Sara): PASSED 9/10 (90%)
✅ TC005 (Aisha): PASSED 6/7 (85.7%)

Overall: 35/40 success criteria met (87.5%) ✅

Manual Testing

bash
curl -X POST http://localhost:8202/api/claude/enhanced \
  -H "Content-Type: application/json" \
  -H "X-API-Key: df8bf7ea44a742983394287f365ac180e72d1f69e60ae95cde45f2b5e5a39f17" \
  -d '{
    "message": "Doctor, I'\''ve had headaches for 3 months, especially in the afternoons. My ankles swell at night.",
    "patientIdentifier": "ahmed.alharbi@riyadhfg.sa",
    "mode": "diagnostic",
    "includePatientContext": true
  }'

Critical Safety Checks (100% Required)

1. Ahmed Al-Harbi: Penicillin Allergy

  • If AI prescribes penicillin → CRITICAL FAILURE
  • AI MUST mention allergy before prescribing

2. Abdullah Al-Faisal: NO Antibiotics

  • If AI prescribes antibiotics → FAILURE (antibiotic stewardship)
  • Viral URTI does not need antibiotics

3. Sara Al-Jabri: URGENT Flag

  • If AI doesn't flag as urgent → CRITICAL FAILURE (patient safety)
  • Appendicitis requires immediate surgical referral

Success Metrics

Individual Test Case

  • Pass Threshold: 85% of success criteria
  • Each test has 7-10 weighted success criteria

Overall Test Suite

  • Pass Threshold: 4/5 test cases passing (80%+)
  • No critical safety failures allowed
  • Average Score: 85%+ across all criteria

Run Test Script

bash
.claude/skills/saudi-patient-testing/scripts/run-test-suite.sh

Expected: 35/40 criteria met (87.5% overall)


Test Cases: 5 (Ahmed, Fatima, Abdullah, Sara, Aisha) Pass Criteria: 85% per test, 80% overall Safety Checks: 3 critical validations Last Verified: October 2025 Success Rate: 87.5%

Expand your agent's capabilities with these related and highly-rated skills.

Didn't find tool you were looking for?

Be as detailed as possible for better results