Healthcare

OSCE-Project

Medical dialogue evaluation framework inspired by Objective Structured Clinical Examinations. Simulates 64 patient personas measuring Empathy, Persuasion, and Safety.

Python MIT License Multi-Agent LLM-as-Judge
Healthcare

FHIR-Agent

An LLM agents framework for interacting with FHIR databases, enabling structured clinical data retrieval and reasoning.

Python FHIR Healthcare
Benchmark

OSCE-AgentBeats Leaderboard

Public leaderboard for the OSCE evaluator within the AgentBeats challenge ecosystem. Compare medical agent performance across standardized clinical scenarios.

Python MIT License Benchmark
Security

AgentBeats Security Arena

Adversarial security testing framework for AI agents. Evaluates robustness and safety against prompt injection and manipulation attacks.

Python Security Red-Teaming
Voice AI

OSCE Real-Time Voice

Real-time voice-based clinical examination system extending the OSCE framework with speech interaction for realistic doctor-patient dialogue evaluation.

Python Voice AI Real-Time
Research

Model Training & Distillation

End-to-end LLM fine-tuning and knowledge distillation pipelines for domain-specific medical AI models. Building smaller, faster, and more accurate clinical language models.

Fine-Tuning Distillation LLM Training