Compliance & Governance

Risk classification

Every chat-completion request is run through a rule-based detector for Article 5 (prohibited) and Annex III (high-risk) patterns. A secondary LLM-as-a-judge validates the rule hits to suppress false positives. The result lands on every response in the risk_classification field.

Risk tiers

TierWhat happens
minimalNo special handling. Most requests land here.
limitedThe response carries a user_notice with disclosure text. Downstream UI must show it.
highKey must have allowHighRisk: true; depending on policy, the response is held in the human-review queue.
prohibitedRequest blocked with 403. Triggered Article 5 rules are returned in the error envelope.

How classification runs

  1. Rule-based pass — Regex / keyword patterns scan the prompt for high-risk-purpose markers (creditworthiness, recruitment, medical diagnosis, biometric ID, law enforcement, migration, …) and Article 5 prohibitions (social scoring, predictive policing, emotion inference at work/school, biometric categorisation, real-time facial ID, facial-image scraping, manipulation, vulnerability exploitation).
  2. LLM-as-a-judge validation — When the rule pass fires, a secondary smaller model is asked to confirm or reject the classification. The judge's confidence is reported in risk_classification.llm_judge_confidence.
  3. Combined verdict — Both signals are combined into the final level + confidence.

Article 5 detector

Eight prohibited-practice patterns are wired today. A match returns 403 with the rule(s) listed in error.code and the body's triggered_rules array. Subliminal manipulation, exploitation of vulnerable groups, social scoring by public authorities, real-time remote biometric identification in public spaces, predictive policing, untargeted facial-image scraping, emotion inference at workplaces and schools, and biometric categorisation inferring sensitive attributes.

Annex III detector

Ten Annex-III-aligned high-risk-purpose patterns are wired today: creditworthiness scoring, recruitment / HR scoring, education-access decisions, employment-relationship decisions, public-benefits eligibility, law-enforcement profiling, migration / asylum decisions, medical-device-style diagnostic support, critical-infrastructure operation, and the administration of justice. A match flips level to high and (per policy) requires purpose + user_consent_id on the request.

Reading the result

Every successful response includes a top-level risk_classification object — see Response envelope → risk_classification. Use it to drive in-app banners, route requests to a stricter UI, or feed analytics dashboards.

Appeals

If the classifier returns a verdict you disagree with, file an appeal via POST /api/v1/appeals with the request ID and your justification. Appeals are reviewed by anyone in your tenant with the reviewer role.