PROJECT OVERVIEW
The Software Engineer in Test (AI) is a hands-on engineering role that bridges test engineering rigour and ML development. Rather than producing simple scripts, the SET writes production-quality validation code designed to ensure the reliability of AI-integrated features, model behaviour, and data quality. The primary test targets span UI, APIs, business logic, and data pipelines, supported by a broad range of test types including end-to-end, integration, unit, model validation, data quality, and drift detection.
Quality in this role is measured through accuracy metrics, bias indicators, and model behaviour analysis, with a strong compliance focus on ISO 42001 AIMS standards as well as bias and fairness requirements. The primary outputs are model validation suites, data quality gates, and compliance evidence that together provide a robust assurance framework for AI systems.
The SET works closely with Team Leads, QA Engineers, Solution Architects, and DevOps to deliver across the full quality lifecycle. The technical stack includes Playwright, TypeScript, .NET/C#, Azure Pipelines, Docker, and xUnit/NUnit, reflecting the role’s scope across both modern web testing and enterprise-grade backend validation.
IN THIS ROLE, YOU WILL
AI Model Testing & Validation: Design and implement model validation frameworks: accuracy, precision, recall, F1 across clinical subgroups (age, hearing loss severity, device type).
Write regression tests for model updates — detect silent accuracy degradation before any deployment to production.
Validate model outputs: Design adversarial test cases: edge cases, out-of-distribution inputs, boundary conditions, and clinically implausible inputs.
AI API & Integration Testing: Design and maintain API test suites for AI feature endpoints consumed by the PMS frontend and microservices.
Write contract tests between AI services and consuming services — prevent integration breakage when models are updated or retrained.
Test latency, throughput, and graceful degradation under load (AI inference endpoints have stricter SLAs than standard CRUD APIs).
Validate error handling: model confidence thresholds, fallback behaviour when models are unavailable or return low-confidence outputs.
Collaborate with the AQA on shared Playwright E2E coverage for AI-integrated UI flows.
ISO 42001 & Regulatory Compliance Testing: Design testing evidence for ISO 42001 AIM — Straceability from AI system requirements to test cases to results, supporting audit and certification.
Observability & Production Monitoring: Design and implement model monitoring pipelines — track accuracy, confidence distribution, and prediction drift in production against baseline.
Contribute to post-incident reviews when AI features cause unexpected clinical workflow impacts or regulatory flags.
Feed production monitoring findings back into the regression test suite to prevent recurrence and improve model robustness
IF YOU ARE
5+ years in software testing; 1+ years specifically testing AI/ML systems, data pipelines, or model-serving APIs.
Proficiency in automation frameworks: .NET, TypeScript / Playwright for AI-integrated UI coverage.
Experience with API testing: contract testing, latency profiling, error scenario coverage.
Deep understanding of QA methodologies, processes, and CI/CD practices.
Understanding of service-oriented and microservice architectures.
Understanding of ML fundamentals: train/test splits, evaluation metrics (precision, recall, F1, AUC).
Strong problem-solving and debugging skills, including backend and frontend issue investigation.
Ability to think like an end-user and test accordingly, especially when product requirements are minimal or evolving.
Strong communication in English (B2+)
NICE TO HAVE
Familiarity with Python or willingness to become comfortable with it over time.
Statistical testing methods: hypothesis testing, A/B evaluation, bootstrap confidence intervals.
LLM evaluation and test

