AI Evaluation Engineer – Design Real‑World Benchmark Tasks
Gramian Consulting · Kenya
Job description
About the role
Gramian Consultancy is seeking an AI Evaluation Engineer to design and implement realistic, terminal‑based benchmark tasks that assess how AI systems handle complex debugging, operational failures, and multi‑step problem‑solving scenarios. The role is fully remote and can be performed full‑time or part‑time.
Key responsibilities
- Design realistic terminal‑based benchmark tasks for AI evaluation systems.
- Create deep debugging and investigation scenarios that reflect production environments.
- Develop specifications involving infrastructure, pipelines, and operational failures.
- Write clear solution approaches and deterministic evaluation criteria.
- Identify edge cases, failure modes, and system constraints.
- Design multi‑step reasoning challenges across complex technical environments.
- Collaborate with reviewers and researchers to refine benchmark quality and validation logic.
Required profile
- 3‑10 years of experience in software engineering or related technical domains.
- Strong analytical, debugging, and systems‑reasoning abilities.
- Good understanding of system architecture, dependencies, and operational processes.
- Experience with terminal, CLI, automation, or developer‑tooling workflows.
- Exposure to AI systems, large language models, or evaluation frameworks is a plus.
Required skills
- Backend engineering
- Infrastructure
- DevOps
- Data systems
- MLOps
- Cybersecurity
- Platform engineering
- Terminal / CLI
- Automation
- Developer tooling
- AI systems
- Large language models (LLMs)
- Benchmarking
- Evaluation frameworks
Questions fréquentes
Why are you reporting this job?
Apply in 30 seconds
Enter your email to apply. An account will be created automatically.
By continuing, you accept our terms of use.
Already have an account? Login
Published 3 days ago
Expires 1 month from now
10 views · 0 applications
Boost your chances
Upload your CV — we will match you with relevant openings.
Analyzing your CV...
Gramian Consulting
Kenya
Related job offers
-
Technical Account Manager – Network Security
Tana Kenya -
AI Trainer – Swahili
Wing Assistant Kenya -
Application Manager – Insights-2-Action Project
ReliefWeb Kenya -
Servicetechniker (m/w/d)
Robotron Datenbank-Software GmbH Site And Service -
Senior AI Platform Integration Architect – Salesforce Agentforce
In All Media Nzalae/ Nzawa locations