Company Description
Holmes is rethinking QA from the ground up. We're building an autonomous testing platform where AI agents reason about product intent, navigate real UIs, and get sharper with every release, turning QA from a maintenance burden into something that just works. Deep integrations with GitHub, Linear, Jira, and PostHog mean testing happens where your team already works. Backed by €1.1M, we're a small team in Ghent moving fast on one of the hardest problems in dev tools.
About the role
Holmes autonomously tests applications by understanding what a product is supposed to do. No scripts, no manual maintenance. As our Founding AI Engineer, you build the core of that system: a multi-agent architecture that reasons about product intent, navigates real UIs, and gets sharper with every release.
Beyond the AI layer, you'll co-shape the technical architecture and strategic direction of the product. What you build now is what Holmes runs on.
Responsibilities
- Extend and improve our multi-agent system
- Build out the browser automation stack (Stagehand, BrowserUse, Playwright MCP, and others)
- Select and tune LLMs per agent and use case, balancing performance, cost, and latency across providers
- Set up evaluations, benchmarks, and quality metrics
- Co-shape the technical architecture and strategic direction of the product
Requirements
- Deep understanding of how LLMs work: tokenization, context windows, sampling, and tool use
- Proven experience with multiple LLM providers (Claude, GPT, Gemini, Qwen, GLM, Kimi, Gemma) and clear insight into their trade-offs
- Strong command of Python or an equivalent language
- Experience with agent systems, browser automation, or complex LLM pipelines
- Self-directed and pragmatic, you ship without waiting for perfect conditions
Nice to have
- Hands-on experience with Stagehand, BrowserUse, or Playwright
- Background in QA, testing, or developer tools
- Previous experience at an early-stage startup
What you get
- Founding engineer title and real ownership, equity included
- Access to all major LLM providers with budget to experiment freely
- Direct line to the founders and full visibility into product and company decisions
- Competitive salary at a well-funded, fast-moving company
Why Join Us
QA is one of the most painful, most ignored problems in software. Teams spend hours writing tests, maintaining them, and still ship bugs. We think agentic AI is finally good enough to solve this properly, and we're early enough that the technical decisions you make now will define how Holmes works for years.
You'll work on problems that don't have textbook answers yet: which models to trust where, how agents should reason about unfamiliar UIs, how to measure quality when there's no ground truth. The questions are interesting, and the answers aren't obvious.
Small team, fast feedback loops, no legacy to drag along. We're based in Ghent, well-funded, and building toward category leadership
Pay: €8.000,00 - €12.000,00 per month
Benefits:
- Company computer
- Company phone
- Eco vouchers
- Food allowance
- Hospitalization insurance
- Internet reimbursement
- Retirement plan
- Stock options
- Work from home
Work Location: In person