Now Recruiting

Infra Software Engineering Expert

$50–$110/hourRemote (US)ContractProject-based

About Polymath

Polymath is an applied research lab building the simulation environments that the next generation of AI agents will be trained and evaluated in. We partner with leading model labs to push the frontier of long-horizon agent capabilities — environments where agents must plan, use tools, recover from errors, and work autonomously for hours or days at a time.

We recently announced our $8M seed round led by Base10, with Y Combinator and a roster of angels alongside. We're a small team of researchers, engineers, and operators, and we ship.

About the role

You'll design and build reinforcement learning environments and tasks that push frontier coding models past their current limits. Polymath partners with senior practitioners to author the high-fidelity scenarios AI systems train and are evaluated against; in this role, that means standing up realistic software systems — containers, clusters, networks, services — and turning them into well-scoped tasks an agent can attempt, fail at, and learn from. Engagements range from authoring a single environment to longer-running collaborations on task suites and grading rubrics.

The interesting judgment calls here are technical and pedagogical at once. What does a non-trivial Kubernetes debugging task look like when the agent has shell access? How do you instrument a distributed system so success and failure are unambiguous? Where's the line between a task that's tractable and one that's actually hard? You'll be drawing on real systems engineering experience — Docker, Kubernetes, Terraform, networking, infra — to construct problems that look like the work senior engineers actually do.

Responsibilities

Design and build RL environments and tasks that target real software engineering and systems work
Containerize and orchestrate the underlying services, networks, and infrastructure each task depends on
Specify clear success criteria, failure modes, and grading signals for agent attempts
Stress-test your own environments against current frontier coding models and iterate on difficulty
Collaborate with Polymath researchers and other domain experts on task suites and rubric design
Document environments and tasks so other engineers and graders can extend them

Who we're looking for

Have substantial professional experience in software engineering and the full software development lifecycle
Are fluent with containerization, networking, and infrastructure as everyday tools
Have hands-on depth with Docker, Kubernetes, and Terraform
Bring a systems engineering mindset — comfortable reasoning across services, hosts, and network boundaries
Can scope a hard technical problem into a task with crisp inputs, outputs, and grading signals
Are interested in how frontier coding models actually behave, and in building environments that expose their weaknesses

Nice-to-haves

Depth in performance engineering or distributed systems
Background in cybersecurity, including offensive or defensive tooling
DevOps or SRE experience running production systems at scale
ML engineering experience, especially around training or evaluating large models
Operating systems internals expertise

Infra Software Engineering Expert

About Polymath

About the role

Responsibilities

Who we're looking for

Nice-to-haves

Common Questions

Who is the company?

What does the work look like?

How does the interview work?

How much does it pay?

How do I get in touch?