TypeSafe is a frontier model lab. We build reliable and general AI systems to power economically valuable automation. Our mission is to usher in a new era of Transformative Artificial Intelligence (TAI): technology with the power to drive a societal shift on the scale of the agricultural and industrial revolutions.

While others chase benchmarks and academic puzzles, we’ve been quietly rethinking the LLM stack from first principles — building a new kind of general frontier model designed for real-world reliability, decision-making, and autonomy in production.

We’re a small, fast-moving team from OpenAI, Google Brain and Meta/FAIR, backed by top-tier investors. Since mid-2024, we’ve been engineering the foundation for what comes after the current “state-of-the-art” — a model that actually gets things done.

About the Role

As an ML Engineer, you will own and evolve the core training and infrastructure that powers our frontier models. You’ll sit at the intersection of large-scale distributed systems and cutting-edge ML, building the tooling and workflows that let us train, evaluate, and iterate on models with production-grade reliability.

Our tech stack is primarily Python. We also use TypeScript, Next.js, and Tailwind CSS for frontend, with Kubernetes for orchestration. We empower developers to use any tooling they find helpful for getting their job done, including Claude Code and Cursor.

Day to day, you might:

Design, build, and maintain distributed training infrastructure and job submission systems.

Run and analyze experiments to understand model behavior, diagnose failure modes, and guide the next iteration of training and architecture changes.

Build and refine evaluation pipelines to measure model quality, reliability, and safety across a variety of tasks and benchmarks.

Monitor live and long-running training jobs, building alerting, dashboards, and diagnostics to detect and resolve issues quickly.

Implement and automate hyper-parameter tuning workflows to efficiently explore training configurations and push the performance frontier.

Profile and optimize training performance across the stack.

We are looking for people who:

Care deeply about our mission and are driven to translate it into real-world impact

Have 5+ years of professional software engineering, or equivalent professional expertise

Have 3+ years of experience working in machine learning

Have strong product intuition and enjoy thinking beyond the code to how systems are used in the real world

Maintain high attention to detail and a strong bar for quality

Can describe complex problems succinctly and clearly

Thrive in ambiguous problem spaces and take satisfaction in finding creative solutions

Life at TypeSafe

We’re a small, flat, close-knit team dedicated to real-world impact preparing the world for Transformative AI. Our team works fully in-person in our San Francisco office near Embarcadero station. We love what we do and care about our work a lot.

We strive for excellence and craftsmanship and won't stop until we get there. When the team wins, we all win, and we enjoy collaborating and inspiring each other to grow as a team and as individuals.

We also value emotional honesty, kindness, and bringing your whole self to work. We build machines; we don't try to be machines.

We want you to be able to do the most impactful work of your career at TypeSafe and help define our future as a company.

We provide

- Competitive salary and equity

- 100% covered health insurance

- Daily lunch and dinner

- Visa sponsorships

- 401K plans

Machine Learning Engineer

Job Description