Join us in building some of the largest AI supercomputing clusters in the world! On the infrastructure team, you will manage and scale the company's supercomputers (powered by Kubernetes), build our research platform, and work on cross-functional projects to accelerate progress at the cutting-edge of AI research. This work involves a wide range of tasks, from writing high performance Imagenet models, to tracking down ARP table overflows in a fleet with thousands of servers. See our recent blog post (https://blog.openai.com/scaling-kubernetes-to-2500-nodes/) to get a sense of what kind of challenges we solve in our day-to-day work.
We look for a track record of the following
Experience, designing, implementing, and running production services
Comfort managing and monitoring large-scale infrastructure deployments
Willingness to debug problems across the stack, such as networking issues, performance problems, or memory leaks
You might be a good fit if you
Know your way around a Unix shell
Are self-directed and enjoy figuring out the most important problem to work on
Own problems end-to-end, and are willing to pick up whatever knowledge you're missing to get the job done
In this role, you will work closely with and directly accelerate researchers, but don't need to become a machine learning expert yourself. We value people who can quickly obtain deep technical understanding of new domains, and enjoy being self-directed and identifying the most important problems to solve. Experience with high-performance computing, or open-source contributions are a bonus.
OpenAI is a non-profit AI research company, discovering and enacting the path to safe
artificial general intelligence.
OpenAI's mission is to build safe AGI, and ensure AGI's benefits are as widely and evenly distributed as possible. We expect AI technologies to be hugely impactful in the short term, but their impact will be outstripped by that of the first AGIs.
We're a non-profit research company. Our full-time staff of 60 researchers and engineers is dedicated to working towards our mission regardless of the opportunities for selfish gain which arise along the way.
We focus on long-term research, working on problems that require us to make fundamental advances in AI capabilities. By being at the forefront of the field, we can influence the conditions under which AGI is created. As Alan Kay said, "The best way to predict the future is to invent it."
We publish at top machine learning conferences, open-source software tools for accelerating AI research, and release blog posts to communicate our research. We will not keep information private for private benefit, but in the long term, we expect to create formal processes for keeping technologies private when there are safety concerns.