Uncubed

Senior Software Engineer - Container Platform

Netflix, Los Gatos, California

Leading subscription service for watching TV episodes and movies


At the heart of Netflix technology is the Cloud Computing platform, which serves as the distributed systems foundation for Netflix application development. We are building a job and resource scheduling engine for container based workloads on top of the public cloud that powers Netflix. This system manages both service and batch jobs across multiple regions of the world. To handle this amount of scale, we launch over 1 million containers per week with thousands of underlying container hosts, and leverage the elastic cloud to optimize efficiency through advanced bin packing and capacity bursting.

We architect our system to be highly available, fault tolerant and distributed from the ground up. We invest deeply in reliability improvements to support our scale and business criticality of container applications. Operational automation, testing, and performance improvement is critical to the success of the container platform. We are looking to expand the team with software developers that can advance not only the functionality of the platform, but also keep a strong focus on the operational challenges around keeping the platform reliable as it continues to scale.

For more information on the Netflix container platform, see our recent techblog post and our most recent public presentation.

What we are building:

  • We extend Linux and container runtimes to provide isolation and deep integration with Amazon EC2 networking and security. We integrate the container execution environment with other critical Netflix infrastructural systems.
  • We provide advanced scheduling across both service and batch jobs (capacity management, bin packing for efficiency, anti-colocation for high availability, cross workload optimization, etc.).
  • We focus on driving a consistent and fault tolerant control plane. We drive all parts of the system to be operationally resilient and capable of world-wide scale in support of all Netflix users.

Skills we are looking for:

  • Passion and demonstrated experience in improving the reliability and operational automation of complex, multi-tier systems. SRE experience is a big plus.
  • Experience beyond usage of container management platforms (Mesos, Swarm and Kubernetes) and container runtimes (Docker and rkt). Specifically, we are looking for developers who have extended and improved these platforms.
  • Experience with addressing performance issues across the whole stack from applications to operating systems.
  • Good understanding of OS fundamentals, Linux internals and shell programming.
  • Experience building business critical large scale system with extreme availability.
  • Ability to program across the core project languages Java and Golang
Netflix offers a unique culture that values freedom and responsibility. You can learn more on our jobs page.

About Netflix

Netflix is the world’s leading Internet television network with over 100 million members in over 190 countries enjoying more than 125 million hours of TV shows and movies per day, including original series, documentaries and feature films. Members can watch as much as they want, anytime, anywhere, on nearly any Internet-connected screen. Members can play, pause and resume watching, all without commercials or commitments.

Want to learn more about Netflix? Visit https://www.netflix.com/