Uncubed
           

Senior Software Engineer, Resilience Engineering

Netflix, Los Gatos, California

Leading subscription service for watching TV episodes and movies


The big picture
Netflix has more than 130 million subscribers worldwide. To support such a large subscriber base, we run a large, distributed, and ever-changing system. The Resilience Engineering team’s goal is to make this complex system as resilient as possible, so that our customers enjoy a great experience. The opportunity to impact Netflix and its 125 million customers is huge! If you like scale and global impact, this is an amazing place to be.

How do we make our system more resilient? We find vulnerabilities and risks in our system before they lead to customer-facing outages. To find vulnerabilities, we build Chaos tools that allow us to inject events that we expect the system to handle, and check that the service stays healthy. We are currently leveraging these tools to build a platform for load testing services with production traffic. This platform allows us to better understand the limits of our production systems. Finally, we track patterns of risks and vulnerabilities, which inform us of our biggest availability challenges and help us come up with risk mitigation strategies. You can read more about the practice of Chaos engineering here.

Who you are

  • You are intensely curious about how complex distributed systems operate and fail at scale
  • When you code, you reflect and seek feedback on design choices and trade-offs you make
  • You value engineering excellence and write testable, clear, and re-usable code.
  • You think freely and independently, and are ready to share your view
  • You are humble and eager to learn from mistakes and you socialize the lessons learned
  • You can argue both sides of most disagreements
  • You collaborate well with partner teams

What you’ll do

  • Study the problems in the software resilience space
  • Create new solutions and see them through, from conception to production
  • Write code to support our existing solutions
  • Work with partner teams to find and fix vulnerabilities in their services

Requirements

  • You have built or contributed to a variety of systems, ideally in different technologies
  • You have experience with microservice architectures and understand scaling and concurrency concerns
  • You have strong software design and development skills in modern programming languages

Nice to have

  • Experience with multi-site high availability
  • Experience with Chaos engineering or testing in production
  • Experience creating products for engineers
  • Experience developing tools to improve reliability
  • Experience with internet-scale infrastructure

About Netflix

Netflix is the world’s leading Internet television network with over 100 million members in over 190 countries enjoying more than 125 million hours of TV shows and movies per day, including original series, documentaries and feature films. Members can watch as much as they want, anytime, anywhere, on nearly any Internet-connected screen. Members can play, pause and resume watching, all without commercials or commitments.

Want to learn more about Netflix? Visit Netflix's website.