Software Reliability Engineer, Databases
Airbnb, San Francisco, United States
Who wouldn't want to work here?
Reliability Engineers (REs) are responsible for the overall performance and reliability of Airbnb's infrastructure and products. We believe that every engineering team at Airbnb is responsible for running and operating the software that they build. REs provide support to product and infrastructure teams by providing tools, processes and expertise to make services easy to operate and as reliable as possible. RE at Airbnb is a critical role to ensure that our services are properly instrumented and able to scale with our growing business.
What makes Reliability Engineering different at Airbnb?
- We emphasize building tools over manual processes. We create, not operate. Things should go from repeatable to automated quickly.
- We're rooted in open source (http://airbnb.io/) and give as much back to the community as possible with both new and contributions to existing projects.
- Our job is to focus on building reliable infrastructure and tools for our product teams so that they can focus on solving user problems and new features, not reinventing platforms.
- SREs don't sit on the other side of the tossing fence -- we're a first class engineering citizen and help lead our infrastructure focus
What are some examples of Reliability Engineering work at Airbnb?
- Drive service reliability by developing tooling that enables metric visibility using SLIs, SLOs, and SLAs.
- Developing Production Readiness standards to ensure service reliability
- Work with product engineering teams on design and implementation choices of large scale distributed systems
- Automate as much as humanly possible and always configure as code
- Bring ideas to life (i.e. production) to help make the lives of engineers better
- Predict our future failures and work proactively to mitigate them
- Advocate and implement reliable design patterns (circuit breakers, graceful degradation, etc.)
- Partner with the broader Airbnb organization to learn from incidents through a blameless postmortem process
Some examples of Database Reliability projects are:
- Working on our next generation database architecture supporting a data localized, multi-region setup
- Assessing database capacity and working with Airbnb developers to proactively address scaling challenges
- Developing tooling for database operations including upgrades, patches and schema migrations
- Building monitoring systems to detect query workload regressions and deliver performance insights
The following experience is relevant to us:
- 4+ years of industry experience
- Experience operating and maintaining database clusters (MySQL, Vitess, etc)
- The knack for writing, clean, readable, maintainable code
- An eye for automation and instrumentation
- The ability to decompose complex systems and find failure scenarios
- Great communication skills
- Knowledge of public cloud platforms (AWS, Google Cloud Platform, etc)
Founded in August of 2008 and based in San Francisco, California, Airbnb is a trusted community marketplace for people to list, discover, and book unique accommodations around the world — online or from a mobile phone. Whether an apartment for a night, a castle for a week, or a villa for a month, Airbnb connects people to unique travel experiences, at any price point, in more than 33,000 cities and 192 countries.