Software Engineer, Systems Observability

Airbnb, San Francisco, California

Who wouldn't want to work here?

As Airbnb's infrastructure continues to scale from a monolithic stack to a highly concurrent and distributed stack, we face increasing visibility challenges into the health and performance of our systems proportional to the complexity of our infrastructure. Basic metric monitoring tools will no longer be able to capture the full depth of the increasingly unpredictable and complex interactions between services. The maturity of the monitoring and introspection tools available to our engineers will determine our ability to correctly anticipate performance bottlenecks, identify anomalous system interactions, and properly diagnose the root cause of incidents, underscoring the overall productivity of our engineering team.

The Observability team’s mission is to engineer the observability tools Airbnb engineers need to be successful in a highly distributed modern architecture. We are building a unified platform for instrumenting, processing, storing and presenting the state of our systems as metrics, traces, profiles, or call graphs. Our engineers should be able to seamlessly switch between opinionated aggregate views that help identify N+1s or performance regressions and detailed trace or profile views for closer root cause analysis. With stream processors we are correlating exceptions to deploys for automated rollbacks and hope to be able to generally surface correlated anomalous metrics to our engineers in the future.

We formed the Observability team in early 2017 to build an observability infrastructure that matches our scale and are already processing many billions of data points per day. We rely heavily on open source technology and standards but are not shy to research new tracing architectures or stream processing techniques. The team focuses on both the backend collection of data and custom interfaces and tools that unlock the deeper relations. In addition to building a monitoring system more robust than the production system that we are monitoring, we must also work closely with other infra and product teams to anticipate the modern technologies being adopted throughout our engineering team--such as GraphQL, React Native, HTML streaming, and Single Page Apps--each posing new observability challenges calling for unique instrumentations and data cubes.

We are looking for new teammates who have 2+ years industry experience in and/or similarly interested:

  • Elastic Stack (Elasticsearch, Logstash, Kibana)
  • Stream processing (Flink)
  • Tracing (OpenTracing, LTTnG, Chrome DevTools, Zipkin)
  • Profiling (ruby-prof, perf)
  • High-performance, column-oriented, distributed data store (Druid)
  • Event relay (Kafka)
  • Automated correlation and anomaly detection
  • Data visualization (dynamic dashboards, call graphs, flame graphs)
  • Site reliability engineering
  • Site performance tracking and management
  • Building robust distributed systems that must fail independently of our production system
  • Building high-leverage tools for engineers where engineers are our customers


  • Stock
  • Competitive salaries
  • Quarterly employee travel coupon
  • Paid time off
  • Medical, dental, & vision insurance
  • Life insurance and disability benefits
  • Fitness Discounts
  • 401K
  • Flexible Spending Accounts
  • Apple equipment
  • Commuter Subsidies
  • Community Involvement (4 hours per month to give back to the community)
  • Company sponsored tech talks and happy hours
  • Much more...

About Airbnb

Founded in August of 2008 and based in San Francisco, California, Airbnb is a trusted community marketplace for people to list, discover, and book unique accommodations around the world — online or from a mobile phone. Whether an apartment for a night, a castle for a week, or a villa for a month, Airbnb connects people to unique travel experiences, at any price point, in more than 33,000 cities and 192 countries. 

Want to learn more about Airbnb? Visit https://www.airbnb.com/