Sr. Software Engineer, Observability

Slack, Remote, Canada

Slack's cloud-based collaboration tools and services are used worldwide.

The Monitoring Infrastructure team in the Observability & Performance group at Slack develops platforms and tools which produce telemetry, provide insights, and improves observability in Slack production services with a focus on performance and reliability. We develop management tools for distributed applications and infrastructure, maintain datasets for performance and system analytics, and build interfaces and backend systems to answer questions and infer behavioral patterns about our users and systems. Our toolset is varied,  we work with open-source observability/monitoring technologies like Elasticsearch and Prometheus, or cloud providers such as AWS, and develop software using a combination of Go, Python, or Java.

As part of the Monitoring Infrastructure team, you will be focused on our log pipelines and work closely with other teams in engineering, product development, and customer experience to provide valuable insights to drive decisions and ensure a positive user experience for our Slack customers. You will also help build and maintain distributed services in an environment that processes millions of data points per second with the ability to self-heal and scale up or down to meet demand. We are an inclusive team with deep empathy for our colleagues and customers.

You can see the team at work here at Monitorama 2018.

Slack has a positive, diverse, and supportive culture—we look for people who are curious, inventive, and work to be a little better every single day. In our work together we aim to be smart, humble, hardworking, and, above all, collaborative. If this sounds like a good fit for you, why not say hello?

About the Role

This is a remote senior engineering position based in the United States or Canada.

What you will be doing

  • Build, maintain, and ensure timely delivery of our high-volume event log pipelines.
  • Create libraries, tools, and automation to help ensure that critical event data gets to the right place.
  • Encouraging a culture of Observability at Slack - help suss out problem areas and consult on improving visibility into our systems.
  • Prototyping tooling interfaces or building new features for engineering use cases.
  • Improving auto-remediation in our logging infrastructure to avoid recurring failures.
  • Teaching engineers or customer experience agents how to use our tools to introspect their systems.
  • Participating in the Monitoring Infrastructure on-call rotation, triaging, and addressing production issues as they arise.

What you should have

  • You are a strong communicator. Explaining complex technical concepts to designers, support, and other engineers is no problem for you.
  • You enjoy helping onboard new team members, mentoring, and teaching others.
  • You live for unit tests, code review, design documentation, debugging, and solving problems.
  • You have a deep curiosity about how things work under the hood.
  • You are motivated by helping others succeed. When things break — and they will — you are eager and able to help fix things. You like thinking of ways to improve efficiency or bring delight to your coworkers.
  • You care about improving the performance of systems through data-informed decisions.
  • You also know that the internet is a scary place and understand security concepts deeply and can put them into action to protect us and our users.



  • Firm grasp of computer science fundamentals: data structures, algorithms, programming languages, distributed systems, and information retrieval.
  • Bachelor's degree in Computer Science, Engineering or related field, or equivalent training, fellowship, or work experience


  • Experience with functional or imperative programming languages -- e.g., PHP, Python, Go, C, or Java (used without frameworks).
  • Experience with creating interfaces, tooling, or automation to help define a path for engineers to self-service.
  • Experience deploying, operating, and debugging server software on Linux at scale

Bonus Points:

  • Passionate about data visualization, graphing, and maximizing signal versus noise.
  • Experience with Elasticsearch, Logstash, and Kibana.
  • Solid competency with Prometheus, OpenTracing, or any other widely-used monitoring/visibility platform.
  • Prior experience with or knowledge of large scale, high volume distributed systems, distributed databases, and data pipelines.
  • Experience with containerization frameworks such as Kubernetes.
  • Experience using deployment automation/configuration management, especially Terraform or Chef.
  • Experience with AWS and other virtualized environments.
  • Experience with message queue services, such as Kafka.

About Slack

Empathy. Courtesy. Playfulness. Craftsmanship. Solidarity — these are some of the values we live by, as a company. We work by them, too: we’re building a platform and products we believe in — knowing there is real value to be gained from helping people, wherever they are, simplify whatever it is that they do and bring more of themselves to their work.

We’re building a strong, diverse team of curious, creative people who want to find a purpose in their work and support each other in the process. We work hard and we play to win… within normal business hours. And then we go home.

That balance is important: It enables us to truly do the best work of our lives. As a result, we create a place where all kinds of work happens — and happens well — all while working alongside people we respect and admire.

Want to learn more about Slack? Visit Slack's website.