Uncubed

Site Reliability Engineer - Machine Learning Systems

Quora, Mountain View, CA

Q&A platform that empowers people to share and grow the world’s knowledge


Our SRE for Machine Learning Systems role at Quora is responsible for building and maintaining our serving infrastructure, optimizing and automating work, developing tools and tests to ensure safe deployment of code and data, and in supporting and advocating for code quality, documentation, maintainability, and other engineering best practices. The SRE for ML Systems focuses on maintaining trained machine learning models, training pipelines and online systems for data prediction.

Quora is a rapidly growing company with an intense focus on solving challenging technical problems. We believe in fostering a culture with strong engineering values and goals as the key to building a great company and product. Our engineers are responsible for all the elements of the system lifecycle, from design to implementation to ongoing support, which helps create strong emphasis on high code quality and maintainability.

Machine Learning Systems at Quora are particularly critical since they power some of the most important parts of our infrastructure and support our goals of providing the best personalized experience to every user. Helping them work smoothly is extremely important for the company.

Responsibilities

  • Maintain and support live services and background pipelines
  • Improve the reliability and efficiency of Quora's core Machine Learning Systems
  • Participate in the design and implementation of next generation systems
  • Work with software engineers to ensure high maintainability and reliability of software
  • Build tools to improve reliability and improve automation of tasks
  • Ensure the security of our systems
  • Be proactive in preventing problems and eliminating recurring issues

Requirements

  • 3 years minimum of SRE experience
  • Overall software engineering experience
  • Excellent C++/Python skills
  • Strong experience with large-scale distributed systems
  • Strong experience with complex interdependent data pipelines
  • Strong ability to effectively debug problems and optimize code
  • Ability to provide thought leadership/advocacy on best practices to ensure high reliability
  • Experience with developing and tracking resources and other metrics
  • Experience with AWS or other public cloud technologies
  • BS/MS in Computer Science (or equivalent)
  • Passion for Quora and its mission
We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

About Quora

We want to democratize access to knowledge of all kinds—from politics to painting, cooking to coding, etymology to experiences—so if someone out there knows something, anyone else can learn it. Our mission is to share and grow the world's knowledge, and we're building a world-class team to help us achieve this mission. 

Want to learn more about Quora? Visit https://www.quora.com/