We’re looking for a Technical Lead on the Site Reliability team to design, implement, and deliver software and infrastructure solutions to improve the scalability, availability, and efficiency of Pinterest’s services. The SRE team operates the most fundamental layers of Pinterest’s global infrastructure, which handles billions of requests per month.
What You'll Do:
Influence and create new designs, architectures, standards and methods for large-scale distributed systems with a focus on operability
Collaborate with developers in the deployment and scaling of new product features
Perform deep dives into reliability issues, partnering with software and systems engineers across the organization to produce and roll out fixes
Lead and mentor multiple team members in improving efficiency, performance and availability of Pinterest's services (previous management experience a plus)
What We're Looking For:
Proficiency in scripting, Python preferred. Systems languages (Go, C) are a plus
Strong knowledge of Linux/Unix/BSD internals and shell scripting; Production experience with JVM, Python, and Golang runtimes are a plus
Deep knowledge of a configuration management tool (i.e. Puppet, Chef, Ansible, Salt, CFEngine). Experience with containers is a plus
Experience operating in a modern cloud environment such as AWS, GCP, or Azure or large scale data centers
Familiarity with distributed systems including service discovery, pub/sub, search indexing, storage, and caching. We use Zookeeper, Kafka, Elasticsearch, MySQL, Hbase, and Memcache respectively.
Pinterest is full of possibilities to design your life. Discover recipes, style inspiration, projects for your home and other ideas to try.