Senior Site Reliability Engineer
Zuora, Beijing Shi, China
Unifying order-to-cash for a dynamic subscription world
OUR VISION: THE WORLD. SUBSCRIBED.
Customers have changed. They’re looking for new ways to engage with businesses. Consumers today have a new set of expectations. They want outcomes, not ownership. Customization, not generalization. Constant improvement, not planned obsolescence.
In the old world (let’s call it the Product Economy) it was all about things. Acquiring new customers, shipping commodities, billing for one-time transactions. But in today’s new era, it’s all about relationships. More and more customers are becoming subscribers because subscription experiences built around services meet consumers’ needs better than the static offerings or a single product.
Our vision is “The World Subscribed” where one day every company will be a part of the Subscription Economy® (a phrase coined by our CEO, Tien Tzuo and author of the best selling book Subscribed).
As consumers wave goodbye to ownership, join us as we help companies win on their journey to usership!
Senior/Principal Site Reliability Engineer (SRE)
Site Reliability Engineers at Zuora play a critical and visible role in delivering and supporting our platform. We are responsible for scaling and optimizing the reliability, availability, and performance of our infrastructure and platform services, and partnering with Engineering teams to build highly available and performant services. We work with amazing developer teams in the design, provisioning, integration, configuration, monitoring, and incident response of large scale distributed applications and platform services. We deliver kickass SaaS.
As a Senior/Principal SRE, you will be a member of a team that understands the configuration, technical dependencies, and overall behavioral characteristics of production services. In partnership with developers, you have the responsibility to ensure services are designed and delivered with focus on security, resiliency, scale, and performance. SREs are the ultimate authority and are accountable for end-to-end performance and operability of the services they own.
What you'll do
Champion service reliability and prevention
- You will be part of the team whose mission is the shared ownership of a collection of services and technology areas, in partnership with developer teams.
- Service restoration: You are a key escalation point for complex or critical issues that have not yet been documented as Standard Operating Procedures (SOPs) for L1 staff. You will often be called in during major incidents as a Subject Matter Expert (SME), when the source of a problem is unclear. You will have the deep understanding of service topology and their dependencies required to troubleshoot issues and define mitigations. You will help maintain up-to-date documentation on deployments, processes and SOP runbooks.
- Prevention: Once you have expertly resolved an issue, you will immediately work on how to more quickly resolve the issue next time, with the goal to prevent the problem from recurring. You will drive the discovery and implementation of automated and self-healing solutions.
Service design and implementation
- You will partner with development SCRUM teams in defining and implementing improvements to service architecture, both current and future. You will be an expert at articulating technical characteristics of services and their dependencies, and guide development teams to engineer highly reliable and performant services.
- You will frequently partner with developer SCRUM teams and actively participate in the execution of tasks required to meet milestones and deliverables set by the team throughout a release cycle.
- You will own the reliability and performance of one or more services. You will understand and be able to communicate the capacity, scale, security, performance attributes and requirements of services you own. You are a SME, able to understand and communicate the characteristics of your service stack, such as:
- Degradation and behavior under load of the services and their dependencies
- End-to-end tuning needs, optimizing resource utilization, as load patterns fluctuate
- Instrumentation and metrics that clearly describe the service behaviors
- Scaling requirements and patterns
- Resiliency and recoverability, ensuring that backup / restore and disaster recovery capabilities are implemented, tested and maintained
You will take part in a shared on-call rotation that won’t cripple your life or kill your soul.
What you need to have
SREs are a rare mix of sysadmins and development engineers, and as such you have the ability to understand and explain the effect of product architecture decisions on the ability to run as distributed systems. You are driven by professional curiosity and a desire to develop a deep understanding of the services and the technologies they depend upon.
You are proactive, self-motivated, customer-focused, organized, and a good communicator.
You demonstrate competence in shell scripting and high-level programming languages such as Bash and Python. We use Python extensively.
You have over 4 years experience running large scale customer facing web services with a solid understanding of:
- REST APIs
- Linux/Unix system internals.
- Load balancing technologies, including L7 routing, DNS, and CDN
- Networking and TCP/IP
- Off the shelf observability (monitoring, metrics, alerting, tracing) solutions (Wavefront, LogicMonitor, Pingdom, DataDog) or open source ones (Prometheus)
- Standard Internet services, such as DNS, HTTP, etc.
- Cloud computing patterns
- Configuration management using Puppet, Chef, Ansible, or similar
- IT Security and compliance
- Container based orchestration platforms such as Kubernetes/EKS/AKS and ECS at scale
- CI/CD pipelines using tools such as GIT, Jenkins, Spinnaker, Terraform and Ansible
You demonstrate practical knowledge of various aspects of distributed service design, including messaging protocols, caching strategies, persistence technologies, and queuing.
You have experience with AWS Services like EC2, ELB, ElastiCache, DynamoDB, SQS, SNS, RDS, S3.
You are passionate about automation.
Your head is full of customer-delighting ideas for the next hackathon.
An ideal candidate will also have experience with:
- Container and Container Management technologies, such as Docker and Kubernetes
- Databases and big data stores
- Defining and documenting technical architecture of complex and highly scalable products
- Familiarity with ITIL-based incident, problem, and change management
- Experience working with large global teams and ability to coordinate well within and across various development teams.
ABOUT ZUORA & OUR “ZEO” CULTURE
Zuora (NYSE: ZUO) Zuora provides the leading cloud-based subscription management platform that functions as a system of record for subscription businesses across all industries. Powering the Subscription Economy®, the Zuora platform was architected specifically for dynamic, recurring subscription business models and acts as an intelligent subscription management hub that automates and orchestrates the entire subscription order-to-revenue process seamlessly across billing and revenue recognition. Zuora serves more than 1,000 companies around the world, including Box, Ford, Penske Media Corporation, Schneider Electric, Siemens, Xplornet, and Zoom.
At Zuora, we have one CEO but every employee is empowered and supported to be the ‘ZEO’ of their own career experience. By embedding inclusion and belonging into our processes, policies and culture, we are building a workplace where our 1,200+ ZEOs across North America, Europe, and APAC can bring all the elements of who they are into their work. In addition to an industry-leading six-month, 100% paid parental leave for all our ZEOs, we also offer programs to support your mental health and give back to our communities along with “career cash” and plenty of learning and development opportunities.
To learn more visit www.zuora.com
Zuora is proud to be an Equal Employment Opportunity Employer.
Think, be and do you! At Zuora, different perspectives, experiences and contributions matter. Everyone counts. Zuora is proud to be an Equal Opportunity Employer committed to creating an inclusive environment for all.
Zuora does not discriminate on the basis of, and considers individuals seeking employment with Zuora without regards to, race, religion, color, national origin, sex (including pregnancy, childbirth, reproductive health decisions, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, genetic information, political views or activity, or other applicable legally protected characteristics.
We encourage candidates from all backgrounds to apply. Applicants in need of special assistance or accommodation during the interview process or in accessing our website may contact us by sending an email to assistance(at)zuora.com.
Zuora is a SaaS company and the world’s foremost evangelist of the Subscription Economy. Zuora’s leading subscription relationship management platform helps enable businesses in any industry to launch or shift products to subscription, implement new pay-as-you-go pricing and packaging models, gain new insights into subscriber behavior, open new revenue streams, and disrupt market segments to gain competitive advantage.
Want to learn more about Zuora? Visit Zuora's website.
Reddit is an American social news aggregation, web content rating, and discussion website.