About the department
As part of the Cloudflare Engineering organization, SREs are primarily responsible for production reliability. SREs are based in San Francisco, London, Singapore, Austin and Lisbon and use the global distribution to enable follow the sun coverage which allows work to be focused in business hours in each location.
SREs are supported by all engineering teams at Cloudflare who participate in on call schedules for their services. The SRE teams facilitate remediation and follow up of production issues and mature the tooling to enable all engineering teams to self-service on production. Incident follow up work across all engineering teams is prioritized above product innovation and the impact of production incidents influences the priority.
Currently SREs support two main environments: Edge SRE are focused on edge distribution where most client traffic is served. Core SRE are focused on the core services like control plane, data pipeline and other supporting supporting services
Edge SRE project work is organized in four development areas: Platform Engineering, Production Tooling, Hardware Lifecycle and Observability.
Who you are
- You have 5+ years of software engineering, reliability, or operations experience in a customer-focused environment.
- You have 2+ years experience managing a team of 5 or more engineers on projects in the areas of: distributed systems, tooling, Linux, Internetworking, infrastructure security or infrastructure management
- You are comfortable collaborating and co-ordinating on cross-team projects and workflows
- You can provide a strong technical vision for systems and infrastructure teams
- You have experience building services and systems, have successfully taken projects from inception to production, and are comfortable diving in to provide leadership for major projects when needed
- You are capable of leading a discussion with upper management, and are able to tailor the level of technical detail to suit your audience
What you'll do
We are looking for an Engineering Manager to join the Edge SRE team in Austin. You will lead and develop a team of SREs that are responsible for Cloudflare edge production and building the tools for all teams to understand and interact with it. You will play a lead role in driving our Observability initiatives for edge services and will be tasked with leading engineers who build tools and best practices for engineering teams to debug in production, measure availability and performance indicators, track and report on thresholds.
- Lead a team of engineers who are working to keep the Cloudflare edge reliable and scalable
- Mentor, grow, and empower your team by giving them the skills, confidence and motivation to make decisions
- Help the individuals on your team to build and execute personal development plans that align with Cloudflare’s goals and objectives
- Take an active role in prioritizing the roadmap for the SRE Org
- Drive cross-team and cross-org alignment in engineering, infrastructure and product teams
- Partner with other Engineering Managers across Cloudflare to achieve reliability outcomes for their services
- Participate in deep technical design discussions within your team, and across partner teams, and ensure that we're building the right systems and keeping the quality high
Examples of desirable skills, knowledge and experience
- Hands-on experience with software or reliability engineering
- Experience leading and hiring a team that builds and runs tools and platforms
- Excel at planning and overseeing execution to meet commitments and deliver with predictability
- Observability: Tracking and refining key customer
- Incident root cause analysis and follow-ups
- Incident management
- Comfortable managing teams/projections with deadlines and short release cycles
- Experience using observability tools such as Jaeger, OpenTracing, ELK, Prometheus, Thanos, Grafana, Clickhouse
- Experience running and maturing distributed systems
- Familiarity working with Proxies, DNS, Databases, Internet and Security
- Experience developing tools and APIs
What Makes Cloudflare Special?
We’re not just a highly ambitious, large-scale technology company. We’re a highly ambitious, large-scale technology company with a soul. Fundamental to our mission to help build a better Internet is protecting the free and open Internet.
Project Galileo: We equip politically and artistically important organizations and journalists with powerful tools to defend themselves against attacks that would otherwise censor their work, technology already used by Cloudflare’s enterprise customers--at no cost.
Athenian Project: We created Athenian Project to ensure that state and local governments have the highest level of protection and reliability for free, so that their constituents have access to election information and voter registration.
Path Forward Partnership: Since 2016, we have partnered with Path Forward, a nonprofit organization, to create 16-week positions for mid-career professionals who want to get back to the workplace after taking time off to care for a child, parent, or loved one.
Sound like something you’d like to be a part of? We’d love to hear from you!
This position may require access to information protected under U.S. export control laws, including the U.S. Export Administration Regulations. Please note that any offer of employment may be conditioned on your authorization to receive software or technology controlled under these U.S. export laws without sponsorship for an export license.
Cloudflare is proud to be an equal opportunity employer. We are committed to providing equal employment opportunity for all people and place great value in both diversity and inclusiveness. All qualified applicants will be considered for employment without regard to their, or any other person's, perceived or actual race, color, religion, sex, gender, gender identity, gender expression, sexual orientation, national origin, ancestry, citizenship, age, physical or mental disability, medical condition, family care status, or any other basis protected by law. We are an AA/Veterans/Disabled Employer.
Cloudflare provides reasonable accommodations to qualified individuals with disabilities. Please tell us if you require a reasonable accommodation to apply for a job. Examples of reasonable accommodations include, but are not limited to, changing the application process, providing documents in an alternate format, using a sign language interpreter, or using specialized equipment. If you require a reasonable accommodation to apply for a job, please contact us via e-mail at [email protected] or via mail at 101 Townsend St. San Francisco, CA 94107.
Cloudflare is the simplest way to make websites faster, safer and smarter. Millions of websites have signed up for our service, including large enterprises, major consumer destinations, and government agencies. With offices in San Francisco and London, Cloudflare operates a highly-available global network that has security measures built into every layer and regularly clocks in lightning-fast speeds.
We're on a mission to build a better web - and we need smart, talented people to join our team. Our team works on the forefront of leading technologies including nginx, Go and Lua programming languages. We're a strong supporter of the open source community and regularly share our technology learnings at https://blog.cloudflare.com.
Want to learn more about Cloudflare? Visit Cloudflare's website.
Slack's cloud-based collaboration tools and services are used worldwide.