- Collaborate with the Executive Team, Product Management, Architects, and existing engineering teams to design, develop, and publish software, processes, and workflows supporting a highly available, fault-tolerant SaaS platform.
- Maintain and actively harden infrastructure shared among multiple development teams.
- Scale-out and manage 2000+ hosts across multiple AWS services using Chef and Terraform.
- Participate in service-level monitoring, metrics gathering, and an on-call rotation using Datadog, Sensu, and PagerDuty.
- Work across the company to identify and implement new ideas and mature existing processes.
- Actively solve problems using modern open source technologies and techniques.
- One or more years of experience running Linux-based systems in a customer-facing production environment.
- Demonstrable experience with CI/CD tools (Jenkins preferred).
- Experience with a systems management framework (Chef preferred).
- Experience with an infrastructure management framework (Terraform preferred).
- The ability to debug complicated issues with others in a group setting.
- Comfort learning new tools and technologies to serve new purposes.
- Excellent verbal and written communications skills.
Bonus Points For:
- Experience working with Kubernetes, Mesos, ECS, or a similar container orchestration tool.
- Administration of technologies like MySQL, Redis, ElasticSearch or Resque.
- Monitoring and tuning a production system based on event-driven job scheduling.
- Sizing hardware and measuring the resulting performance.
We're already intrigued, but would love experience with:
About CloudHealth Technologies
CloudHealth, the recognized worldwide leader in the growing Cloud Service Management industry, provides integrated reporting, recommendations and active policy management to help companies control the problems associated with “cloud chaos.” Our comprehensive platform gives enterprise companies and MSPs the ability to visualize, optimize and govern their cloud and hybrid environments. By providing analysis and deep insight into historical trends, capacity planning, resource optimization and resource automation, CloudHealth enables stakeholders ranging from C-level executives to engineers, cloud specialists, architects, IT directors and LOB managers to improve performance and drive value through their cloud ecosystems. Well-known organizations that rely on CloudHealth’s capabilities and expertise include Amtrak, Dow Jones, Acquia, and Sumo Logic, among others. Based in Boston, the company is backed by Kleiner Perkins, Meritech Capital Partners, Sapphire Ventures, Scale Venture Partners, .406 Ventures and Sigma Prime Ventures.