Site Reliability Engineer (Big Data / Hadoop)
Foursquare, San Francisco, New York
See jobs at Foursquare
About the Role:
At Foursquare, our production systems run on an innovative hybrid cloud-and-coloc installation. We embrace open source and home-grown tools in the belief that what works best, is best. We're looking for a seasoned site reliability engineer to help us grow, automate, and monitor our footprint, in the datacenter and in the cloud.
The Big Data SRE will focus on operation and optimization of our large (7000+ cores, 4 petabytes storage and growing!) Hadoop cluster. You will work closely with the rest of the engineering org, to ensure a stable and scalable platform is available to support our extensive data analytics and machine learning efforts. You will cross train with the rest of the SRE team to share your Hadoop expertise, and to acquire skills relevant to maintaining and scaling the rest of our infrastructure.
You should have a proven track record of writing automation tools, a solid understanding of operating system fundamentals, and familiarity with common production environment services. You should be comfortable running with your own ideas and eager to learn new skills on a bleeding edge platform. We use a variety of tools, technologies, and languages to build software (e.g., Scala, Hadoop, Python, Thrift, MongoDB, Memcached, Redis, Kafka, Chef, Aurora, Mesos, RocksDB, Luigi, Pants, Nginx, Haproxy, Logstash, Grafana), but experience with equivalent ones will do just fine.
- 5+ years of proven industry experience.
- Strong written and verbal communication skills.
- Solid background using Linux and *nix operating systems.
- Experience with deployment automation tools like Ambari, Chef, Puppet or similar systems.
- Familiarity with a breadth of projects in the Hadoop ecosystem, and expert with at least a few of them. We primarily use HDFS, YARN, Hive, MapReduce, Cascading, Scalding, Presto, Spark, PySpark, Jupyter, Zeppelin.
- Familiarity with using and supporting analytics systems like Hive, Redshift, Presto, Athena, Tableau and similar tools.
- Familiarity with performance debugging and tuning at the OS, JVM and cluster (MapReduce, Hive, Spark jobs) levels.
- Bonus points for deploying/operating large-ish Hadoop clusters in AWS/GCP and use of EMR, Terraform, DC/OS, Dataproc.
- Bachelors Degree or higher in Computer Science, Electrical Engineering or related field
About us Foursquare is a technology company that enriches consumer experiences and informs business decisions through a deep understanding of location intelligence. Every month, more than 50 million people use the Foursquare City Guide app, Foursquare Swarm check-in app and websites to discover new places, explore the world and check in. Our community of explorers have left 91 million tips and checked in 12 billion times. Foursquare’s Places API powers location data for Apple, Samsung, Microsoft, Twitter, Uber, Airbnb and 100,000 other developers. Foursquare’s business solutions also include Pinpoint, Attribution, Pilgrim SDK and Foursquare Analytics, which empower brands to understand and connect to targeted audiences as well as measure foot traffic and advertising success. Foursquare has over 250 employees based in New York headquarters and offices in San Francisco, Los Angeles, London, and Singapore. Foursquare is proud to be funded by Union Square Ventures, Andreessen Horowitz, DFJ Growth, Morgan Stanley Alternative Investment Partners and more. U.S. offices New York City, San Francisco, Chicago, Los Angeles International offices London, Singapore
Want to learn more about Foursquare? Visit Foursquare's website.
Jobs You May Like
Senior Software Engineer: Backend (CS Tech)
Postmates, San Francisco
Engineering Manager, Test & Measurement
Lyft, Palo Alto
Application Sales Engineer
Software Engineer - Platform
C3, Redwood City
Salesforce Quality Engineer
ThoughtWorks, New York