Data Engineer III

Box, Redwood City, CA

A Cloud Content Platform for the Digital Age


Box's Data Engineering is going through a rapid transformation as we continue to scale our business. Hadoop on S3, Redshift, Spark, Kafka and other data related technologies are all at the heart of enabling Box's ability to store and retrieve critical customer data with low latency. The Data Engineering team is responsible for continually pushing the boundaries of our data evolution through developing tools and frameworks to create data products and improve availability and quality, collaborating with engineers across various teams to on-board and support new products.

In this role, you will be discovering opportunities to create data products, deliver quality data to empower our analytics community, provide metadata to facilitate data usage and, automate manual processes to allow self service. As the data volume continues to grow, you'll be developing robust products and pipelines you will be improving the overall throughput of our data platforms. Additionally, in this role you should push to become a better developer, ie. independent reading, attends tech talks, and takes classes, and help the team to improve.

Why the team needs you

The Data Engineering team is comprised of talented and motivated engineers who continually push the boundaries for our team and organization, as a whole. We will be looking for your unrivaled experience and insights on developing on open source data related technologies. Your background in software development and data technologies along with your continued passion to grow and challenge others in the team will be paramount to finding the best possible solutions to the challenges we set out to tackle, as a team.

Why Box needs you

Box is growing fast. Real fast. Every business in the world is looking to modernize the way that they work. As the leader in cloud content management, Box is the only company that can help enterprises transform how people work together. Come help us define a robust way to build, operate, and scale Box's data platforms that power this industry-leading mission!

Why you need Box

You're going to have the unique opportunity to help scale out a rapidly growing data platform built in/for the cloud to power our data driven culture. Like others in this position, you'll be challenged to push existing configurations beyond their current limits through performance testing and tuning. You'll also get to take on opportunities to build software with a test-driven development mindset that aims to simplify routine work or resolve larger technical problems. Through cooperation with engineers across other teams in the engineering organization, you'll also enable the development of new products that have a lasting and direct impact on the entire business.


  • Model and create data sets that meet our business requirements.
  • Transform existing manual processes with automation and create self-service data consumption.
  • Develop data infrastructure that can power ETL from structured and unstructured data sources.
  • Create data tools to serve our analytics and data scientist organizations to allow them to provide insights about our product and recommendations for constant innovation.
  • Develop data pipelines to power stakeholders to create actionable insights into customer acquisition, operational efficiency and other key business performance metrics.
  • Support stakeholders with data-related issues.
  • Enforce our data classification to secure our customer data across national boundaries through different cloud regions as needed.
  • Collaborate with data experts to continuously improve our data systems.
  • Create and maintain scalable data platforms and data products that can meet our SLA’s.


  • 4+ years experience with software engineering.
  • 2+ years working experience as a Data Engineering in a high data volume environment.
  • Good programmer - able to write modular, maintainable code with guidance.
  • Strong technical skills, allowing you to work on small projects independently, and medium to large projects with supervision.
  • Quality Engineering mindset
  • Strong experience on SQL and working experience with relational databases.
  • 2+ years experience developing “big data” data pipelines, architectures patterns and data sets modeling.
  • 2+ years experience of root cause analysis on “big data” issues in production and identify opportunities for improvement.
  • Analytic skills related to working with unstructured datasets.
  • 2+ years experience building ETL, data structures, metadata, dependency and workload management. Successful stories are important.
  • Working experience with “big data” around message queuing, stream processing, batch processing and scalable data stores.
  • Bachelors in Computer Science or another quantitative field.
  • Must have experience in the following technologies: Hadoop, Spark, Kafka, Data Warehouse DB, Python or Java.
  • It is a plus to have experience with Redshift, Scala, RDS, Cassandra, pipeline workflows (Apache Airflow), Spark-Streaming, Apache Flink, EC2, EMR, CI/CD, test automation framework, Active contributor to open source project in the data space is a big plus.

About Box

Box is an enterprise content management platform that solves simple and complex challenges, from sharing and accessing files on mobile devices to sophisticated business processes like data governance and retention.

Since 2005, Box has made it easier for people to securely share ideas, collaborate and get work done faster. Today, more than 41 million users and 74,000 businesses—including 59% of the Fortune 500—trust Box to manage content in the cloud.

Want to learn more about Box? Visit Box's website.