- Architect, build, test, deploy distributed, scalable, and resilient Spark/Scala/Kafka Big Data processing, and Machine Learning model pipelines for batch, micro-batch, and streaming workloads sets into Cerebri AI’s proprietary data stores for use in machine learning modeling
- Develop and maintain data ontologies for key market segments
- Collaborate with data scientists to develop automated orchestration of model pipelines to solve Cerebri AI business use case objectives
- Collaborate with clients to develop pipeline infrastructure, and to ask appropriate questions to gain deep understanding of client data
- Deploy fully containerized Docker/Kubernetes Data processing, and Machine Learning model pipelines into Azure, AWS, GCP cloud environments and on-premise systems as necessary
- Document Detailed Designs (including source to target mappings) and Code for Data Quality frameworks that can measure and maintain Data Completeness, Data Integrity and Data Validity between interfacing systems
- Ensure all solutions comply with the highest levels of security, privacy, and data governance requirements as outlined by Cerebri and Client legal and information security guidelines, law enforcement, and privacy legislation, including data anonymization, encryption, and security in transit and at rest, etc.
- Train and mentor junior team members
- Acts as a Subject Matter Expert and a Thought Leader, continuously following industry trends, the latest competitive developments, and delivering papers and presentations at major industry conferences and events.
- A degree in Computer Science, Engineering, AI, Machine Learning, BI, MIS, or an equivalent technology field
- Minimum 2 years of Production programming experience in Scala, Spark, PySpark, Big Data, Python
- Minimum 2 years of Production experience with Hadoop Big Data platform
- Able to program and understand data science and data engineering ideas in Python and translate into modular, functional components in Scala
- Streaming and micro-batch application development experience would be an asset, including Kafka, Storm, NiFi, Spark Streaming, Confluent or equivalent
- Proficiency with Linux/Unix operating systems, utilities and tools
- Experience working directly with relational database structures and flat files
- Ability to write efficient database queries, functions and views to include complex joins and the identification and development of custom indices
- Knowledge of professional software engineering practices and best practices for the full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, continuous integration and development, and operations.
- Experience deploying containerized Docker/Kubernetes applications
- Experience with Microsoft Azure or similar cloud computing solutions
- Big Data application architecture experience and in-depth understanding of the Big Data ecosystem, applications, services, and design patterns
- Production systems integration experience
- Good verbal and written communication skills, with both technical and non-technical stakeholders
Nice to Haves
- Experience in business intelligence visualization tools such as Grafana, Superset, Redash or Tableau.
- Master’s degree or higher in a relevant quantitative subject
- Experience with the Atlassian suite (JIRA, Confluence, BitBucket).
- Any other related experience with Big Data, artificial intelligence, natural language processing, machine learning and/or deep learning, predictive analytics
- Familiar with automated machine learning (AutoML) concepts would be an asset
- Experience with Breeze would be an asset
About Cerebri AI
Cerebri AI provides AI and machine learning solutions to help enterprises grow top line revenues by giving them a 1:1 relationship with their customers. We do this by processing internal and external customer data, and by determining the dollar value a customer places on the “value” of a vendor, products, assets, etc. We also monetize a critical variable in any revenue situation, the customer’s ability to pay, so things such as up-selling opportunities can be clearly scoped and delivered. We call the results Customer Value Indexes (CVIs) for brands, vendors, assets and financing.
Want to learn more about Cerebri AI? Visit Cerebri AI's website.
Palantir builds software that connects data, technologies, humans and environments.