Description:
We are seeking a highly skilled Data Engineer with complementary Data Science expertise to design, build, and optimize data platforms that power advanced analytics and machine learning solutions. The ideal candidate has deep hands-on experience with Databricks, Python, and AWS SageMaker, and thrives in a fast-paced environment where both engineering excellence and analytical curiosity are valued.
Key Responsibilities
Data Engineering – 75%
- Design, develop, and maintain scalable data pipelines using Databricks (PySpark, Delta Lake, SQL).
- Build and optimize ETL/ELT workflows for structured and unstructured data.
- Develop and manage Delta Lake architectures, ensuring ACID compliance and high data quality.
- Integrate data from various sources (APIs, databases, streaming platforms, cloud storage).
- Implement data quality frameworks, monitoring, logging, and alerting.
- Optimize Databricks clusters, workflows, and job performance.
- Collaborate with cloud platform teams on AWS architecture, security, and cost optimization.
- Ensure proper data governance, documentation, and metadata standards.
Data Science – 25%
- Develop, train, and evaluate ML models in Python using libraries like scikit-learn, pandas, and PyTorch/TensorFlow (optional).
- Operationalize and deploy machine learning models using SageMaker (training jobs, endpoints, pipelines).
- Perform exploratory data analysis (EDA), feature engineering, and statistical analysis.
- Collaborate with business stakeholders to understand analytical requirements and communicate insights.
- Partner with Data Engineering teams to productionize ML features and pipelines.
Required Skills & Experience
- 5+ years of experience as a Data Engineer, Data Scientist.
- Strong proficiency in Python for data processing and ML.
- Hands-on experience with Databricks (PySpark, SQL, Delta Lake, MLflow).
- Experience building and deploying ML models using AWS SageMaker.
- Solid understanding of data modeling, warehouse/lakehouse architectures, and cloud-native patterns.
- Proficiency with ETL/ELT development, CI/CD, and version control (Git).
- Strong problem-solving mindset and ability to work cross-functionally.