Senior Data Engineer
CORE PROFILE
The Data Engineer is responsible for the creation, maintenance, and continuous improvement of data pipelines. This includes implementing best practices in data management such as data cleaning, validation, and transformation, ensuring that data is structured into high-quality datasets that can be efficiently consumed by downstream teams.
This role works closely with software engineers, data analysts, data scientists, and data governance teams to understand data behavior within specific domains, clarify business and technical requirements, and translate data use cases into efficient, reliable, and maintainable data pipelines.
The Data Engineer plays a critical role in enabling advanced analytics and AI use cases by ensuring that data pipelines, datasets, and platforms are reliable, scalable, and suitable for machine learning and real-time decision systems.
Within Data Engineering, whether in Central DE or Distributed DE, this role applies and continuously improves best practices in data management, data architecture, and DataOps. The Data Engineer is essential in ensuring data availability, reliability, and usability, enabling teams across the organization to generate insights and deliver value through data.
NATURE OF WORK
The Data Engineer works on existing data pipelines, including the development of data models and data management across data warehouses, data lakes (Delta Lake), and lakehouse architectures. The role collaborates with upstream teams (e.g., Mesh teams) that provide data into the platform, as well as downstream teams that consume and operationalize data. The role understands existing technology choices, and adopts new technology & practices that consistently comply with Data Engineering standards.
Central DE
- Independently designs and builds new data pipelines within existing architectures
- Owns ingestion and transformation logic end-to-end, ensuring solutions meet both functional and long-term requirements
- Handles both new and legacy pipelines with minimal guidance
- Improves pipeline reliability, performance, and scalability
- Leads projects, not just development tasks
- Designs solutions spanning multiple pipelines or domains, including those supporting analytical and advanced data use cases
- Navigates technical dependencies across teams and systems
- Drives improvements in data integrity, timeliness, and quality within assigned domains
- Mentors junior engineers and provides guidance on implementation, best practices, and technical decision-making
AI & Advanced Analytics Enablement
- Designs and maintains data pipelines that support advanced analytics and AI use cases
- Ensures data quality, consistency, and availability for analytical and data-driven applications
- Collaborates with downstream teams to align data models and pipelines with evolving analytical requirements
AI-Assisted Development Practice
- Uses AI-assisted tools to improve productivity and efficiency in development
- Remains fully accountable for the technical correctness and quality of all outputs
- Reviews, validates, and challenges AI-generated code, recommendations, and design proposals
- Ensures all solutions align with Data Engineering standards, security requirements, maintainability, and long-term reliability
DISPLAYED SKILL MASTERY
Common Skills
- Proficiency in Shell scripting (e.g., bash, zsh)
- Strong proficiency in data manipulation using SQL and experience with structured and semi-structured data (e.g., JSON, NoSQL)
- Solid working knowledge of cloud platforms (e.g., AWS services such as S3, EC2, Glue, Lambda, Athena, etc.) or equivalent
- Proficiency in Apache Spark using SQL and/or Python for large-scale data processing
- Strong understanding of data storage and processing architectures, including Data Warehouse, Data Lake, Delta Lake, and Lakehouse
- Ability to collaborate effectively with cross-functional teams and contribute to a culture of technical excellence and accountability
- Familiarity with AI-assisted development tools and ability to critically validate generated outputs
Central DE
- Advanced expertise in designing and implementing scalable data ingestion pipelines across batch, CDC, and streaming patterns
- Strong design skills across distributed data processing systems, including batch, streaming, and hybrid architectures
- Deep understanding of data modeling, pipeline performance optimization, and cost management in cloud environments
- Strong troubleshooting skills across distributed systems, with the ability to identify and resolve complex data pipeline and infrastructure issues
- Good understanding of data pipeline reliability practices, including monitoring, observability, failure handling, and recovery strategies
- Ability to lead technical design discussions, review solutions, and guide implementation across teams
- Proven ability to mentor engineers and enforce engineering standards and best practices
REQUIRED QUALIFICATIONS
Education
- Bachelor’s degree in Computer Science, Information Technology, Engineering, or a related field
- Master’s degree is a plus but not required
Experience
- 4+ years of experience in data engineering, data platform engineering, or related roles
- Proven experience designing and building scalable data ingestion and data processing pipelines in cloud-based environments
- Strong hands-on experience with distributed data processing frameworks (e.g., Apache Spark) and cloud services (e.g., AWS or equivalent)
- Experience working with both batch and near real-time or streaming data ingestion pipelines
- Experience managing data pipelines across data lakes, data warehouses, or lakehouse architectures
- Demonstrated ability to lead end-to-end delivery of data pipeline and infrastructure solutions
- Experience troubleshooting and resolving complex data pipeline or production issues across distributed systems
- Experience collaborating with upstream data providers (e.g., source systems, platform teams) and downstream consumers (e.g., analytics or data science teams
Technical Expertise
- Advanced proficiency in SQL and strong programming skills in Python or equivalent
- Strong understanding of data ingestion patterns (batch, CDC, streaming)
- Solid understanding of distributed data architectures (Data Lake, Delta Lake, Lakehouse)
- Experience with performance optimization and cost management in cloud-based data platforms
- Experience with CI/CD practices and infrastructure-as-code tools (e.g., Terraform) is highly preferred
AI & Advanced Analytics Awareness
- Understanding of how data pipelines support downstream advanced analytics and AI use cases
- Ability to design pipelines that provide consistent, reliable, and timely data for analytical workloads
- Familiarity with data quality, completeness, and reliability requirements for downstream model consumption
- Experience using AI-assisted development tools and the ability to critically validate generated outputs
- Behavioral & Leadership Competencies
- Ability to lead technical discussions and drive data pipeline design decisions across teams
- Strong problem-solving skills in distributed and data-intensive systems
- Effective communication skills across technical and non-technical stakeholders
- Proven ability to mentor engineers and review technical outputs