Data Engineer
The Data Engineer is responsible for the creation, maintenance, and continuous improvement of data pipelines. Part of his/her responsibilities are to implement best practices in data management practices (i.e., cleaning, validation, and transformation of data) make data into usable datasets that can easily be consumed by other teams.
This role will also work closely with the software engineers, data analysts, data scientists and data governance to understand how the data behaves in its respective domain, to clarify business and technical requirements on different data use cases, and to design and create efficient and reliable data pipelines.
Within Data Engineering, this person will learn and adopt best practices on data management, data architecture design, and DataOps principles. May it be in Central DE or Distributed DE, a Data engineer is crucial in creating value for downstream teams that use data.
Key Responsibilities:
- Develop, maintain, and optimize data pipelines, data models, and data management solutions across data warehouses, data/delta lakes, or lakehouse environments.
- Collaborate with upstream teams (e.g., Mesh Teams) to integrate data sources and with downstream teams to ensure data usability and accessibility.
- Understand and adhere to existing technology standards and Data Engineering (DE) best practices.
Central DE:
- Maintain and enhance the overall data architecture, ensuring scalability, high availability, and timely data ingestion.
- Build and optimize data pipelines for new data sources, applying DataOps principles to ensure seamless operations and minimal disruptions.
Distributed DE:
- Acquire and maintain deep domain knowledge of assigned data areas to inform data modeling and pipeline development.
- Design and develop data models for Zone 2 (silver layer) and Zone 3 (gold layer), ensuring business datasets are accurate, reliable, and ready for downstream consumption.
Qualifications
- Good working knowledge on Shell (e.g. bash, zsh) scripting
- Good working knowledge on data manipulation (SQL statements, JSON, NOSQL query, etc.)
- Good working knowledge on AWS services (EC2, S3, Glue Crawlers, Jobs, Batch, Athena, Lambda, etc.) or equivalent cloud offerings a big plus
- Good working knowledge on Apache Spark using SQL/Python
- Good understanding of the concepts of Datawarehouse, Data Lake/Delta Lake and/or Lakehouse
- Ability to work with other Leads to foster a culture of collaboration and teamwork
Central DE
- Good knowledge on Linux/Unix Administration
- CI/CD experience using Terraform a big plus
Distributed DE
- Good working knowledge on data modeling