Data Engineer
The Data Engineer is responsible for the creation, maintenance, and continuous improvement of data pipelines. Part of his/her responsibilities are to implement best practices in data management practices (i.e., cleaning, validation, and transformation of data) make data into usable datasets that can easily be consumed by other teams.
This role will also work closely with the software engineers, data analysts, data scientists and data governance to understand how the data behaves in its respective domain, to clarify business and technical requirements on different data use cases, and to design and create efficient and reliable data pipelines.
Within Data Engineering, this person will learn and adopt best practices on data management, data architecture design, and DataOps principles. May it be in Central DE or Distributed DE, a Data engineer is crucial in creating value for downstream teams that use data.
NATURE OF WORK
The Data Engineer works on existing data pipelines, including development of data models, and data management may it be in a data warehouse, data/delta lake or lakehouse. Collaboratively works with upstream teams (e.g. Mesh Teams) that pass data to the data architecture and with downstream teams that use data within it. He/She can understand the existing technology choices and consistently complies with the DE Standard.
Central DE: A Data Engineer in Central DE helps in the maintenance of the overall data architecture. Ensuring its scalability, high availability, on-time data ingestions and ensure operations are not disrupted. He/She will build data pipelines as new data comes in, applying best practices and DataOps principles.
Distributed DE: A Data Engineer in Distributed DE acquires and maintains an in-depth domain knowledge of the data within the assigned scope. This domain knowledge is crucial in the creation of data models and development of the said data models for Zone 2 and Zone 3 data (a.k.a. silver and gold layer, respectively). This expertise ensures that DE-transformed business datasets are usable for downstream teams.
Qualifications
- Good working knowledge on Shell (e.g. bash, zsh) scripting
- Good working knowledge on data manipulation (SQL statements, JSON, NOSQL query, etc.)
- Good working knowledge on AWS services (EC2, S3, Glue Crawlers, Jobs, Batch, Athena, Lambda, etc.) or equivalent cloud offerings a big plus
- Good working knowledge on Apache Spark using SQL/Python
- Good understanding of the concepts of Datawarehouse, Data Lake/Delta Lake and/or Lakehouse
- Ability to work with other Leads to foster a culture of collaboration and teamwork
Required Skills
Central DE:
- Good knowledge on Linux/Unix Administration
- CI/CD experience using Terraform a big plus
Distributed DE:
- Good working knowledge on data modeling