Senior Site Reliability Engineer
CORE PROFILE
As a Site Reliability Engineer, you will be leading a team ensuring some of the critical API gateways and backend services that power the Maya App work well for millions of users in the Philippines and beyond. Your work will improve features that people rely on every day.
Maya operates at a large scale, so we need someone with a sharp eye for detail. You will work on the infrastructure for new services, analyze and optimize existing infrastructure, improve reliability and reduce cost, and work with software engineers to maintain high quality. This is a chance to make a real impact in a fast-moving environment.
NATURE OF WORK
- Work with product owners, developers, and other SREs to understand requirements and deliver projects.
- Build and maintain highly available, reliable, robust, scalable, and cost-efficient infrastructure.
- Automate deployments, monitoring, and system management to reduce manual work and improve delivery speed and operational efficiency.
- Develop reusable infrastructure templates to simplify and standardize resource provisioning.
- Manage and optimize budget while balancing cost and performance.
- Ensure compliance and security by adhering to industry standards and frameworks such as PCI-DSS and BSP regulations.
- Lead incident response, troubleshoot issues, and conduct root cause analysis.
- Stay up to date on SRE and DevOps best practices and new technologies.
- Support and mentor team members.
REQUIRED QUALIFICATIONS
- 5+ years of experience in Site Reliability Engineering and working in a DevOps culture.
- AWS certification, at least on an Associate Level.
- Strong expertise in Kubernetes (EKS) and container orchestration.
- Hands-on experience with Infrastructure-as-Code (IaC) using Terraform.
- Experience with CI/CD pipeline management using GitLab CI or similar tools.
- Experience with monitoring, logging, and telemetry tools like Splunk, AWS CloudWatch, or Dynatrace and how to utilize them effectively.
- Strong knowledge of WAF rules/policies, and security configurations.
- Experience with service mesh technologies (Istio, Envoy, AWS App Mesh).
- Experience with deployment strategies (Blue/Green, Canary, Rolling).
- Hands-on networking experience that covers VPC, VLAN, Peering, and Routing.
- Knowledge of operating, scaling and optimizing relational database systems (such as PostgreSQL), NoSQL database systems (such as DynamoDB, MongoDB), and key-value stores (Redis).
- Proficient in shell scripting and/or Python.
- Familiarity with Java, Node.js, or other programming languages.
- Strong troubleshooting and problem-solving skills, especially under pressure.
- Effective communication and teamwork in an Agile environment (we use Scrum).
- Nice to have: Experience with leading a small SRE team (up to 5 members).