SRE Manager - Jagoan Loker

SRE Manager

Date Posted:
Salary:
Jakarta

Job Description

Reliability is an extremely important aspect of our app. As a Site Reliability Engineering Manager, you will play a crucial role in leading a team of individuals who are responsible to build and run large-scale, massively distributed, fault-tolerant systems. The team’s main task is to ensure high availability, reliability, and stability on both our internal and external user-facing systems. Additionally SRE’s will keep an ever-watchful eye on the capacity and performance of our system.

Responsibilities :
  • Collaborate with key stakeholders across Product, Engineering, IT Security, and other teams on initiatives and capabilities related to the operational health, security, growth, and design of our applications.
  • Monitor, analyze and tackle potential reliability issues by implementing comprehensive monitoring tools & metrics across different systems.
  • Provide observability and insights for Business, Product and other non-tech teams in regards to service reliability & customer experience as part of the OKR supporting tool.
  • During troubleshoots, you will need to lead your team on root cause analysis, pattern identification and continuous improvement in order to optimize application performance, resilience and reliability.
  • Drive service reliability by developing tooling that enables metric visibility using SLIs, SLOs, and SLAs
  • Provide advice/solution and lead the initiatives from the infrastructure team to improve our availability, performance, efficiency, change management, monitoring, emergency response, and capacity planning
  • Be a mentor, coordinator and provide guidance to the SRE team
  • Develop safe rollout plans for our Services to prevent potential outages.
Requirements
  • 6+ years of experience in a site reliability engineering, DevOps, or cloud architect role. Hands-on experience in defining processes and implementing best practices for enterprise scale infrastructures.
  • Experienced in working with modern infrastructure and monitoring tech stacks for enterprise-scale applications ie. New Relic, Datadog, ELK, Kubernetes, Cloud Service Platforms, CI/CD pipelines etc.
  • Experienced in successfully managing a distributed team of 5-8 engineers on large-scale projects that included technical deep-dives and production troubleshooting in the areas of: distributed systems, code, networking, storage, and operating systems.
  • Exemplary leadership and communication abilities (both verbal and written) are a must
  • Experience in activities like architecture reviews, code reviews, creating platforms and frameworks, capacity planning, etc.
  • Skilled in Bash scripting and general Linux commands. Other programming languages such as Go/Python/Java is a big plus.
Benefits
  • Capital market sharing session
  • Flexible work arrangement
  • Self development program
  • Health insurance benefits
  • Well being and counseling program

Stockbit

Related Jobs

Job Detail

  • Location
    Jakarta
  • Company
  • Type
    Private
  • Employment Status
    Permanent
  • Positions
    Available
  • Career Level
    Experience
  • Gender
    Male/Female

Contact Stockbit

Sponsored by

https://kalam.id connects jobseekers and recruiters by accurately matching candidate profiles to the relevant job openings through an advanced 2-way matching technology. While most job portals only focus on getting candidates the next job, Shine focuses on the entire career growth of candidates.