Job Description
Key Responsibilities
- Define and drive the reliability roadmap (SLOs, error budgets, capacity planning, cost/performance optimization).
 - Establish platform standards for progressive delivery, safe rollbacks, and change management.
 - Expand observability (metrics/logs/tracing using OpenTelemetry) and implement actionable alerts.
 - Oversee incident management (on-call rotations, root cause analysis, postmortems) to ensure continuous improvement.
 - Develop policies for secret/key management (Vault/HSM/KMS) and infrastructure hardening.
 - Standardize blockchain node/RPC operations (setup, upgrades, failover) and integrate them into service workflows.
 - Recruit, mentor, and develop the team while collaborating with backend, infrastructure, security, and product teams.
 
Job Requirements
- 5+ years of DevOps/SRE experience, including 2+ years managing blockchain or mission-critical infrastructure.
 - Leadership experience managing 3-5 engineers.
 - Deep expertise in Kubernetes, automation frameworks (Terraform/Helm/Ansible), and CI/CD pipelines.
 - Proven track record of delivering production-level reliability for large-scale microservices.
 - Hands-on experience with blockchain nodes (Ethereum, Solana, Bitcoin, or similar).
 - Strong foundation in observability, incident response, and system hardening.
 - Excellent communication skills; English proficiency preferred.
 
Preferred Qualifications
- Ability to balance hands-on engineering with team leadership and strategic planning.
 
Benefits
- Annual leave
 - Health check-ups
 - Performance bonuses
 - Remote work flexibility
 


