Job Description
Key Responsibilities
- Develop and drive the reliability roadmap including SLOs, error budgets, capacity planning, and cost/performance optimization
 - Establish platform standards for progressive rollouts, safe rollbacks, and change management
 - Expand observability systems (metrics/logging/tracing using OpenTelemetry) and implement actionable alerting strategies
 - Oversee incident management processes including on-call rotations, root cause analysis, and postmortems
 - Implement secure key and secret management strategies using Vault/HSM/KMS solutions
 - Standardize blockchain node/RPC operations (deployment, upgrades, failover) and integrate with service workflows
 - Lead, mentor, and grow team members while collaborating with backend, infrastructure, security, and product teams
 
Job Requirements
- 5+ years of DevOps/SRE experience with at least 2 years in blockchain or critical production infrastructure
 - Proven leadership experience managing teams of 3-5 engineers
 - Deep expertise in Kubernetes, automation frameworks (Terraform/Helm/Ansible), and CI/CD pipelines
 - Experience delivering production-grade reliability for large-scale microservices
 - Familiarity with blockchain node operations (Ethereum, Solana, Bitcoin or similar)
 - Strong knowledge of observability, incident response, and system hardening
 - Excellent communication skills with preference for English proficiency
 
Preferred Qualifications
- Ability to balance hands-on engineering, team management, and strategic planning
 
Benefits
- Opportunity to work on cutting-edge Web3 projects with a technology-driven, flat organizational structure
 - Highly competitive compensation package with additional performance incentives
 - Abundant technical growth opportunities including participation in top industry conferences and collaboration with elite development teams
 - Flexible work arrangements with remote work support and comprehensive benefits package
 


