Job Description:
As a key member of our operations team, you will be responsible for ensuring the stability, scalability, and efficiency of our core business systems and cloud platforms. Your expertise will directly contribute to maintaining our high-availability infrastructure and optimizing our operational processes.
- Deploy, optimize, and ensure high availability for core business systems, cloud platforms (AWS/Aliyun/Tencent Cloud), and foundational services (Kubernetes, Docker, Nginx, MySQL, Redis, Kafka).
- Plan and implement system capacity management, performance optimization, and disaster recovery solutions to guarantee service stability and scalability.
- Build and maintain CI/CD pipelines to achieve automated building, testing, deployment, and rollback.
- Design and enhance system monitoring, log collection, and alerting systems (Prometheus/Grafana/ELK/OpenSearch).
- Participate in emergency response for production incidents, troubleshooting, and post-mortem analysis to drive long-term optimization.
- Contribute to standardization and process improvement in operations, documenting best practices.
- Assess overall operational costs and audit IT expenditures.
Job Requirements:
We are looking for a highly skilled professional with extensive experience in large-scale internet or cloud platform operations. The ideal candidate will possess strong technical expertise and problem-solving abilities.
- Bachelor's degree or higher in Computer Science or related field, with 5+ years of experience in large-scale internet or cloud platform operations.
- Proficient in Linux systems and at least one scripting/programming language (Shell/Python/Go).
- Expertise in Docker, Kubernetes, and CI/CD toolchains (Jenkins, GitLab CI, ArgoCD, etc.).
- Familiarity with monitoring and logging systems (Prometheus, Grafana, ELK/OpenSearch).
- Experience with public cloud architectures (AWS, Aliyun, GCP, Azure).
- Strong communication skills and teamwork spirit, with ability to quickly identify and resolve complex issues.
Preferred Qualifications:
- Extensive experience in IT operations cost optimization.
- Experience with global network acceleration and security deployment in enterprise cloud environments.
- Excellent documentation skills.
Benefits:
We offer a fully remote work environment, competitive salary and performance bonuses, and a positive team atmosphere that fosters collaboration and professional growth.


