Job Description
Key Responsibilities
- Deploy updates and fixes to ensure the stability and performance of our services, including version control, testing, and rollback procedures.
 - Monitor system health and maintain high uptime by proactively identifying and mitigating potential risks.
 - Provide Level 2 technical support to resolve escalated issues, while being on-call to address urgent DevOps team needs during production outages.
 - Develop and maintain tools that automate error detection, reduce manual intervention, and improve overall operational efficiency.
 - Design and implement integration solutions for internal back-end systems, ensuring compatibility and data consistency across platforms.
 - Conduct root cause analysis for production errors, document findings, and propose preventive measures to avoid recurrence.
 - Investigate and resolve complex technical issues, including system configuration, network connectivity, and application performance bottlenecks.
 - Create and refine scripts for automating visualization tasks, such as data processing, reporting, and dashboard generation.
 - Establish standardized procedures for system troubleshooting, maintenance, and incident response to ensure consistency and scalability.
 - Collaborate with cross-functional teams to align technical solutions with business objectives and user requirements.
 - Continuously optimize system workflows and infrastructure to enhance reliability, security, and user experience.
 - Stay updated on emerging technologies and industry best practices to drive innovation in system management and automation.
 
Job Requirements
- Proven experience in DevOps operations, with a strong track record of maintaining high system uptime and resolving critical issues.
 - Advanced knowledge of system administration, automation tools (e.g., Ansible, Puppet), and cloud platforms (e.g., AWS, Azure).
 - Excellent problem-solving skills and ability to analyze complex technical scenarios to identify root causes and implement effective solutions.
 - Proficiency in scripting languages (e.g., Python, Bash) for automation and visualization tasks, including API integration and data processing.
 - Strong understanding of software development lifecycle, with experience in integrating applications with internal back-end systems.
 - Ability to design and document standardized procedures for system maintenance, troubleshooting, and incident management.
 - Excellent communication skills to collaborate with teams and explain technical solutions to non-technical stakeholders.
 - Preferred: Experience with CI/CD pipelines, containerization technologies (e.g., Docker, Kubernetes), and monitoring tools (e.g., Prometheus, Grafana).
 - Ability to work independently and as part of a team, with a proactive approach to identifying opportunities for improvement.
 - Strong attention to detail and commitment to delivering high-quality, reliable technical solutions that align with business goals.
 - Preferred: Familiarity with ITIL frameworks and incident management best practices.
 - Ability to adapt to evolving technologies and continuously improve system performance and security protocols.
 


