The Cloud DevOps engineer combines software and systems engineering to ensure the efficient and effective operation of PRITS cloud platform. This role is responsible for managing the Cloud DevOps team.
Responsibilities
Oversee Cloud Infrastructure design, implementation, and support.
Provide leadership and guidance of software development focusing on a cloud-first, microservices development model.
Core responsibilities are both the technical and strategic aspects of infrastructure as code, automation, governance, storage, backups, change management, cost optimization, software delivery, security, resource orchestration, configuration management, monitoring, business continuity, disaster recovery and emergency response
Enhance delivery and orchestration of cloud and platform services through code
Implement, operate and optimize CI/CD pipelines for effective delivery of cloud resources and software
Develop and implement effective reference architecture solutions for the delivery of platform services
Analysis and resolution of performance and availability issues affecting Code Dog customers and internal stakeholders
Systems engineering and/or automation activities to solve complex problems associated with running large scale, multi-tenant, production environments
Build, migrate, operate and improve on our clients cloud infrastructure’s security posture and operational capabilities
Lead the way to an automated, reliable, secure, scalable and cost-effective cloud
Implementation of proactive monitoring, alerting, trend analysis and self-healing systems
Participate in incident resolution processes driving restoration and repair of service-impacting issues
Define non-functional requirements as part of the product and software lifecycle to influence the new designs, standards, and methods for scalable, highly available distributed systems
Solve problems relating to mission critical services and build automation to prevent problem recurrence; with the goal of automating response to all non-exceptional service conditions
Support services before they go live through activities such as system design consulting, developing automation tools and frameworks, capacity planning as well as operational and security reviews prior to launch
Identify and drive opportunities to improve operational workflows
Required Skills
Bachelor's degree in Computer Science or equivalent
2+ years IT management experience
2+ years experience with Azure, AWS or other cloud computing platforms
5+ years of experience as a Site-Reliability/DevOps/Systems Engineer administration role of customer-facing, high-availability, large scale web-based applications
5+ years of Linux/Unix administration
5+ years of Windows server administration
5+ years of Python, Ruby, C#, Java, Bash or similar languages
1+ years of CI/CD experience in a customer facing, production environment
Desired Skills
Prior successful experience as a Systems, DevOps or Site Reliability Engineer
Eats, sleeps and breaths all things cloud, Azure and AWS
Mastery of Linux
Windows experience is a plus
Experience with container orchestration (Docker Swarm, Kubernetes, Mesosphere, AWS ECS/EKS)
Experience with Serverless architecture design, implementation and deployment
Proficient in Python, Ruby, Java, Bash or similar languages
Administrative experience installing, configuring, troubleshooting, monitoring, maintaining Linux infrastructure
Experience with CI/CD tools (Jenkins, CodePipeline, CodeBuild, CodeDeploy, Bamboo)
Experience with the Atlassian Tools (Jira, Confluence, Bitbucket)
Experience writing SQL and NoSQL procedures
Experience analyzing logs using tools
Prior experience working in an Agile environment
Demonstrated experience with Cloud and DevOps best practices
Experience with Orchestration tools such as Terraform or Cloudformation
Experience leveraging Configuration Management Tools, such as Ansible, Puppet or Chef etc.
Experience with instrumenting and monitoring production systems utilizing monitoring tools, such as Datadog, NewRelic, Collectd, Grafana etc.
Networking: knowledge and understanding of network concepts and technology such as TCP/IP, UDP, MAC addresses, IP packets, DNS, OSI layers, ACLs, routing tables, VPN and load balancing.
Desire to work in a fast paced and dynamic environment
A passion for operational excellence
Certifications: Azure DevOps Engineer, Azure Solution Architect or AWS SysOps Associate, AWS Solutions Architect Associate, AWS DevOps Professional