Sr. Manager - Digital Solutions SRE Support - Remote (Biotech)
As a SRE Manager in Digital Solutions, you will lead a team of talented individuals, and be responsible for the delivery, optimization, resilience, and availability of high-value and high-transaction-rate services.
Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Digital Solutions' systems maintain the appropriate service levels (availability, latency, and reliability) to serve our customers' needs, and reduce the friction for managing change, while being strategic about capacity, and constantly managing performance.
How you will do it
Lead a team of SREs(Site reliability engineers) and Managed Services Partners/Vendors who provide 24/7 monitoring and L2/L3 support for SRE Infrastuctre, CI/CD platform/tools and digital products.
Responsible for reliability, availability and performance. Helping the team meet its strategic goals; to maintain the highest level of availability through resilience, observability, maximise developer velocity while keeping our product reliable, and ensure that we can deliver the highest quality experience to our customers.
Work closely with engineering teams to ensure that services are correctly designed for scale, have defined proper metrics and related SLOs, follow best practices and guidelines for health, security, observability, and operability.
Define best practices and standards in regards to Observability, Monitoring, Alerting, Capacity Planning, availability, performance/latency, change and incident management, troubleshooting and support across all our technology teams and services.
Ensure changes are delivered in a safe and secure way, resilience is built into our products and using best practices for safe deployments with automation and without single point of failure.
Lead and participate in the troubleshooting of incidents, problem analysis and postmortems.
Develops and maintains product-level runbooks for incident response, in collaboration with SMEs on each product team, to document the step-by-step process to recover from specific components within a system.
Drive and Deliver automation to prevent problem recurrence, and automate response to all non-exceptional service conditions.
Strive to maintain SLA adherence and continuously improve service level KPIs. Oversee end to end service operations, establish and monitor appropriate metrics and measurements.
Solve problems relating to mission-critical services and build automation to prevent problem recurrence; with the goal of automated response to all non-exceptional service conditions
Conduct operation KPI/Metric reviews with Engineering and Product management teams. Able to present to executive leadership.
What we look for
BS/MS in Computer Science or equivalent experience.
10 plus years of infrastructure operations or DevOps/SRE experience, and experience running large-scale cloud infrastructure and applications with minimum 3 years of management experience.
Experience in leading production support organizations (24/7 Monitoring ,Incident, Problem Management, Preventine Maintenance and Toil Reduction)
Prior experience in DevOPS, Infrastructure Engineering, and Site Relibility Engineering required
Experience in defining and implementing highly resilient and reliabile infrastructure
Able to work effectively across multiple time zones to collaborate with peers in other geographies.
Experience building, maintaining and operating production systems (> 99.9% SLA) on Azure.
Experience with CI/CD process, tools and technologies, K8s and Networking
Experience automating complicated infrastructure systems using Chef, Puppet, or Ansible
Excellent communication skills (verbal and written) are critical to the role.
Experience in Agile software development
Knowledge/Experience in AWS and GCP
Johnson Controls International plc. is an equal employment opportunity and affirmative action employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, age, protected veteran status, genetic information, sexual orientation, gender identity, status as a qualified individual with a disability or any other characteristic protected by law. To view more information about your equal opportunity and non-discrimination rights as a candidate, visit EEO is the Law. If you are an individual with a disability and you require an accommodation during the application process, please visit here.