Site Reliability Engineer Lead

 

Description:

Responsibilities:

  • Work in collaboration with Application Development, Quality, Product and Data Engineering teams to Champion SRE/ DevOps culture and practices.
  • Strategic approach with clear objectives to improve service / product Availability, Performance Optimization, improve Incident MTTR, Change Success Rate and ensure feedback loop to Dev
  • Build and maintain Reliable Systems and platforms using SRE and DevSecOps principles with special focus on Observability, Resiliency (proactive impact prevention), Self Healing and Reliability testing
  • Work with App & Business teams to establish (SLO/SLI), SRE Dashboards that provide multiple views (LOB, business process or App) view to track value and enable effective decision making
  • Innovative approach to Reliability, from Arch and feasibility phase to Operation & Continuous Improvement following product model and Agile methodologies.
  • Focus on latest technology trends when it comes to Observability, Automation, Platform technology and tools including AIOps & MLOps reliability and resiliency.
  • Ensure Toil is addressed from inception and addressed in Operations (self-healing, self config, self-Provision and optimization) by leveraging Sense & response, advanced monitoring (synthetic & RUM)
  • Lead / Participate in Community of Practice (CoP) to connect and collaborate with like-minded teams, set objectives, roadmaps, and implementation. SRE office hours and CoP leadership and participation.

Organization Iris Software Inc.
Industry Engineering Jobs
Occupational Category Site Reliability Engineer Lead
Job Location Toronto,Canada
Shift Type Morning
Job Type Full Time
Gender No Preference
Career Level Intermediate
Experience 2 Years
Posted at 2025-06-23 2:49 pm
Expires on 2026-01-05