Manager of Site Reliability Engineering
ECI Software Solutions
RemoteRemoteFull Time
Job Description
Location: US - Remote
Overview
The Site Reliability Engineering (SRE) Manager is responsible for leading the reliability, performance, operational excellence, and cost efficiency of ECI’s production systems across hybrid environments (cloud and on-prem). This role partners closely with Product, Development, Infrastructure, Finance, and Support teams to ensure platforms meet uptime, SLA, performance, and cost optimization objectives while supporting continuous delivery and business growth.
Responsibilities
Operational Excellence, Reliability & Availability
Lead and manage SRE operations supporting 24/7/365 availability.
Own uptime, SLA compliance, SLIs, SLOs, error budgets, MTTR, and incident trends.
Oversee incident management, on-call rotations, and post-incident reviews..
FinOps & Cost Optimization
Lead FinOps practices across hybrid environments.
Drive right-sizing, optimization, and elimination of infrastructure waste.
Establish cost visibility, allocation, and reporting.
Observability, Telemetry & Alerting
Define and maintain observability standards across hybrid environments, such as AWS, Azure and Vsphere.
Utilize platforms such as Coralogix, Open Telemetry, and FireHydrant.
GitOps, Infrastructure & Automation
Champion GitOps practices and pull request governance.
Lead Terraform-based infrastructure automation initiatives.
Collaboration & Leadership
Partner across Product, Engineering, Infrastructure, Finance, and Support teams.
Lead, mentor, and develop a high-performing SRE team.
Qualifications
Leadership experience managing SRE, DevOps, or Infrastructure teams.
Experience operating hybrid (cloud and on-prem) production environments.
Proven experience with FinOps and cost optimization initiatives.
Experience with GitOps workflows, Terraform, and observability tooling.
Preferred
Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
8+ years of experience in Site Reliability Engineering, DevOps, Infrastructure, or related disciplines, including people leadership.
Cloud certifications such as AWS Certified Solutions Architect, Google Professional Cloud Architect, or Microsoft Azure certifications.
Experience working in Agile/Scrum environments and managing work through tools such as Jira.
Experience supporting high-availability SaaS platforms and modernization initiatives.
Experience driving application modernization efforts, establishing robust CI/CD frameworks that accelerate delivery cycles.
#LI-Remote
Overview
The Site Reliability Engineering (SRE) Manager is responsible for leading the reliability, performance, operational excellence, and cost efficiency of ECI’s production systems across hybrid environments (cloud and on-prem). This role partners closely with Product, Development, Infrastructure, Finance, and Support teams to ensure platforms meet uptime, SLA, performance, and cost optimization objectives while supporting continuous delivery and business growth.
Responsibilities
Operational Excellence, Reliability & Availability
Lead and manage SRE operations supporting 24/7/365 availability.
Own uptime, SLA compliance, SLIs, SLOs, error budgets, MTTR, and incident trends.
Oversee incident management, on-call rotations, and post-incident reviews..
FinOps & Cost Optimization
Lead FinOps practices across hybrid environments.
Drive right-sizing, optimization, and elimination of infrastructure waste.
Establish cost visibility, allocation, and reporting.
Observability, Telemetry & Alerting
Define and maintain observability standards across hybrid environments, such as AWS, Azure and Vsphere.
Utilize platforms such as Coralogix, Open Telemetry, and FireHydrant.
GitOps, Infrastructure & Automation
Champion GitOps practices and pull request governance.
Lead Terraform-based infrastructure automation initiatives.
Collaboration & Leadership
Partner across Product, Engineering, Infrastructure, Finance, and Support teams.
Lead, mentor, and develop a high-performing SRE team.
Qualifications
Leadership experience managing SRE, DevOps, or Infrastructure teams.
Experience operating hybrid (cloud and on-prem) production environments.
Proven experience with FinOps and cost optimization initiatives.
Experience with GitOps workflows, Terraform, and observability tooling.
Preferred
Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
8+ years of experience in Site Reliability Engineering, DevOps, Infrastructure, or related disciplines, including people leadership.
Cloud certifications such as AWS Certified Solutions Architect, Google Professional Cloud Architect, or Microsoft Azure certifications.
Experience working in Agile/Scrum environments and managing work through tools such as Jira.
Experience supporting high-availability SaaS platforms and modernization initiatives.
Experience driving application modernization efforts, establishing robust CI/CD frameworks that accelerate delivery cycles.
#LI-Remote