Site Reliability Engineer - SRE
LATENT
San Francisco, CA 94114$200,000 - $300,000 a yearFull Time
Devops EngineerApply with AI Cover Letter
Job Description
*Job Overview*
We are seeking a highly skilled Site Reliability Engineer (SRE) to join our dynamic IT team. The ideal candidate will be responsible for maintaining the reliability, availability, and performance of our enterprise-scale systems and cloud infrastructure. This role involves designing, implementing, and managing scalable, secure, and efficient systems using a broad range of technologies including cloud computing platforms, containerization, automation tools, and scripting languages. The SRE will collaborate closely with development teams to ensure seamless deployment and operation of software solutions across various environments. A strong background in system administration, software development, and incident management is essential to succeed in this position.
*Responsibilities*
* Design, build, and maintain scalable cloud infrastructure utilizing platforms such as AWS, Google Cloud Platform, Azure, OpenStack, and VMware.
* Develop automation scripts and configuration management workflows using tools like Ansible, Puppet, Chef, Terraform, PowerShell, Bash (Unix shell), and Python to streamline deployment processes.
* Manage container orchestration platforms including Docker and Kubernetes to support microservices architectures.
* Monitor system health and application performance using tools like New Relic, Splunk, Elasticsearch, and log analysis techniques; perform proactive troubleshooting to prevent outages.
* Implement disaster recovery plans and incident response procedures to ensure high availability and business continuity.
* Maintain security best practices by managing firewalls, identity & access management systems, DNS configurations, and cloud security protocols.
* Collaborate with development teams on CI/CD pipelines using Jenkins, GitHub/GitLab integrations, Maven, Gradle, TFS, and other DevOps tools for continuous integration and deployment.
* Conduct system testing and debugging for enterprise software applications built on Java (including WebSphere), C#, .NET frameworks, Ruby on Rails, Node.js, C++, Perl, Groovy, Go, and other technologies.
* Manage databases such as MySQL, Microsoft SQL Server, Oracle (including PL/SQL), DynamoDB to ensure data integrity and optimal performance.
* Participate in requirements gathering for new projects; contribute to requirements management and SDLC processes.
* Provide technical support for IT infrastructure issues related to network administration (TCP/IP/WAN/LAN), Active Directory/Identity & Access Management services.
*Experience*
* Proven experience in system administration or DevOps roles supporting large-scale enterprise or SaaS environments.
* Extensive hands-on experience with cloud computing platforms such as AWS (S3), Google Cloud Platform (GCP), Azure; familiarity with cloud security best practices is preferred.
* Strong proficiency in scripting languages including Python, Bash shell scripting; experience with PowerShell is a plus.
* Deep understanding of containerization (Docker) and orchestration (Kubernetes).
* Experience with configuration management tools like Ansible, Puppet or Chef; infrastructure as code using Terraform preferred.
* Knowledge of monitoring tools such as New Relic or Splunk for log analysis and system health checks.
* Familiarity with microservices architecture principles utilizing RESTful APIs and web services.
* Ability to troubleshoot complex issues involving distributed systems across Linux/Unix environments; experience with WebSphere, Weblogic or JBoss is advantageous.
* Strong background in incident recovery & incident management processes; experience with agile methodologies such as Scrum or Kanban is desirable.
* Excellent problem-solving skills combined with the ability to work effectively under pressure in a fast-paced environment. This position offers an exciting opportunity for a dedicated SRE professional eager to contribute to the stability of critical enterprise systems while working with cutting-edge technologies in a collaborative environment.
Job Type: Full-time
Pay: $200,000.00 - $300,000.00 per year
Work Location: In person
We are seeking a highly skilled Site Reliability Engineer (SRE) to join our dynamic IT team. The ideal candidate will be responsible for maintaining the reliability, availability, and performance of our enterprise-scale systems and cloud infrastructure. This role involves designing, implementing, and managing scalable, secure, and efficient systems using a broad range of technologies including cloud computing platforms, containerization, automation tools, and scripting languages. The SRE will collaborate closely with development teams to ensure seamless deployment and operation of software solutions across various environments. A strong background in system administration, software development, and incident management is essential to succeed in this position.
*Responsibilities*
* Design, build, and maintain scalable cloud infrastructure utilizing platforms such as AWS, Google Cloud Platform, Azure, OpenStack, and VMware.
* Develop automation scripts and configuration management workflows using tools like Ansible, Puppet, Chef, Terraform, PowerShell, Bash (Unix shell), and Python to streamline deployment processes.
* Manage container orchestration platforms including Docker and Kubernetes to support microservices architectures.
* Monitor system health and application performance using tools like New Relic, Splunk, Elasticsearch, and log analysis techniques; perform proactive troubleshooting to prevent outages.
* Implement disaster recovery plans and incident response procedures to ensure high availability and business continuity.
* Maintain security best practices by managing firewalls, identity & access management systems, DNS configurations, and cloud security protocols.
* Collaborate with development teams on CI/CD pipelines using Jenkins, GitHub/GitLab integrations, Maven, Gradle, TFS, and other DevOps tools for continuous integration and deployment.
* Conduct system testing and debugging for enterprise software applications built on Java (including WebSphere), C#, .NET frameworks, Ruby on Rails, Node.js, C++, Perl, Groovy, Go, and other technologies.
* Manage databases such as MySQL, Microsoft SQL Server, Oracle (including PL/SQL), DynamoDB to ensure data integrity and optimal performance.
* Participate in requirements gathering for new projects; contribute to requirements management and SDLC processes.
* Provide technical support for IT infrastructure issues related to network administration (TCP/IP/WAN/LAN), Active Directory/Identity & Access Management services.
*Experience*
* Proven experience in system administration or DevOps roles supporting large-scale enterprise or SaaS environments.
* Extensive hands-on experience with cloud computing platforms such as AWS (S3), Google Cloud Platform (GCP), Azure; familiarity with cloud security best practices is preferred.
* Strong proficiency in scripting languages including Python, Bash shell scripting; experience with PowerShell is a plus.
* Deep understanding of containerization (Docker) and orchestration (Kubernetes).
* Experience with configuration management tools like Ansible, Puppet or Chef; infrastructure as code using Terraform preferred.
* Knowledge of monitoring tools such as New Relic or Splunk for log analysis and system health checks.
* Familiarity with microservices architecture principles utilizing RESTful APIs and web services.
* Ability to troubleshoot complex issues involving distributed systems across Linux/Unix environments; experience with WebSphere, Weblogic or JBoss is advantageous.
* Strong background in incident recovery & incident management processes; experience with agile methodologies such as Scrum or Kanban is desirable.
* Excellent problem-solving skills combined with the ability to work effectively under pressure in a fast-paced environment. This position offers an exciting opportunity for a dedicated SRE professional eager to contribute to the stability of critical enterprise systems while working with cutting-edge technologies in a collaborative environment.
Job Type: Full-time
Pay: $200,000.00 - $300,000.00 per year
Work Location: In person