offergenie_white
CloudWave

Cloud Systems Monitoring Engineer

CloudWave

RemoteRemote
Cloud EngineerRemote
Apply with AI Cover Letter

Job Description

SUMMARY:

The Cloud Systems Monitoring Engineer reports to the Manager Cloud Operation and is a member of the Monitoring and Reliability Division (MRD). This individual will be responsible for the architecture, configuration, administration, operations, capacity planning and support of the cloud systems management infrastructure. The Cloud Systems Monitoring Engineer provides guidance and training on monitoring platforms to other technical resources and plays a role in capacity and availability management to ensure resources are appropriately monitored and managed to ensure maximum availability for service levels. The Cloud Systems Monitoring Engineer provides expert level support in both incidents to restore service as soon as possible and problem management in preventing the recurrence of incidents. The engineer also plays a key role in working directly with customers during design, implementation and ongoing support for the services provided.

ESSENTIAL DUTIES AND RESPONSIBILITIES:

Configuration and release management of all monitoring, capacity forecasting and associated systems management technologies ensuring data and systems are secure and available.
Demonstrate strong knowledge of cloud computing concepts as well as virtualized datacenter and multi-tenant infrastructure.
Provide expert level support in the related technologies for incidents to ensure operations systems are returned to service as soon as possible and prevent the re-occurrence of incidents by supporting the problem management process. Monitor performance of all assigned systems, respond to reports of slow or erratic performance.
Establish and follow operating and security procedures, standards, and protocols related to stated technologies.
Manage the systems by monitoring infrastructure and program end-to-end, ensuring the highest quality and validity of data collection to ensure availability and efficient capacity planning. Implement and manage a proactive process to collect and report data and statistics on the cloud and related customer environments; ensuring the systems operates efficiently and meets the needs of the organization.
Establish and manage a program to ensure systems are current, including but not limited, to firmware revisions, patch management and current management software versions.
Establish capacity, performance, and availability monitoring thresholds, alerts, and actions for automated event management as well as measuring and maintaining capacity for demand growth.
Analyze clinical application and delivery infrastructure, including virtual servers and hosts
Configure monitoring agents for network and storage devices
Configure monitoring applications inputs and interfaces
Participate in application performance management and tuning activities (analysis and problem solving)
Perform analysis of application outages and service degradations
Establish standards for availability and business continuity and test related systems against the standard on a defined basis.
Support change and release management by having defined release plans, developing and implement changes requests, while having defined back out plans, and after-action reviews of failed changes.
Train, mentor, and develop other engineers and support personnel where needed on use of monitoring platforms.
Work directly with customers, utilizing excellent customer service skills, during all phases of the customer relationship.

OTHER DUTIES AND RESPONSIBILITIES:

Constant improvement of all services and processes
Other duties as assigned

MINIMUM QUALIFICATIONS: (To perform this job successfully, an individual must be able to perform each essential duty satisfactorily. The requirements listed below are representative of the minimum knowledge, skill, and/or ability required. Reasonable accommodation may be made to enable individuals with disabilities to perform the essential functions.)

Experience:

Five years or more of progressive experience in large corporate and/or Cloud IT systems environment with a wide variety of Information Management systems, networks, and technologies required.
Five years or more of in-depth integration, planning, and design experience with large scale cloud or enterprise operations and monitoring infrastructures required. Five years or more experience directly implementing, managing and supporting systems management solutions such as ScienceLogic AIOps & Observability Solution, Microsoft System Center Operations Manager (SCOM), Solarwinds, Nimsoft Monitoring, HP Operations Manager/OpenView/Insight Manager or other similar solutions required.
Baseline knowledge of cloud best practices and data center virtualization in multi-tenant environments.
Two or more years or more of experience in healthcare IT or related managed services organizations, preferred.
Education or equivalent experience in ITIL, Cobit, and/or HIPAA preferred for this position.

Education:

Bachelor’s degree in information management, Computer Science, Engineering, Math or related field or equivalent experience (5 years) required.

Certifications or Licenses:

Certifications in Microsoft, VMware, Storage, ITIL, or other Cloud Technologies are required for this position.

Special Knowledge, Skills and Abilities:

Excellent customer service skills required.

TRAVEL REQUIRED: Approximately 5% Travel

PHYSICAL DEMANDS: (The physical demands and work environment described here are representative of those that must be met by an employee to successfully perform the essential functions of this job. Reasonable accommodation may be made to enable individuals with disabilities to perform the essential functions.)

Ability to work long hours at a desk using a PC, video conferencing, and phone usage.
Ability to occasionally lift and move computer equipment if necessary.
Moderate overnight travel by land or air.

WORK ENVIRONMENT: (The physical demands and work environment described here are representative of those that must be met by an employee to successfully perform the essential functions of this job. Reasonable accommodation may be made to enable individuals with disabilities to perform the essential functions.)

Ability to office at home
Extensive use of desktop computers, mobile technologies video conference, phone, cell phone is essential for this function.