Open Positions

We are looking for Monitoring & Operations Engineers at Junior, Mid, and Senior levels to operate and monitor hybrid environments including AWS, Azure, On-Premise infrastructures, Windows/Linux servers, Databases, Cloud Services, and Kubernetes platforms.

This role focuses on 24/7 monitoring, incident detection, first-level troubleshooting, and operational support, working closely with DevOps, SRE, Platform, Infrastructure, and Application development teams to ensure high availability and system reliability.

Monitor cloud, on-premise, and Kubernetes-based systems in a 7/24 shift-based environment
Track system health, performance, and availability using:
- AWS CloudWatch, Azure Monitor
- Grafana, Prometheus
- ELK
Monitor Windows and Linux servers (CPU, memory, disk, services, events)
Monitor Kubernetes clusters (EKS / AKS / On-Prem K8s):
- Nodes, pods, deployments, services
- Cluster events and resource usage
Analyze alarms and alerts, identify potential root causes, and take first-level actions
Escalate incidents to relevant teams with clear technical findings and evidence
Perform end-to-end system checks during incidents (infra, application, network, security, platform)
Execute operational procedures using runbooks / SOPs
Log incidents, events, and actions accurately in ticketing systems
Support maintenance, change, and release activities
Contribute to improving monitoring coverage, alert quality, and operational processes

Core Technical Skills

Experience or strong interest in hybrid environments
- Cloud (AWS, Azure)
- On-Prem infrastructure
Knowledge of Windows Server and Linux fundamentals
Hands-on experience with monitoring & observability tools:
- CloudWatch, Azure Monitor
- Grafana, Prometheus
- ELK
Kubernetes monitoring and troubleshooting knowledge
Understanding of:
- Networking basics (DNS, TCP/IP, Load Balancers)
- Application metrics, logs, and events
Ability to distinguish false alerts vs real incidents
Experience with ticketing and incident management tools
(Jira, ServiceNow, Opsgenie, PagerDuty, etc.)

Job Description

Responsibilities

Monitor cloud, on-premise, and Kubernetes-based systems in a 7/24 shift-based environment
Track system health, performance, and availability using:
- AWS CloudWatch, Azure Monitor
- Grafana, Prometheus
- ELK
Monitor Windows and Linux servers (CPU, memory, disk, services, events)
Monitor Kubernetes clusters (EKS / AKS / On-Prem K8s):
- Nodes, pods, deployments, services
- Cluster events and resource usage
Analyze alarms and alerts, identify potential root causes, and take first-level actions
Escalate incidents to relevant teams with clear technical findings and evidence
Perform end-to-end system checks during incidents (infra, application, network, security, platform)
Execute operational procedures using runbooks / SOPs
Log incidents, events, and actions accurately in ticketing systems
Support maintenance, change, and release activities
Contribute to improving monitoring coverage, alert quality, and operational processes

Required Skills & Qualifications

Core Technical Skills

Experience or strong interest in hybrid environments
- Cloud (AWS, Azure)
- On-Prem infrastructure
Knowledge of Windows Server and Linux fundamentals
Hands-on experience with monitoring & observability tools:
- CloudWatch, Azure Monitor
- Grafana, Prometheus
- ELK
Kubernetes monitoring and troubleshooting knowledge
Understanding of:
- Networking basics (DNS, TCP/IP, Load Balancers)
- Application metrics, logs, and events
Ability to distinguish false alerts vs real incidents
Experience with ticketing and incident management tools
(Jira, ServiceNow, Opsgenie, PagerDuty, etc.)