Incident Management
Job Req Id:
1365561
Key Responsibilities
- Oversee major incident management within the organization, ensuring rapid response and resolution.
- Collaborate with cross-functional teams to identify, analyze, and remediate incidents.
- Communicate effectively with stakeholders and provide timely updates.
- Proactively identify potential issues and prepare mitigation strategies.
- Utilize DataDog for monitoring and incident detection
- Aware of AWS / OCI cloud operations, Micro Services, including node and pod management
- Perform L1 troubleshooting based on logs, documented processes to save time for development teams.
Qualifications
- 6+ years experience
- Proven work experience in incident management within a streaming or media organization.
- Strong understanding and troubleshooting on DataDog, AWS K8S and other microservices.
- Proactive mindset and ability to work under pressure.
Good to Have
- Familiarity with Slack for team communication.
- Knowledge of JIRA for ticketing and documentation
- Knowledge of PagerDuty for incident response.
- Understanding of Tableau for data visualization.
Min Salary:
Max Salary:
Job Segment:
Manager, Management