job description :
Main Duties and Responsibilities:-
· Plan, organize, direct, control and evaluate the ongoing operations of the Watch and TOC teams.
· Develop and implement policies and procedures for Watch information processing and technical day to day operations.
· Hire and manage Watch & TOC personnel and contractors.
· Collaborate with other managers to discuss monitoring tools, thresholds, technical requirements and timelines for establishing and operating effective 24X7 monitoring and alerts processing.
· Create extensive dashboards layout covering end to end operations health and status.
· Manage incidents ticketing system and routing of tickets to higher support tiers.
· Manage 24X7 Teams shifts rotation, vacation and administrative issues.
· Working to prepare Watch & TOC budgets and control expenditures.
· Define KPI and success factors for Team efficiency and Service level measurement.
· Act as management escalation point in case of first level issues, manage the events, communicate with other business units and provide support until problem resolution.
· Communicate with internal and customer-facing teams, providing status updates for open issues, root cause analysis and problem remediation.
· Conduct periodic health check analysis and produce reports for various Operative’s systems, collaborating with other teams to gather and analyze information.
· Ensure that Watch & TOC teams apply company’s information security best practices and respond to security events.
· Provide operational reports regarding Watch & TOC teams’ performance and support SLA.
· Ensure teams are creating and maintaining proper Runbooks and documentation.
· Identify processes and problems that can be candidates for establishing automation & corrective L1 actions.
· Plan and implement Training and knowledge sharing process to maintain technical skill level across the teams and mitigate attrition.
Skills and Experience -Required Skills:-
· Bachelor (4-year) degree, with a technical major, such as engineering or computer science.
· Must have over 8 years of proven technical & management experience.
· Strong technical knowledge and background, managing complex IT systems, Cloud environments, monitoring, NOC, TechOps or DevOps teams.
· Experience in managing monitoring and alerts teams for Micro services, SaaS systems, including containers-based solutions such as Kubernetes, ECS, EKS, etc.
· Strong technical AWS Skills including EC2, RDS, S3, Cloudwatch, CloudFormation, Aurora, Lambda, AWS API, and other AWS services.
· Experience in defining and operating end to end monitoring systems such as ELK stack, Zabbix, Prometheus, SolarWinds, New Relic, DataDog, Grafana etc.
· Strong technical Unix/Linux and Windows-based systems administration skills in a Physical, Cloud or Virtualized environment.
· Must have proven project management skills with projects of varying complexity and size.
· Advanced communication skills with the ability to interact with all levels of management.
· Understanding of Information Security concepts.
· Knowledge of IT products and techniques, network infrastructure, applications, and equipment for a large, distributed, heterogeneous computing environment.
· Must be available outside of normal business hours to assist in an emergency, recovery or in the event of a failure or outage, etc. of critical Operative systems.
· Readiness for possible travels abroad.
Must demonstrate the following skills:
· Management and supervisory skills.
· Team building skills.
· Project Management skills.
· Analytical and problem-solving skills.
· Decision making skills.
· Effective verbal, presentation and listening communications skills.
· Effective written communications skills.