Responsibilities:
Designing and implementing infrastructure and systems (such as metrics, monitoring, node management, alerting, deployment, logging)
Setup new environments & deploying solutions
Building proactive Monitoring & alerting service.
Automation using ansible, python, Perl scripting
Performance and stability problems investigation - internal and on client sites
Tuning Actimize Platform(AIS and RCM)/Operating System/Application servers/Databases for optimal performance and stability
Identifying performance bottlenecks and assisting in root cause analysis.
Performance related design reviews
Create and setup deployment scripts for different environments (i.e. Test properties vs Prod properties)
Configure and optimize instances and web servers for optimal performance. (ex: adjusting default connection limits, adjusting request queuing thresholds)
AWS troubleshooting support
Support, Architect and Implement alongside Technical & Operations teams to meet our customers' individual needs for their infrastructure & application deployments.
Work on critical, highly complex customer problems that will span multiple AWS services (dealing daily with high severity incidents).
Help build and improve customer operations through scripts to automate and deploy AWS resources seamlessly with as little manual intervention as possible.
Collaborate and help build utilities and tools for internal use that enable you and your fellow AWS Engineers to operate safely at high speed / wide scale.
Drive customer communication during critical events.
Provide on-call off hour support and flexible to work in 24*7 shift environment
Qualifications:
3 to 4 years of relevant experience
Good experience in a DevOps environment / Operations team / Infrastructure Operations team.
Excellent Troubleshooting skills
Expertise in Performance tuning / investigation / root cause analysis / mitigate bottlenecks
Excellent hands-on experience in managing Application Support (3 tier/2 tier apps)
Strong problem solving, analytical and communication skills
Good communication both written and verbal
AWS service knowledge for core services (EC2, S3, IAM, ASG, ELB, CFN, VPC, DX, VPN, )
Good exposure on managing Containers & Kubernetes
Exposure to scripting language (Ansible, Perl, Python, Ruby, Shell script, PowerShell etc.)
Database skills ( SQL ,Oracle or Postgres / Cassandra )
Good exposure on ELK, Splunk, Kafka
Application Server (skills on any of Middleware technologies e.g. – Tomcat, WebLogic , WebSphere)
Good exposure on Application performance monitoring tools like – AppDynamics, Dynatrace
Troubleshooting performance issues & tuning
Working with Architecture team on hardware sizing recommendations
JAVA performance testing, diagnosis, and tuning JAVA applications
Additional Skills Desired:
Cloud / Application level Security experience
Has worked in an Agile / Sprint development model.
Experience in working with tools like OpsGenie, AlertOps, Pagerduty/OpenDuty
Troubleshooting Java related issues
performance testing/investigation experience
Database performance testing, diagnosis, and tuning.