Sr Engineer, Systems Reliability
Are you ready for an exciting new challenge and having the ability to make a real impact to T-Mobile's Frontline Care Experts? Do you want to be on a team that is developing cutting-edge software? Do you want to drive change and a positive impact? If the answers are yes, then the Sr Engineer, Systems Reliability position on the Atlas Global Nav and tools team might be just the right place for you.
The System Reliability Engineer (SRE) improves and protects the software and systems behind all of T-Mobile's IT services, including management of scalability, availability, latency, performance, security, and capacity, and delivering of software faster, better, and cheaper. From designing & maintaining CICD Pipelines to building the next generation of T-Mobile applications on cloud-native platforms, the SRE's enable great customer experience and product innovation by continuous improvement of operational support.
What you'll do in your role.
Enterprise Core Competencies
Requires competency in customer focus, change & innovation, strategic thinking, relationship building & influencing, results focus and inspirational leadership.
Essential Functions
Demonstrates fluency in emerging DevOps-centric automation tools and technologies for CICD, configuration management, etc. for non-prod environments
Performs environment management, automated server provisioning (Kubernetes/Conductor, Private Cloud/Docker).
Delivers software to improve the availability, scalability, latency, and efficiency of T-Mobile's services.
Creates, manages, and uses dashboard for continuous monitoring and health check of applications, and the underlying infrastructure, improving the quality of services using the monitoring feedback for the non-production environment.
Contributes in future improvement of software delivery processes and operations, e.g., cloud enablement, use of microservices with containerization,etc
Drives automation through software and scripting to ensure efficiency for SRE team.
Maintains and improves CICD pipelines for stability, consistency, and removal of manual deployment steps.
The experience you'll bring.
Minimum Required
4-7 years of relevant experience in technology operations, software engineering, and/or SRE functions.
Strong scripting experience, such as Python or Shell/Bash
Experience with DevOps tools, such as, Ansible, Chef, Puppet, etc. Experience in Docker, Kubernetes, etc. is preferable.
Experience in APM tool, like, AppDynamics, logging tool, like Splunk.
Experience working in a cloud environment (public/private).
Desired
Experience in migrating to cloud or cloud native environments experience is preferred
Experience in Continuous Integration/Continuous Delivery tools, such as GitLab.
Certifications in AWS.
Network knowledge (TCP/IP, UDP, DNS, Load balancing, Inbound rules/outbound rules in security group).
Experience with container technologies like Docker or Kubernetes.
Ability to find the issues in production and resolve them quickly.
Experience in APM tool like AppDynamics logging tools like Splunk.
Self-motivated and able to work well under pressure.
Ability and desire to work cooperatively with others on a team.
Excellent communication and interpersonal skills.
Strong analytical and problem-solving skills.
Bachelors degree, Master's preferred
Position details
Req ID: 191371BR
Department: Information Technology
Travel Required: No