USNLX Ability Jobs

USNLX Ability Careers

Job Information

TEKsystems Entry Level SRE in Phoenix, Arizona

Description:

Our client has partnered with us (TEKsystems) to run a SRE accelerator program that is 9 weeks of learning to upskill your current skillset of an entry level infrastructure resource into a level 1 Site Reliability Engineer. This accelerator program is a full time, 9 week immersive program that gives you the knowledge and skills necessary to excel in the field of SRE.

This boot camp will cover a wide range of topics, including Agile Scrum, Linux and Bash Shell scripting, Python programming, SRE principles and practical examples, DevOps with Docker and Kubernetes, SQL programming with MySQL, using MongoDB, and working with Apache Kafka stream API. Each week, you will delve into a different aspect of SRE, gradually building a strong foundation and practical experience to succeed in this critical role. As you expand your skills, you’ll use what you learn with hands-on projects. Upon successfully completing the program, you’ll have the experience necessary and the opportunity to tap into rewarding employment possibilities at one of TEKsystems financial customers.

As a Retail Site Reliability Engineer (SRE) within the Site Reliable Center, you will combine your software and systems expertise to manage applications and to create innovative and automated solutions to simplify operations, eliminate toil, and increase the reliability and availability of our critical applications and business services.

Objective

  • Run the production environment by monitoring availability and taking a holistic view of system health

  • Build software and systems to manage platform infrastructure and applications

  • Improve reliability, quality, and time-to-market of our suite of software solutions

  • Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement

  • Provide primary operational support and engineering for multiple large-scale distributed software applications

    The ideal candidate will have the following qualifications

  • Proactive approach to identifying problems, performance bottlenecks, and areas for improvement

  • Experienced operating in a 24x7 environment

  • Experience in Site Reliability Engineering (Preferred) or DevOps engineering

  • Expertise in one or more programming languages such as Java, Python, or Go

  • Strong working knowledge of RHEL 8 and higher to include bash shell scripting

  • Able and eager to develop technical skills and employ a continuous improvement mindset

  • Deep understanding of performance measurement (MTTD, MTTR, MTRS)

Candidates should possess some level of familiarity and/or knowledge of the following

  • Engineering and support of micro app/service architectures

  • Working knowledge of web services technologies such as SOAP, JSON and REST

  • Application Platforms such as OpenShift, Docker and Kubernetes

  • Development Frameworks such as React, Node.js and Spring Boot

  • Expertise in database technologies including SQL Server, MySQL, Oracle and Mongo

  • Understanding of distributed tracing and monitoring tools such as Dynatrace, Jaeger and Humio

  • Experience with Kafka event streaming

  • Strong working knowledge of modern development technologies and tools such as Agile, CI/CD, Git and Jenkins

  • Strong working knowledge of Internet protocols such as HTTP, TCP/UDP

    Responsibilities of this position

  • Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault finding

  • Partner with development teams to improve services through rigorous testing and release procedures

  • Participate in system design consulting, platform management, and capacity planning

  • Create sustainable systems and services through automation and uplifts

  • Balance feature development speed and reliability with well-defined service-level objectives

  • Partner with technology teams across the enterprise to establish SRE best practices and automated solutions with a focus on operational excellence

  • Identify opportunities to evangelize adoption for greater self-healing and resiliency patterns

  • Troubleshoot priority incidents and participate in blameless post-mortems

  • Perform analytics on previous incidents and usage patterns to better predict issues and take proactive actions.

  • Participate in 24x7 on-call rotations and escalation workflows

Benefits:

  • Eligibility requirements apply to some benefits and may depend on your job classification and length of employment. Benefits are subject to change and may be subject to specific elections, plan, or program terms. If eligible, the benefits available for this temporary role may include the following:

  • Medical, dental & vision

  • Critical Illness, Accident, and Hospital

  • 401(k) Retirement Plan – Pre-tax and Roth post-tax contributions available

  • Life Insurance (Voluntary Life & AD&D for the employee and dependents)

  • Short and long-term disability

  • Health Spending Account (HSA)

  • Transportation benefits

  • Employee Assistance Program

  • Time Off/Leave (PTO, Vacation or Sick Leave)

    About TEKsystems:

We're partners in transformation. We help clients activate ideas and solutions to take advantage of a new world of opportunity. We are a team of 80,000 strong, working with over 6,000 clients, including 80% of the Fortune 500, across North America, Europe and Asia. As an industry leader in Full-Stack Technology Services, Talent Services, and real-world application, we work with progressive leaders to drive change. That's the power of true partnership. TEKsystems is an Allegis Group company.

The company is an equal opportunity employer and will consider all applications without regards to race, sex, age, color, religion, national origin, veteran status, disability, sexual orientation, gender identity, genetic information or any characteristic protected by law.

DirectEmployers