USNLX Ability Jobs

USNLX Ability Careers

Job Information

TEKsystems Network Reliability Operations Engineer in Santa Clara, California

Description:

NVIDIA is looking for a Network Reliability and Operations (NRO) Engineer to support and maintain our cloud network infrastructure. This network serves the needs across the whole software stack for NVIDIA, from Graphics Drivers to Autonomous Vehicles and Artificial Intelligence.

In this role, the NRO Engineer will remediate critical alerts within defined SLAs, provide an initial line of triage for network incidents, and interact with internal customers on network related issues. They will also be responsible for engaging with external vendors to remediate issues such as circuit outages, and participate in project related work such as network device upgrades and link capacity augmentations. An ideal candidate will possess a wide range of skills, including alert monitoring & resolution in large-scale networks and CSP environments, outstanding troubleshooting skills, and network protocol knowledge in large multi-vendor infrastructures.

What You Will Be Doing:

Monitor and troubleshoot the entire NVIDIA network stack within our cloud and on-premise network infrastructures, which include intra-DC, inter-DC, and CSP environments.

Network Reliability Operations experience

● Knowledge of large scale IP Networking Technologies and protocols such as: MP-BGP, VRF, VxLAN, EVPN, IPSEC, DNS

● Experience with one or more of the following CSP environments: AWS, Azure, GCP, OCI

● Ability to multi-task in an interrupt-driven environment

● Familiarity with Arista, Fortinet, and Juniper

● Strong track record of alert response and resolution, within defined SLAs

● Excellent verbal and written communication skills

● Experience with high performance network and network optimization in highly-available, large-scale, multi-site, international environments

● Hands-on experience with contributing to tooling and automation for provisioning, monitoring, and managing network infrastructure

● 4+ years of experience in network operations

● BS Degree or equivalent combination of education, technical training, and work experience

Skills:

Alert Management, Network Reliability Operations, BGP, VXLAN, Scripting

Additional Skills & Qualifications:

Ways To Stand Out From The Crowd:

● Working knowledge of Mellanox/Cumulus OS (nice to have)

● Ability to write and understand Python/Shell scripts and programs for automation, tools, frameworks, dashboards, alarms

● Passionate about innovating and investing in ground breaking technologies

About TEKsystems:

We're partners in transformation. We help clients activate ideas and solutions to take advantage of a new world of opportunity. We are a team of 80,000 strong, working with over 6,000 clients, including 80% of the Fortune 500, across North America, Europe and Asia. As an industry leader in Full-Stack Technology Services, Talent Services, and real-world application, we work with progressive leaders to drive change. That's the power of true partnership. TEKsystems is an Allegis Group company.

The company is an equal opportunity employer and will consider all applications without regards to race, sex, age, color, religion, national origin, veteran status, disability, sexual orientation, gender identity, genetic information or any characteristic protected by law.

DirectEmployers