companyScale AI logo

Site Reliability Engineer / DevOps Specialist

Scale AIMexico City, MX
On-site Full-time

Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.


Unlock Your Potential

Generate Job-Optimized Resume

One Click And Our AI Optimizes Your Resume to Match The Job Description.

Is Your Resume Optimized For This Role?

Find Out If You're Highlighting The Right Skills And Fix What's Missing

Experience Level

Experience

Qualifications

We are looking for candidates who possess: 5+ years of experience in roles such as IT Technician, Field Engineer, Facilities Technician, or similar hands-on technical positions. A proven history of installing and maintaining server hardware and workstations. Experience in configuring and troubleshooting network infrastructures and executing physical installation tasks. Experience in coordinating shared resources or facilities while managing schedules and access. Familiarity with Linux, Windows, and MacOS operating systems. Knowledge of Python/C++, scripting, and command line operations (cmd/bash). A basic understanding of electrical systems associated with robotic stations and equipment.

About the job

As Scale AI continues to expand its product offerings and customer reach, we are on the lookout for talented Site Reliability Engineers to oversee and enhance our physical infrastructure, which comprises robotic stations, servers, computing devices, and network systems. In this pivotal role, you will ensure that our technical facilities operate seamlessly while managing resource allocation and providing direct support to remote engineering teams.

These engineers will acquire an in-depth understanding of our physical infrastructure and technical setups, enabling them to efficiently install, configure, and maintain essential systems as required. A key responsibility of this position will involve coordinating access to shared technical resources and acting as the primary on-site liaison for remote engineers who require physical adjustments, transforming current manual coordination processes into a more streamlined and effective workflow.

Your key responsibilities will include:

  • Installing, configuring, and maintaining on-site robotic stations and associated technical equipment.
  • Managing and maintaining server infrastructure, workstations, and computing resources within our facilities.
  • Overseeing network installations, including routers and cabling systems.
  • Coordinating access to technical facilities, ensuring a structured schedule and orderly use of shared resources.
  • Establishing and enforcing safety protocols and usage guidelines for technical facilities.
  • Providing hands-on assistance to remote engineers, executing physical changes and configurations as requested.
  • Troubleshooting hardware, network, and connectivity issues to minimize operational disruptions.
  • Documenting infrastructure configurations, modifications, and maintenance procedures for knowledge retention and transfer.
  • Identifying maintenance requirements and capacity limitations proactively to avert issues before they occur.
  • Promoting standardization and collaboration across various teams to optimize facility usage.

About Scale AI

Scale AI is a leading provider of data infrastructure for AI applications, enabling organizations to harness the power of machine learning with high-quality datasets and advanced solutions. Join us to be part of a team that is shaping the future of AI technology.

Similar jobs

Tailoring 0 resumes

We'll move completed jobs to Ready to Apply automatically.