Clicking Apply Now takes you to AutoApply where you can tailor your resume and apply.
Unlock Your Potential
Generate Job-Optimized Resume
One Click And Our AI Optimizes Your Resume to Match The Job Description.
Is Your Resume Optimized For This Role?
Find Out If You're Highlighting The Right Skills And Fix What's Missing
Experience Level
Experience
Qualifications
The ideal candidate should possess a strong background in computing and networking. Key qualifications include:Proficiency in cloud platforms such as AWS, Azure, or Google Cloud. Experience with containerization technologies like Docker and Kubernetes. Solid understanding of scripting languages (e.g., Python, Bash). Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana). Ability to troubleshoot and resolve system issues efficiently. Strong communication skills and the ability to work collaboratively in a team environment.
About the job
Medal seeks a Site Reliability Engineer - Infrastructure Specialist in New York City. The focus is on strengthening the company’s infrastructure and ensuring the stability of Medal’s systems. This role works within a collaborative team to design, build, and maintain the technical foundation that enables the company’s growth and efficiency.
What you will do
Design and implement infrastructure solutions that can scale as demand increases
Maintain and improve system reliability to help minimize downtime
Monitor and optimize system performance to keep applications running smoothly
Collaborate with team members to address ongoing infrastructure requirements
About Medal
Medal is at the forefront of innovation in the tech industry, dedicated to delivering cutting-edge solutions and exceptional service. Our team is passionate about leveraging technology to solve complex challenges and drive business success. Join us and be a part of a vibrant culture that fosters growth, creativity, and collaboration.
Similar jobs
1 - 20 of 16,695 Jobs
Search for Site Reliability Engineer At Cloaked New York Ny
Cloaked is an innovative privacy startup committed to restoring consumer confidence in the handling of personal data. Our mission is to build an internet that prioritizes user needs, placing individual privacy and opt-in choices at the forefront. Our flagship product serves as a virtual 'cloak' for users, enabling them to navigate websites like Facebook and Amazon while controlling the sharing of their private information according to their preferences.
Cloaked is an innovative privacy startup committed to restoring consumer confidence in the use of personal data. Our mission focuses on creating a user-centric internet where privacy and the right to choose are paramount. Our flagship product is a virtual "cloak" that enables you to navigate any website—be it Facebook, Amazon, or others—while selectively sharing your private information according to your personal preferences.
Cloaked is an innovative privacy startup committed to restoring consumer confidence in the use of personal data. Our mission is to create a user-centric internet where individual privacy and opt-in preferences are paramount. Our flagship product is a virtual 'cloak' that allows users to navigate any website—be it Facebook, Amazon, or others—while selectively sharing their personal information according to their preferences.
Full-time|$170.1K/yr - $283.6K/yr|On-site|New York, NY, United States of America
At Block, we are more than just a company; we are a collective of diverse teams united by a common mission of economic empowerment. Our foundational teams — including People, Finance, Counsel, Hardware, Information Security, and Platform Infrastructure Engineering — collaborate across various business sectors and global time zones to create inclusive policies, provide financial forecasting, deliver legal support, secure our systems, and nurture innovative initiatives. Every challenge we face opens new opportunities, and we value diverse perspectives to uncover them. We invite you to bring yours to Block. The Role As a vital member of our Site Reliability Engineering (SRE) team, you will take on the dual responsibility of proactively enhancing and reactively managing the reliability of Block's platform and critical infrastructure. You are driven by metrics, possess a systems-oriented mindset, and are dedicated to building distributed platforms that facilitate safe, scalable product development. You will utilize and continuously refine AI-driven tools and automation to boost observability, expedite incident detection and response, and minimize operational toil. This includes applying AI techniques to incident analysis, alert tuning, and operational workflows. Your role will also involve primary platform on-call duties (12 hours a day, one week every few weeks, depending on team size), supporting Block's most critical (Tier 0) services. In this capacity, you will lead incident command, coordinate mitigation efforts, and ensure effective escalation during high-severity incidents. You Will Build and extend platforms to enhance system reliability. Collaborate on team objectives that prioritize reliability across the entire company. Standardize reliability tools across multiple platforms and departments. Triaging, coordinating, and leading stabilization efforts for severity 0–1 incidents. Serve as the primary on-call engineer, maintaining clear escalation paths and demonstrating leadership during escalations. Drive improvements in platform-wide reliability, shared operational tools, and safe deployment patterns. Leverage AI-driven systems to enhance signal detection, reduce noise, and accelerate root cause analysis. Design and implement safe deployment strategies (including progressive delivery, automated rollback, and guardrails). You Have A strong inclination towards identifying root causes in complex systems and implementing necessary fixes. Proven technical initiative and leadership on prior projects, particularly those focused on backend/platform. Experience with AI-driven tools for observability, incident analysis, or automation. A mindset that naturally re-evaluates existing processes to drive continual improvement.
Role Overview Mistral is looking for a Site Reliability Engineer based in New York. This role focuses on keeping systems reliable, available, and high-performing. The position involves working closely with teams across the company to support and improve scalable infrastructure. What You Will Do Maintain and enhance system reliability and uptime Work with other teams to design and build scalable infrastructure Implement monitoring, automation, and incident response practices Location This position is based in New York, NY.
NBCUniversal Media, LLC seeks a Site Reliability Engineer to support Nexthink in New York. This position centers on keeping systems stable and performing well, which helps deliver dependable digital experiences to users. Role overview The Site Reliability Engineer will work to maintain and enhance the reliability of key systems. Efforts in this role contribute to consistent performance and minimize disruptions for users. What you will do Monitor and improve system performance Address reliability issues as they arise Support initiatives that strengthen digital user experiences Location This role is based in New York.
Full-time|$100K/yr - $250K/yr|On-site|New York Office
About KalshiKalshi is pioneering a new frontier in finance with its unique prediction markets platform, empowering individuals to trade on the outcomes of various events and transform any future question into a financial opportunity. We have worked diligently to legalize prediction markets in the United States, making history as the fastest-growing financial market in the country, with a diverse range of markets spanning politics, economics, finance, weather, technology, AI, culture, and beyond.At Kalshi, we envision a future where prediction markets emerge as the largest financial marketplace globally, enabling everyone to turn their insights into financial positions.Our Vision: To construct the largest financial market on Earth.Our Mission: To foster greater truth in the world through the power of markets.Our culture thrives on attracting top-tier talent, embracing hard work, and celebrating our collective journey. We are on the lookout for exceptional and driven individuals to join our compact team as we build the future of financial markets.Your RoleAs a key member of Kalshi's engineering team, you will play a crucial role in developing the next-generation financial ecosystem, akin to establishing a new NYSE or CME from the ground up. In our agile and dynamic environment, your responsibilities will quickly expand, and the impact of your work will be highly visible. Much of our infrastructure is still in its early stages, giving you the opportunity to design, own, and evolve entire systems.Key ResponsibilitiesEnhance observability, reliability, and service availability by defining and measuring critical metrics.Develop automation and systems to eliminate toil and lessen operational burdens.Work collaboratively with core infrastructure engineers to optimize cloud deployments (Docker, Terraform, Kubernetes, EC2, etc.).Partner with product teams to minimize service disruptions and automate incident response.Identify and analyze reliability issues across the stack, implementing software solutions for substantial, long-term improvements.Mentor engineers and cultivate a culture where reliability is a fundamental engineering principle.Produce high-quality, thoroughly tested code that meets both internal and external customer requirements.Troubleshoot complex technical challenges to enhance system usability, operability, and diagnosability.
Join Our Team as a Site Reliability EngineerAt Claylabs, our mission is to empower organizations to transform their growth ideas into reality. We believe growth is a creative endeavor rather than a mere formula. Identifying and engaging with your ideal customers requires innovative thinking and continuous experimentation.As artificial intelligence accelerates execution and simplifies tactics, creativity remains our unique advantage. We proudly serve thousands of clients, including industry leaders such as Anthropic, Notion, Google, and Ramp, providing them with unparalleled data, insights, and AI-driven research to successfully enter the market.In 2025, we achieved over $100 million in revenue and successfully raised a $100 million Series C at a valuation of $5 billion, supported by esteemed investors like Sequoia, CapitalG, and First Round. We also completed our second employee tender offer and launched a community equity round for our valued customers, agency partners, and club members.Here are some highlights about our company:Our community consists of over 11,000 customers, 150+ integration partners, 125+ agencies, and more than 30,000 Slack members.We boast a unique culture that extends beyond work; our team members include DJs, activists, writers, marathoners, and more.All employees have the opportunity to collaborate with world-class coaches specializing in creativity, management, and other fields.Our operating principles, such as negative maintenance and non-attached action, guide our work. Discover more about them here.
About RadarRadar stands at the forefront of geolocation technology, offering cutting-edge geofencing SDKs, maps APIs, and AI-driven solutions tailored for marketing, fraud detection, and operational excellence.Why Join Radar?Collaborate with some of the most respected companies globally, ranging from innovative startups to established Fortune 500 giants.Experience significant scale with over 1 billion API calls processed daily from hundreds of millions of devices.Benefit from robust resources, having secured $85.5 million in funding from top-tier investors like Accel and Insight Partners.Thrive in a high-performance culture, surrounded by ambitious and entrepreneurial colleagues.Enjoy our newly relocated office in the vibrant Flatiron district of Manhattan, NYC.Be part of a team recognized as one of the top 10 best workplaces in NYC by Crain's.Despite our impressive growth, we are just getting started, and we need your expertise!About the RoleWe are seeking skilled Site Reliability Engineers to enhance our production infrastructure. Radar is a high-throughput, data-intensive application that manages over 1 billion API calls per day and supports usage from over 100 million devices globally. We operate within a multi-availability zone architecture and are actively working towards expanding our deployment capabilities to a multi-region setup.Technology Stack:Our infrastructure is managed using Terraform, and we deploy to AWS via EKS. We utilize MongoDB on Atlas, implement CI and deployments through CircleCI, and monitor production with tools like CloudWatch, Grafana, Pingdom, and PagerDuty. DNS management is handled by CloudFlare. Most engineering team members participate in the on-call rotation. Our primary server languages include TypeScript and Rust, while our data pipelines are powered by Airflow and Scala Spark. Additionally, we proudly sponsor OpenStreetMaps, MapLibre, and OpenAddresses.Team Dynamics:Our engineering team comprises former technical co-founders and exceptional interns from renowned institutions like Waterloo and CMU. Engineers at Radar typically fit one of two profiles: staff-level expertise in a specific stack or multi-stack proficiency across various technologies.
Full-time|$176.8K/yr - $209.1K/yr|On-site|New York, New York
About the Role Peloton Interactive, Inc. is committed to building a platform that matches the quality and ambition of its products. The platform supports rapid development and continuous learning, freeing engineers to deliver new features and improvements. With a strong focus on data, the team identifies where to invest effort for the greatest impact on members. The platform spans hardware, firmware, web, mobile, backend, data, messaging, content, streaming, and machine learning, serving millions of users worldwide. The Site Reliability Engineer (SRE) will join a growing team in New York, working closely with colleagues across disciplines. The main focus: support and develop a monitorable, reliable, and highly scalable deployment platform. The team manages thousands of nodes and pods across many deployments, addressing large-scale operational challenges every day. What You Will Do Implement rapid auto-scaling for live rides and major events Maintain infrastructure to deliver a seamless experience for members across tens of thousands of pods in multiple clusters Support a platform that enables machine learning and other complex workloads, helping developers move quickly Promote best practices for building and running reliable systems Act as a subject matter expert in observability and monitoring Advise on system design to meet reliability and capacity goals Automate processes, from infrastructure management to daily operations Lead post-mortem analysis after infrastructure incidents Support operational security and compliance efforts Identify and address potential security and reliability risks Work with tools such as Amazon Web Services, Chef, Python, Ubuntu, Nginx, Jenkins, and Terraform Location This role is based in New York, New York.
Cloaked is a pioneering privacy startup focused on restoring consumer confidence in the management of personal data. Our mission centers around building an internet that prioritizes user needs, with personal privacy and opt-in consent at its foundation. Our innovative product functions as a virtual “cloak”, allowing users to navigate any website—be it Facebook, Amazon, or others—while providing the option to share complete, partial, or no personal information according to their preferences.
Full-time|$160K/yr - $180K/yr|On-site|USA - New York, NY
Gen Digital brings together brands like Norton, Avast, LifeLock, and MoneyLion to deliver cybersecurity, privacy, identity protection, and financial wellness solutions to nearly 500 million users in over 150 countries. The company is committed to helping people protect and manage their digital and financial lives, encouraging the use of AI as a collaborative tool to achieve results. Gen Digital values open discussion, experimentation, and continuous learning. The team welcomes diverse backgrounds and perspectives, emphasizing respect and support for every member. Flexible work options, generous time off, competitive pay, and comprehensive benefits are part of the company’s approach to supporting career growth. Role overview As a Senior Site Reliability Engineer for Engine by MoneyLion, the focus will be on scaling the platform and ensuring high standards for security, reliability, and performance. This leadership role involves partnering with top financial institutions to deliver a broad range of personalized financial products to consumers. The position centers on guiding the evolution of DevOps and SRE architecture, establishing best practices for cloud-native infrastructure, and mentoring engineers across teams. Deep technical expertise, sound architectural judgment, and effective collaboration with colleagues around the world are essential for success in this role. Location New York, NY, USA
Cloaked is a pioneering AI Data Privacy startup focused on restoring consumer trust in the management of personal data. Our mission is to craft a digital landscape that prioritizes user needs, with individual privacy and opt-in choices at its core. Our innovative product acts as a virtual 'cloak' that users can employ while browsing any website—be it Facebook, Amazon, or others—allowing them to selectively share their private information based on personal preferences.
Join Spotify as a Senior Site Reliability Engineer, where you'll play a crucial role in maintaining the reliability and performance of our services. This position involves collaborating with cross-functional teams to enhance our infrastructure and ensure a seamless experience for our users.As a key member of our engineering team, you will be responsible for monitoring system health, implementing automation processes, and troubleshooting issues to improve system performance. Your expertise will be instrumental in driving our mission to deliver an exceptional streaming service.
About Legora Legora builds AI-driven solutions for the legal sector, partnering directly with legal professionals to create tools that support better insights and decision-making. Our platform is trusted by major global firms, including Cleary Gottlieb and Goodwin, and is used in over 40 countries. We focus on continuous improvement and innovation, working closely with users to ensure our technology truly meets their needs. Site Reliability Engineer – New York City (On-site) Legora is looking for a Site Reliability Engineer to join the founding SRE team at our New York City engineering hub. This role is based fully on-site, five days a week. The position centers on maintaining and improving the reliability and performance of our platform as we expand. Expect to work side by side with experienced engineers, focusing on production systems, observability, incident response, and automation. What You Will Do Oversee and improve production services, including deployments, monitoring, and system health. Develop and maintain observability tools for metrics, logs, and traces, aiming for high-quality signals and minimal noise. Help define Service Level Indicators (SLIs) and Service Level Objectives (SLOs), and set up alerting and reliability metrics for key services. Participate in on-call rotations, contribute to post-incident reviews, and help implement measures to prevent future issues. Location Requirement This role requires working on-site at Legora’s New York City office, Monday through Friday. In-person collaboration is core to how we work and deliver results.
Join Alloy as a Site Reliability Engineer and play a crucial role in ensuring the reliability, availability, and performance of our systems. You will work closely with development teams to design and implement robust infrastructure solutions that enable seamless user experiences. Your expertise will be vital in maintaining our high standards for uptime and efficiency.
Full-time|$160K/yr - $160K/yr|On-site|New York Office
As energy companies face increasing challenges from severe weather events and the need for infrastructure modernization, Treeswift is at the forefront, enabling them to transform their field operations. Our innovative solutions utilize advanced sensors deployed on backpacks and vehicles, generating vast amounts of LiDAR and imagery data. This data is processed through sophisticated AI models, providing our clients with actionable insights via our web platform.Since initiating our first utility pilot in June 2024, we have rapidly grown, collaborating with three of the five largest utilities in the United States and continuing to expand our client base and use cases.Our team is composed of passionate experts from top institutions and companies in robotics and software development, and we are backed by prominent investors like Penny Pritzker’s Inspired Capital. We are headquartered in lower Manhattan with an additional office in Philadelphia, and we encourage our team members, including software engineers, to engage directly with our customers at their sites.Join us and be part of our mission to shape the future of energy management.
As a Cloud Site Reliability Engineer, you will be responsible for deploying innovative solutions within the public cloud environment, specifically utilizing AWS services. You will create and manage configuration templates designed for scalable infrastructure, including AWS components like EFS, EC2, and RDS. Collaborating closely with the Scrum Master, you will ensure the project requirements are met within an agile development setting.Key Responsibilities:• Contribute to architectural design to enhance system consistency, security, maintainability, and flexibility.• Assist architects in creating highly scalable and automated deployments for diverse applications.• Develop configuration templates using established architectural blueprints.• Ensure the development of robust and scalable services across public cloud platforms, including AWS and GCP.• Monitor and assess system performance to ensure optimal operation.
Role overview ro is looking for a Senior Site Reliability Engineer based in New York, NY. This role focuses on maintaining and improving the reliability, availability, and performance of our cloud infrastructure and applications. The position supports ongoing enhancements and encourages a culture of continuous improvement across the engineering team.
Role overview Medal seeks a Site Reliability Engineer - Infrastructure Specialist in New York City. The focus is on strengthening the company’s infrastructure and ensuring the stability of Medal’s systems. This role works within a collaborative team to design, build, and maintain the technical foundation that enables the company’s growth and efficiency. What you will do Design and implement infrastructure solutions that can scale as demand increases Maintain and improve system reliability to help minimize downtime Monitor and optimize system performance to keep applications running smoothly Collaborate with team members to address ongoing infrastructure requirements
Apr 24, 2026
Sign in to browse more jobs
Create account — see all 16,695 results
Tailoring 0 resumes…
Tailoring 0 resumes…
We'll move completed jobs to Ready to Apply automatically.