We're excited to offer this opportunity to join our team. Take a look at the details of this position and apply if you feel you're a good fit. We look forward to receiving your application.
Design and implement tooling to improve the availability, scalability, observability, and latency of our services, which are used by developers and customers to deploy and operate their services.
Share an on-call schedule for the platform services you own and respond to incidents alongside engineering teams.
Take a lead role in implementation and maintenance of system health monitoring and alerting.
Collaborate closely with architects, developers, database administrators in order to handle the reliability and scalability of the infrastructure
Ensure application performance, uptime, and scale, maintaining high standards of code quality and thoughtful design
Create a DevOps culture of communication and support between product engineering and our SRE Studio
Over 4 years experience in contributing toward the architecture and design (architecture, reliability, and scaling) of new and existing systems.
Solid understanding of Linux containerization with Docker.
2+ years production experience experience with AWS or other providers.
2+ years of experience with Kubernetes, ECS or similar orchestration frameworks.
Programming skills in any programming language. Preferably Python, Go, Node.js or Ruby.
Experience implementing and maintaining observability tools (e.g. DataDog, New Relic, Prometheus, Grafana)
Ability to identify root-cause sources of instability in high-traffic, large-scale distributed systems.
Configuration management and orchestration (e.g. Terraform, CloudFormation).
Good written and verbal communication skills in English.
You don't have to meet all requirements  to be able to apply.
Apply if you think you are a good fit and we will take it from there.