Site Reliability Engineer

At Moove It we design, develop, and deploy custom software solutions for organizations that want to make an impact through technology. Being officially recognized as a Great Place To Work in LATAM, we offer the perfect balance between work and fulfilling life. Here you can develop your passions and form strong relationships. Moove It is your opportunity to leave your mark in creating a better world for everyone, working on international projects that have a positive impact on our society.

Our fast-growing SRE, DevOps, and Cloud Studio is looking for a new teammate!

We're excited to offer this opportunity to join our team. Take a look at the details of this position and apply if you feel you're a good fit. We look forward to receiving your application.

You are excellent at doing the following

Design and implement tooling to improve the availability, scalability, observability, and latency of our services, which are used by developers and customers to deploy and operate their services.

Share an on-call schedule for the platform services you own and respond to incidents alongside engineering teams.

Take a lead role in implementation and maintenance of system health monitoring and alerting.

Collaborate closely with architects, developers, database administrators in order to handle the reliability and scalability of the infrastructure

Ensure application performance, uptime, and scale, maintaining high standards of code quality and thoughtful design

Create a DevOps culture of communication and support between product engineering and our SRE Studio

What will make you a perfect fit:

Over 4 years experience in contributing toward the architecture and design (architecture, reliability, and scaling) of new and existing systems.

Solid understanding of Linux containerization with Docker.

2+ years production experience experience with AWS or other providers.

2+ years of experience with Kubernetes, ECS or similar orchestration frameworks.

Programming skills in any programming language. Preferably Python, Go, Node.js or Ruby.

Experience implementing and maintaining observability tools (e.g. DataDog, New Relic, Prometheus, Grafana)

Ability to identify root-cause sources of instability in high-traffic, large-scale distributed systems.

Configuration management and orchestration (e.g. Terraform, CloudFormation).

Good written and verbal communication skills in English.

You don't have to meet all requirements to be able to apply.

Apply if you think you are a good fit and we will take it from there.

The benefits that make us a great fit

benefits

Work-life balance

Plenty of fun, team-building activities every month.
Positive and collaborative working environment.
Free lunch at the office every Friday!
Develop your passions, whether by playing sports, an instrument, or brewing the perfect coffee.
Healthy breakfasts and snacks to share everyday.