# DevOps Engineer (GCP & Kubernetes)

**Company:** [Devsu](http://jobs.workable.com/companies/4sM4PvpDWNbkTLw15DtW7T.md)
**Location:** Remote
**Workplace:** remote
**Employment type:** Full-time
**Department:** Engineering

[Apply for this job](http://jobs.workable.com/view/ec60ff5b-f76b-4f20-b682-2a5eb195146d)

## Description

We are looking for a hands-on **Semi Senior DevOps Engineer** to join a high-impact project supporting a global-scale sports event. This role is ideal for someone who enjoys working close to production systems, troubleshooting complex issues, automating infrastructure, and ensuring platform reliability in mission-critical environments.

You will work closely with engineering teams to build, maintain, and improve cloud-native infrastructure running on Google Cloud Platform (GCP) and Kubernetes. The role requires participation in an on-call rotation, including occasional weekend coverage.

## Requirements

### Responsibilities

-   Deploy, maintain, and improve cloud infrastructure in Google Cloud Platform (GCP).
-   Operate and support Kubernetes environments, including GKE.
-   Build and maintain Infrastructure as Code using Terraform.
-   Monitor production systems and proactively identify reliability risks.
-   Troubleshoot infrastructure, networking, application, and performance issues.
-   Participate in incident response, root cause analysis, and postmortem activities.
-   Implement and maintain observability solutions, dashboards, and alerting systems.
-   Collaborate with software engineering teams to improve deployment processes and operational excellence.
-   Support highly available and scalable production environments.
-   Contribute to automation initiatives that reduce operational overhead and improve reliability.

### Required Qualifications

-   3+ years of experience in DevOps, Cloud Engineering, Site Reliability Engineering, or similar roles.
-   Hands-on experience with Google Cloud Platform **(GCP).**
-   Strong understanding of core GCP services, including:

-   Compute Engine
-   Cloud Run
-   App Engine
-   Google Kubernetes Engine (GKE)

-   Production experience managing Kubernetes environments.
-   Experience configuring Kubernetes resources such as Deployments, Services, Ingress, ConfigMaps, Secrets, and Autoscaling.
-   Solid understanding of Kubernetes health checks, including readiness and liveness probes.
-   Experience with Infrastructure as Code using Terraform.
-   Understanding of Terraform state management and multi-environment infrastructure design.
-   Strong Linux administration and troubleshooting skills.
-   Good understanding of networking concepts, including:

-   VPCs
-   Subnets
-   Firewall rules
-   Load balancing
-   Private networking

-   Experience with monitoring, logging, and observability platforms.
-   Experience investigating and resolving production incidents.
-   Understanding of reliability concepts such as SLA, SLO, and SLI.
-   Strong verbal and written English communication skills.

### Preferred Qualifications

-   Experience designing highly available and globally distributed applications in **GCP.**
-   Knowledge of zero-downtime deployment strategies.
-   Experience supporting large-scale production environments.
-   Experience with multi-tenant architectures.
-   Scripting experience using Python, Bash, or similar languages.
-   Experience working in hybrid cloud/on-premise environments.
-   Experience participating in SEV incident management.
-   Familiarity with capacity planning and performance tuning.

### Technology Stack

-   Cloud: Google **Cloud Platform (GCP)**
-   Containers: Kubernetes, GKE
-   Infrastructure as Code: Terraform
-   Monitoring & Observability: Grafana, Prometheus, Logging Platforms
-   Operating Systems: Linux
-   Incident Management: PagerDuty, ServiceNow, Slack (or equivalent tools)

### Working Requirements

-   Availability to work within CT business hours.
-   Participation in an on-call rotation that includes coverage for one weekend day when scheduled.

### What Success Looks Like

-   Reliable operation of production systems during periods of high traffic and critical business activity.
-   Fast and effective incident response and troubleshooting.
-   Well-automated, maintainable infrastructure managed through Infrastructure as Code.
-   Strong collaboration with development teams to improve reliability, scalability, and operational efficiency.

## Benefits

At Devsu, we believe in creating an environment where you can thrive both personally and professionally. By joining our team, you’ll enjoy:

-   A stable, long-term contract with opportunities for career growth
-   A remote-friendly culture that promotes work-life balance
-   Continuous training, mentorship, and learning programs to keep you at the forefront of the industry
-   Free access to AI training resources and state-of-the-art AI tools to elevate your daily work
-   A flexible Paid Time Off (PTO) policy as well as paid holiday days
-   Challenging, world-class software projects for clients in the US and LatAm
-   Collaboration with some of the most talented software engineers in Latin America and the US, in a diverse work environment

Join Devsu and discover a workplace that values your growth, supports your well-being, and empowers you to make a global impact.
