# AI/MLOps Engineer

**Company:** [99x Brazil (formerly Nextly)](http://jobs.workable.com/companies/o46v8cTo8oqXp2WFZKR6o5.md)
**Location:** Remote
**Workplace:** remote
**Employment type:** Full-time

[Apply for this job](http://jobs.workable.com/view/d0f4f0b1-1a38-4e86-bec3-74a62f1e1e9f)

## Description

We are seeking a skilled AI/MLOps Engineer to join the innovative team at 99x Brazil. In this role, you will be responsible for designing, deploying, and maintaining scalable machine learning infrastructure and pipelines that enable rapid development and reliable deployment of AI models. You will work closely with data scientists, engineers, and product managers to ensure seamless integration of AI capabilities into production systems.

You will play a crucial part in automating ML workflows, monitoring model performance, and optimizing resource utilization in cloud environments. Join us to help drive the future of AI-powered solutions in a fast-paced, collaborative environment.

### Responsibilities

-   Design and maintain monitoring and observability solutions for AI applications and ML pipelines
-   Track logs, metrics, and traces using tools such as CloudWatch, Datadog, or similar platforms
-   Develop evaluation and testing frameworks for prompts, models, and AI workflows
-   Perform regression testing and quality validation for LLM-based systems
-   Manage prompt experimentation, versioning, and A/B testing processes
-   Debug AI workflows, including model outputs, orchestration pipelines, and infrastructure failures
-   Support deployment, scaling, and maintenance of AI/ML infrastructure in production environments
-   Collaborate with engineering and product teams to improve system reliability and performance
-   Analyze production data and user feedback to drive continuous improvement of AI systems
-   Contribute to operational best practices, documentation, and incident response processes

## Requirements

-   Experience with DevOps, SRE, MLOps, or AI infrastructure engineering
-   Strong understanding of monitoring and observability concepts
-   Hands-on experience with tools such as Datadog, CloudWatch, Grafana, Prometheus, or similar
-   Experience supporting AI/ML or LLM-based applications in production
-   Familiarity with prompt engineering, model evaluation, and experimentation workflows
-   Knowledge of cloud platforms such as AWS, Azure, or Google Cloud
-   Experience troubleshooting distributed systems and production pipelines
-   Proficiency in Python, scripting, or automation tooling
-   Strong analytical and problem-solving skills
-   Excellent communication and collaboration abilities  
    

### **Nice to Have**

-   Experience with LLM orchestration frameworks
-   Familiarity with vector databases and RAG architectures
-   Experience with CI/CD pipelines for ML systems
-   Knowledge of Kubernetes, Docker, and infrastructure-as-code tools
-   Experience with AI governance, security, or compliance practices

## Benefits

-   Your pick when it comes to employment models: CLT/PJ/Cooperativa;
-   We provide resources for you to grow and learn on the job, including online courses, mentoring, and the latest-gen laptops;
-   A fully remote work environment with flexible working hours;
-   Bonus for any referrals that we end up hiring;
