# AI Developer

**Company:** [Salvo Software](http://jobs.workable.com/companies/iNNW8MH46uFLdta8kHbK5x.md)
**Location:** Remote
**Workplace:** remote
**Employment type:** Full-time
**Department:** Firmware Development

[Apply for this job](http://jobs.workable.com/view/b57ccbd1-f3ca-4cfb-8ab5-7438707e2dfe)

## Description

**About Salvo Software**

Salvo Software is a global firm that provides cost-effective software solutions to guide enterprises and startups through digital transformation. With distributed teams across the US, LATAM, and India, we partner with clients to build high-performance, scalable systems that solve complex technical challenges. Our culture values innovation, ownership, and engineering excellence.

**Role Overview**

We are seeking a highly skilled AI Developer with a strong backend and machine learning engineering background to design, train, optimize, and deploy LLM models in on-prem and offline environments. This role is deeply technical and hands-on, requiring expertise across Python ML stacks, model optimization, local inference frameworks, RAG (Retrieval-Augmented Generation) architectures, MCP (Model Context Protocol) integrations, and DevOps workflows tailored for offline systems.

You will work closely with our engineering and product teams to build end-to-end LLM pipelines — including data preprocessing, supervised fine-tuning, model quantization, evaluation, RAG pipeline design, and deployment using local or air-gapped infrastructure. If you enjoy working with cutting-edge open-source LLMs, building context-aware AI systems, and designing reliable backend pipelines, this role is for you.

**Key Responsibilities**

**Core LLM Development**

-   Train and fine-tune LLMs using supervised fine-tuning (SFT).
-   Work with open-source models such as LLaMA, Mistral, Qwen, and similar architectures.
-   Build LoRA / Q-LoRA pipelines for efficient fine-tuning.
-   Implement and optimize data preprocessing workflows, including tokenization and long-context handling.
-   Use and extend Hugging Face Transformers & Datasets for training and inference.
-   Parse and process structured and semi-structured data, including XML/XSD files.
-   Implement document parsing solutions for Office formats (python-docx, OpenXML).

**RAG & Context-Aware Systems**

-   Design and implement end-to-end Retrieval-Augmented Generation (RAG) pipelines for document-grounded question answering and knowledge retrieval.
-   Build and maintain vector stores and embedding pipelines using tools such as FAISS, Chroma, Weaviate, or pgvector.
-   Optimize retrieval strategies including hybrid search, re-ranking, and chunking approaches tailored for domain-specific corpora.
-   Develop and maintain MCP (Model Context Protocol) server integrations to enable LLMs to interact dynamically with tools, APIs, and external data sources.
-   Design agentic workflows that leverage MCP to give models structured access to internal systems and context in a controlled, auditable manner.

**Offline / On-Prem Model Expertise**

-   Deploy, run, and maintain models fully offline and in air-gapped environments.
-   Perform model optimization and quantization (GGUF, GPTQ, AWQ, bitsandbytes).
-   Build and maintain inference systems using frameworks like vLLM, TGI, and Ollama.
-   Optimize GPU usage (CUDA, cuDNN, VRAM-aware batching).
-   Maintain local CI/CD pipelines for ML models without cloud dependencies.
-   Manage local model registries, versioning, and artifacts.
-   Ensure RAG and MCP components are fully operational in offline and restricted network environments.

**Backend & DevOps**

-   Build backend services in Python for ML training and inference workflows.
-   Work with relational databases (Postgres/MySQL) and vector databases for RAG storage layers.
-   Use Docker and Git for reliable development and deployment pipelines.
-   Use Azure DevOps for CI/CD, including local runners when applicable.

## Requirements

**Technical Skills**

-   Strong experience in Python for backend and ML development.
-   Expertise with ML frameworks such as PyTorch or TensorFlow, scikit-learn, and pandas.
-   Solid knowledge of Postgres or MySQL for data storage.
-   Experience with Docker, Git, and DevOps best practices.
-   Hands-on expertise with LLM training, fine-tuning, and optimization.
-   Experience with Hugging Face Transformers & Datasets.
-   Familiarity with XML/XSD and Office document parsing tools.
-   Experience deploying models with vLLM, TGI, or Ollama.
-   Understanding of quantization techniques (GGUF/GPTQ/AWQ).
-   Experience working with GPU optimization and the CUDA stack.
-   Ability to build solutions for offline, on-prem, and air-gapped environments.
-   Hands-on experience designing and implementing RAG pipelines, including embedding models, vector stores (FAISS, Chroma, Weaviate, or pgvector), and retrieval optimization strategies.
-   Experience building or integrating MCP (Model Context Protocol) servers to connect LLMs with external tools, APIs, and structured data sources.

**Nice to Have**

-   Experience building agentic systems using MCP in production or near-production environments.
-   Familiarity with advanced RAG techniques such as HyDE, re-ranking, or multi-hop retrieval.
-   Experience managing ML model registries in offline environments.
-   Familiarity with AWS for hybrid deployments.
-   Experience with secure environments, restricted networks, or enterprise compliance requirements.

**Soft Skills**

-   Strong ownership mindset and problem-solving ability.
-   Ability to work effectively in distributed teams across time zones.
-   Clear communication when discussing complex technical topics with both technical and non-technical stakeholders.