# Senior Agentic (AI) Engineer

**Company:** [Worth AI](http://jobs.workable.com/companies/k2E8uRLJaqV8qQDdQHzTkn.md)
**Location:** Remote
**Workplace:** remote
**Employment type:** Full-time
**Department:** Engineering

[Apply for this job](http://jobs.workable.com/view/ff5ad997-b80a-492e-bc16-2f6a96730ee8)

## Description

Worth AI is hiring a Senior Agentic AI Engineer to design and ship production agent systems that automate KYB, underwriting, and risk decisions on regulated financial data. You’ll own agents end-to-end architecture, retrieval, tools, evals, and production deployment and partner closely with our Chief AI Officer, applied scientists, and platform teams.

### Responsibilities

-   Design and ship multi-step agentic systems (planner/executor, tool-using, multi-agent, human-in-the-loop) for onboarding, underwriting, case review, and continuous monitoring.
-   Architect agent graphs in LangGraph (or comparable — CrewAI, AutoGen, Claude Agent SDK) with explicit state, durable execution, retries, and safe fallbacks.
-   Build the retrieval layer powering our agents — chunking, hybrid search, reranking, and grounded citation.
-   Own the eval stack: golden sets, offline regression suites, LLM-as-judge, online A/B and shadow evals, and red-teaming for jailbreaks, prompt injection, and PII leakage.
-   Expose agents to production systems via well-typed tools and MCP servers. Treat tool surface area as a product.
-   Drive production MLOps: deployment, versioning, traffic shaping, cost/latency budgets, tracing, and on-call playbooks for agent incidents.
-   Partner with security and compliance to keep agents inside SOC 2, GDPR, CCPA, and fair-lending posture — auditability and explainability built in, not bolted on.
-   Mentor engineers on agent patterns, prompt hygiene, eval discipline, and LLM failure modes.
-   **Technology Stack**

-   Languages: Python, Node.js, TypeScript
-   Agent / LLM frameworks: LangGraph, LangChain, Claude Agent SDK, MCP, OpenAI SDK
-   Models: Anthropic Claude, OpenAI, open-weight where appropriate
-   Retrieval & Data: PostgreSQL, pgvector, OpenSearch, Kafka, Redshift, Redis
-   Infra: AWS, Kubernetes (EKS), ArgoCD, Terraform
-   Evals & Observability: LangSmith / Langfuse / Braintrust-style tooling, DataDog

## Requirements

-   5+ years of software engineering experience, with 2+ years building production LLM or agentic systems (not just notebooks or demos).
-   Hands-on experience with a modern agent framework (LangGraph strongly preferred) and a track record of shipping agents that run, fail gracefully, and recover.
-   Strong RAG fundamentals chunking, embeddings, hybrid retrieval, reranking, grounding — and judgment about when RAG isn’t the right answer.
-   Real eval experience golden sets, offline and online evaluations, used to make ship/no-ship calls.
-   Production MLOps fluency: deployed LLM workloads under real latency, cost, and reliability constraints.
-   Strong Python; comfortable in TypeScript / Node.js.
-   Solid systems engineering instincts APIs, async patterns, queues, databases, distributed system failure modes.
-   Calibrated communicator; thrives in ambiguous, fast-moving environments.
-   Prior experience in fintech, lending, payments, KYB/KYC, fraud, or AML.
-   Experience building MCP servers or other structured tool interfaces for LLMs.
-   Background in classical ML (ranking, scoring, calibration).
-   Experience designing explainable / auditable AI workflows for regulated environments.
-   Open-source contributions to agent frameworks, eval tooling, or retrieval libraries.
-   AWS depth (EKS, MSK, RDS, S3, Lambda) and IaC with Terraform.

### Success Metrics

-   Agent Quality: Measurable improvements in task success rate, grounding accuracy, and hallucination rate on our eval suites.
-   Production Reliability: Agents you own meet defined SLOs for latency (P90/P99), tool-call success, and cost per task.
-   Velocity: New agent capabilities go from prototype to production in weeks, without skipping evals or guardrails.
-   Risk Posture: Zero material incidents tied to prompt injection, PII leakage, or unsafe tool use on agents you own.
-   Force Multiplier: Patterns, tools, and eval scaffolding you build get adopted across engineering.

**All Remote Hires will be required to travel to Orlando, Florida at least twice per year for Town Halls and team collaboration, in addition to orientation in Orlando.**

## Benefits

-   Health Care Plan (Medical, Dental & Vision)
-   Retirement Plan (401k, IRA)
-   Life Insurance
-   Flexible Paid Time Off
-   9 paid Holidays
-   Family Leave
-   Remote
-   Hybrid work (for Orlando Associates)
-   Free Food & Snacks (Orlando)
-   Wellness Resources