Job Description
This role offers a unique opportunity to contribute to the development of AI workflow orchestration and automation platforms from scratch (0→1) or scaling existing systems (1→N). You'll work on cutting-edge technologies including RAG (Retrieval-Augmented Generation), tool calling, Agent collaboration, and asynchronous orchestration.
Key Responsibilities
- AI/LLM Workflow Orchestration: Design and implement multi-step reasoning, Agent collaboration, tool calling (Tool-Calling/Function-Calling), asynchronous task queues, and compensation mechanisms. Optimize RAG pipelines including data ingestion, chunking, vectorization, retrieval/reranking, context compression, caching, and cost reduction.
- Evaluation & Quality Assurance: Build automated evaluation frameworks (using benchmarks, Ragas/G-Eval/custom metrics) with A/B testing and real-time monitoring. Implement shadow traffic replay and response comparison (using Diffy or equivalent solutions) to identify regression risks during model/prompt/service upgrades, supporting canary releases and rollbacks.
- Engineering & Observability: Establish version control for models/prompts, feature/data versioning, experiment tracking (MLflow/W&B), and audit logs. Implement end-to-end observability covering latency, error rates, prompt/context lengths, hit rates, and cost monitoring (tokens/$).
- Platform Integration: Expose workflows via API/SDK/microservices. Integrate with business backends (Go/PHP/Node), queues (Kafka/RabbitMQ), storage (Postgres/Redis/object storage), and vector databases (Milvus/Qdrant/pgvector). Ensure security and compliance (data masking, PII protection, auditing, rate limiting, and model governance).
Job Requirements
- Must-Have Qualifications:
- 3+ years backend/data/platform engineering experience with 1-2 years hands-on LLM/generative AI projects
- Expertise in LLM application engineering: prompt engineering, function calling, dialogue state management, memory, structured output, alignment and evaluation
- Proficiency in at least one orchestration framework: LangChain/LangGraph, LlamaIndex, Temporal/Prefect/Airflow, or custom DAG/state machine implementations
- End-to-end RAG experience: data cleaning → vectorization → retrieval → reranking → evaluation
- Experience with Diffy or equivalent shadow traffic/replay comparison solutions
- Strong engineering fundamentals: Docker, CI/CD, Git, observability (OpenTelemetry/Prometheus/Grafana)
- Proficiency in Go/Python/TypeScript with ability to write reliable services and tests
- Excellent remote collaboration and documentation skills
- Preferred Qualifications:
- Deep Diffy implementation experience
- LLMOps/evaluation platform experience (Arize Phoenix, Evidently, PromptLayer, etc.)
- Practical Agent framework experience (LangGraph, autogen/crewAI, GraphRAG)
- Security/compliance expertise (data masking, PDPA/GDPR, moderation systems)
- Domain knowledge in IM/customer service/marketing automation
- Cost optimization experience (caching, retrieval compression, model routing)
Technical Stack
- Orchestration: LangChain/LangGraph, LlamaIndex, Temporal/Prefect/Airflow
- Models & Evaluation: OpenAI/Anthropic/Google, VLLM/Ollama, Ragas, G-Eval
- Retrieval: Milvus, Qdrant, pgvector, Elasticsearch, reranking models
- Services: Go/Python/TypeScript, gRPC/REST, Redis, Postgres, Kafka/RabbitMQ
- Observability: OpenTelemetry, Prometheus, Grafana, ELK/ClickHouse
Benefits
Fully remote work environment, competitive compensation package, and collaborative team culture that values innovation and professional growth.


