Job Description
This role offers a unique opportunity to contribute to the development of AI workflow orchestration and automation platforms from the ground up (0→1) or scaling existing systems (1→N). You'll work on cutting-edge technologies including RAG (Retrieval-Augmented Generation), tool calling, Agent collaboration, and asynchronous orchestration.
Key Responsibilities
- AI/LLM Workflow Orchestration: Design and implement multi-step reasoning, Agent collaboration, tool calling (Tool-Calling/Function-Calling), asynchronous task queues, and compensation mechanisms. Optimize RAG pipelines including data ingestion, chunking, vectorization, retrieval/reranking, context compression, caching, and cost reduction.
- Evaluation & Quality Assurance: Build automated evaluation and alignment systems (benchmark sets, Ragas/G-Eval/custom metrics), integrate A/B testing and online monitoring. Implement shadow traffic replay and response comparison using Diffy (or equivalent) to identify regression risks in models/prompts/service upgrades, supporting canary releases and rollbacks.
- Engineering & Observability: Establish model/prompt versioning, feature/data versioning, experiment tracking (MLflow/W&B), and audit logs. Implement end-to-end observability for latency, error rates, prompt/context lengths, hit rates, and cost monitoring (tokens/$).
- Platform Integration: Expose workflows via API/SDK/microservices. Integrate with business backends (Go/PHP/Node), queues (Kafka/RabbitMQ), storage (Postgres/Redis/object storage), and vector databases (Milvus/Qdrant/pgvector). Implement security and compliance measures (data masking, PII protection, auditing, rate limiting, quotas, model governance).
Job Requirements (Mandatory)
- 3+ years of backend or data/platform engineering experience with 1-2 years of hands-on LLM/generative AI projects
- Proficiency in LLM application engineering: prompt engineering, function/tool calling, conversation state management, memory, structured output, alignment and evaluation
- Experience with at least one orchestration framework: LangChain/LangGraph, LlamaIndex, Temporal/Prefect/Airflow, or custom DAG/state machine/compensation solutions
- End-to-end RAG implementation experience: data cleaning → vectorization → retrieval → reranking → evaluation, with knowledge of Milvus/Qdrant/pgvector
- Diffy experience or equivalent production traffic replay/diff comparison (shadow traffic, record/replay, regression output comparison, canary releases)
- Strong engineering fundamentals: Docker, CI/CD, Git workflows, logging/metrics (OpenTelemetry/Prometheus/Grafana)
- Proficiency in at least one primary language (Go/Python/TypeScript) with ability to write reliable services and tests
- Excellent remote collaboration and documentation skills, metrics-driven delivery approach
Preferred Qualifications
- Deep Diffy implementation experience (or integrating with API gateways for shadow traffic/routing/comparison)
- LLMOps/evaluation platform experience (Arize Phoenix, Evidently, PromptLayer, OpenAI Evals, Ragas)
- Practical experience with Agent frameworks (LangGraph, autogen/crewAI, GraphRAG, tool ecosystems)
- Security/compliance knowledge (data masking, access control, PDPA/GDPR) and moderation systems (Llama Guard/Moderation)
- Domain experience in IM/customer service/marketing automation or multilingual scenarios (Chinese/English/Vietnamese)
- Cost optimization experience: caching, retrieval compression, model routing, multi-provider switching (OpenAI/Anthropic/Google/local models)
Technology Stack (Partial List)
- Orchestration: LangChain/LangGraph, LlamaIndex, Temporal/Prefect/Airflow
- Models & Evaluation: OpenAI/Anthropic/Google, VLLM/Ollama, Ragas, G-Eval, MLflow/W&B
- Vector Search: Milvus, Qdrant, pgvector, Elasticsearch, rerankers (bge/multilingual/E5)
- Backend: Go/Python/TypeScript, gRPC/REST, Redis, Postgres, Kafka/RabbitMQ, Docker/K8s
- Observability: OpenTelemetry, Prometheus, Grafana, ELK/ClickHouse
- Testing: Twitter Diffy or equivalent shadow traffic/replay + diff comparison systems
Benefits
We offer competitive compensation, career growth opportunities, and a collaborative team environment with full remote work flexibility. University interns with relevant experience who can commit to regular working hours are welcome to apply. Interested candidates please contact via Telegram: @Oran_Gina or email: [email protected]


