Binance is a leading global blockchain ecosystem behind the world’s largest cryptocurrency exchange by trading volume and registered users. We are trusted by over 280 million people in 100+ countries for our industry-leading security, user fund transparency, trading engine speed, deep liquidity, and an unmatched portfolio of digital-asset products. Binance offerings range from trading and finance to education, research, payments, institutional services, Web3 features, and more. We leverage the power of digital assets and blockchain to build an inclusive financial ecosystem to advance the freedom of money and improve financial access for people around the world.

As an A.I. Agent Evaluation and Optimisation Specialist, play a critical role in ensuring both the outstanding performance and continuous improvement of large language model (LLM)-driven autonomous agents. Responsibilities span from designing and implementing robust evaluation frameworks to proactively identifying and executing optimisation strategies that enhance reliability, adaptability, and compliance across the agent lifecycle.

Responsibilities:

Design, Develop & Optimise Evaluation Plans:
Create structured, risk-aware, and adaptive evaluation and optimisation plans. Align these with user goals, governance requirements, and system architectures. Translate objectives into measurable criteria, scenarios, and optimisation targets.
Test Suite Development & Performance Tuning:
Develop and curate tests covering standard, edge, and emergent agent behaviours. Collaborate to generate synthetic data and incorporate domain expertise and use hands-on optimisation techniques to improve agent robustness.
Multi-Stage Evaluation & Optimisation:
Execute controlled (offline) and real-world (online) evaluations, assessing not just outputs but also reasoning steps, tool usage, and workflow execution. Identify and resolve performance bottlenecks, drive fine-tuning, and recommend systemic improvements.
Analyse, Diagnose & Optimise:
Conduct deep analysis of evaluation results to find performance gaps, failure modes, and optimisation opportunities at both the model and system level. Provide clear, actionable recommendations to directly improve agent efficiency, accuracy, and reliability.
Drive Continuous Improvement:
Collaborate closely with development teams to translate evaluation and optimisation findings into runtime adaptations, code performance enhancements, architectural upgrades, and targeted model retraining, including prompt engineering and reinforcement learning from human feedback (RLHF) methodologies.
Implement Feedback Loops:
Establish feedback mechanisms that combine human and machine evaluator input for ongoing monitoring, anomaly detection, and dynamic agent behaviour adjustment, integrating optimisation insights into deployment pipelines.
Ensure Compliance and Safety:
Maintain up-to-date governance documentation and safety cases, overseeing regulatory, ethical, and operational compliance through both evaluation and optimisation cycles.
Cross-Functional Collaboration:
Work with A.I. researchers, engineers, and domain experts to align evaluation and optimisation strategies with product objectives and user needs.

Requirements:

Bachelor’s or Master’s degree in Computer Science, Artificial Intelligence, Data Science, or a related field.
Demonstrated hands-on A.I. agent development experience, with a track record of identifying and implementing agent performance improvements.
In-depth understanding of large language models (LLMs), their optimisation, and agent system architectures.
Experience in both A.I. evaluation methodologies (like benchmarking, online/offline analysis) and direct agent optimisation, such as model fine-tuning or prompt design.
Familiarity with software engineering best practices (e.g. TDD, BDD), and deep exposure to AI-specific frameworks, observability, and lifecycle analytics.
Proven ability to perform data-driven diagnostics and root cause analysis, with direct contributions to measurable improvement in A.I. agent performance.
Strong communication skills, especially for documenting evaluation plans, optimisation strategies, result rationales, and technical recommendations.
Effective teamwork and cross-functional feedback process experience, bridging evaluation, development, and operations.
Programming skills in Python plus experience with major A.I./ML libraries and APIs, including hands-on development of LLM agents.

Why Binance

• Shape the future with the world’s leading blockchain ecosystem

• Collaborate with world-class talent in a user-centric global organization with a flat structure

• Tackle unique, fast-paced projects with autonomy in an innovative environment

• Thrive in a results-driven workplace with opportunities for career growth and continuous learning

• Competitive salary and company benefits

• Work-from-home arrangement (the arrangement may vary depending on the work nature of the business team)

Binance is committed to being an equal opportunity employer. We believe that having a diverse workforce is fundamental to our success.

By submitting a job application, you confirm that you have read and agree to our Candidate Privacy Notice.

Apply now

See more open positions at Binance

Privacy policy Cookie policy

Get Hired!

0 open roles

0 companies

AI Evaluation Specialist

Responsibilities:

Requirements:

Driving value for the Cosmos Hub

0
open roles

0
companies