Research concept · 2025 · India markets

Hierarchical AI agents for autonomous retail trading

An open research project at the intersection of reinforcement learning, large language models, and real-time market signals — applied to NSE equities, cryptocurrency pairs, INR forex segments, and commodity instruments.

Follow this research Read the architecture

Abstract

We present a hierarchical agent architecture combining reinforcement learning-trained trading models with a large language model orchestration layer capable of real-time news ingestion, agentic tool use, and natural language instruction parsing. The system is evaluated across three risk-appetite profiles on NSE equities, cryptocurrency pairs, INR forex segments, and commodity instruments using simulated paper portfolios. Agents demonstrate continuous self-improvement through post-trade analysis loops. This work explores whether conversational AI interfaces can democratize access to quantitative strategies historically available only to institutional desks.

Domain Computational finance · AI agents

Markets NSE · BSE · Crypto · Forex · Commodities

Status Active research · Pre-publication

Preliminary findings

Simulated benchmark results

Paper portfolio backtesting 2018–2024. Not indicative of live trading performance.

11.4%

Conservative agent — average simulated annual return on Nifty 50 universe

0.87 Sharpe ratio

7.3% Max drawdown

Benchmark: Nifty 50 CAGR ~12% over same period. Agent prioritizes capital preservation.

23.7%

Balanced agent — average simulated annual return on mid-cap + momentum strategies

1.41 Sharpe ratio

14.2% Max drawdown

Best risk-adjusted return across profiles. Multi-factor stock selection with position sizing.

41.2%

Aggressive agent — average simulated annual return on multi-asset, high-volatility strategy

0.98 Sharpe ratio

31.8% Max drawdown

Highest absolute return but significant tail risk. Includes crypto and commodity positions.

System architecture

Four-layer intelligence stack

The RL core handles strategy. The LLM layer handles context and language. Execution and learning loops close the feedback cycle.

Layer 1 · RL core

Reinforcement learning trading agents

PPO and SAC agents trained on 10+ years of historical OHLCV data. Reward functions shaped by Sharpe ratio, Calmar ratio, and maximum drawdown. Three separate models per risk tier, evaluated with walk-forward validation to reduce overfitting.

FinRL Stable-Baselines3 PPO / SAC Sharpe reward

Layer 2 · LLM orchestration

Agentic LLM with tool-calling

A large language model acts as the reasoning layer — ingesting financial news, parsing natural language instructions, and modulating RL agent parameters in response to macro context. Latency-optimized for swing trading, not HFT.

LangGraph RAG pipeline Tool calling Claude / GPT-4o

Layer 3 · Execution

Broker API + confirmation layer

Paper trading via simulated portfolios. Live execution (research phase) through Zerodha Kite API and Angel SmartAPI. A mandatory confirmation layer intercepts every autonomous action before execution, preserving human-in-the-loop control. All orders are idempotent.

Zerodha Kite Angel SmartAPI Idempotent orders Human-in-the-loop

Layer 4 · Learning loop

Continuous post-trade analysis

After each completed trade cycle, a critic module analyzes the outcome against the agent's stated reasoning. Losing trades trigger a strategy review. Updated policies are tested on paper before re-deployment to live portfolios.

Critic module Policy update Paper validation

Research scope

What this project is — and isn't

Clearly defining the boundaries of this research effort.

This is An open research project exploring AI agent behavior in financial market simulations. All results are from paper portfolios. No real capital is deployed in any experiment.

This is not A financial product, investment advisor, or trading platform. Nothing published here constitutes investment advice or a solicitation to trade any security or instrument.

Our goal To understand whether hierarchical AI agents can match or exceed benchmark performance while remaining interpretable, safe, and accessible to non-expert retail participants.

Open questions Overfitting to historical market regimes, LLM hallucination on financial news, correlated agent behavior at scale, and regulatory uncertainty around AI-driven execution in India.

Follow this research

Get notified when we publish findings, architecture updates, and early access to the paper trading sandbox.

Subscribed. We'll be in touch.

No spam. Research updates only.