Agent skill
agentdb-reinforcement-learning-training
Train AI agents using AgentDB's 9 reinforcement learning algorithms including Q-Learning, DQN, PPO, and Actor-Critic. Build self-learning agents, implement RL training loops with experience replay, and deploy optimized models to production.
Install this agent skill to your Project
npx add-skill https://github.com/aiskillstore/marketplace/tree/main/skills/dnyoussef/agentdb-reinforcement-learning-training
Metadata
Additional technical details for this skill
- tags
-
agentdb reinforcement-learning neural-networks ai-training q-learning
- author
- claude-flow
- created
- 1761782400
- updated
- 1761782400
SKILL.md
AgentDB Reinforcement Learning Training
Overview
Train AI learning plugins with AgentDB's 9 reinforcement learning algorithms including Decision Transformer, Q-Learning, SARSA, Actor-Critic, PPO, and more. Build self-learning agents, implement RL, and optimize agent behavior through experience.
When to Use This Skill
Use this skill when you need to:
- Train autonomous agents that learn from experience
- Implement reinforcement learning systems
- Optimize agent behavior through trial and error
- Build self-improving AI systems
- Deploy RL agents in production environments
- Benchmark and compare RL algorithms
Available RL Algorithms
- Q-Learning - Value-based, off-policy
- SARSA - Value-based, on-policy
- Deep Q-Network (DQN) - Deep RL with experience replay
- Actor-Critic - Policy gradient with value baseline
- Proximal Policy Optimization (PPO) - Trust region policy optimization
- Decision Transformer - Offline RL with transformers
- Advantage Actor-Critic (A2C) - Synchronous advantage estimation
- Twin Delayed DDPG (TD3) - Continuous control
- Soft Actor-Critic (SAC) - Maximum entropy RL
SOP Framework: 5-Phase RL Training Deployment
Phase 1: Initialize Learning Environment (1-2 hours)
Objective: Setup AgentDB learning infrastructure with environment configuration
Agent: ml-developer
Steps:
- Install AgentDB Learning Module
npm install agentdb-learning@latest
npm install @agentdb/rl-algorithms @agentdb/environments
- Initialize learning database
import { AgentDB, LearningPlugin } from 'agentdb-learning';
const learningDB = new AgentDB({
name: 'rl-training-db',
dimensions: 512, // State embedding dimension
learning: {
enabled: true,
persistExperience: true,
replayBufferSize: 100000
}
});
await learningDB.initialize();
// Create learning plugin
const learningPlugin = new LearningPlugin({
database: learningDB,
algorithms: ['q-learning', 'dqn', 'ppo', 'actor-critic'],
config: {
batchSize: 64,
learningRate: 0.001,
discountFactor: 0.99,
explorationRate: 1.0,
explorationDecay: 0.995
}
});
await learningPlugin.initialize();
- Define environment
import { Environment } from '@agentdb/environments';
const environment = new Environment({
name: 'grid-world',
stateSpace: {
type: 'continuous',
shape: [10, 10],
bounds: [[0, 10], [0, 10]]
},
actionSpace: {
type: 'discrete',
actions: ['up', 'down', 'left', 'right']
},
rewardFunction: (state, action, nextState) => {
// Distance to goal reward
const goalDistance = Math.sqrt(
Math.pow(nextState[0] - 9, 2) +
Math.pow(nextState[1] - 9, 2)
);
return -goalDistance + (goalDistance === 0 ? 100 : 0);
},
terminalCondition: (state) => {
return state[0] === 9 && state[1] === 9; // Reached goal
}
});
await environment.initialize();
- Setup monitoring
const monitor = learningPlugin.createMonitor({
metrics: ['reward', 'loss', 'exploration-rate', 'episode-length'],
logInterval: 100, // Log every 100 episodes
saveCheckpoints: true,
checkpointInterval: 1000
});
monitor.on('episode-complete', (episode) => {
console.log('Episode:', episode.number, 'Reward:', episode.totalReward);
});
Memory Pattern:
await agentDB.memory.store('agentdb/learning/environment', {
name: environment.name,
stateSpace: environment.stateSpace,
actionSpace: environment.actionSpace,
initialized: Date.now()
});
Validation:
- Learning database initialized
- Environment configured and tested
- Monitor capturing metrics
- Configuration stored in memory
Phase 2: Configure RL Algorithm (1-2 hours)
Objective: Select and configure RL algorithm for the learning task
Agent: ml-developer
Steps:
- Select algorithm
// Example: Deep Q-Network (DQN)
const dqnAgent = learningPlugin.createAgent({
algorithm: 'dqn',
config: {
networkArchitecture: {
layers: [
{ type: 'dense', units: 128, activation: 'relu' },
{ type: 'dense', units: 128, activation: 'relu' },
{ type: 'dense', units: environment.actionSpace.size, activation: 'linear' }
]
},
learningRate: 0.001,
batchSize: 64,
replayBuffer: {
size: 100000,
prioritized: true,
alpha: 0.6,
beta: 0.4
},
targetNetwork: {
updateFrequency: 1000,
tauSync: 0.001 // Soft update
},
exploration: {
initial: 1.0,
final: 0.01,
decay: 0.995
},
training: {
startAfter: 1000, // Start training after 1000 experiences
updateFrequency: 4
}
}
});
await dqnAgent.initialize();
- Configure hyperparameters
const hyperparameters = {
// Learning parameters
learningRate: 0.001,
discountFactor: 0.99, // Gamma
batchSize: 64,
// Exploration
epsilonStart: 1.0,
epsilonEnd: 0.01,
epsilonDecay: 0.995,
// Experience replay
replayBufferSize: 100000,
minReplaySize: 1000,
prioritizedReplay: true,
// Training
maxEpisodes: 10000,
maxStepsPerEpisode: 1000,
targetUpdateFrequency: 1000,
// Evaluation
evalFrequency: 100,
evalEpisodes: 10
};
dqnAgent.setHyperparameters(hyperparameters);
- Setup experience replay
import { PrioritizedReplayBuffer } from '@agentdb/rl-algorithms';
const replayBuffer = new PrioritizedReplayBuffer({
capacity: 100000,
alpha: 0.6, // Prioritization exponent
beta: 0.4, // Importance sampling
betaIncrement: 0.001,
epsilon: 0.01 // Small constant for stability
});
dqnAgent.setReplayBuffer(replayBuffer);
- Configure training loop
const trainingConfig = {
episodes: 10000,
stepsPerEpisode: 1000,
warmupSteps: 1000,
trainFrequency: 4,
targetUpdateFrequency: 1000,
saveFrequency: 1000,
evalFrequency: 100,
earlyStoppingPatience: 500,
earlyStoppingThreshold: 0.01
};
dqnAgent.setTrainingConfig(trainingConfig);
Memory Pattern:
await agentDB.memory.store('agentdb/learning/algorithm-config', {
algorithm: 'dqn',
hyperparameters: hyperparameters,
trainingConfig: trainingConfig,
configured: Date.now()
});
Validation:
- Algorithm selected and configured
- Hyperparameters validated
- Replay buffer initialized
- Training config set
Phase 3: Train Agents (3-4 hours)
Objective: Execute training iterations and optimize agent behavior
Agent: safla-neural
Steps:
- Start training loop
async function trainAgent() {
console.log('Starting RL training...');
const trainingStats = {
episodes: [],
totalReward: [],
episodeLength: [],
loss: [],
explorationRate: []
};
for (let episode = 0; episode < trainingConfig.episodes; episode++) {
let state = await environment.reset();
let episodeReward = 0;
let episodeLength = 0;
let episodeLoss = 0;
for (let step = 0; step < trainingConfig.stepsPerEpisode; step++) {
// Select action
const action = await dqnAgent.selectAction(state, {
explore: true
});
// Execute action
const { nextState, reward, done } = await environment.step(action);
// Store experience
await dqnAgent.storeExperience({
state,
action,
reward,
nextState,
done
});
// Train if enough experiences
if (dqnAgent.canTrain()) {
const loss = await dqnAgent.train();
episodeLoss += loss;
}
episodeReward += reward;
episodeLength += 1;
state = nextState;
if (done) break;
}
// Update target network
if (episode % trainingConfig.targetUpdateFrequency === 0) {
await dqnAgent.updateTargetNetwork();
}
// Decay exploration
dqnAgent.decayExploration();
// Log progress
trainingStats.episodes.push(episode);
trainingStats.totalReward.push(episodeReward);
trainingStats.episodeLength.push(episodeLength);
trainingStats.loss.push(episodeLoss / episodeLength);
trainingStats.explorationRate.push(dqnAgent.getExplorationRate());
if (episode % 100 === 0) {
console.log(`Episode ${episode}:`, {
reward: episodeReward.toFixed(2),
length: episodeLength,
loss: (episodeLoss / episodeLength).toFixed(4),
epsilon: dqnAgent.getExplorationRate().toFixed(3)
});
}
// Save checkpoint
if (episode % trainingConfig.saveFrequency === 0) {
await dqnAgent.save(`checkpoint-${episode}`);
}
// Evaluate
if (episode % trainingConfig.evalFrequency === 0) {
const evalReward = await evaluateAgent(dqnAgent, environment);
console.log(`Evaluation at episode ${episode}: ${evalReward.toFixed(2)}`);
}
// Early stopping
if (checkEarlyStopping(trainingStats, episode)) {
console.log('Early stopping triggered');
break;
}
}
return trainingStats;
}
const trainingStats = await trainAgent();
- Monitor training progress
monitor.on('training-update', (stats) => {
// Calculate moving averages
const window = 100;
const recentRewards = stats.totalReward.slice(-window);
const avgReward = recentRewards.reduce((a, b) => a + b, 0) / recentRewards.length;
// Store metrics
agentDB.memory.store('agentdb/learning/training-progress', {
episode: stats.episodes[stats.episodes.length - 1],
avgReward: avgReward,
explorationRate: stats.explorationRate[stats.explorationRate.length - 1],
timestamp: Date.now()
});
// Plot learning curve (if visualization enabled)
if (monitor.visualization) {
monitor.plot('reward-curve', stats.episodes, stats.totalReward);
monitor.plot('loss-curve', stats.episodes, stats.loss);
}
});
- Handle convergence
function checkConvergence(stats, windowSize = 100, threshold = 0.01) {
if (stats.totalReward.length < windowSize * 2) {
return false;
}
const recent = stats.totalReward.slice(-windowSize);
const previous = stats.totalReward.slice(-windowSize * 2, -windowSize);
const recentAvg = recent.reduce((a, b) => a + b, 0) / recent.length;
const previousAvg = previous.reduce((a, b) => a + b, 0) / previous.length;
const improvement = (recentAvg - previousAvg) / Math.abs(previousAvg);
return improvement < threshold;
}
- Save trained model
await dqnAgent.save('trained-agent-final', {
includeReplayBuffer: false,
includeOptimizer: false,
metadata: {
trainingStats: trainingStats,
hyperparameters: hyperparameters,
finalReward: trainingStats.totalReward[trainingStats.totalReward.length - 1]
}
});
console.log('Training complete. Model saved.');
Memory Pattern:
await agentDB.memory.store('agentdb/learning/training-results', {
algorithm: 'dqn',
episodes: trainingStats.episodes.length,
finalReward: trainingStats.totalReward[trainingStats.totalReward.length - 1],
converged: checkConvergence(trainingStats),
modelPath: 'trained-agent-final',
timestamp: Date.now()
});
Validation:
- Training completed or converged
- Reward curve shows improvement
- Model saved successfully
- Training stats stored
Phase 4: Validate Performance (1-2 hours)
Objective: Benchmark trained agent and validate performance
Agent: performance-benchmarker
Steps:
- Load trained agent
const trainedAgent = await learningPlugin.loadAgent('trained-agent-final');
- Run evaluation episodes
async function evaluateAgent(agent, env, numEpisodes = 100) {
const results = {
rewards: [],
episodeLengths: [],
successRate: 0
};
for (let i = 0; i < numEpisodes; i++) {
let state = await env.reset();
let episodeReward = 0;
let episodeLength = 0;
let success = false;
for (let step = 0; step < 1000; step++) {
const action = await agent.selectAction(state, { explore: false });
const { nextState, reward, done } = await env.step(action);
episodeReward += reward;
episodeLength += 1;
state = nextState;
if (done) {
success = env.isSuccessful(state);
break;
}
}
results.rewards.push(episodeReward);
results.episodeLengths.push(episodeLength);
if (success) results.successRate += 1;
}
results.successRate /= numEpisodes;
return {
meanReward: results.rewards.reduce((a, b) => a + b, 0) / results.rewards.length,
stdReward: calculateStd(results.rewards),
meanLength: results.episodeLengths.reduce((a, b) => a + b, 0) / results.episodeLengths.length,
successRate: results.successRate,
results: results
};
}
const evalResults = await evaluateAgent(trainedAgent, environment, 100);
console.log('Evaluation results:', evalResults);
- Compare with baseline
// Random policy baseline
const randomAgent = learningPlugin.createAgent({ algorithm: 'random' });
const randomResults = await evaluateAgent(randomAgent, environment, 100);
// Calculate improvement
const improvement = {
rewardImprovement: (evalResults.meanReward - randomResults.meanReward) / Math.abs(randomResults.meanReward),
lengthImprovement: (randomResults.meanLength - evalResults.meanLength) / randomResults.meanLength,
successImprovement: evalResults.successRate - randomResults.successRate
};
console.log('Improvement over random:', improvement);
- Run comprehensive benchmarks
const benchmarks = {
performanceMetrics: {
meanReward: evalResults.meanReward,
stdReward: evalResults.stdReward,
successRate: evalResults.successRate,
meanEpisodeLength: evalResults.meanLength
},
algorithmComparison: {
dqn: evalResults,
random: randomResults,
improvement: improvement
},
inferenceTiming: {
actionSelection: 0,
totalEpisode: 0
}
};
// Measure inference speed
const timingTrials = 1000;
const startTime = performance.now();
for (let i = 0; i < timingTrials; i++) {
const state = await environment.randomState();
await trainedAgent.selectAction(state, { explore: false });
}
const endTime = performance.now();
benchmarks.inferenceTiming.actionSelection = (endTime - startTime) / timingTrials;
await agentDB.memory.store('agentdb/learning/benchmarks', benchmarks);
Memory Pattern:
await agentDB.memory.store('agentdb/learning/validation', {
evaluated: true,
meanReward: evalResults.meanReward,
successRate: evalResults.successRate,
improvement: improvement,
timestamp: Date.now()
});
Validation:
- Evaluation completed (100 episodes)
- Mean reward exceeds threshold
- Success rate acceptable
- Improvement over baseline demonstrated
Phase 5: Deploy Trained Agents (1-2 hours)
Objective: Deploy trained agents to production environment
Agent: ml-developer
Steps:
- Export production model
await trainedAgent.export('production-agent', {
format: 'onnx', // or 'tensorflowjs', 'pytorch'
optimize: true,
quantize: 'int8', // Quantization for faster inference
includeMetadata: true
});
- Create inference API
import express from 'express';
const app = express();
app.use(express.json());
// Load production agent
const productionAgent = await learningPlugin.loadAgent('production-agent');
app.post('/api/predict', async (req, res) => {
try {
const { state } = req.body;
const action = await productionAgent.selectAction(state, {
explore: false,
returnProbabilities: true
});
res.json({
action: action.action,
probabilities: action.probabilities,
confidence: action.confidence
});
} catch (error) {
res.status(500).json({ error: error.message });
}
});
app.listen(3000, () => {
console.log('RL agent API running on port 3000');
});
- Setup monitoring
import { ProductionMonitor } from '@agentdb/monitoring';
const prodMonitor = new ProductionMonitor({
agent: productionAgent,
metrics: ['inference-latency', 'action-distribution', 'reward-feedback'],
alerting: {
latencyThreshold: 100, // ms
anomalyDetection: true
}
});
await prodMonitor.start();
- Create deployment pipeline
const deploymentPipeline = {
stages: [
{
name: 'validation',
steps: [
'Load trained model',
'Run validation suite',
'Check performance metrics',
'Verify inference speed'
]
},
{
name: 'export',
steps: [
'Export to production format',
'Optimize model',
'Quantize weights',
'Package artifacts'
]
},
{
name: 'deployment',
steps: [
'Deploy to staging',
'Run smoke tests',
'Deploy to production',
'Monitor performance'
]
}
]
};
await agentDB.memory.store('agentdb/learning/deployment-pipeline', deploymentPipeline);
Memory Pattern:
await agentDB.memory.store('agentdb/learning/production', {
deployed: true,
modelPath: 'production-agent',
apiEndpoint: 'http://localhost:3000/api/predict',
monitoring: true,
timestamp: Date.now()
});
Validation:
- Model exported successfully
- API running and responding
- Monitoring active
- Deployment pipeline documented
Integration Scripts
Complete Training Script
#!/bin/bash
# train-rl-agent.sh
set -e
echo "AgentDB RL Training Script"
echo "=========================="
# Phase 1: Initialize
echo "Phase 1: Initializing learning environment..."
npm install agentdb-learning @agentdb/rl-algorithms
# Phase 2: Configure
echo "Phase 2: Configuring algorithm..."
node -e "require('./config-algorithm.js')"
# Phase 3: Train
echo "Phase 3: Training agent..."
node -e "require('./train-agent.js')"
# Phase 4: Validate
echo "Phase 4: Validating performance..."
node -e "require('./evaluate-agent.js')"
# Phase 5: Deploy
echo "Phase 5: Deploying to production..."
node -e "require('./deploy-agent.js')"
echo "Training complete!"
Quick Start Script
// quickstart-rl.ts
import { setupRLTraining } from './setup';
async function quickStart() {
console.log('Starting RL training quick setup...');
// Setup
const { learningDB, environment, agent } = await setupRLTraining({
algorithm: 'dqn',
environment: 'grid-world',
episodes: 1000
});
// Train
console.log('Training agent...');
const stats = await agent.train(environment, {
episodes: 1000,
logInterval: 100
});
// Evaluate
console.log('Evaluating agent...');
const results = await agent.evaluate(environment, {
episodes: 100
});
console.log('Results:', results);
// Save
await agent.save('quickstart-agent');
console.log('Quick start complete!');
}
quickStart().catch(console.error);
Evidence-Based Success Criteria
-
Training Convergence (Self-Consistency)
- Reward curve stabilizes
- Moving average improvement < 1%
- Agent achieves consistent performance
-
Performance Benchmarks (Quantitative)
- Mean reward exceeds baseline by 50%
- Success rate > 80%
- Inference time < 10ms per action
-
Algorithm Validation (Chain-of-Verification)
- Hyperparameters validated
- Exploration-exploitation balanced
- Experience replay functioning
-
Production Readiness (Multi-Agent Consensus)
- Model exported successfully
- API responds within latency threshold
- Monitoring active and alerting
- Deployment pipeline documented
Additional Resources
- AgentDB Learning Documentation: https://agentdb.dev/docs/learning
- RL Algorithms Guide: https://agentdb.dev/docs/rl-algorithms
- Training Best Practices: https://agentdb.dev/docs/training
- Production Deployment: https://agentdb.dev/docs/deployment
Recommended Agent Skills
Expand your agent's capabilities with these related and highly-rated skills.
perigon-backend
Perigon ASP.NET Core + EF Core + Aspire conventions
perigon-agent
Pointers for Copilot/agents to apply Perigon conventions
perigon-angular
Angular 21+ standalone/Material/signal conventions for Perigon WebApp
fastapi-mastery
Comprehensive FastAPI development skill covering REST API creation, routing, request/response handling, validation, authentication, database integration, middleware, and deployment. Use when working with FastAPI projects, building APIs, implementing CRUD operations, setting up authentication/authorization, integrating databases (SQL/NoSQL), adding middleware, handling WebSockets, or deploying FastAPI applications. Triggered by requests involving .py files with FastAPI code, API endpoint creation, Pydantic models, or FastAPI-specific features.
context7-efficient
Token-efficient library documentation fetcher using Context7 MCP with 86.8% token savings through intelligent shell pipeline filtering. Fetches code examples, API references, and best practices for JavaScript, Python, Go, Rust, and other libraries. Use when users ask about library documentation, need code examples, want API usage patterns, are learning a new framework, need syntax reference, or troubleshooting with library-specific information. Triggers include questions like "Show me React hooks", "How do I use Prisma", "What's the Next.js routing syntax", or any request for library/framework documentation.
browser-use
Browser automation using Playwright MCP. Navigate websites, fill forms, click elements, take screenshots, and extract data. Use when tasks require web browsing, form submission, web scraping, UI testing, or any browser interaction.
Didn't find tool you were looking for?