Agent skill

rl-policy-optimization

Best practices for reinforcement learning policy optimization. Use when working on RL agents, PPO, SAC, or reward design.

Stars 11,027
Forks 1,262

Install this agent skill to your Project

npx add-skill https://github.com/aiming-lab/AutoResearchClaw/tree/main/researchclaw/skills/builtin/domain/rl-policy-optimization

Metadata

Additional technical details for this skill

author
researchclaw
version
1.0
category
domain
priority
3
references
Schulman et al., Proximal Policy Optimization, 2017; Haarnoja et al., Soft Actor-Critic, ICML 2018
trigger keywords
reinforcement learning,rl,policy,reward,agent,environment,ppo,sac
applicable stages
9,10

SKILL.md

RL Policy Optimization Best Practice

Algorithm selection:

  • Discrete actions: PPO, DQN, A2C
  • Continuous actions: SAC, TD3, PPO
  • Multi-agent: MAPPO, QMIX
  • Offline: CQL, IQL, Decision Transformer

Training recipe:

  • PPO: clip=0.2, lr=3e-4, gamma=0.99, GAE lambda=0.95
  • SAC: lr=3e-4, tau=0.005, auto-tune alpha
  • Use vectorized environments (e.g., gymnasium.vector)
  • Normalize observations and rewards
  • Log episode return, episode length, value loss, policy entropy

Evaluation:

  • Report mean +/- std over 10+ evaluation episodes
  • Use deterministic policy for evaluation
  • Compare against random policy and simple baselines
  • Report sample efficiency (return vs. env steps)

Common pitfalls:

  • Reward shaping can introduce bias
  • Seed sensitivity is HIGH — use 5+ seeds
  • Hyperparameter sensitivity — do a small sweep

Expand your agent's capabilities with these related and highly-rated skills.

aiming-lab/AutoResearchClaw

scientific-visualization

Publication-ready scientific figure design with matplotlib and seaborn. Use when creating journal submission figures with proper formatting, accessibility, and statistical annotations.

11,027 1,262
Explore
aiming-lab/AutoResearchClaw

hypothesis-formulation

Structured scientific hypothesis generation from observations. Use when formulating testable hypotheses, competing explanations, or experimental predictions.

11,027 1,262
Explore
aiming-lab/AutoResearchClaw

scientific-writing

Academic manuscript writing with IMRAD structure, citation formatting, and reporting guidelines. Use when drafting or revising research papers.

11,027 1,262
Explore
aiming-lab/AutoResearchClaw

a-evolve

Apply A-Evolve's agentic evolution methodology to improve AI agent performance across runs. Use when the user wants to diagnose agent failures, generate targeted skills from error patterns, evolve system prompts, or accumulate episodic knowledge. Works standalone or inside AutoResearchClaw pipelines. Triggers on: "evolve", "self-improve", "diagnose failures", "generate skills from errors", "what went wrong and how to fix it", or any mention of A-Evolve.

11,027 1,262
Explore
aiming-lab/AutoResearchClaw

chemistry-rdkit

Computational chemistry with RDKit for molecular analysis, descriptors, fingerprints, and substructure search. Use when working with SMILES, drug discovery, or cheminformatics tasks.

11,027 1,262
Explore
aiming-lab/AutoResearchClaw

literature-search

Systematic literature review methodology including search strategy, screening, and synthesis. Use when conducting literature reviews or writing background sections.

11,027 1,262
Explore

Didn't find tool you were looking for?

Be as detailed as possible for better results