siiRL

Quickstart

  • Installation
  • Quickstart: GRPO training on GSM8K dataset

Programming guide

  • siiRL Complete Architecture Guide
  • Code Structure
  • siiRL’s Implementation Explained
  • SRPO Code Implementation Explained

Data Preparation

  • Prepare Data for Post-Training
  • Implementing Reward Functions for Datasets

User Define Interface

  • Filter Interface
  • Reward Interface
  • Pipeline API
  • Metrics Interface

Configurations

  • Configuration Guide

Example

  • DeepScaleR Example with PPO
  • MM-Eureka Example with GRPO
  • DeepScaleR Example with CPGD
  • Megatron-LM Training Backend
  • Embodied SRPO Training

Hardware Support

  • Ascend NPU
  • Data Collection on Ascend Devices Based on the FSDP Backend
  • MetaX(沐曦) GPU
siiRL
  • Search


© Copyright 2025, SII AI Infra Team.

Built with Sphinx using a theme provided by Read the Docs.