siiRL

Quickstart

  • Installation
  • Quickstart: GRPO training on GSM8K dataset

Programming guide

  • siiRL: The DistFlow Programming Guide

Data Preparation

  • Prepare Data for Post-Training
  • Implementing Reward Functions for Datasets

Configurations

  • Config Explanation

Example

  • DeepScaleR Example with PPO
  • MM-Eureka Example with GRPO
  • DeepScaleR Example with CPGD

Hardware Support

  • Ascend NPU
  • Data Collection on Ascend Devices Based on the FSDP Backend
siiRL
  • siiRL documentation
  • View page source

siiRL documentation

Quickstart

  • Installation
    • Requirements
    • Method 1: Install from docker image
    • Method 2: Install from PIP
    • Method 3: Install from custom environment
  • Quickstart: GRPO training on GSM8K dataset
    • Introduction
    • Dataset Introduction
    • Step 1: Prepare the dataset
    • Step 2: Download a model for post-training
    • Step 3: Perform GRPO training with the instruct model

Programming guide

  • siiRL: The DistFlow Programming Guide
    • Motivation: Overcoming the Limits of Centralized Control
    • The DistFlow Architecture
    • Codebase Walkthrough: How DistFlow is Implemented
    • Key Takeaways

Data Preparation

  • Prepare Data for Post-Training
  • Implementing Reward Functions for Datasets

Configurations

  • Config Explanation
    • ppo_dag_trainer.yaml for RL FSDP Backend
    • workflow_grpo.yaml for GRPO

Example

  • DeepScaleR Example with PPO
  • MM-Eureka Example with GRPO
  • DeepScaleR Example with CPGD

Hardware Support

  • Ascend NPU
  • Data Collection on Ascend Devices Based on the FSDP Backend
Next

© Copyright 2025, SII AI Infra Team.

Built with Sphinx using a theme provided by Read the Docs.