siiRL

Quickstart

  • Installation
  • Quickstart: GRPO training on GSM8K dataset

Programming guide

  • siiRL: The DistFlow Programming Guide

Data Preparation

  • Prepare Data for Post-Training
  • Implementing Reward Functions for Datasets

Configurations

  • Config Explanation

Example

  • DeepScaleR Example with PPO
  • MM-Eureka Example with GRPO
  • DeepScaleR Example with CPGD

Hardware Support

  • Ascend NPU
  • Data Collection on Ascend Devices Based on the FSDP Backend
siiRL
  • Search


© Copyright 2025, SII AI Infra Team.

Built with Sphinx using a theme provided by Read the Docs.