siiRL
Quickstart
Installation
Quickstart: GRPO training on GSM8K dataset
Programming guide
siiRL: The DistFlow Programming Guide
Data Preparation
Prepare Data for Post-Training
Implementing Reward Functions for Datasets
Configurations
Config Explanation
Example
DeepScaleR Example with PPO
MM-Eureka Example with GRPO
DeepScaleR Example with CPGD
Hardware Support
Ascend NPU
Data Collection on Ascend Devices Based on the FSDP Backend
siiRL
siiRL documentation
View page source
siiRL documentation
Quickstart
Installation
Requirements
Method 1: Install from docker image
Method 2: Install from PIP
Method 3: Install from custom environment
Quickstart: GRPO training on GSM8K dataset
Introduction
Dataset Introduction
Step 1: Prepare the dataset
Step 2: Download a model for post-training
Step 3: Perform GRPO training with the instruct model
Programming guide
siiRL: The DistFlow Programming Guide
Motivation: Overcoming the Limits of Centralized Control
The DistFlow Architecture
Codebase Walkthrough: How DistFlow is Implemented
Key Takeaways
Data Preparation
Prepare Data for Post-Training
Implementing Reward Functions for Datasets
Configurations
Config Explanation
ppo_dag_trainer.yaml for RL FSDP Backend
workflow_grpo.yaml for GRPO
Example
DeepScaleR Example with PPO
MM-Eureka Example with GRPO
DeepScaleR Example with CPGD
Hardware Support
Ascend NPU
Data Collection on Ascend Devices Based on the FSDP Backend