Ascend NPU ========== SiiRL is also supports for Huawei's Ascend NPU devices. This guide has been tested with the following hardware: - Atlas 200T A2 Box16 Installation Process -------------------- Core Environment Requirements ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Ensure your environment meets these core software version requirements: +---------------------+------------+ | Software | Version | +---------------------+------------+ | Python | == 3.10 | +---------------------+------------+ | CANN | == 8.1.RC1 | +---------------------+------------+ | PyTorch | == 2.5.1 | +---------------------+------------+ | torch_npu | == 2.5.1 | +---------------------+------------+ | mindspeed(Optional) | == 0.12.1 | +---------------------+------------+ Recommended Base Image ^^^^^^^^^^^^^^^^^^^^^^ For a smoother setup, we strongly recommend using our pre-built Docker image, which includes all necessary dependencies. Please note this pre-built docker image contains torch, torch-npu, vLLM and vLLM-Ascend packages, after pulling it you only need to install siiRL framework from source. .. code-block:: bash docker pull crispig/verl_npu:cann8.1rc1-py3.10-torch2.5.1-vllm-ascend0.7.3.post1-250616 Compiling vLLM and vllm-ascend [Optional] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Proper integration of vLLM within siiRL requires compiling both `vllm` and `vllm-ascend` from source. Follow the steps below, paying close attention to the instructions specific to your hardware. .. note:: We recommend using the latest version of vllm v0.9.2 and vllm-ascend v0.9.0rc2, which support setting use_remove_padding=True. .. code-block:: bash # vllm git clone -b v0.9.2 --depth 1 https://github.com/vllm-project/vllm.git cd vllm pip install -r requirements-build.txt # For Atlas 200T A2 Box16 VLLM_TARGET_DEVICE=empty pip install -e . --extra-index https://download.pytorch.org/whl/cpu/ .. code-block:: bash # vllm-ascend git clone -b v0.9.0rc2 --depth 1 https://github.com/vllm-project/vllm-ascend.git cd vllm-ascend export COMPILE_CUSTOM_KERNELS=1 python setup.py install SiiRL Installation ^^^^^^^^^^^^^^^^^^ Finally, install the siiRL framework itself. DO NOT use the pip install command to install siiRL, it will cause dependency conflicts. .. code-block:: bash git clone https://github.com/sii-research/siiRL.git cd siirl pip install -e . Third-Party Library Considerations ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Please be aware of the following specific requirements and limitations for certain libraries on Ascend hardware: +--------------+---------------+ | Software | Description | +--------------+---------------+ | transformers | v4.52.4 | +--------------+---------------+ | flash_attn | not supported | +--------------+---------------+ | liger-kernel | not supported | +--------------+---------------+ | tensordict | 0.8.3 (ARM) | +--------------+---------------+ 1. Using `--flash_attention_2` through `transformers` is supported (requires `transformers` version >= 4.52.0). 2. Flash Attention acceleration via the `flash_attn` package is not supported. 3. `liger-kernel` is not supported. 4. For ARM servers, `tensordict` version 0.8.3 is required. You can manually install it after the main dependencies are installed. 5. For x86 servers, the CPU version of `torchvision` must be installed. .. code-block:: bash pip install torchvision==0.20.1+cpu --index-url https://download.pytorch.org/whl/cpu Verification with a Quick Start Example --------------------------------------- To ensure your setup is correct, we recommend performing a quick test run. The following example trains a Qwen2.5-0.5B model on the GSM8k dataset using the GRPO algorithm. 1. **Prepare the Dataset** First, download and preprocess the GSM8k dataset. The provided script will convert it to the Parquet format required by the framework. .. code-block:: bash python3 examples/data_preprocess/gsm8k.py --local_dir ~/data/gsm8k 2. **Run the Training Job** Next, execute the training command below. Ensure you have set the `VLLM_ATTENTION_BACKEND` environment variable. .. code-block:: bash set -x python3 -m siirl.main_dag \ algorithm.adv_estimator=grpo \ data.train_files=/datasets/gsm8k/train.parquet\ data.val_files=/datasets/gsm8k/teset.parquet \ data.train_batch_size=1024 \ data.max_prompt_length=1024 \ data.max_response_length=1024 \ data.filter_overlong_prompts=True \ data.truncation='error' \ actor_rollout_ref.model.path=/models/Qwen2.5-0.5B-Instruct \ actor_rollout_ref.actor.optim.lr=5e-8 \ actor_rollout_ref.model.use_remove_padding=False \ actor_rollout_ref.actor.ppo_mini_batch_size=32 \ actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=2 \ actor_rollout_ref.actor.use_kl_loss=True \ actor_rollout_ref.actor.entropy_coeff=0 \ actor_rollout_ref.actor.kl_loss_coef=0.001 \ actor_rollout_ref.actor.kl_loss_type=low_var_kl \ actor_rollout_ref.model.enable_gradient_checkpointing=True \ actor_rollout_ref.actor.fsdp_config.param_offload=False \ actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \ actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=2 \ actor_rollout_ref.rollout.tensor_model_parallel_size=4 \ actor_rollout_ref.rollout.name=vllm \ actor_rollout_ref.rollout.gpu_memory_utilization=0.3 \ actor_rollout_ref.rollout.n=5 \ actor_rollout_ref.rollout.enable_chunked_prefill=False \ actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=2 \ actor_rollout_ref.ref.fsdp_config.param_offload=True \ algorithm.use_kl_in_reward=False \ trainer.critic_warmup=0 \ trainer.logger=['console'] \ trainer.project_name='siirl_grpo_example_gsm8k' \ trainer.experiment_name='qwen2_05b_function_rm' \ trainer.n_gpus_per_node=16 \ trainer.nnodes=$NNODES \ trainer.save_freq=-1 \ trainer.test_freq=5 \ trainer.total_epochs=300 \ trainer.device=npu $@ (Optional) Setting Up MindSpeed Training Backend Guide ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Refer to the MindSpeed README _ for instructions on installing the MindSpeed acceleration library, recommended versions: MindSpeed Core 0.12.1, Megatron-LM 0.12.2. .. warning:: Please Be sure to install **megatron-core** via ``pip install``. Using ``PYTHONPATH`` to point to megatron will crash the program. Enable siirl worker model ``strategy`` and set it to ``megatron``. For example: ``actor_rollout_ref.actor.strategy=megatron``. Custom MindSpeed parameters can be passed through the override_transformer_config option. For instance, to enable FA for the actor model, you can use: ``+actor_rollout_ref.actor.megatron.override_transformer_config.use_flash_attn=True``. MindSpeed provides the same support for siiRL and verl. For more feature details, please refer to the MindSpeed+verl documentation. _.