I ran into a problem when training the qwen3-dlm model. The function interface is incompatible. In liger-kernel 0.6.2, the forward propagation of LigerFusedLinearCrossEntropyFunction has 12 parameters, but in qwen3_dlm, the function is passed 13 parameters, which actually corresponds to the parameter format of liger-kernel 0.6.3. Same problem in llada_dlm and dream_dlm.
I ran into a problem when training the
qwen3-dlmmodel. The function interface is incompatible. In liger-kernel 0.6.2, the forward propagation ofLigerFusedLinearCrossEntropyFunctionhas 12 parameters, but inqwen3_dlm, the function is passed 13 parameters, which actually corresponds to the parameter format of liger-kernel 0.6.3. Same problem inllada_dlmanddream_dlm.