Question about the three schedulers from generator and real/fake teacher

This is truly a remarkable piece of work! Many thanks to the author for open-sourcing it.

I have a question about the critic loss computation.

From my understanding, the student predicts $x_0$ using the generator's scheduler (denoted as `scheduler_stu`). Then `scheduler_stu` is used to add noise, and the resulting noisy sample is passed to the fake/real teachers.

However, the teachers use a scheduler with a different shift value from the student's scheduler to predict $x_0$ from `pred_flow`. After that, the code converts $x_0$ back to `pred_flow` again using the generator's scheduler.

I am wondering why the critic loss is not computed directly from the teacher's original output (`pred_flow`).

Instead, `pred_flow` is first transformed to $x_0$ and then converted back using a different shift setting. Is this intentional, or could it be a bug?

file path `model/dmd.py`
```python
  _, pred_fake_image = self.fake_score(
      noisy_image_or_video=noisy_generated_image, # !!!!!!!!!!!!!!!!!
      conditional_dict=conditional_dict,
      timestep=critic_timestep
  ) # Using fake teacher's scheduler, default shift = 8
```

```python
flow_pred = WanDiffusionWrapper._convert_x0_to_flow_pred(
    scheduler=self.scheduler,
    x0_pred=pred_fake_image.flatten(0, 1),  # !!!!!!!!!!!!!!!!!
    xt=noisy_generated_image.flatten(0, 1),
    timestep=critic_timestep.flatten(0, 1)
) # Using generator's scheduler, shift was set to 5.
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the three schedulers from generator and real/fake teacher #35

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about the three schedulers from generator and real/fake teacher #35

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions