Hi, I was going through your work. After understanding the paper somewhat, I found that you guys mentioned using diffusion loss, embedded in the autoregressive optimization objective, during pre-training.
However, in your code, I only see MSELoss or CrossEntropyLoss being used:
def pretrain_one_epoch(self, train_loader, model_optim, model_scheduler):
train_loss = []
model_criterion = self._select_criterion()
self.model.train()
for i, (batch_x, batch_y, batch_x_mark, batch_y_mark) in enumerate(
train_loader
):
model_optim.zero_grad()
batch_x = batch_x.float().to(self.device)
batch_y = batch_y.float().to(self.device)
pred_x = self.model(batch_x)
diff_loss = model_criterion(pred_x, batch_x)
diff_loss.backward()
model_optim.step()
train_loss.append(diff_loss.item())
model_scheduler.step()
train_loss = np.mean(train_loss)
where, the _select_criterion() function, is:
def _select_criterion(self):
if self.args.task_name == "finetune" and self.args.downstream_task == "classification":
criterion = nn.CrossEntropyLoss()
print("Using CrossEntropyLoss")
else:
criterion = nn.MSELoss()
print("Using MSELoss")
return criterion
Can you please clarify what is "diffusion loss instead of MSE" actually being used for?
Thank You
Hi, I was going through your work. After understanding the paper somewhat, I found that you guys mentioned using diffusion loss, embedded in the autoregressive optimization objective, during pre-training.
However, in your code, I only see MSELoss or CrossEntropyLoss being used:
where, the _select_criterion() function, is:
Can you please clarify what is "diffusion loss instead of MSE" actually being used for?
Thank You