Having problem with diffusion model, are lowering the steps and num_timesteps related?

75 views Asked by At

python -m torch.distributed.launch --nproc_per_node=3 --master_port=12233 --use_env run_train.py
*--diff_steps 1000 * --lr 0.0001
*--learning_steps 3000 * --save_interval 600
--seed 102
--noise_schedule sqrt
--hidden_dim 128
--bsz 2048
--dataset qqp
--data_dir datasets/ForTest
--vocab bert
--seq_len 128
--schedule_sampler lossaware
--notes test-qqp

I change the 'learning steps' and 'diff steps'.

File "/home/documents/DiffuSeq-main/diffuseq/gaussian_diffusion.py", line 805, in ddim_sample_loop_progressive indices = list(range(self.num_timesteps))[::-1][::gap] ValueError: slice step cannot be zero ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2903262) of binary: /home/yjhongkr/miniconda3/envs/dec/bin/python ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 3/3 attempts left; will restart worker group INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:

And some problems come up with 'slice step'. Would there be some relations between them?

gap is determined as '1'like this.

def ddim_sample_loop_progressive( self, model, shape, noise=None, clip_denoised=True, denoised_fn=None, model_kwargs=None, device=None, progress=False, eta=0.0, langevin_fn=None, mask=None, x_start=None, gap=1 ):

I want to know whether it is related.

If it is, I have to run the model all over again...

0

There are 0 answers