Parallel computation of models

 Hi author, when I trained my model using coloured datasets, I found that the memory required for training was too large, after adding multiple GPUs I switched to a data-parallel training method and found that the training duration was still unsatisfactory, I would like to ask what changes I can make if I want to try a model-parallel training method.