Conversation
|
Thanks a lot for the PR! The review might take a while due to the upcoming holiday season. |
guarin
left a comment
There was a problem hiding this comment.
Thanks again for the PR! This will take a while to get merged as it contains many changes. I left a lot of comments/questions regarding the various parts. Let us know if you need help, we can also support you in making the changes or take over parts of the PR if that helps.
| SparseMaxPooling, | ||
| SparseSyncBatchNorm2d, | ||
| SparseConvNeXtBlock, | ||
| SparseConvNeXtLayerNorm |
There was a problem hiding this comment.
SparseConvNeXtLayerNorm is imported but not used
| ) | ||
|
|
||
|
|
||
| class SparseConvNeXtBlock(nn.Module): |
There was a problem hiding this comment.
What is the difference between this ConvNeXt implementation and the ones from torchvision and timm? Would it be possible to inherit from torchvisions CNBlock instead as it is used in dense_model_to_sparse? This way we wouldn't have to implement DropPath ourselves.
There was a problem hiding this comment.
I was initially opting to not rely on Timm, but I'm fine with working with Timm for simplicity. Will change!
There was a problem hiding this comment.
Let's move code in this file to lightly/models/modules/spark_sparse.py
There was a problem hiding this comment.
Let's move the code in this file to lighlty/models/modules/spark_sparse.py
There was a problem hiding this comment.
Let's not add this emtpy file
|
Thanks for all the comments! I'll make some changes later today and I'll let you know how much help I'll need given the scope of this project. |
|
I wish to continue working on that. Is that possible? |
|
Sure, contributions are very welcome :) |
|
@guarin Starting new PR, for now, a draft. Fixing some things you pointed out. I will first fix those that do not imply in great modifications, then i will test and see if its working properly, model training and etc, then i will refactor it to adhere to better engineering patterns. Is that okay to you? |
This is a WIP implementation of the SparK algorithm, as requested by #1462