-
Notifications
You must be signed in to change notification settings - Fork 2
Closed
Labels
questionFurther information is requestedFurther information is requested
Description
While inspecting the parameter distribution of the model, I noticed that ~43% of the total trainable parameters are concentrated in the linear layers alone:
| Component | Parameters |
|---|---|
gnn |
1.5M |
linear_layers |
365K |
final_layer |
783K |
| Total | 2.648M |
I was wondering about a few architectural questions:
- Is this heavy reliance on linear layers intentional, or could it be simplified? It may overshadow the representational power of the GNN backbone.
- Could we experiment with a single linear head, as commonly done in standard models like those in PyG?
- Alternatively, would it make sense to shift more capacity back into the GNN module, where the inductive bias is typically stronger?
Metrics:
train_macro-f1: 0.960train_micro-f1: 0.990val_macro-f1: 0.699val_micro-f1: 0.911
| Metric | Train | Validation | Gap |
|---|---|---|---|
| Macro F1 | 0.960 | 0.699 | 0.261 |
| Micro F1 | 0.990 | 0.911 | 0.079 |
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested