Skip to content

Linear layers constitutes almost half of the trainable parameters #17

@aditya0by0

Description

@aditya0by0

While inspecting the parameter distribution of the model, I noticed that ~43% of the total trainable parameters are concentrated in the linear layers alone:

Component Parameters
gnn 1.5M
linear_layers 365K
final_layer 783K
Total 2.648M

I was wondering about a few architectural questions:

  • Is this heavy reliance on linear layers intentional, or could it be simplified? It may overshadow the representational power of the GNN backbone.
  • Could we experiment with a single linear head, as commonly done in standard models like those in PyG?
  • Alternatively, would it make sense to shift more capacity back into the GNN module, where the inductive bias is typically stronger?

Metrics:

  • train_macro-f1: 0.960
  • train_micro-f1: 0.990
  • val_macro-f1: 0.699
  • val_micro-f1: 0.911
Metric Train Validation Gap
Macro F1 0.960 0.699 0.261
Micro F1 0.990 0.911 0.079

Metadata

Metadata

Assignees

Labels

questionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions