Linear layers constitutes almost half of the trainable parameters


While inspecting the parameter distribution of the [model](https://wandb.ai/chebai/chebai/runs/0oksfx9u/logs), I noticed that **~43% of the total trainable parameters** are concentrated in the **linear layers** alone:

| Component        | Parameters |
|------------------|------------|
| `gnn`            | 1.5M       |
| `linear_layers`  | 365K       |
| `final_layer`    | 783K       |
| **Total**        | 2.648M     |

I was wondering about a few architectural questions:

- Is this heavy reliance on linear layers **intentional**, or could it be **simplified**? It may overshadow the representational power of the GNN backbone.
- Could we experiment with a **single linear head**, as commonly done in standard models like those in [PyG](https://pytorch-geometric.readthedocs.io/en/2.5.2/_modules/torch_geometric/nn/models/basic_gnn.html)?
- Alternatively, would it make sense to shift more capacity back into the GNN module, where the inductive bias is typically stronger?

Metrics:
- `train_macro-f1`: **0.960**
- `train_micro-f1`: **0.990**
- `val_macro-f1`: **0.699**
- `val_micro-f1`: **0.911**



Metric | Train | Validation | Gap
-- | -- | -- | --
Macro F1 | 0.960 | 0.699 | **0.261**
Micro F1 | 0.990 | 0.911 | 0.079





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linear layers constitutes almost half of the trainable parameters #17

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Component	Parameters
`gnn`	1.5M
`linear_layers`	365K
`final_layer`	783K
Total	2.648M

Linear layers constitutes almost half of the trainable parameters #17

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions