Skip to content

Commit 5c09a7d

Browse files
Doc_updates (#370)
* fixed a logger issue added some docs * updadted exp tracking tutorial * deleted unnecessary notebook * added folder to gitignore * added documentation and tutorials * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 04f3448 commit 5c09a7d

29 files changed

+5846
-8947
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -158,3 +158,4 @@ tests/.datasets/
158158
test.py
159159
lightning_logs/
160160
docs/tutorials/examples/basic/
161+
docs/tutorials/pytorch-tabular-covertype/

README.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -76,13 +76,13 @@ For complete Documentation with tutorials visit [ReadTheDocs](https://pytorch-ta
7676
- FT Transformer from [Revisiting Deep Learning Models for Tabular Data](https://arxiv.org/abs/2106.11959)
7777
- [Gated Additive Tree Ensemble](https://arxiv.org/abs/2207.08548v3) is a novel high-performance, parameter and computationally efficient deep learning architecture for tabular data. GATE uses a gating mechanism, inspired from GRU, as a feature representation learning unit with an in-built feature selection mechanism. We combine it with an ensemble of differentiable, non-linear decision trees, re-weighted with simple self-attention to predict our desired output.
7878
- [Gated Adaptive Network for Deep Automated Learning of Features (GANDALF)](https://arxiv.org/abs/2207.08548) is pared-down version of GATE which is more efficient and performing than GATE. GANDALF makes GFLUs the main learning unit, also introducing some speed-ups in the process. With very minimal hyperparameters to tune, this becomes an easy to use and tune model.
79-
8079
- [DANETs: Deep Abstract Networks for Tabular Data Classification and Regression](https://arxiv.org/pdf/2112.02962v4.pdf) is a novel and flexible neural component for tabular data, called Abstract Layer (AbstLay), which learns to explicitly group correlative input features and generate higher-level features for semantics abstraction. A special basic block is built using AbstLays, and we construct a family of Deep Abstract Networks (DANets) for tabular data classification and regression by stacking such blocks.
8180

8281
**Semi-Supervised Learning**
8382

8483
- [Denoising AutoEncoder](https://www.kaggle.com/code/faisalalsrheed/denoising-autoencoders-dae-for-tabular-data) is an autoencoder which learns robust feature representation, to compensate any noise in the dataset.
8584

85+
## Implement Custom Models
8686
To implement new models, see the [How to implement new models tutorial](https://github.com/manujosephv/pytorch_tabular/blob/main/docs/tutorials/04-Implementing%20New%20Architectures.ipynb). It covers basic as well as advanced architectures.
8787

8888
## Usage
@@ -140,11 +140,10 @@ loaded_model = TabularModel.load_model("examples/basic")
140140
## Future Roadmap(Contributions are Welcome)
141141

142142
1. Integrate Optuna Hyperparameter Tuning
143-
1. Integrate Captum for interpretability
144-
1. Have a scikit-learn compatible API
143+
1. Migrate Datamodule to Polars or NVTabular for faster data loading and to handle larger than RAM datasets.
145144
1. Add GaussRank as Feature Transformation
145+
1. Have a scikit-learn compatible API
146146
1. Enable support for multi-label classification
147-
1. Migrate Datamodule to Polars or Vaex for faster data loading and to handle larger than RAM datasets.
148147
1. Keep adding more architectures
149148

150149
## Contributors

docs/gs_cite.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
If you use PyTorch Tabular for a scientific publication, we would appreciate citations to the published software and the following paper:
2+
3+
- [arxiv Paper](https://arxiv.org/abs/2104.13638)
4+
5+
```
6+
@misc{joseph2021pytorch,
7+
title={PyTorch Tabular: A Framework for Deep Learning with Tabular Data},
8+
author={Manu Joseph},
9+
year={2021},
10+
eprint={2104.13638},
11+
archivePrefix={arXiv},
12+
primaryClass={cs.LG}
13+
}
14+
```
15+
16+
- Zenodo Software Citation
17+
18+
```
19+
@article{manujosephv_2021,
20+
title={manujosephv/pytorch_tabular: v0.5.0-alpha},
21+
DOI={10.5281/zenodo.4732773},
22+
abstractNote={<p>First Alpha Release</p>},
23+
publisher={Zenodo},
24+
author={manujosephv},
25+
year={2021},
26+
month={May}
27+
}
28+
```

docs/gs_installation.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
!!! note
2+
3+
Although the installation includes PyTorch, the best and recommended way is to first install PyTorch from [here](https://pytorch.org/get-started/locally/), picking up the right CUDA version for your machine. (PyTorch Version >1.3)
4+
5+
Once, you have got PyTorch installed and working, just use:
6+
7+
```bash
8+
pip install pytorch_tabular[extra]
9+
```
10+
11+
to install the complete library with extra dependencies:
12+
13+
- Weights&Biases for experiment tracking
14+
- Plotly for some visualization
15+
- Captum for Interpretability
16+
17+
And :
18+
19+
``` bash
20+
pip install pytorch_tabular
21+
```
22+
23+
for the bare essentials.
24+
25+
The sources for `pytorch_tabular` can be downloaded from the Github repo.
26+
27+
You can clone the public repository:
28+
29+
``` bash
30+
git clone git://github.com/manujosephv/pytorch_tabular
31+
```
32+
33+
Once you have a copy of the source, you can install it with:
34+
35+
``` bash
36+
pip install .
37+
```
38+
39+
or
40+
41+
``` bash
42+
python setup.py install
43+
```

docs/gs_usage.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
PyTorch Tabular comes with intelligent defaults that make it easy to get started with tabular deep learning. However, it also provides the flexibility to customize the model and pipeline to suit your needs.
2+
3+
Here is a simple example of how to use PyTorch Tabular to train a model, evaluate on new data, generate predictions, and save and load the model.
4+
5+
```python
6+
from pytorch_tabular import TabularModel
7+
from pytorch_tabular.models import CategoryEmbeddingModelConfig
8+
from pytorch_tabular.config import (
9+
DataConfig,
10+
OptimizerConfig,
11+
TrainerConfig,
12+
)
13+
14+
data_config = DataConfig(
15+
target=[
16+
"target"
17+
], # target should always be a list. Multi-targets are only supported for regression. Multi-Task Classification is not implemented
18+
continuous_cols=num_col_names,
19+
categorical_cols=cat_col_names,
20+
)
21+
trainer_config = TrainerConfig(
22+
auto_lr_find=True, # Runs the LRFinder to automatically derive a learning rate
23+
batch_size=1024,
24+
max_epochs=100,
25+
)
26+
optimizer_config = OptimizerConfig()
27+
28+
model_config = CategoryEmbeddingModelConfig(
29+
task="classification",
30+
layers="1024-512-512", # Number of nodes in each layer
31+
activation="LeakyReLU", # Activation between each layers
32+
learning_rate=1e-3,
33+
)
34+
35+
tabular_model = TabularModel(
36+
data_config=data_config,
37+
model_config=model_config,
38+
optimizer_config=optimizer_config,
39+
trainer_config=trainer_config,
40+
)
41+
tabular_model.fit(train=train, validation=val)
42+
result = tabular_model.evaluate(test)
43+
pred_df = tabular_model.predict(test)
44+
tabular_model.save_model("examples/basic")
45+
loaded_model = TabularModel.load_model("examples/basic")
46+
```
47+
48+
For more detailed tutorials and how-to guides refer to the **Tutorials** and **How-To Guides** sections.

docs/imgs/diataxis.webp

41.6 KB
Loading

docs/imgs/gflu_v2.png

92.4 KB
Loading
12.9 KB
Loading

docs/index.md

Lines changed: 13 additions & 113 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
1-
![PyTorch Tabular](imgs/pytorch_tabular_logo.png)
1+
![PyTorch Tabular](imgs/pytorch_tabular_logo.png#only-light)
2+
![PyTorch Tabular](imgs/pytorch_tabular_logo_inv.png#only-dark)
23

34
[![pypi](https://img.shields.io/pypi/v/pytorch_tabular.svg)](https://pypi.python.org/pypi/pytorch_tabular)
45
[![Testing](https://github.com/manujosephv/pytorch_tabular/actions/workflows/testing.yml/badge.svg?event=push)](https://github.com/manujosephv/pytorch_tabular/actions/workflows/testing.yml)
@@ -8,126 +9,25 @@
89
[![DOI](https://zenodo.org/badge/321584367.svg)](https://zenodo.org/badge/latestdoi/321584367)
910
[![contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat-square)](https://github.com/manujosephv/pytorch_tabular/issues)
1011

11-
PyTorch Tabular aims to make Deep Learning with Tabular data easy and accessible to real-world cases and research alike. The core principles behind the design of the library are:
1212

13-
- **Low Resistance Usability**
14-
- **Easy Customization**
15-
- **Scalable and Easier to Deploy**
13+
**PyTorch Tabular** is a powerful library that aims to simplify and popularize the application of deep learning techniques to tabular data. Tabular deep learning has gained significant importance in the field of machine learning due to its ability to handle structured data, such as data in spreadsheets or databases. However, working with tabular data can be challenging, requiring expertise in both deep learning and data preprocessing.
1614

17-
It has been built on the shoulders of giants like [**PyTorch**](https://pytorch.org/)(obviously), [**PyTorch Lightning**](https://www.pytorchlightning.ai/), and [pandas](https://pandas.pydata.org/)
15+
This is where **PyTorch Tabular** comes in. Built on the shoulders of giants like `PyTorch`, `PyTorch Lightning`, and `pandas`, PyTorch Tabular offers a **low resistance usability**, making it accessible to both real-world use cases and research projects. The library's core principles revolve around **easy customization**, allowing users to tailor their models and pipelines to specific requirements. Moreover, PyTorch Tabular provides **scalable and efficient tooling**, making it easier to deploy models in production environments. The underlying goodness of `PyTorch` makes designing deep learning architectures pythonic and intuitive, while `PyTorch Lightning` simplifies the training process. `pandas` is the de-facto standard for working with tabular data, and PyTorch Tabular leverages its strengths to simplify the preprocessing of tabular data. With PyTorch Tabular, data scientists and researchers can focus on the core aspects of their work, while the library takes care of the underlying complexities, enabling efficient and effective tabular deep learning.
1816

19-
## Installation
17+
The documentation is organized taking inspiration from the Diátaxis system of documentation.
2018

21-
Although the installation includes PyTorch, the best and recommended way is to first install PyTorch from [here](https://pytorch.org/get-started/locally/), picking up the right CUDA version for your machine. (PyTorch Version >1.3)
19+
> Diátaxis is a way of thinking about and doing documentation. Diátaxis identifies four distinct needs, and four corresponding forms of documentation - tutorials, how-to guides, technical reference and explanation. It places them in a systematic relationship, and proposes that documentation should itself be organised around the structures of those needs. Diátaxis solves problems related to documentation content (what to write), style (how to write it) and architecture (how to organise it). It is a system for thinking about documentation, and a system for doing documentation. - [Diátaxis](https://diataxis.fr/)
2220
23-
Once, you have got Pytorch installed, just use:
21+
![Diátaxis System of Documentation](imgs/diataxis.webp)
2422

25-
```bash
26-
pip install pytorch_tabular[extra]
27-
```
23+
Taking cues from the system, the documentation is separated into five sections:
2824

29-
to install the complete library with extra dependencies(Weights&Biases and Plotly).
25+
- **Getting Started** - A quick introduction on how to install and get started with PyTorch Tabular.
3026

31-
And :
27+
- **Tutorials** - Short and focused exercises to get you going quickly.
3228

33-
```bash
34-
pip install pytorch_tabular
35-
```
29+
- **How-to Guides** - Step-by-step guides to covering key tasks, real world operations and common problems.
3630

37-
for the bare essentials.
31+
- **Concepts** - Explanations of some of the larger concepts and intricacies of the library.
3832

39-
The sources for pytorch_tabular can be downloaded from the `Github repo`.
40-
41-
You can either clone the public repository:
42-
43-
```bash
44-
git clone git://github.com/manujosephv/pytorch_tabular
45-
```
46-
47-
Once you have a copy of the source, you can install it with:
48-
49-
```bash
50-
pip install .
51-
```
52-
53-
or
54-
55-
```bash
56-
python setup.py install
57-
```
58-
59-
## Usage
60-
61-
```python
62-
from pytorch_tabular import TabularModel
63-
from pytorch_tabular.models import CategoryEmbeddingModelConfig
64-
from pytorch_tabular.config import (
65-
DataConfig,
66-
OptimizerConfig,
67-
TrainerConfig,
68-
)
69-
70-
data_config = DataConfig(
71-
target=[
72-
"target"
73-
], # target should always be a list. Multi-targets are only supported for regression. Multi-Task Classification is not implemented
74-
continuous_cols=num_col_names,
75-
categorical_cols=cat_col_names,
76-
)
77-
trainer_config = TrainerConfig(
78-
auto_lr_find=True, # Runs the LRFinder to automatically derive a learning rate
79-
batch_size=1024,
80-
max_epochs=100,
81-
)
82-
optimizer_config = OptimizerConfig()
83-
84-
model_config = CategoryEmbeddingModelConfig(
85-
task="classification",
86-
layers="1024-512-512", # Number of nodes in each layer
87-
activation="LeakyReLU", # Activation between each layers
88-
learning_rate=1e-3,
89-
)
90-
91-
tabular_model = TabularModel(
92-
data_config=data_config,
93-
model_config=model_config,
94-
optimizer_config=optimizer_config,
95-
trainer_config=trainer_config,
96-
)
97-
tabular_model.fit(train=train, validation=val)
98-
result = tabular_model.evaluate(test)
99-
pred_df = tabular_model.predict(test)
100-
tabular_model.save_model("examples/basic")
101-
loaded_model = TabularModel.load_model("examples/basic")
102-
```
103-
104-
## Citation
105-
106-
If you use PyTorch Tabular for a scientific publication, we would appreciate citations to the published software and the following paper:
107-
108-
- [arxiv Paper](https://arxiv.org/abs/2104.13638)
109-
110-
```
111-
@misc{joseph2021pytorch,
112-
title={PyTorch Tabular: A Framework for Deep Learning with Tabular Data},
113-
author={Manu Joseph},
114-
year={2021},
115-
eprint={2104.13638},
116-
archivePrefix={arXiv},
117-
primaryClass={cs.LG}
118-
}
119-
```
120-
121-
- Zenodo Software Citation
122-
123-
```
124-
@article{manujosephv_2021,
125-
title={manujosephv/pytorch_tabular: v0.5.0-alpha},
126-
DOI={10.5281/zenodo.4732773},
127-
abstractNote={<p>First Alpha Release</p>},
128-
publisher={Zenodo},
129-
author={manujosephv},
130-
year={2021},
131-
month={May}
132-
}
133-
```
33+
- **API Reference** - The technical details of the library: all classes and functions, along with their parameters and return types.

0 commit comments

Comments
 (0)