|
1 | | - |
| 1 | + |
| 2 | + |
2 | 3 |
|
3 | 4 | [](https://pypi.python.org/pypi/pytorch_tabular) |
4 | 5 | [](https://github.com/manujosephv/pytorch_tabular/actions/workflows/testing.yml) |
|
8 | 9 | [](https://zenodo.org/badge/latestdoi/321584367) |
9 | 10 | [](https://github.com/manujosephv/pytorch_tabular/issues) |
10 | 11 |
|
11 | | -PyTorch Tabular aims to make Deep Learning with Tabular data easy and accessible to real-world cases and research alike. The core principles behind the design of the library are: |
12 | 12 |
|
13 | | -- **Low Resistance Usability** |
14 | | -- **Easy Customization** |
15 | | -- **Scalable and Easier to Deploy** |
| 13 | +**PyTorch Tabular** is a powerful library that aims to simplify and popularize the application of deep learning techniques to tabular data. Tabular deep learning has gained significant importance in the field of machine learning due to its ability to handle structured data, such as data in spreadsheets or databases. However, working with tabular data can be challenging, requiring expertise in both deep learning and data preprocessing. |
16 | 14 |
|
17 | | -It has been built on the shoulders of giants like [**PyTorch**](https://pytorch.org/)(obviously), [**PyTorch Lightning**](https://www.pytorchlightning.ai/), and [pandas](https://pandas.pydata.org/) |
| 15 | +This is where **PyTorch Tabular** comes in. Built on the shoulders of giants like `PyTorch`, `PyTorch Lightning`, and `pandas`, PyTorch Tabular offers a **low resistance usability**, making it accessible to both real-world use cases and research projects. The library's core principles revolve around **easy customization**, allowing users to tailor their models and pipelines to specific requirements. Moreover, PyTorch Tabular provides **scalable and efficient tooling**, making it easier to deploy models in production environments. The underlying goodness of `PyTorch` makes designing deep learning architectures pythonic and intuitive, while `PyTorch Lightning` simplifies the training process. `pandas` is the de-facto standard for working with tabular data, and PyTorch Tabular leverages its strengths to simplify the preprocessing of tabular data. With PyTorch Tabular, data scientists and researchers can focus on the core aspects of their work, while the library takes care of the underlying complexities, enabling efficient and effective tabular deep learning. |
18 | 16 |
|
19 | | -## Installation |
| 17 | +The documentation is organized taking inspiration from the Diátaxis system of documentation. |
20 | 18 |
|
21 | | -Although the installation includes PyTorch, the best and recommended way is to first install PyTorch from [here](https://pytorch.org/get-started/locally/), picking up the right CUDA version for your machine. (PyTorch Version >1.3) |
| 19 | +> Diátaxis is a way of thinking about and doing documentation. Diátaxis identifies four distinct needs, and four corresponding forms of documentation - tutorials, how-to guides, technical reference and explanation. It places them in a systematic relationship, and proposes that documentation should itself be organised around the structures of those needs. Diátaxis solves problems related to documentation content (what to write), style (how to write it) and architecture (how to organise it). It is a system for thinking about documentation, and a system for doing documentation. - [Diátaxis](https://diataxis.fr/) |
22 | 20 |
|
23 | | -Once, you have got Pytorch installed, just use: |
| 21 | + |
24 | 22 |
|
25 | | -```bash |
26 | | - pip install pytorch_tabular[extra] |
27 | | -``` |
| 23 | +Taking cues from the system, the documentation is separated into five sections: |
28 | 24 |
|
29 | | -to install the complete library with extra dependencies(Weights&Biases and Plotly). |
| 25 | +- **Getting Started** - A quick introduction on how to install and get started with PyTorch Tabular. |
30 | 26 |
|
31 | | -And : |
| 27 | +- **Tutorials** - Short and focused exercises to get you going quickly. |
32 | 28 |
|
33 | | -```bash |
34 | | - pip install pytorch_tabular |
35 | | -``` |
| 29 | +- **How-to Guides** - Step-by-step guides to covering key tasks, real world operations and common problems. |
36 | 30 |
|
37 | | -for the bare essentials. |
| 31 | +- **Concepts** - Explanations of some of the larger concepts and intricacies of the library. |
38 | 32 |
|
39 | | -The sources for pytorch_tabular can be downloaded from the `Github repo`. |
40 | | - |
41 | | -You can either clone the public repository: |
42 | | - |
43 | | -```bash |
44 | | -git clone git://github.com/manujosephv/pytorch_tabular |
45 | | -``` |
46 | | - |
47 | | -Once you have a copy of the source, you can install it with: |
48 | | - |
49 | | -```bash |
50 | | -pip install . |
51 | | -``` |
52 | | - |
53 | | -or |
54 | | - |
55 | | -```bash |
56 | | -python setup.py install |
57 | | -``` |
58 | | - |
59 | | -## Usage |
60 | | - |
61 | | -```python |
62 | | -from pytorch_tabular import TabularModel |
63 | | -from pytorch_tabular.models import CategoryEmbeddingModelConfig |
64 | | -from pytorch_tabular.config import ( |
65 | | - DataConfig, |
66 | | - OptimizerConfig, |
67 | | - TrainerConfig, |
68 | | -) |
69 | | - |
70 | | -data_config = DataConfig( |
71 | | - target=[ |
72 | | - "target" |
73 | | - ], # target should always be a list. Multi-targets are only supported for regression. Multi-Task Classification is not implemented |
74 | | - continuous_cols=num_col_names, |
75 | | - categorical_cols=cat_col_names, |
76 | | -) |
77 | | -trainer_config = TrainerConfig( |
78 | | - auto_lr_find=True, # Runs the LRFinder to automatically derive a learning rate |
79 | | - batch_size=1024, |
80 | | - max_epochs=100, |
81 | | -) |
82 | | -optimizer_config = OptimizerConfig() |
83 | | - |
84 | | -model_config = CategoryEmbeddingModelConfig( |
85 | | - task="classification", |
86 | | - layers="1024-512-512", # Number of nodes in each layer |
87 | | - activation="LeakyReLU", # Activation between each layers |
88 | | - learning_rate=1e-3, |
89 | | -) |
90 | | - |
91 | | -tabular_model = TabularModel( |
92 | | - data_config=data_config, |
93 | | - model_config=model_config, |
94 | | - optimizer_config=optimizer_config, |
95 | | - trainer_config=trainer_config, |
96 | | -) |
97 | | -tabular_model.fit(train=train, validation=val) |
98 | | -result = tabular_model.evaluate(test) |
99 | | -pred_df = tabular_model.predict(test) |
100 | | -tabular_model.save_model("examples/basic") |
101 | | -loaded_model = TabularModel.load_model("examples/basic") |
102 | | -``` |
103 | | - |
104 | | -## Citation |
105 | | - |
106 | | -If you use PyTorch Tabular for a scientific publication, we would appreciate citations to the published software and the following paper: |
107 | | - |
108 | | -- [arxiv Paper](https://arxiv.org/abs/2104.13638) |
109 | | - |
110 | | -``` |
111 | | -@misc{joseph2021pytorch, |
112 | | - title={PyTorch Tabular: A Framework for Deep Learning with Tabular Data}, |
113 | | - author={Manu Joseph}, |
114 | | - year={2021}, |
115 | | - eprint={2104.13638}, |
116 | | - archivePrefix={arXiv}, |
117 | | - primaryClass={cs.LG} |
118 | | -} |
119 | | -``` |
120 | | - |
121 | | -- Zenodo Software Citation |
122 | | - |
123 | | -``` |
124 | | -@article{manujosephv_2021, |
125 | | - title={manujosephv/pytorch_tabular: v0.5.0-alpha}, |
126 | | - DOI={10.5281/zenodo.4732773}, |
127 | | - abstractNote={<p>First Alpha Release</p>}, |
128 | | - publisher={Zenodo}, |
129 | | - author={manujosephv}, |
130 | | - year={2021}, |
131 | | - month={May} |
132 | | -} |
133 | | -``` |
| 33 | +- **API Reference** - The technical details of the library: all classes and functions, along with their parameters and return types. |
0 commit comments