|
1 | | -# felixgwilliams python-copier-template |
| 1 | +# Python Copier Template for Data Science |
2 | 2 |
|
| 3 | +[](https://github.com/astral-sh/ruff) |
3 | 4 | [](https://github.com/copier-org/copier) |
| 5 | +[](https://github.com/pre-commit/pre-commit) |
| 6 | + |
| 7 | +This is a template built with [Copier](https://github.com/copier-org/copier) to generate a data science focused python project. |
| 8 | + |
| 9 | +Get started with the following command: |
| 10 | + |
| 11 | +```shell |
| 12 | +copier copy gh:felixgwilliams/python-copier-template-ds path/to/destination |
| 13 | +``` |
| 14 | + |
| 15 | +## Features |
| 16 | + |
| 17 | +### Project structure |
| 18 | + |
| 19 | +It is assumed that most of the work will be done in Jupyter Notebooks. |
| 20 | +However, the template also includes a python project, in which you can put functions and classes shared across notebooks. |
| 21 | +The repository is set up to use [Pytest](https://docs.pytest.org/en/stable/) for unit testing this module code. |
| 22 | + |
| 23 | +The template also includes a `data` directory whose contents will be ignored by git. |
| 24 | +You can use this folder to store data that you do not commit. |
| 25 | +You may also put a readme file in which you can document the source datasets you use and how to acquire them. |
| 26 | + |
| 27 | +### [just](https://github.com/casey/just) |
| 28 | + |
| 29 | +`just` is a command runner that allows you to easily to run project-specific commands. |
| 30 | +In fact, you can use `just` to run all the setup commands listed below: |
| 31 | + |
| 32 | +```shell |
| 33 | +just setup |
| 34 | +``` |
| 35 | + |
| 36 | +### [uv](https://github.com/astral-sh/uv) |
| 37 | + |
| 38 | +The repository is set up to use [uv](https://github.com/astral-sh/uv) for package or project management. |
| 39 | +You may set up your python environment with |
| 40 | + |
| 41 | +```shell |
| 42 | +uv sync --all-groups --all-extras |
| 43 | +``` |
| 44 | + |
| 45 | +### [Ruff](https://github.com/astral-sh/ruff) |
| 46 | + |
| 47 | +The repository is configured to use [Ruff](https://github.com/astral-sh/ruff) for linting and formatting. |
| 48 | +By default, all lints are enabled except |
| 49 | + |
| 50 | +- [`COM`](https://docs.astral.sh/ruff/rules/#flake8-commas-com) Enforces trailing commas |
| 51 | +- [`ERA`](https://docs.astral.sh/ruff/rules/#eradicate-era) Disallows commented-out code |
| 52 | +- [`ISC001`](https://docs.astral.sh/ruff/rules/single-line-implicit-string-concatenation/#flake8-executable-exe) (conflicts with the formatter). |
| 53 | + |
| 54 | +In addition, the following rules are only enforced for module code as they are inappropriate or too strict for unit tests and notebooks: |
| 55 | + |
| 56 | +- [`D`](https://docs.astral.sh/ruff/rules/#pydocstyle-d) Requires docstrings on functions, classes and modules |
| 57 | +- [`ANN`](https://docs.astral.sh/ruff/rules/#flake8-annotations-ann) Requires type annotations on functions and methods |
| 58 | +- [`S101`](https://docs.astral.sh/ruff/rules/assert/) Disallows use of `assert` |
| 59 | +- [`PLR2004`](https://docs.astral.sh/ruff/rules/magic-value-comparison/) Disallows "magic" values in comparisons |
| 60 | +- [`T20`](https://docs.astral.sh/ruff/rules/#flake8-print-t20) Disallows print statements |
| 61 | + |
| 62 | +The target line length is 120 and the docstring convention is google. |
| 63 | + |
| 64 | +### [pre-commit](https://github.com/pre-commit/pre-commit) |
| 65 | + |
| 66 | +pre-commit is a tool that runs checks on your files before you commit them with git, thereby helping ensure code quality. |
| 67 | +Enable it with the following command: |
| 68 | + |
| 69 | +```shell |
| 70 | +pre-commit install --install-hooks |
| 71 | +``` |
| 72 | + |
| 73 | +The configuration is stored in `.pre-commit-config.yaml`. |
| 74 | + |
| 75 | +### [nbwipers](https://github.com/felixgwilliams/nbwipers) |
| 76 | + |
| 77 | +`nbwipers` is a tool written in rust to ensure Jupyter notebooks are clean. |
| 78 | +Committing notebooks that are not clean makes diffs more confusing, can degrade performance and increases the risk of leaking sensitive information. |
| 79 | +You can set it up as a git filter with the following command. |
| 80 | + |
| 81 | +```shell |
| 82 | +nbwipers install local |
| 83 | +``` |
| 84 | + |
| 85 | +### [pytest](https://docs.pytest.org/en/stable/) |
| 86 | + |
| 87 | +The repository comes configured to use `pytest` for unit testing the module code. |
| 88 | +Feel free to ignore it if you do not write module code. |
| 89 | + |
| 90 | +### Github Actions |
| 91 | + |
| 92 | +You may optionally add a github workflow file which checks the following: |
| 93 | + |
| 94 | +- uses ruff to check files are formatted and linted |
| 95 | +- Runs unit tests and checks coverage |
| 96 | +- Checks any markdown files are formatted with [markdownlint-cli2](https://github.com/DavidAnson/markdownlint-cli2) |
| 97 | +- Checks that all jupyter notebooks are clean |
| 98 | + |
| 99 | +### [Typos](https://github.com/crate-ci/typos) |
| 100 | + |
| 101 | +Typos checks for common typos in code, aiming for a low false positive rate. |
| 102 | +The repository is configured not to use it for Jupyter notebook files, as it tends to find errors in cell outputs. |
4 | 103 |
|
5 | 104 | Test with [Copier](https://github.com/copier-org/copier) and [copier-template-tester](https://github.com/KyleKing/copier-template-tester). |
0 commit comments