diff --git a/AI_POLICY.md b/AI_POLICY.md new file mode 100644 index 0000000..3e5a5b4 --- /dev/null +++ b/AI_POLICY.md @@ -0,0 +1,84 @@ +# AI Contributions Policy + +## Overview + +We welcome AI-assisted contributions. AI tools can be genuinely useful for writing, editing, refactoring, and exploring ideas. However, AI is a tool, not a contributor. Every submission must be owned, understood, and vouched for by a human. + +The rules below exist to respect everyone's time, preserve code quality, and ensure authentic engagement in our community. They apply to all contributors equally, regardless of role. + +## All AI usage must be disclosed + +If any part of your contribution - code or documentation - was meaningfully shaped by an AI tool, you **must** state this in the opening comment of your pull request. Include the specific tool and version, and a brief description of how it was used. + +Version matters. "ChatGPT" tells us very little. "ChatGPT o3" or "Claude-Sonnet-4-5" tells us something meaningful about the capabilities and tendencies of the model involved. If you do not know the version, check before submitting. If you still cannot determine it then provide as much information as possible. + +Routine use of AI for spell-checking or light grammar correction does not require disclosure. + +Example AI usage disclosure: + +```text +This PR used ChatGPT o3 to help refactor a parser function and generate an initial set of unit tests. All code and tests were reviewed, simplified, and rewritten where necessary. +``` + +## You must fully understand everything you submit + +If you cannot explain what your changes do, why they are the correct approach, and how they interact with the rest of the codebase *without the aid of AI tools* your contribution is not ready for review. When you open a PR, you **must** provide a *short* summary that explains what it does and why the changes were made - these **must not** be written by AI. + +This is the most important rule in this policy. We will ask questions during review. "The AI suggested it" is not an answer. + +## AI-generated code must be indistinguishable from high-quality human work + +Submitting a raw or lightly edited AI output is not acceptable. Before opening a PR, you are expected to thoroughly read, test, and clean up any AI-generated code. In practice this means: + +- **Remove unnecessary code.** AI models routinely generate guards for conditions that cannot realistically occur because the model is overly-cautious. If you cannot point to a real scenario where a guard fires, remove it. +- **Remove redundant or superfluous tests.** AI-generated tests frequently cover unreachable states, re-test behaviour already covered by other tests, or test the language itself rather than your code. Every test you submit should cover a real, meaningful case. If a test would pass regardless of whether your code is correct, it has no place here. +- **Strip all AI artefacts.** This includes issue numbers embedded in docstrings or comments, references to specific line numbers, summaries of what a function does written in a way that mirrors the prompt, `TODO` comments left by the model, and any other content that reads like the model narrating its own output. None of this belongs in submitted code and its presence signals the output has not been read. +- **Remove unnecessary comments.** AI models over-comment. Comments that restate what the code obviously does add noise and should be deleted. Comments should explain *why*, not *what*. +- **Eliminate inconsistent or incorrect naming and style.** AI output frequently mixes naming conventions, uses overly generic identifiers, or invents abstractions that do not exist elsewhere in the codebase. Rename things to match the project's conventions and remove abstractions that are not pulling their weight. +- **Do not pad scope.** AI models have a tendency to add things that were not asked for such as extra utility functions, additional configuration options, broader error hierarchies, convenience overloads. If it was not part of the intended change, remove it. Scope creep in AI-generated PRs is common and wastes review time. **Keep PRs small and focused.** +- **Conform to project style.** Beyond naming, this means matching the project's patterns for error handling, logging, module structure, and code organisation. AI output is trained on the entire Internet, not this repository. +- **All tests must pass.** This obviously applies to both human and AI-assisted contributions. + +If a PR shows obvious signs of unreviewed AI output, it may be closed without detailed feedback. Cleaning up someone else's AI-generated junk is not a reasonable thing to ask of a reviewer. + +## Issues and discussions must be written by you + +You **must not** use AI to write issue reports, pull request descriptions, or discussion comments. + +This is intentional. Writing a PR description or issue in your own words is one of the clearest signals that you actually understand what you are submitting. A concise, accurate, human-written description saves everyone time and demonstrates genuine engagement with the problem. + +AI-generated descriptions are often verbose, imprecise, and critically, may not accurately reflect what the code actually does. They also tend to pad with noise: lists of files modified, marketing-style statements of purported benefits (e.g., "improves robustness"), and bullet-point summaries of trivial changes that restate what the code obviously does. A reviewer who reads a description like that learns nothing useful and may be actively misled. PR descriptions **must not** contain any of this. Inclusion of such text may be cited as evidence of AI generation and failure to read and understand this policy. + +If you cannot write a clear description of your change in your own words, that is a sign the contribution is not ready. + +Posting AI-generated content via automated bots or agents is strictly forbidden. Accounts repeatedly doing this may be banned and reported to GitHub as spam. + +## Documentation + +The same rules that apply to code apply to documentation. AI tools can hallucinate or invent details, as well as produce confident-sounding text that is subtly wrong. Review everything carefully and make sure you can stand behind it. + +## AI-generated media + +AI-generated images, illustrations, audio, or other media assets are not accepted without explicit prior approval from the project maintainers due to copyright reasons. + +## Enforcement + +Contributions that appear to be low-effort AI output, including anything that reads like it was generated without genuine engagement with the codebase, will be closed without detailed feedback. + +AI **must not** be used as the sole basis for approving or rejecting a contribution. A reviewer is always accountable for their decision, and that decision must reflect their own understanding of the change. + +When someone submits unreviewed AI output, they are not really contributing - they are offloading the actual work of understanding, validating, and cleaning up onto the person reviewing it. That is not a fair exchange of effort, and it is not acceptable regardless of who is doing it. + +Repeated low-quality submissions may result in a contributor being blocked. + +## Why this policy exists + +This is not an anti-AI policy. It is an anti-slop policy. + +Reviewing a contribution takes real time and attention from another person. When a submission has not been genuinely understood or cleaned up by its author, the reviewer ends up doing all the actual work. That is true whether the author is a first-time contributor or a long-standing one. This policy applies to everyone. + +If you are using AI to learn - great. Use it to understand the codebase, experiment locally, and build your skills. When you submit, the work should be yours: you read it, you tested it, you understand it, you can defend it, and you assert that you are the copyright owner (or if it includes parts of other open source software, the license is included). + +--- + +This policy draws on the [Ghostty AI Usage Policy](https://github.com/ghostty-org/ghostty/blob/main/AI_POLICY.md), the [htop AI-Assisted Contributions Policy](https://github.com/htop-dev/htop/blob/main/docs/ai-contributions-policy.md), and the [Fedora AI Contribution Policy](https://docs.fedoraproject.org/en-US/council/policy/ai-contribution-policy/). diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 95aec50..f3fc9a0 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,99 +1,116 @@ # Contributing -Contributions to this repository are very welcome. Please create an issue before starting any significant work so that we can discuss and understand the changes. If you are interested in contributing, feel free to contact us or create an issue in the [issue tracking system](https://github.com/AI-SDC/ACRO/issues). Alternatively, you may [fork](https://docs.github.com/en/github/getting-started-with-github/fork-a-repo) the project and submit a [pull request](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request-from-a-fork). All contributions must be made under the same license as the rest of the project: [MIT License](../blob/main/LICENSE). +Contributions to this repository are very welcome. If this is your first contribution to the repository, please ensure that you have carefully read and understood the entirety of this contributing guide and our [AI Policy](AI_POLICY.md). -New code should be accompanied with appropriate unit tests and documentation. The `CHANGELOG.md` is generated from PR titles (see [Changelog](#changelog) below). If this is your first contribution to the repository, please also add your details to `CITATION.cff`. If you are introducing new imports, then these must also be added to `requirements.txt` (in root and docs folders) and `setup.py`. After creating a pull request, the continuous integration tools will automatically run the unit tests, apply the pre-commit checks listed below, and build and deploy the Sphinx documentation (when merged into the main branch.) +Please create an issue before starting any significant work so that we can discuss and understand the changes before you invest time in it. You can contact us directly or use the [issue tracking system](https://github.com/AI-SDC/ACRO/issues). Once agreed, external collaborators should [fork](https://docs.github.com/en/github/getting-started-with-github/fork-a-repo) the project and submit a [pull request](https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request-from-a-fork) (PR). If you are a member of the repository team, your changes should be made in a feature branch before opening a PR. ## Pull Request Standards -All pull requests must meet the following requirements before being accepted: +All PRs **must** meet the following requirements before being accepted. -- **All prek checks pass** (includes Ruff formatting and linting) -- **Codecov reports at least 99% statement coverage** -- **Tests do more than just make sure the lines run, they actually check for desired effects** +### Provenance and legal + +- Contributors assert copyright ownership and release their contribution under the [MIT License](../blob/main/LICENSE). +- All contributor details are present in `CITATION.cff`. First-time contributors must add themselves. +- If work is copied from another open source repository, the license must be checked and included. + +### Code quality + +- The PR is small and addresses a single specific issue. +- Code is high quality. This includes: small focused functions and modules, no duplication, fully documented, extensive use of type hints, no unused arguments, no more than 3 levels of nesting except in rare justified cases, no bloat. +- No inline pragmas. If a rule suppression is genuinely necessary, add a per-file setting to `pyproject.toml` to keep the source code clean. +- New dependencies are added to `pyproject.toml`. +- All [pre-commit checks](#pre-commit) pass, including automatic formatting and linting. Run pre-commit/prek locally before opening a PR. + +### Tests + +- All existing tests pass. +- New code is accompanied by appropriate tests. +- Code coverage is at least 90% statement coverage. +- Tests verify real-world effects, not just that lines of code execute. +- Run the full test suite locally before opening a PR. CI minutes are not unlimited. + +### Pull request description + +- The PR title follows [Conventional Commits](#pull-request-titles) format. +- The description is **short**, written in your own words, and explains what changed and why. See the [AI Policy](AI_POLICY.md) for what this means in practice. AI-generated descriptions are not acceptable. +- Do not add issue or PR numbers to the title manually. To close an issue automatically, add the closing keyword in a comment instead. + +### AI + +- Any use of AI tools to assist with code or documentation is disclosed in the opening PR comment, including the specific tool and version. See the [AI Policy](AI_POLICY.md) for the full requirements. ## Development -Clone the repository and install the dependencies (within a virtual environment): +Clone the repository and install the dependencies within a virtual environment: ```shell $ git clone git@github.com:AI-SDC/ACRO.git $ cd ACRO -$ pip install -e . +$ pip install -e .[test] ``` -Then to run the tests: +Run the tests: ```shell -$ pip install pytest -$ pytest . +$ pytest ``` ## Directory Structure -* `acro`: contains ACRO source code. -* `data`: contains data files for testing. -* `docs`: contains [Sphinx](https://www.sphinx-doc.org) documentation. -* `notebooks`: contains example notebooks. -* `stata`: contains Stata wrapper code. -* `test`: contains unit tests. +| Directory | Contents | +| ----------- | -------------------------------------------------- | +| `acro` | ACRO source code | +| `data` | Data files for testing | +| `docs` | [Sphinx](https://www.sphinx-doc.org) documentation | +| `notebooks` | Example notebooks | +| `stata` | Stata wrapper code | +| `test` | Unit tests | -## Style Guide +## Pre-commit -Code quality is maintained through [prek](https://prek.j178.dev) hooks that run [Ruff](https://github.com/astral-sh/ruff) (which includes pylint-style checks) along with other formatting and linting tools. +Code quality is maintained through [pre-commit](https://prek.j178.dev) hooks that run [Ruff](https://github.com/astral-sh/ruff) along with other formatting and linting tools. A `.pre-commit-config.yaml` configuration file is provided to automatically handle: -A [prek](https://prek.j178.dev) configuration [file](../tree/main/.pre-commit-config.yaml) is provided to automatically: -* Trim trailing whitespace and fix line endings; -* Check for spelling errors; -* Check Yaml files; -* Automatically format and lint with [Ruff](https://github.com/astral-sh/ruff); +- Trimming trailing whitespace and fixing line endings +- Spell checking +- Validating JSON, TOML, YAML, etc., files +- Formatting and linting with Ruff +- Checking types with mypy +- And others -Prek can be setup locally as follows: +We recommend using [uv](https://docs.astral.sh/uv/) to install prek: ```shell -$ pip install prek +uv tool install prek ``` -Then to run on all files locally: +However, you may prefer to use pip: ```shell -$ prek run -a +pip install prek ``` -Make any corrections as necessary and re-run before committing the fixes and then pushing. - -To install as a hook that executes with every `git commit`: +Run on all files locally: ```shell -$ prek install +prek run -a ``` -## Pull Request Titles - -Titles for pull requests should follow the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) specification. +Optionally, install as a git hook so it runs automatically on every commit: -This is a lightweight convention on top of commit messages. It provides a simple set of rules for creating an explicit, readable, and automation-friendly project history. - -Individual commit messages in branches may follow an unrestricted policy, but **PR titles must follow Conventional Commits**. +```shell +prek install +``` -Do **not** add issue or PR numbers manually. If you wish to automatically close -an issue then add the text in a comment. +Make any corrections and re-run before committing. -### Why We Use Conventional Commit PR Titles -We require PR titles to follow the Conventional Commits format because it: +## Pull Request Titles -- **Enables automatic changelogs** - release notes can be generated from PR titles without manual work. -- **Clearly communicates intent** - reviewers can immediately see whether a PR is a `feat`, `fix`, `chore`, etc. -- **Improves git history navigation** - makes it easy to scan and understand changes over time. -- **Aligns with Semantic Versioning (SemVer)** - structured titles help determine version bumps automatically. -- **Supports better PR labeling & filtering** - PRs are labeled by type, making them easier to prioritise and review. -- **Flags breaking changes** - adding `!` (e.g. `feat!:`) automatically marks a PR as a breaking change. +PR titles **must** follow the [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) specification. Individual commit messages within a branch are unrestricted, but the PR title is used to generate the changelog and must be correct. ### Format -The general structure is: - ```text [optional scope]: ``` @@ -104,56 +121,55 @@ Example: feat: send an email to the customer when a product is shipped ``` -Types: +### Types -```text -feat — new feature -fix — bug fix -docs — documentation changes -style — formatting/styling (no code logic) -refactor — code changes without feature/bug impact -perf — performance improvements -test — adding/updating tests -build — changes to build system or dependencies -ci — changes to CI config/scripts -chore — miscellaneous maintenance tasks -revert — reverts an earlier commit -``` +| Type | Use for | +| ---------- | ------------------------------------------------ | +| `feat` | New feature | +| `fix` | Bug fix | +| `docs` | Documentation changes only | +| `style` | Formatting or styling with no logic change | +| `refactor` | Code restructuring without feature or bug impact | +| `perf` | Performance improvements | +| `test` | Adding or updating tests | +| `build` | Build system or dependency changes | +| `ci` | CI configuration or script changes | +| `chore` | Miscellaneous maintenance | +| `revert` | Reverting an earlier commit | -## Changelog +To flag a breaking change, append `!` to the type: `refactor!: renamed foo() to goo()`. -The `CHANGELOG.md` is generated from the commit history using [git-cliff](https://github.com/orhun/git-cliff) and assumes conventional commit messages. +### Why we use Conventional Commit PR titles -### Generate the changelog +We require PR titles to follow the Conventional Commits format because it: + +* Enables automatic changelogs - release notes can be generated from PR titles without manual work. +* Clearly communicates intent - reviewers can immediately see whether a PR is a `feat`, `fix`, `chore`, etc. +* Improves git history navigation - makes it easy to scan and understand changes over time. +* Aligns with Semantic Versioning (SemVer) - structured titles help determine version bumps automatically. +* Supports better PR labeling and filtering - PRs are labeled by type, making them easier to prioritise and review. +* Flags breaking changes - adding `!` (e.g. `feat!:`) automatically marks a PR as a breaking change. -#### Install +## Changelog -We recommend using [uv](https://docs.astral.sh/uv/) to install: +Contributors **should not** modify the `CHANGELOG.md` directly. It is generated at release time by a maintainer from the commit history using [git-cliff](https://github.com/orhun/git-cliff). In future this process may be fully-automated. The tool works best when Conventional Commit messages are used. The configuration lives in `cliff.toml` at the repository root, which converts `(#NNN)` references into markdown PR links and skips noise commits such as pre-commit auto-fixes and release-prep commits. -```shell -uv tool install git-cliff -``` +### Install git-cliff -However, you may prefer to use pip or obtain it using any of the installation -methods provided on the official repository. +The maintainer issuing a release should install as follows: ```shell +uv tool install git-cliff +# or pip install git-cliff ``` -#### Usage +### Generate the changelog -Example that prepends entries between v0.4.12 and the current HEAD (this can be -changed to a specific version if desired). +Example that prepends entries from a given tag to HEAD: ```shell git-cliff v0.4.12..HEAD --config cliff.toml --prepend CHANGELOG.md ``` -### Configuration - -The configuration lives in `cliff.toml` at the repository root. It: - -- Converts `(#NNN)` in commit messages into markdown PR links. -- Skips noise commits (pre-commit auto-fixes, changelog and release-prep commits). -- Filters out commits that do not follow Conventional Commits (see [Pull Request Titles](#pull-request-titles)). +Adjust the tag to match the last release you want to start from.