Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .asf.yaml
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd review this config in a follow-up PR.

Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
Expand Down
28 changes: 28 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

root = true

[*]
end_of_line = lf
indent_style = space
insert_final_newline = true
trim_trailing_whitespace = true

[*.toml]
indent_size = tab
tab_width = 2
2 changes: 0 additions & 2 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
Expand All @@ -13,7 +12,6 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# The default behavior, which overrides 'core.autocrlf', is to use Git's
# built-in heuristics to determine whether a particular file is text or binary.
Expand Down
45 changes: 45 additions & 0 deletions .github/semantic.yml
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need INFRA's help to turn on Semantic PR on this repo. But anyway if we agree on using conventional commits, we can leave it here for now.

Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

# The pull request's title should be fulfilled the following pattern:
#
# <type>[optional scope]: <description>
#
# ... where valid types and scopes can be found below; for example:
#
# build(maven): One level down for native profile
#
# More about configurations on https://github.com/Ezard/semantic-prs#configuration

enabled: true

titleOnly: true

types:
- feat
- fix
- docs
- style
- refactor
- perf
- test
- build
- ci
- chore
- revert

targetUrl: https://github.com/apache/datasketches-rust/blob/main/.github/semantic.yml
104 changes: 104 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

name: CI
on:
pull_request:
branches: [ main ]
push:
branches: [ main ]

# Concurrency strategy:
# github.workflow: distinguish this workflow from others
# github.event_name: distinguish `push` event from `pull_request` event
# github.event.number: set to the number of the pull request if `pull_request` event
# github.run_id: otherwise, it's a `push` event, only cancel if we rerun the workflow
#
# Reference:
# https://docs.github.com/en/actions/using-jobs/using-concurrency
# https://docs.github.com/en/actions/learn-github-actions/contexts#github-context
concurrency:
group: ${{ github.workflow }}-${{ github.event_name }}-${{ github.event.number || github.run_id }}
cancel-in-progress: true
jobs:
check:
name: Check
runs-on: ubuntu-24.04
steps:
- uses: actions/checkout@v6
- name: Delete rust-toolchain.toml
run: rm rust-toolchain.toml
- name: Install toolchain
uses: dtolnay/rust-toolchain@nightly
with:
components: rustfmt,clippy
- uses: Swatinem/rust-cache@v2
- uses: taiki-e/install-action@v2
with:
tool: typos-cli,taplo-cli,hawkeye
- name: Check all
run: |
hawkeye check
taplo format --check
typos
cargo +nightly fmt --all -- --check
cargo +nightly clippy --all-targets --all-features -- -D warnings
test:
name: Run tests
strategy:
matrix:
os: [ ubuntu-24.04, macos-14, windows-2022 ]
rust-version: [ "1.85.0", "stable" ]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v6
- name: Delete rust-toolchain.toml
run: rm rust-toolchain.toml
- uses: Swatinem/rust-cache@v2
- name: Install toolchain
uses: dtolnay/rust-toolchain@master
with:
toolchain: ${{ matrix.rust-version }}
- name: Build
run: cargo build --workspace --all-features --bins --tests --examples --benches --lib
- name: Run unit tests
shell: bash
run: cargo test --all-features -- --nocapture
- name: Run examples
shell: bash
run: |
set -x
cargo run --example hll_usage
required:
name: Required
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can later add a required status check in .asf.yaml to ensure all PRs can only be merged if CI passed.

runs-on: ubuntu-24.04
if: ${{ always() }}
needs:
- check
- test
steps:
- name: Guardian
run: |
if [[ ! ( \
"${{ needs.check.result }}" == "success" \
&& "${{ needs.test.result }}" == "success" \
) ]]; then
echo "Required jobs haven't been completed successfully."
exit -1
fi
85 changes: 19 additions & 66 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,85 +1,38 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

# Eclipse project files
.classpath
.project
.settings/
.settings
.checkstyle

# IntelliJ project files
*.idea
.idea
**/*.iml
*.ipr
*.iws

# VSCode project files
**/.vscode/

# Additional tools
.clover/
.vscode
!.vscode/settings.json

# OSX files
**/.DS_Store

# Compiler output, class files
*.class
bin/

# Log file
*.log

# BlueJ files
*.ctxt

# Mobile Tools for Java (J2ME)
.mtj.tmp/

# Package Files #
*.jar
*.war
*.ear
*.zip
*.tar.gz
*.rar

# virtual machine crash logs, see http://www.java.com/en/download/help/error_hotspot.xml
hs_err_pid*

#Test config and output
test-output/
local/
reports/
.pmd
tmp/

# Build artifacts
target/
out/
build/
jarsIn/
build.xml
*.properties
*.releaseBackup
*.next
*.tag
doc/

# Jekyll
_site/
_*
_*/
**/target
43 changes: 43 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Contributing

Thank you for contributing to Apache DataSketches!

The goal of this document is to provide everything you need to start contributing to this core Rust library.

## Your First Contribution

1. [Fork the DataSketches repository](https://github.com/apache/datasketches-rust/fork) in your own GitHub account.
2. [Create a new Git branch](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-and-deleting-branches-within-your-repository).
3. Make your changes.
4. [Submit the branch as a pull request](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request-from-a-fork) to the upstream repo. A DataSketches team member should comment and/or review your pull request within a few days. Although, depending on the circumstances, it may take longer.

## Setup

This repo develops Apache® DataSketches™ Core Rust Library Component. To build this project, you will need to set up Rust development first. We highly recommend using [rustup](https://rustup.rs/) for the setup process.

For Linux or macOS users, use the following command:

```shell
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
```

For Windows users, download `rustup-init.exe` from [here](https://win.rustup.rs/x86_64) instead.

Rustup will read the `rust-toolchain.toml` file and set up everything else automatically. To ensure that everything works correctly, run `cargo version` under the root directory:

```shell
cargo version
# cargo 1.85.0 (<hash> 2024-12-31)
```

To keep code style consistent, we use the following tools:

* Nightly `rustfmt` for code formatting: `cargo +nightly fmt --all -- --check`
* Nightly `clippy` for linting: `cargo +nightly clippy --all-targets --all-features -- -D warnings`
* [`typos`](https://github.com/crate-ci/typos) for spell checking: `cargo install typos-cli` and then `typos`
* [`taplo`](https://taplo.tamasfe.dev/) for checking `toml` files: `cargo install taplo-cli` and then `taplo check`
* [`hawkeye`](https://github.com/korandoru/hawkeye) for checking license header: `cargo install hawkeye` and then `hawkeye check`

## Code of Conduct

We expect all community members to follow our [Code of Conduct](https://www.apache.org/foundation/policies/conduct.html).
16 changes: 15 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,21 @@
[package]
name = "datasketches"
version = "0.1.0"

edition = "2024"
rust-version = "1.85.0"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edition 2024 requires at least MSRV=1.85.

I personally like moving MSRV eagerly because the old toolchain doesn't get maintained anyway. But other library authors would like to stay as low as possible to keep a wide user adoption range, which, in turn, blocks their dependencies from bumping MSRV.

This is another topic we may discuss later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to start with 1.85.


categories = ["data-structures", "algorithms"]
description = "A software library of stochastic streaming algorithms (a.k.a. sketches)"
homepage = "https://datasketches.apache.org"
keywords = ["sketch", "hyperloglog", "probabilistic"]
license = "Apache-2.0"
readme = "README.md"
repository = "https://github.com/apache/datasketches-rust"

[package.metadata.docs.rs]
all-features = true
rustdoc-args = ["--cfg", "docsrs"]

[dependencies]
mur3 = "0.1"
mur3 = { version = "0.1.0" }
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,13 @@
under the License.
-->

# Apache<sup>&reg;</sup> DataSketches&trade; Core Rust Library Component
# Apache® DataSketches Core Rust Library Component

> [!WARNING]
>
> This repository is under early development. Use it with caution!

This is the core Rust component of the DataSketches library. It contains a subset of the sketching algorithms and can be accessed directly from user applications.
This is the core Rust component of the DataSketches library. It contains a subset of the sketching algorithms and can be accessed directly from user applications.

Note that we have parallel core library components for Java, C++, Python, and Go implementations of many of the same sketch algorithms:

Expand Down
Loading