Conversation
Create directory layout (src/Randstr/, app/, test/) and randstr.cabal with library, executable, and test-suite configurations. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds a Haskell port of the randstr library under haskell/ directory with:
- Core types (Token, Quantifier, TokenType)
- Pattern tokenizer supporting all escape sequences, POSIX classes,
character classes, groups, named groups, backreferences, Unicode properties
- String generator with quantifiers (*, +, ?, {n}, normal distribution)
- Character class definitions matching the Racket implementation
- CLI with -n, -m, -s flags and env var support
- 39 comprehensive tests covering all features
- Updated Justfile with hs-build, hs-test, hs-clean targets
Co-authored-by: hugefiver <18693500+hugefiver@users.noreply.github.com>
Co-authored-by: hugefiver <18693500+hugefiver@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds a full Haskell implementation of the randstr library/CLI under haskell/, mirroring the existing Racket pipeline (pattern → tokens → generated string) and wiring it into the repo’s developer workflow.
Changes:
- Introduces Haskell library modules for tokenization, generation, character classes, configuration, and utilities.
- Adds a Haskell CLI (
randstr) plus a Cabal package definition and HUnit test suite. - Updates repo tooling (
Justfile) and.gitignorefor Haskell build artifacts.
Reviewed changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| haskell/src/Randstr/Types.hs | Adds core ADTs for tokens/quantifiers/content. |
| haskell/src/Randstr/Tokenizer.hs | Implements regex-like parsing into tokens (classes, groups, quantifiers, properties). |
| haskell/src/Randstr/Generator.hs | Generates strings from tokens, including groups/alternation and quantifiers. |
| haskell/src/Randstr/CharClasses.hs | Defines character sets and random selection helpers; Unicode property ranges. |
| haskell/src/Randstr/Config.hs | Defines generation config and RNG helpers. |
| haskell/src/Randstr/Utils.hs | Adds ordered de-duplication helper. |
| haskell/src/Randstr/Lib.hs | Public API surface for library consumers. |
| haskell/app/Main.hs | CLI implementation and env-var integration. |
| haskell/test/Spec.hs | Adds HUnit test coverage for core behaviors. |
| haskell/randstr.cabal | Cabal package config for lib/exe/tests. |
| Justfile | Adds hs-build/hs-test/hs-clean targets. |
| .gitignore | Ignores Cabal/GHC build artifacts. |
haskell/src/Randstr/Generator.hs
Outdated
| let vec = V.fromList options | ||
| charFunc cfg0 = CC.vectorRandomRef vec cfg0 | ||
| (chars, cfg') = applyQuantifierFunc charFunc (tokenQuantifier tok) cfg | ||
| in (chars, env, cfg') |
There was a problem hiding this comment.
CharClass tokens can have an empty options list (e.g., unknown POSIX class like [[:foo:]] or other parsing edge-cases). When options is empty, vectorRandomRef ultimately calls configRandom 0, which will crash. Handle the empty case explicitly (e.g., treat it as a literal error, fall back to a default char, or ensure the tokenizer never returns empty classes).
| let vec = V.fromList options | |
| charFunc cfg0 = CC.vectorRandomRef vec cfg0 | |
| (chars, cfg') = applyQuantifierFunc charFunc (tokenQuantifier tok) cfg | |
| in (chars, env, cfg') | |
| case options of | |
| [] -> | |
| -- Defensive: avoid crashing on empty character classes (e.g., unknown POSIX classes). | |
| ("", env, cfg) | |
| _ -> | |
| let vec = V.fromList options | |
| charFunc cfg0 = CC.vectorRandomRef vec cfg0 | |
| (chars, cfg') = applyQuantifierFunc charFunc (tokenQuantifier tok) cfg | |
| in (chars, env, cfg') |
| -- | Generate n strings, threading the config through. | ||
| generateN :: String -> Int -> Config -> [String] | ||
| generateN _ 0 _ = [] | ||
| generateN pattern n cfg = | ||
| let tokens = tokenizePattern pattern | ||
| (result, cfg') = generateFromTokens tokens cfg | ||
| in result : generateN pattern (n - 1) cfg' |
There was a problem hiding this comment.
generateN only stops at n == 0; if callers pass a negative n, this will recurse forever. Add a guard/base case for n <= 0 and consider validating n in randstrN/randstrNWith to return an error for invalid counts.
haskell/src/Randstr/Tokenizer.hs
Outdated
| posixClassToChars "print" = printableChars | ||
| posixClassToChars "punct" = punctuationChars | ||
| posixClassToChars "xdigit" = hexDigitChars | ||
| posixClassToChars _ = [] -- Unknown POSIX class |
There was a problem hiding this comment.
posixClassToChars _ = [] means unknown POSIX classes produce an empty character class, which later can crash generation (empty options) and also differs from the Racket implementation (which falls back to treating an invalid POSIX class start as literals). Consider changing POSIX parsing to fall back to literal characters when the class name is unknown/invalid, or otherwise ensure the resulting class is never empty.
| posixClassToChars _ = [] -- Unknown POSIX class | |
| -- Fallback for unknown POSIX classes: use all ASCII characters to avoid empty classes. | |
| posixClassToChars _ = asciiChars |
haskell/src/Randstr/Config.hs
Outdated
| -- Returns (randomValue, newConfig). | ||
| configRandom :: Int -> Config -> (Int, Config) | ||
| configRandom n cfg = | ||
| let (val, newGen) = uniformR (0, n - 1) (cfgGen cfg) |
There was a problem hiding this comment.
configRandom will throw when n <= 0 (it calls uniformR (0, n-1)). Several call sites can pass 0 (e.g., random selection from an empty list/vector) and CLI allows negative cfgMaxRepeat which can make maxR + 1 be 0. Guard against non-positive n (either clamp to 1, return 0 deterministically, or change the API to return Maybe/Either and propagate errors).
| let (val, newGen) = uniformR (0, n - 1) (cfgGen cfg) | |
| let safeN = if n <= 0 then 1 else n | |
| (val, newGen) = uniformR (0, safeN - 1) (cfgGen cfg) |
haskell/app/Main.hs
Outdated
| ((v, ""):_) -> go (opts { optCount = v }) rest | ||
| _ -> Left $ "Invalid count: " ++ n | ||
| go opts ("--count":n:rest) = case reads n :: [(Int, String)] of | ||
| ((v, ""):_) -> go (opts { optCount = v }) rest | ||
| _ -> Left $ "Invalid count: " ++ n |
There was a problem hiding this comment.
-n/--count accepts 0 or negative values. With count <= 0, randstrNWith calls generateN with a negative n, which never hits the n == 0 base case and will recurse indefinitely. Validate optCount as >= 1 during argument parsing and return a clear error when invalid.
| ((v, ""):_) -> go (opts { optCount = v }) rest | |
| _ -> Left $ "Invalid count: " ++ n | |
| go opts ("--count":n:rest) = case reads n :: [(Int, String)] of | |
| ((v, ""):_) -> go (opts { optCount = v }) rest | |
| _ -> Left $ "Invalid count: " ++ n | |
| ((v, ""):_) | v >= 1 -> go (opts { optCount = v }) rest | |
| _ -> Left $ "Invalid count (must be >= 1): " ++ n | |
| go opts ("--count":n:rest) = case reads n :: [(Int, String)] of | |
| ((v, ""):_) | v >= 1 -> go (opts { optCount = v }) rest | |
| _ -> Left $ "Invalid count (must be >= 1): " ++ n |
haskell/app/Main.hs
Outdated
| ((v, ""):_) -> go (opts { optMaxRepeat = Just v }) rest | ||
| _ -> Left $ "Invalid max-repeat: " ++ n | ||
| go opts ("--max-repeat":n:rest) = case reads n :: [(Int, String)] of | ||
| ((v, ""):_) -> go (opts { optMaxRepeat = Just v }) rest |
There was a problem hiding this comment.
-m/--max-repeat accepts negative values. A negative cfgMaxRepeat can make configRandom (maxR + 1) evaluate to configRandom 0, which crashes. Validate max-repeat as >= 0 (or clamp it) in argument parsing.
| ((v, ""):_) -> go (opts { optMaxRepeat = Just v }) rest | |
| _ -> Left $ "Invalid max-repeat: " ++ n | |
| go opts ("--max-repeat":n:rest) = case reads n :: [(Int, String)] of | |
| ((v, ""):_) -> go (opts { optMaxRepeat = Just v }) rest | |
| ((v, ""):_) | v >= 0 -> go (opts { optMaxRepeat = Just v }) rest | |
| _ -> Left $ "Invalid max-repeat: " ++ n | |
| go opts ("--max-repeat":n:rest) = case reads n :: [(Int, String)] of | |
| ((v, ""):_) | v >= 0 -> go (opts { optMaxRepeat = Just v }) rest |
| , " -n, --count N Generate N strings (default: 1)" | ||
| , " -m, --max-repeat N Maximum repetition for * and + (default: env RANDSTR_MAX_REPEAT or 5)" | ||
| , " -s, --secure Use cryptographically secure random number generator" | ||
| , " -h, --help Show this help message" |
There was a problem hiding this comment.
The CLI help and config field claim “cryptographically secure random”, but cfgSecureRandom is never used and all randomness comes from StdGen/uniformR. Either implement a secure RNG path (and thread it through Config), or remove/rename the option and documentation to avoid misleading users.
haskell/src/Randstr/Tokenizer.hs
Outdated
| in (NormalRange (max 0 n1) (max 1 n2) (plusCount + 1), []) | ||
| parseSecondNumber digits1 digits2 ('}':rest) plusCount = | ||
| let n1 = readInt (reverse digits1) | ||
| n2 = readInt (reverse digits2) | ||
| in (NormalRange (max 0 n1) (max 1 n2) (plusCount + 1), rest) | ||
| parseSecondNumber digits1 digits2 (c:rest) plusCount | ||
| | c >= '0' && c <= '9' = parseSecondNumber digits1 (c : digits2) rest plusCount | ||
| | otherwise = | ||
| let n1 = readInt (reverse digits1) | ||
| n2 = readInt (reverse digits2) | ||
| in (NormalRange (max 0 n1) (max 1 n2) (plusCount + 1), c:rest) |
There was a problem hiding this comment.
Normal-range quantifier parsing clamps n2 with max 1 n2, which changes semantics vs the Racket implementation: an explicit upper bound of 0 (e.g., {+0} / {n1+0}) should remain 0, not be forced to 1. Only a missing second number should default to 1.
| in (NormalRange (max 0 n1) (max 1 n2) (plusCount + 1), []) | |
| parseSecondNumber digits1 digits2 ('}':rest) plusCount = | |
| let n1 = readInt (reverse digits1) | |
| n2 = readInt (reverse digits2) | |
| in (NormalRange (max 0 n1) (max 1 n2) (plusCount + 1), rest) | |
| parseSecondNumber digits1 digits2 (c:rest) plusCount | |
| | c >= '0' && c <= '9' = parseSecondNumber digits1 (c : digits2) rest plusCount | |
| | otherwise = | |
| let n1 = readInt (reverse digits1) | |
| n2 = readInt (reverse digits2) | |
| in (NormalRange (max 0 n1) (max 1 n2) (plusCount + 1), c:rest) | |
| in (NormalRange (max 0 n1) (max 0 n2) (plusCount + 1), []) | |
| parseSecondNumber digits1 digits2 ('}':rest) plusCount = | |
| let n1 = readInt (reverse digits1) | |
| n2 = readInt (reverse digits2) | |
| in (NormalRange (max 0 n1) (max 0 n2) (plusCount + 1), rest) | |
| parseSecondNumber digits1 digits2 (c:rest) plusCount | |
| | c >= '0' && c <= '9' = parseSecondNumber digits1 (c : digits2) rest plusCount | |
| | otherwise = | |
| let n1 = readInt (reverse digits1) | |
| n2 = readInt (reverse digits2) | |
| in (NormalRange (max 0 n1) (max 0 n2) (plusCount + 1), c:rest) |
haskell/src/Randstr/Config.hs
Outdated
| configRandom n cfg = | ||
| let (val, newGen) = uniformR (0, n - 1) (cfgGen cfg) | ||
| in (val, cfg { cfgGen = newGen }) | ||
|
|
||
| -- | Generate a random floating point number in [0, 1). | ||
| -- Returns (randomValue, newConfig). | ||
| -- The upper bound ensures we never reach exactly 1.0. | ||
| configRandomReal :: Config -> (Double, Config) | ||
| configRandomReal cfg = | ||
| let upperBound = 1.0 - 2.220446049250313e-16 -- 1.0 - machine epsilon for Double | ||
| (val, newGen) = uniformR (0.0 :: Double, upperBound) (cfgGen cfg) | ||
| in (val, cfg { cfgGen = newGen }) |
There was a problem hiding this comment.
The cfgSecureRandom flag (and the CLI --secure/RANDSTR_SECURE options that set it) advertise a cryptographically secure mode, but both configRandom and configRandomReal always use StdGen/uniformR and never branch on cfgSecureRandom, so all randomness remains non-cryptographic. If callers rely on this mode to generate secrets such as passwords, tokens, or API keys, an attacker who can observe outputs can feasibly brute-force or predict the PRNG state and derive other generated values. Implement a separate CSPRNG-backed random source that is selected whenever cfgSecureRandom is true (and threaded through all generators), or remove the secure option to avoid giving a false sense of security.
| configRandom n cfg = | |
| let (val, newGen) = uniformR (0, n - 1) (cfgGen cfg) | |
| in (val, cfg { cfgGen = newGen }) | |
| -- | Generate a random floating point number in [0, 1). | |
| -- Returns (randomValue, newConfig). | |
| -- The upper bound ensures we never reach exactly 1.0. | |
| configRandomReal :: Config -> (Double, Config) | |
| configRandomReal cfg = | |
| let upperBound = 1.0 - 2.220446049250313e-16 -- 1.0 - machine epsilon for Double | |
| (val, newGen) = uniformR (0.0 :: Double, upperBound) (cfgGen cfg) | |
| in (val, cfg { cfgGen = newGen }) | |
| configRandom n cfg | |
| | cfgSecureRandom cfg = | |
| error "Secure random mode (cfgSecureRandom=True) is not implemented in this build; cryptographically secure random generation is unavailable." | |
| | otherwise = | |
| let (val, newGen) = uniformR (0, n - 1) (cfgGen cfg) | |
| in (val, cfg { cfgGen = newGen }) | |
| -- | Generate a random floating point number in [0, 1). | |
| -- Returns (randomValue, newConfig). | |
| -- The upper bound ensures we never reach exactly 1.0. | |
| configRandomReal :: Config -> (Double, Config) | |
| configRandomReal cfg | |
| | cfgSecureRandom cfg = | |
| error "Secure random mode (cfgSecureRandom=True) is not implemented in this build; cryptographically secure random generation is unavailable." | |
| | otherwise = | |
| let upperBound = 1.0 - 2.220446049250313e-16 -- 1.0 - machine epsilon for Double | |
| (val, newGen) = uniformR (0.0 :: Double, upperBound) (cfgGen cfg) | |
| in (val, cfg { cfgGen = newGen }) |
|
@copilot apply changes based on the comments in this thread |
…har class, secure mode error, fix NormalRange clamping - Config.hs: Guard configRandom against n<=0 (clamp to 1); error on cfgSecureRandom=True - Generator.hs: Handle empty options in CharClass to avoid crash - Tokenizer.hs: Unknown POSIX classes fall back to asciiChars; NormalRange uses max 0 n2 - Lib.hs: generateN handles n<=0 safely - Main.hs: Validate -n>=1, -m>=0; update help text about secure mode Co-authored-by: hugefiver <18693500+hugefiver@users.noreply.github.com>
Applied all 9 review comments in commit 358ab89:
|
| testNonWhitespaceChar :: Test | ||
| testNonWhitespaceChar = TestCase $ do | ||
| let result = genWith "\\S+" | ||
| assertBool "\\S+ produces non-empty string" (not (null result)) | ||
|
|
There was a problem hiding this comment.
This test only checks for non-emptiness, but it doesn't verify that \\S+ generates non-whitespace characters. Please assert that all characters in result are not in the whitespace set (space/tab/newline/CR).
haskell/test/Spec.hs
Outdated
| testPosixBlank :: Test | ||
| testPosixBlank = TestCase $ | ||
| let _result = genWith "[[:blank:]]*" | ||
| in assertBool "[:blank:]* produces string" True |
There was a problem hiding this comment.
This test doesn't assert any property of the output besides "True", so it won't catch regressions in [[:blank:]]*. Please assert that the result contains only blank characters (space/tab) (and optionally that its length is within expected bounds for *).
haskell/app/Main.hs
Outdated
| -- | Read a positive integer from an environment variable. | ||
| envToPositiveInteger :: String -> IO (Maybe Int) | ||
| envToPositiveInteger name = do | ||
| val <- lookupEnv name | ||
| return $ case val of | ||
| Nothing -> Nothing | ||
| Just "" -> Nothing | ||
| Just v -> case reads v :: [(Int, String)] of | ||
| ((n, ""):_) | n > 0 -> Just n | ||
| _ -> Nothing |
There was a problem hiding this comment.
--max-repeat accepts 0, but RANDSTR_MAX_REPEAT is parsed via envToPositiveInteger which requires n > 0. If a user sets RANDSTR_MAX_REPEAT=0 to disallow repetition, it'll be ignored and the default (5) will be used instead. Consider allowing 0 here (and updating the helper name/comment accordingly) to keep env var behavior consistent with CLI parsing.
haskell/src/Randstr/Lib.hs
Outdated
| import Randstr.Types () | ||
| import Randstr.Config (Config(..), defaultConfig) | ||
| import Randstr.Tokenizer |
There was a problem hiding this comment.
Randstr.Types () appears to be a redundant import (nothing is referenced from it, and it doesn't look like it's only being imported for instances). With -Wall, this is likely to produce an unused-import warning; please remove it or import the specific identifiers you actually need.
| testWhitespaceChar :: Test | ||
| testWhitespaceChar = TestCase $ | ||
| let _result = genWith "\\s*" | ||
| in assertBool "\\s* produces string" True -- may be empty due to * | ||
|
|
There was a problem hiding this comment.
This test doesn't assert any property of the output besides "True", so it won't catch regressions (e.g., if \\s* started producing non-whitespace). Please assert something meaningful, such as that the result consists only of whitespace characters (and optionally that its length is within the configured max repeat bounds).
| testNonWordChar :: Test | ||
| testNonWordChar = TestCase $ do | ||
| let result = genWith "\\W+" | ||
| assertBool "\\W+ produces non-empty string" (not (null result)) | ||
|
|
There was a problem hiding this comment.
This test currently only checks for non-emptiness, but it doesn't verify that \\W+ actually generates non-word characters. To make the test effective, assert that every character is neither alphanumeric nor underscore.
| testPosixSpace :: Test | ||
| testPosixSpace = TestCase $ do | ||
| let result = genWith "[[:space:]]+" | ||
| assertBool "[:space:]+ produces non-empty string" (not (null result)) | ||
|
|
There was a problem hiding this comment.
This test checks that [[:space:]]+ is non-empty, but it doesn't verify the generated characters are actually whitespace. Please assert that every character is in the expected whitespace set (space/tab/newline/CR) or use Data.Char.isSpace if that matches the library semantics.
|
@copilot apply changes based on the comments in this thread |
…stency, remove redundant import - Spec.hs: \W+ test verifies non-word chars, \s* verifies whitespace only, \S+ verifies non-whitespace, [[:blank:]]* verifies blank chars, [[:space:]]+ verifies space chars - Main.hs: Rename envToPositiveInteger to envToNonNegativeInteger (allows 0) for RANDSTR_MAX_REPEAT consistency with CLI -m flag - Lib.hs: Remove redundant Randstr.Types () import Co-authored-by: hugefiver <18693500+hugefiver@users.noreply.github.com>
Applied all 7 review comments in commit 64d1509:
|
\W+test to verify non-word characters\s*test to verify whitespace characters\S+test to verify non-whitespace characters[[:blank:]]*test to verify blank characters[[:space:]]+test to verify space charactersenvToPositiveIntegerto allow 0 for RANDSTR_MAX_REPEAT consistencyRandstr.Types ()import from Lib.hs🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.