feat: Elliptic Curve Method (ECM) for integer factorization by s-celles · Pull Request #174 · JuliaMath/Primes.jl

s-celles · 2026-03-12T09:29:02Z

ecm_factor(n, B1, num_curves) — attempts to find a non-trivial factor of n using ECM Stage 1.
Two optimized code paths, selected automatically by dispatch:
- BigInt: in-place GMP arithmetic via ccall (ECMBuffers with 12 preallocated BigInts, zero allocation in the hot loop).
- Generic T<:Integer: works for UInt128, Int128, and BitIntegers.jl types (UInt256, UInt512, etc.). Modular multiplication uses widen(T) to stay in fixed-width LLVM integers, avoiding BigInt heap allocation.
Overflow-safe modular arithmetic (_addmod, _submod, _mulmod) that handles unsigned types near typemax correctly.
Shared helpers (_ecm_prime_powers, _ecm_suyama) factored out to avoid duplication between the two paths

Helps #159
Part of #173 with some BitIntegers.jl types support

The previous implementation was GMP-only, forcing callers to convert to BigInt even when a fixed-width type (e.g. UInt256 from BitIntegers.jl) would be much faster. Changes in src/ecm.jl: - Add _addmod / _submod / _mulmod generic helpers that use widen(T) for overflow-safe arithmetic. For BitIntegers types the widen chain stays in LLVM fixed-width integers (UInt256 → UInt512 → UInt1024), so no BigInt allocation occurs in the inner loop. - Add generic _ecm_add / _ecm_double / _ecm_scalar_mul (functional style, no mutation) that work for any T<:Integer. - Add generic ecm_factor(n::T, ...) where T<:Integer dispatching to the generic helpers. - Retain the BigInt-specific ecm_factor(n::BigInt, ...) overload using GMP in-place arithmetic (ECMBuffers) for large BigInt factorizations. The _ecm_add! / _ecm_double! / _ecm_scalar_mul! (bang variants) are BigInt-only and called exclusively from this overload. - Remove MontgomeryCurvePoint struct (unused after refactor). - Update _ecm_scalar_mul! to accept k::Integer (was k::BigInt). Changes in test/runtests.jl: - Add testset 'ECM generic integer types' covering UInt128 (widen→BigInt path) and UInt256/Int256 from BitIntegers.jl (widen→UInt512 LLVM path). Changes in Project.toml: - Add BitIntegers as a test-only dependency ([extras] + [targets]).

Use 'a >= n - b ? a - (n - b) : a + b' instead of computing a + b first, which could silently wrap around for unsigned T when n is close to typemax(T). Also eliminate redundant _addmod/_submod calls in _ecm_double by computing the sum and difference of PX, PZ once.

Extract _ecm_prime_powers(B1) and _ecm_suyama(n) shared helpers used by both the generic and BigInt-optimized ecm_factor methods. The Suyama parametrization (curve setup) is not performance-critical, so both paths now share the generic modular arithmetic version. Only the hot Montgomery ladder inner loop retains a separate BigInt in-place implementation.

codecov · 2026-03-12T09:31:10Z

Codecov Report

❌ Patch coverage is 97.26776% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.58%. Comparing base (20a92a0) to head (48e23f9).

Files with missing lines	Patch %	Lines
src/ecm.jl	97.26%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #174      +/-   ##
==========================================
+ Coverage   93.08%   94.58%   +1.49%     
==========================================
  Files           2        3       +1     
  Lines         463      646     +183     
==========================================
+ Hits          431      611     +180     
- Misses         32       35       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

oscardssmith · 2026-03-12T13:22:19Z

What are the benchmarks for the GMP optimized version vs the generic version (with BigInt)?

s-celles · 2026-03-12T14:37:47Z

Both interesting and disappointing results @oscardssmith

❯ julia --project=benchmarks benchmarks/run_all.jl                   

######################################################################
# Running: ecm_microbenchmarks.jl
######################################################################

======================================================================
  ECM Micro-Benchmarks
======================================================================

  Note: widen(UInt128) == BigInt   → still allocates (no LLVM benefit)
        widen(UInt256) == UInt512  → pure LLVM fixed-width arithmetic

--- _mulmod ---
  BigInt GMP in-place (_mulmod!):   39.4 ns,  0 bytes
  BigInt generic    (_mulmod):      152.8 ns,  136 bytes
  UInt128 generic   (_mulmod):      295.4 ns,  304 bytes
  UInt256 generic   (_mulmod):      434.1 ns,  0 bytes

--- _ecm_add ---
  BigInt GMP in-place (_ecm_add!):  243.5 ns,  0 bytes
  BigInt generic    (_ecm_add):     1300.0 ns,  1560 bytes
  UInt128 generic   (_ecm_add):     1795.8 ns,  1800 bytes
  UInt256 generic   (_ecm_add):     2611.1 ns,  0 bytes

--- _ecm_double ---
  BigInt GMP in-place (_ecm_double!): 182.2 ns,  0 bytes
  BigInt generic    (_ecm_double):    959.3 ns,  1056 bytes
  UInt128 generic   (_ecm_double):    1487.5 ns,  1504 bytes
  UInt256 generic   (_ecm_double):    2087.5 ns,  0 bytes

--- _ecm_scalar_mul (k=997, prime) ---
  BigInt GMP in-place (_ecm_scalar_mul!): 4.35 μs,  0 bytes
  BigInt generic    (_ecm_scalar_mul):    21.83 μs,  26240 bytes
  UInt128 generic   (_ecm_scalar_mul):    31.21 μs,  31400 bytes
  UInt256 generic   (_ecm_scalar_mul):    45.46 μs,  0 bytes

======================================================================
  Summary: median time
  (GMP in-place → Generic BigInt → UInt128 → UInt256)
======================================================================
  _mulmod:         39.0 ns → 153.0 ns (0.26×) → 295.0 ns (0.13×) → 434.0 ns (0.09×)
  _ecm_add:        244.0 ns → 1300.0 ns (0.19×) → 1796.0 ns (0.14×) → 2611.0 ns (0.09×)
  _ecm_double:     182.0 ns → 959.0 ns (0.19×) → 1488.0 ns (0.12×) → 2088.0 ns (0.09×)
  _ecm_scalar_mul: 4.3 μs → 21.8 μs (0.2×) → 31.2 μs (0.14×) → 45.5 μs (0.1×)
======================================================================

######################################################################
# Running: ecm_gmp_vs_generic.jl
######################################################################

======================================================================
  ECM Benchmark: GMP In-Place vs Generic (Allocating) BigInt
======================================================================

--- small (~40-digit semiprime) ---
  n      = 824633726603436045817  (21 digits)
  B1     = 2000
  curves = 50

  [GMP in-place]
BenchmarkTools.Trial: 20 samples with 1 evaluation per sample.
 Range (min … max):  422.458 μs … 17.570 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):       5.898 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):     5.878 ms ±  4.574 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▁ █▁   █  ▁    ▁    ▁ ▁ ▁▁ ▁▁   ▁▁     ▁                  ▁  
  ██▁██▁▁▁█▁▁█▁▁▁▁█▁▁▁▁█▁█▁██▁██▁▁▁██▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  422 μs          Histogram: frequency by time         17.6 ms <

 Memory estimate: 13.11 KiB, allocs estimate: 169.

  [Generic (allocating)]
BenchmarkTools.Trial: 20 samples with 1 evaluation per sample.
 Range (min … max):   1.910 ms … 87.844 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     20.396 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   25.224 ms ± 24.158 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

   █   ▃       ▃                                               
  ▇█▁▇▁█▇▁▁▇▁▇▁█▇▁▁▇▇▁▁▁▁▇▁▇▁▁▁▁▁▁▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇▁▁▁▇ ▁
  1.91 ms         Histogram: frequency by time        87.8 ms <

 Memory estimate: 2.13 MiB, allocs estimate: 85227.

  Summary (median):
    GMP in-place : 5.9 ms,  13.1 KiB
    Generic alloc: 20.4 ms,  2180.6 KiB
    Speedup      : 0.29×
    Memory saved : 99.4%

--- medium (~55-digit semiprime) ---
  n      = 632375198520019741607405909210007359304371321  (45 digits)
  B1     = 11000
  curves = 200

  [GMP in-place]
BenchmarkTools.Trial: 20 samples with 1 evaluation per sample.
 Range (min … max):  653.500 μs … 891.417 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     680.125 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   691.173 μs ±  52.258 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▃▃   ▃ ▃█                                                      
  ██▇▇▇█▇██▇▇▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇ ▁
  654 μs           Histogram: frequency by time          891 μs <

 Memory estimate: 37.48 KiB, allocs estimate: 174.

  [Generic (allocating)]
BenchmarkTools.Trial: 20 samples with 1 evaluation per sample.
 Range (min … max):  2.509 ms …  2.770 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.695 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.665 ms ± 80.181 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▁ ▁            ▁▁  ▁ ▁▁▁               ▁▁▁ █ ▁▁  ▁ ▁  ▁ ▁▁  
  █▁█▁▁▁▁▁▁▁▁▁▁▁▁██▁▁█▁███▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁███▁█▁██▁▁█▁█▁▁█▁██ ▁
  2.51 ms        Histogram: frequency by time        2.77 ms <

 Memory estimate: 2.74 MiB, allocs estimate: 96238.

  Summary (median):
    GMP in-place : 0.68 ms,  37.5 KiB
    Generic alloc: 2.7 ms,  2804.1 KiB
    Speedup      : 0.25×
    Memory saved : 98.7%

======================================================================
  Done.
======================================================================

######################################################################
# Running: factorization_endtoend.jl
######################################################################

======================================================================
  End-to-End factor(n) Benchmarks
======================================================================

--- ~12 digits  (12 digits) ---
  n = 824633720831
  factor(n) = Primes.Factorization{BigInt}(824633720831 => 1)
BenchmarkTools.Trial: 20 samples with 1 evaluation per sample.
 Range (min … max):  1.666 μs … 11.875 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.021 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.491 μs ±  2.214 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▆█                                                         
  ▇██▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄ ▁
  1.67 μs        Histogram: frequency by time        11.9 μs <

 Memory estimate: 320 bytes, allocs estimate: 10.
  Median: 2.02 μs, 0.3 KiB allocated

--- ~20 digits  (20 digits) ---
  n = 99999999999999999877
  factor(n) = Primes.Factorization{BigInt}(17 => 1, 29 => 1, 433 => 1, 468452093746633 => 1)
BenchmarkTools.Trial: 20 samples with 1 evaluation per sample.
 Range (min … max):  4.077 ms …  4.388 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     4.263 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.260 ms ± 86.169 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▁            ▁▁▁  ▁     ▁     ▁  █▁▁ ▁▁▁   ▁     ▁  ▁  █ ▁  
  █▁▁▁▁▁▁▁▁▁▁▁▁███▁▁█▁▁▁▁▁█▁▁▁▁▁█▁▁███▁███▁▁▁█▁▁▁▁▁█▁▁█▁▁█▁█ ▁
  4.08 ms        Histogram: frequency by time        4.39 ms <

 Memory estimate: 5.40 MiB, allocs estimate: 262166.
  Median: 4.26 ms, 5530.4 KiB allocated

--- ~30 digits  (21 digits) ---
  n = 824633726603436045817
  factor(n) = Primes.Factorization{BigInt}(1000000007 => 1, 824633720831 => 1)
BenchmarkTools.Trial: 20 samples with 1 evaluation per sample.
 Range (min … max):  10.213 ms … 106.894 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     33.195 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   41.554 ms ±  25.428 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

             ▄  █                                               
  ▆▁▁▆▁▁▁▁▁▆▆█▆▆█▁▆▁▁▁▁▁▆▁▁▁▁▆▁▁▆▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆▆ ▁
  10.2 ms         Histogram: frequency by time          107 ms <

 Memory estimate: 11.72 MiB, allocs estimate: 545740.
  Median: 33.19 ms, 12001.4 KiB allocated

--- ~40 digits  (33 digits) ---
  n = 215230289874699999735266743454119
  factor(n) = Primes.Factorization{BigInt}(17 => 1, 29 => 1, 433 => 1, 6763 => 1, 10627 => 1, 29947 => 1, 468452093746633 => 1)
BenchmarkTools.Trial: 20 samples with 1 evaluation per sample.
 Range (min … max):  4.085 ms …  4.351 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     4.174 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.175 ms ± 60.993 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▁  ▁▁▁   ▁▁   █  ▁▁▁ ▁ ▁█ ▁  ▁▁   ▁                      ▁  
  █▁▁███▁▁▁██▁▁▁█▁▁███▁█▁██▁█▁▁██▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  4.09 ms        Histogram: frequency by time        4.35 ms <

 Memory estimate: 5.41 MiB, allocs estimate: 263523.
  Median: 4.17 ms, 5542.3 KiB allocated

--- ~45 digits  (45 digits) ---
  n = 632375198520019741607405909210007359304371321
  factor(n) = Primes.Factorization{BigInt}(1667 => 1, 2633 => 1, 31723 => 1, 190699 => 1, 9338360851 => 1, 2550322198519739393 => 1)
BenchmarkTools.Trial: 5 samples with 1 evaluation per sample.
 Range (min … max):   26.052 ms … 177.475 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):      90.075 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   103.597 ms ±  66.699 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █           █            █                              █   █  
  █▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁█ ▁
  26.1 ms          Histogram: frequency by time          177 ms <

 Memory estimate: 31.38 MiB, allocs estimate: 1412523.
  Median: 90.08 ms, 32137.9 KiB allocated

======================================================================
  Done.
======================================================================

src/ecm.jl

Co-authored-by: Oscar Smith <oscardssmith@gmail.com>

oscardssmith · 2026-03-12T18:54:18Z

With the widemul removed, I'm seeing

n = Int128(nextprime(2^32))*nextprime(2^30)
B1 = 2000
curves = 50

julia> @benchmark Primes.ecm_factor($n, $B1, $curves)
BenchmarkTools.Trial: 4577 samples with 1 evaluation per sample.
 Range (min … max):   84.421 μs … 10.381 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     764.037 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.091 ms ±  1.092 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▅▁▂▂
  ███████▇▅▆▆▆▅▆▅▄▅▄▄▄▄▄▄▄▃▃▃▃▃▃▃▃▃▃▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▂▂ ▃
  84.4 μs         Histogram: frequency by time         5.26 ms <

 Memory estimate: 8.98 KiB, allocs estimate: 13.

julia> @benchmark Primes.ecm_factor($(big(n)), $B1, $curves)
BenchmarkTools.Trial: 1094 samples with 1 evaluation per sample.
 Range (min … max):  337.064 μs … 26.380 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):       3.261 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):     4.568 ms ±  4.306 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▃ ▁ ▂
  ████▆█▇▇▆█▇▅▆▄▅▅▅▄▅▄▄▄▄▄▄▄▃▃▃▃▂▃▄▂▂▂▂▃▂▂▂▂▃▂▂▂▃▁▂▂▂▂▂▃▂▂▁▁▁▃ ▃
  337 μs          Histogram: frequency by time         19.4 ms <

 Memory estimate: 12.37 KiB, allocs estimate: 146.

julia> @benchmark ecm_factor_generic($(big(n)), $B1, $curves)
BenchmarkTools.Trial: 112 samples with 1 evaluation per sample.
 Range (min … max):   1.964 ms …    1.589 s  ┊ GC (min … max):  0.00% … 66.66%
 Time  (median):     22.564 ms               ┊ GC (median):     0.00%
 Time  (mean ± σ):   54.252 ms ± 181.760 ms  ┊ GC (mean ± σ):  29.27% ±  8.66%

  █▆▅▁
  ████▆▆▄▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄ ▄
  1.96 ms       Histogram: log(frequency) by time       1.13 s <

 Memory estimate: 1.94 MiB, allocs estimate: 76268.

So for numbers <2^64 the Int128 path seems very helpful.

It is very much a shame that the optimized BigInt versions are so much faster. Ideally we would be able to write them in a more uniform way.

oscardssmith · 2026-03-12T19:02:51Z

src/Primes.jl

       nextprime, nextprimes, prevprime, prevprimes, prime, prodfactors, radical, totient

 include("factorization.jl")
+include("ecm.jl")


we need to actually include ecm_factor in addition (or instead?) of polard_factor. Ideally ecm_factor would be always better and then we could delete polard, but I don't know if we're that lucky.

Well my opinion about this is a bit different. Removing this kind of code remove the possibility for users to "experiment" different factorization methods with a given number.
See #175

fair enough. We should, however, probalby remove pollard and add ecm to the default factor algorithm

oscardssmith · 2026-03-12T19:51:13Z

Eventually, it would be good to have the 2 stage version of ECM, but the single stage version should already be an improvement.

s-celles · 2026-03-12T21:07:31Z

I found some time in the evening to look at stage 2 implementation.
But some tests are broken now ;-(

oscardssmith · 2026-03-12T21:12:12Z

I think the problem might be that you are testing ecm_factor on Int128s larger than 2^64 (which is invalid).

s-celles · 2026-03-17T06:48:37Z

This PR is a concrete motivation for #176. _ecm_prime_powers currently calls primes(B1) which allocates a full vector upfront:

function _ecm_prime_powers(B1::Int)
    result = Int[]
    for p in primes(B1)   # allocates all primes ≤ B1
        pk = p
        while pk * p <= B1
            pk *= p
        end
        push!(result, pk)
    end
    result
end

With primes_from from #176, the inner loop becomes lazy — no upfront allocation:

function _ecm_prime_powers(B1::Int)
    result = Int[]
    for p in Iterators.takewhile(<=(B1), primes_from(2))
        pk = p
        while pk * p <= B1
            pk *= p
        end
        push!(result, pk)
    end
    result
end

For typical ECM usage with large B1, this would reduce memory pressure in the Stage 1 setup. And if primes_from is eventually backed by a segmented sieve (as discussed in the planned refactor), the benefit would come for free — without any change to the ECM code itself.

oscardssmith · 2026-03-17T12:59:24Z

IMO this is a fairly weak motivation as a B1 of 11_000 requires very little memory. I do agree that it is a good idea though.

s-celles added 4 commits March 12, 2026 09:30

ECM with BigInt and GMP

174300f

s-celles added 2 commits March 12, 2026 15:25

perf: benchmark

3c5030f

perf: benchmark - fix speedup calc

48e23f9

oscardssmith reviewed Mar 12, 2026

View reviewed changes

src/ecm.jl Outdated Show resolved Hide resolved

Apply suggestions from code review

fbd2b2b

Co-authored-by: Oscar Smith <oscardssmith@gmail.com>

oscardssmith reviewed Mar 12, 2026

View reviewed changes

feat: stage 2 continuation

6841648

Uh oh!

Conversation

s-celles commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

oscardssmith commented Mar 12, 2026

Uh oh!

s-celles commented Mar 12, 2026

Uh oh!

Uh oh!

oscardssmith commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oscardssmith Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

s-celles Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

oscardssmith Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

oscardssmith commented Mar 12, 2026

Uh oh!

s-celles commented Mar 12, 2026

Uh oh!

oscardssmith commented Mar 12, 2026

Uh oh!

s-celles commented Mar 17, 2026

Uh oh!

oscardssmith commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

s-celles commented Mar 12, 2026 •

edited

Loading

codecov bot commented Mar 12, 2026 •

edited

Loading

oscardssmith commented Mar 12, 2026 •

edited

Loading