Skip to content

feat: Elliptic Curve Method (ECM) for integer factorization#174

Open
s-celles wants to merge 8 commits intoJuliaMath:mainfrom
s-celles:feature_ecm
Open

feat: Elliptic Curve Method (ECM) for integer factorization#174
s-celles wants to merge 8 commits intoJuliaMath:mainfrom
s-celles:feature_ecm

Conversation

@s-celles
Copy link
Copy Markdown

@s-celles s-celles commented Mar 12, 2026

  • ecm_factor(n, B1, num_curves) — attempts to find a non-trivial factor of n using ECM Stage 1.
  • Two optimized code paths, selected automatically by dispatch:
    • BigInt: in-place GMP arithmetic via ccall (ECMBuffers with 12 preallocated BigInts, zero allocation in the hot loop).
    • Generic T<:Integer: works for UInt128, Int128, and BitIntegers.jl types (UInt256, UInt512, etc.). Modular multiplication uses widen(T) to stay in fixed-width LLVM integers, avoiding BigInt heap allocation.
  • Overflow-safe modular arithmetic (_addmod, _submod, _mulmod) that handles unsigned types near typemax correctly.
  • Shared helpers (_ecm_prime_powers, _ecm_suyama) factored out to avoid duplication between the two paths

Helps #159
Part of #173 with some BitIntegers.jl types support

The previous implementation was GMP-only, forcing callers to convert to
BigInt even when a fixed-width type (e.g. UInt256 from BitIntegers.jl)
would be much faster.

Changes in src/ecm.jl:
- Add _addmod / _submod / _mulmod generic helpers that use widen(T) for
  overflow-safe arithmetic. For BitIntegers types the widen chain stays
  in LLVM fixed-width integers (UInt256 → UInt512 → UInt1024), so no
  BigInt allocation occurs in the inner loop.
- Add generic _ecm_add / _ecm_double / _ecm_scalar_mul (functional style,
  no mutation) that work for any T<:Integer.
- Add generic ecm_factor(n::T, ...) where T<:Integer dispatching to the
  generic helpers.
- Retain the BigInt-specific ecm_factor(n::BigInt, ...) overload using
  GMP in-place arithmetic (ECMBuffers) for large BigInt factorizations.
  The _ecm_add! / _ecm_double! / _ecm_scalar_mul! (bang variants) are
  BigInt-only and called exclusively from this overload.
- Remove MontgomeryCurvePoint struct (unused after refactor).
- Update _ecm_scalar_mul! to accept k::Integer (was k::BigInt).

Changes in test/runtests.jl:
- Add testset 'ECM generic integer types' covering UInt128 (widen→BigInt
  path) and UInt256/Int256 from BitIntegers.jl (widen→UInt512 LLVM path).

Changes in Project.toml:
- Add BitIntegers as a test-only dependency ([extras] + [targets]).
Use 'a >= n - b ? a - (n - b) : a + b' instead of computing a + b first,
which could silently wrap around for unsigned T when n is close to
typemax(T).

Also eliminate redundant _addmod/_submod calls in _ecm_double by
computing the sum and difference of PX, PZ once.
Extract _ecm_prime_powers(B1) and _ecm_suyama(n) shared helpers used by
both the generic and BigInt-optimized ecm_factor methods. The Suyama
parametrization (curve setup) is not performance-critical, so both paths
now share the generic modular arithmetic version. Only the hot Montgomery
ladder inner loop retains a separate BigInt in-place implementation.
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 12, 2026

Codecov Report

❌ Patch coverage is 97.26776% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.58%. Comparing base (20a92a0) to head (48e23f9).

Files with missing lines Patch % Lines
src/ecm.jl 97.26% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #174      +/-   ##
==========================================
+ Coverage   93.08%   94.58%   +1.49%     
==========================================
  Files           2        3       +1     
  Lines         463      646     +183     
==========================================
+ Hits          431      611     +180     
- Misses         32       35       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@oscardssmith
Copy link
Copy Markdown
Member

What are the benchmarks for the GMP optimized version vs the generic version (with BigInt)?

@s-celles
Copy link
Copy Markdown
Author

Both interesting and disappointing results @oscardssmith

❯ julia --project=benchmarks benchmarks/run_all.jl                   

######################################################################
# Running: ecm_microbenchmarks.jl
######################################################################

======================================================================
  ECM Micro-Benchmarks
======================================================================

  Note: widen(UInt128) == BigInt   → still allocates (no LLVM benefit)
        widen(UInt256) == UInt512  → pure LLVM fixed-width arithmetic

--- _mulmod ---
  BigInt GMP in-place (_mulmod!):   39.4 ns,  0 bytes
  BigInt generic    (_mulmod):      152.8 ns,  136 bytes
  UInt128 generic   (_mulmod):      295.4 ns,  304 bytes
  UInt256 generic   (_mulmod):      434.1 ns,  0 bytes

--- _ecm_add ---
  BigInt GMP in-place (_ecm_add!):  243.5 ns,  0 bytes
  BigInt generic    (_ecm_add):     1300.0 ns,  1560 bytes
  UInt128 generic   (_ecm_add):     1795.8 ns,  1800 bytes
  UInt256 generic   (_ecm_add):     2611.1 ns,  0 bytes

--- _ecm_double ---
  BigInt GMP in-place (_ecm_double!): 182.2 ns,  0 bytes
  BigInt generic    (_ecm_double):    959.3 ns,  1056 bytes
  UInt128 generic   (_ecm_double):    1487.5 ns,  1504 bytes
  UInt256 generic   (_ecm_double):    2087.5 ns,  0 bytes

--- _ecm_scalar_mul (k=997, prime) ---
  BigInt GMP in-place (_ecm_scalar_mul!): 4.35 μs,  0 bytes
  BigInt generic    (_ecm_scalar_mul):    21.83 μs,  26240 bytes
  UInt128 generic   (_ecm_scalar_mul):    31.21 μs,  31400 bytes
  UInt256 generic   (_ecm_scalar_mul):    45.46 μs,  0 bytes

======================================================================
  Summary: median time
  (GMP in-place → Generic BigInt → UInt128 → UInt256)
======================================================================
  _mulmod:         39.0 ns → 153.0 ns (0.26×) → 295.0 ns (0.13×) → 434.0 ns (0.09×)
  _ecm_add:        244.0 ns → 1300.0 ns (0.19×) → 1796.0 ns (0.14×) → 2611.0 ns (0.09×)
  _ecm_double:     182.0 ns → 959.0 ns (0.19×) → 1488.0 ns (0.12×) → 2088.0 ns (0.09×)
  _ecm_scalar_mul: 4.3 μs → 21.8 μs (0.2×) → 31.2 μs (0.14×) → 45.5 μs (0.1×)
======================================================================

######################################################################
# Running: ecm_gmp_vs_generic.jl
######################################################################

======================================================================
  ECM Benchmark: GMP In-Place vs Generic (Allocating) BigInt
======================================================================

--- small (~40-digit semiprime) ---
  n      = 824633726603436045817  (21 digits)
  B1     = 2000
  curves = 50

  [GMP in-place]
BenchmarkTools.Trial: 20 samples with 1 evaluation per sample.
 Range (min … max):  422.458 μs … 17.570 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):       5.898 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):     5.878 ms ±  4.574 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▁ █▁   █  ▁    ▁    ▁ ▁ ▁▁ ▁▁   ▁▁     ▁                  ▁  
  ██▁██▁▁▁█▁▁█▁▁▁▁█▁▁▁▁█▁█▁██▁██▁▁▁██▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  422 μs          Histogram: frequency by time         17.6 ms <

 Memory estimate: 13.11 KiB, allocs estimate: 169.

  [Generic (allocating)]
BenchmarkTools.Trial: 20 samples with 1 evaluation per sample.
 Range (min … max):   1.910 ms … 87.844 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     20.396 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   25.224 ms ± 24.158 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

   █   ▃       ▃                                               
  ▇█▁▇▁█▇▁▁▇▁▇▁█▇▁▁▇▇▁▁▁▁▇▁▇▁▁▁▁▁▁▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇▁▁▁▇ ▁
  1.91 ms         Histogram: frequency by time        87.8 ms <

 Memory estimate: 2.13 MiB, allocs estimate: 85227.

  Summary (median):
    GMP in-place : 5.9 ms,  13.1 KiB
    Generic alloc: 20.4 ms,  2180.6 KiB
    Speedup      : 0.29×
    Memory saved : 99.4%

--- medium (~55-digit semiprime) ---
  n      = 632375198520019741607405909210007359304371321  (45 digits)
  B1     = 11000
  curves = 200

  [GMP in-place]
BenchmarkTools.Trial: 20 samples with 1 evaluation per sample.
 Range (min … max):  653.500 μs … 891.417 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     680.125 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   691.173 μs ±  52.258 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▃▃   ▃ ▃█                                                      
  ██▇▇▇█▇██▇▇▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇ ▁
  654 μs           Histogram: frequency by time          891 μs <

 Memory estimate: 37.48 KiB, allocs estimate: 174.

  [Generic (allocating)]
BenchmarkTools.Trial: 20 samples with 1 evaluation per sample.
 Range (min … max):  2.509 ms …  2.770 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.695 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.665 ms ± 80.181 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▁ ▁            ▁▁  ▁ ▁▁▁               ▁▁▁ █ ▁▁  ▁ ▁  ▁ ▁▁  
  █▁█▁▁▁▁▁▁▁▁▁▁▁▁██▁▁█▁███▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁███▁█▁██▁▁█▁█▁▁█▁██ ▁
  2.51 ms        Histogram: frequency by time        2.77 ms <

 Memory estimate: 2.74 MiB, allocs estimate: 96238.

  Summary (median):
    GMP in-place : 0.68 ms,  37.5 KiB
    Generic alloc: 2.7 ms,  2804.1 KiB
    Speedup      : 0.25×
    Memory saved : 98.7%

======================================================================
  Done.
======================================================================

######################################################################
# Running: factorization_endtoend.jl
######################################################################

======================================================================
  End-to-End factor(n) Benchmarks
======================================================================

--- ~12 digits  (12 digits) ---
  n = 824633720831
  factor(n) = Primes.Factorization{BigInt}(824633720831 => 1)
BenchmarkTools.Trial: 20 samples with 1 evaluation per sample.
 Range (min … max):  1.666 μs … 11.875 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.021 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.491 μs ±  2.214 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▆█                                                         
  ▇██▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄ ▁
  1.67 μs        Histogram: frequency by time        11.9 μs <

 Memory estimate: 320 bytes, allocs estimate: 10.
  Median: 2.02 μs, 0.3 KiB allocated

--- ~20 digits  (20 digits) ---
  n = 99999999999999999877
  factor(n) = Primes.Factorization{BigInt}(17 => 1, 29 => 1, 433 => 1, 468452093746633 => 1)
BenchmarkTools.Trial: 20 samples with 1 evaluation per sample.
 Range (min … max):  4.077 ms …  4.388 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     4.263 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.260 ms ± 86.169 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▁            ▁▁▁  ▁     ▁     ▁  █▁▁ ▁▁▁   ▁     ▁  ▁  █ ▁  
  █▁▁▁▁▁▁▁▁▁▁▁▁███▁▁█▁▁▁▁▁█▁▁▁▁▁█▁▁███▁███▁▁▁█▁▁▁▁▁█▁▁█▁▁█▁█ ▁
  4.08 ms        Histogram: frequency by time        4.39 ms <

 Memory estimate: 5.40 MiB, allocs estimate: 262166.
  Median: 4.26 ms, 5530.4 KiB allocated

--- ~30 digits  (21 digits) ---
  n = 824633726603436045817
  factor(n) = Primes.Factorization{BigInt}(1000000007 => 1, 824633720831 => 1)
BenchmarkTools.Trial: 20 samples with 1 evaluation per sample.
 Range (min … max):  10.213 ms … 106.894 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     33.195 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   41.554 ms ±  25.428 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

             ▄  █                                               
  ▆▁▁▆▁▁▁▁▁▆▆█▆▆█▁▆▁▁▁▁▁▆▁▁▁▁▆▁▁▆▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆▆ ▁
  10.2 ms         Histogram: frequency by time          107 ms <

 Memory estimate: 11.72 MiB, allocs estimate: 545740.
  Median: 33.19 ms, 12001.4 KiB allocated

--- ~40 digits  (33 digits) ---
  n = 215230289874699999735266743454119
  factor(n) = Primes.Factorization{BigInt}(17 => 1, 29 => 1, 433 => 1, 6763 => 1, 10627 => 1, 29947 => 1, 468452093746633 => 1)
BenchmarkTools.Trial: 20 samples with 1 evaluation per sample.
 Range (min … max):  4.085 ms …  4.351 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     4.174 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.175 ms ± 60.993 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▁  ▁▁▁   ▁▁   █  ▁▁▁ ▁ ▁█ ▁  ▁▁   ▁                      ▁  
  █▁▁███▁▁▁██▁▁▁█▁▁███▁█▁██▁█▁▁██▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▁
  4.09 ms        Histogram: frequency by time        4.35 ms <

 Memory estimate: 5.41 MiB, allocs estimate: 263523.
  Median: 4.17 ms, 5542.3 KiB allocated

--- ~45 digits  (45 digits) ---
  n = 632375198520019741607405909210007359304371321
  factor(n) = Primes.Factorization{BigInt}(1667 => 1, 2633 => 1, 31723 => 1, 190699 => 1, 9338360851 => 1, 2550322198519739393 => 1)
BenchmarkTools.Trial: 5 samples with 1 evaluation per sample.
 Range (min … max):   26.052 ms … 177.475 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):      90.075 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   103.597 ms ±  66.699 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █           █            █                              █   █  
  █▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁█ ▁
  26.1 ms          Histogram: frequency by time          177 ms <

 Memory estimate: 31.38 MiB, allocs estimate: 1412523.
  Median: 90.08 ms, 32137.9 KiB allocated

======================================================================
  Done.
======================================================================

Co-authored-by: Oscar Smith <oscardssmith@gmail.com>
@oscardssmith
Copy link
Copy Markdown
Member

oscardssmith commented Mar 12, 2026

With the widemul removed, I'm seeing

n = Int128(nextprime(2^32))*nextprime(2^30)
B1 = 2000
curves = 50

julia> @benchmark Primes.ecm_factor($n, $B1, $curves)
BenchmarkTools.Trial: 4577 samples with 1 evaluation per sample.
 Range (min … max):   84.421 μs … 10.381 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     764.037 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.091 ms ±  1.092 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▅▁▂▂
  ███████▇▅▆▆▆▅▆▅▄▅▄▄▄▄▄▄▄▃▃▃▃▃▃▃▃▃▃▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▂▂ ▃
  84.4 μs         Histogram: frequency by time         5.26 ms <

 Memory estimate: 8.98 KiB, allocs estimate: 13.

julia> @benchmark Primes.ecm_factor($(big(n)), $B1, $curves)
BenchmarkTools.Trial: 1094 samples with 1 evaluation per sample.
 Range (min … max):  337.064 μs … 26.380 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):       3.261 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):     4.568 ms ±  4.306 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▃ ▁ ▂
  ████▆█▇▇▆█▇▅▆▄▅▅▅▄▅▄▄▄▄▄▄▄▃▃▃▃▂▃▄▂▂▂▂▃▂▂▂▂▃▂▂▂▃▁▂▂▂▂▂▃▂▂▁▁▁▃ ▃
  337 μs          Histogram: frequency by time         19.4 ms <

 Memory estimate: 12.37 KiB, allocs estimate: 146.

julia> @benchmark ecm_factor_generic($(big(n)), $B1, $curves)
BenchmarkTools.Trial: 112 samples with 1 evaluation per sample.
 Range (min … max):   1.964 ms …    1.589 s  ┊ GC (min … max):  0.00% … 66.66%
 Time  (median):     22.564 ms               ┊ GC (median):     0.00%
 Time  (mean ± σ):   54.252 ms ± 181.760 ms  ┊ GC (mean ± σ):  29.27% ±  8.66%

  █▆▅▁
  ████▆▆▄▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄ ▄
  1.96 ms       Histogram: log(frequency) by time       1.13 s <

 Memory estimate: 1.94 MiB, allocs estimate: 76268.

So for numbers <2^64 the Int128 path seems very helpful.

It is very much a shame that the optimized BigInt versions are so much faster. Ideally we would be able to write them in a more uniform way.

nextprime, nextprimes, prevprime, prevprimes, prime, prodfactors, radical, totient

include("factorization.jl")
include("ecm.jl")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to actually include ecm_factor in addition (or instead?) of polard_factor. Ideally ecm_factor would be always better and then we could delete polard, but I don't know if we're that lucky.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well my opinion about this is a bit different. Removing this kind of code remove the possibility for users to "experiment" different factorization methods with a given number.
See #175

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough. We should, however, probalby remove pollard and add ecm to the default factor algorithm

@oscardssmith
Copy link
Copy Markdown
Member

Eventually, it would be good to have the 2 stage version of ECM, but the single stage version should already be an improvement.

@s-celles
Copy link
Copy Markdown
Author

I found some time in the evening to look at stage 2 implementation.
But some tests are broken now ;-(

@oscardssmith
Copy link
Copy Markdown
Member

I think the problem might be that you are testing ecm_factor on Int128s larger than 2^64 (which is invalid).

@s-celles
Copy link
Copy Markdown
Author

This PR is a concrete motivation for #176. _ecm_prime_powers currently calls primes(B1) which allocates a full vector upfront:

function _ecm_prime_powers(B1::Int)
    result = Int[]
    for p in primes(B1)   # allocates all primes ≤ B1
        pk = p
        while pk * p <= B1
            pk *= p
        end
        push!(result, pk)
    end
    result
end

With primes_from from #176, the inner loop becomes lazy — no upfront allocation:

function _ecm_prime_powers(B1::Int)
    result = Int[]
    for p in Iterators.takewhile(<=(B1), primes_from(2))
        pk = p
        while pk * p <= B1
            pk *= p
        end
        push!(result, pk)
    end
    result
end

For typical ECM usage with large B1, this would reduce memory pressure in the Stage 1 setup. And if primes_from is eventually backed by a segmented sieve (as discussed in the planned refactor), the benefit would come for free — without any change to the ECM code itself.

@oscardssmith
Copy link
Copy Markdown
Member

IMO this is a fairly weak motivation as a B1 of 11_000 requires very little memory. I do agree that it is a good idea though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants