Skip to content

Use float64 in Jenks natural breaks internals (#1100)#1101

Merged
brendancol merged 3 commits intomasterfrom
issue-1100
Mar 31, 2026
Merged

Use float64 in Jenks natural breaks internals (#1100)#1101
brendancol merged 3 commits intomasterfrom
issue-1100

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

Fixes #1100. The Jenks natural breaks algorithm used float32 for its internal matrices and bin edge array. The naive variance formula sum_squares - (sum * sum) / w loses all significant digits when data has a large offset relative to its spread (elevations around 1000m, projected coordinates in the millions, etc.).

Changed four float32 sites to float64:

  • lower_class_limits matrix dtype
  • var_combinations matrix dtype
  • val = np.float32(data[i4]) cast removed
  • kclass bin edge array dtype

Test plan

  • test_natural_breaks_large_offset_1100: five tight clusters at offset 100,000 with spread of 10 -- all 5 classes must be separated cleanly
  • Full test_classify.py suite: 85 passed, no regressions

The Jenks matrices and bin edge array used float32, causing the naive
variance formula (sum_squares - sum*sum/w) to lose all significant
digits when data had a large offset relative to its spread. Changed
lower_class_limits, var_combinations, val cast, and kclass to float64.
test_natural_breaks_large_offset_1100: five tight clusters offset by
100,000 must be separated into 5 distinct classes. With float32
internals, the variance calculation lost all signal and merged clusters.
@github-actions github-actions bot added the performance PR touches performance-sensitive code label Mar 30, 2026
@brendancol brendancol merged commit 629d533 into master Mar 31, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Jenks natural breaks uses float32 internally, wrong bin edges for offset data

1 participant