Skip to content

Conversation

@proost
Copy link
Contributor

@proost proost commented Dec 10, 2025

previous PR: apache/datasketches-characterization#89

current go version theta sketch performance is bad. one of reasons using errors.Is.

generally, go recommends "errors.Is" for error check, but we are using sentinel error. So we can avoid using errors.Is.

benchmark code:

func BenchmarkUpdateSketchErrorHandling(b *testing.B) {
	b.Run("UpdateInt64_NewKeys", func(b *testing.B) {
		sketch, _ := NewQuickSelectUpdateSketch()
		b.ResetTimer()
		for i := 0; i < b.N; i++ {
			_ = sketch.UpdateInt64(int64(i))
		}
	})

	b.Run("UpdateInt64_DuplicateKeys", func(b *testing.B) {
		sketch, _ := NewQuickSelectUpdateSketch()
		_ = sketch.UpdateInt64(42)
		b.ResetTimer()
		for i := 0; i < b.N; i++ {
			_ = sketch.UpdateInt64(42)
		}
	})
}
  • Current UpdateSketch
goos: darwin
goarch: arm64
pkg: github.com/apache/datasketches-go/theta
cpu: Apple M1 Pro
BenchmarkUpdateSketchErrorHandling
BenchmarkUpdateSketchErrorHandling/UpdateInt64_NewKeys
BenchmarkUpdateSketchErrorHandling/UpdateInt64_NewKeys-10         	 7506458	       150.9 ns/op
BenchmarkUpdateSketchErrorHandling/UpdateInt64_DuplicateKeys
BenchmarkUpdateSketchErrorHandling/UpdateInt64_DuplicateKeys-10   	165562837	         7.287 ns/op
  • New UpdateSketch
BenchmarkUpdateSketchErrorHandling
BenchmarkUpdateSketchErrorHandling/UpdateInt64_NewKeys
BenchmarkUpdateSketchErrorHandling/UpdateInt64_NewKeys-10         	 7503918	       143.6 ns/op
BenchmarkUpdateSketchErrorHandling/UpdateInt64_DuplicateKeys
BenchmarkUpdateSketchErrorHandling/UpdateInt64_DuplicateKeys-10   	166055470	         7.227 ns/op

benchmark code:

func BenchmarkUnionUpdate(b *testing.B) {
	b.Run("Update_NewKeys", func(b *testing.B) {
		sketches := make([]*QuickSelectUpdateSketch, b.N)
		for i := 0; i < b.N; i++ {
			sketch, _ := NewQuickSelectUpdateSketch()
			sketch.UpdateInt64(int64(i))
			sketches[i] = sketch
		}

		union, _ := NewUnion()
		b.ResetTimer()
		for i := 0; i < b.N; i++ {
			_ = union.Update(sketches[i])
		}
	})

	b.Run("Update_DuplicateKeys", func(b *testing.B) {
		sketches := make([]*QuickSelectUpdateSketch, b.N)
		for i := 0; i < b.N; i++ {
			sketch, _ := NewQuickSelectUpdateSketch()
			sketch.UpdateInt64(42)
			sketches[i] = sketch
		}

		union, _ := NewUnion()
		b.ResetTimer()
		for i := 0; i < b.N; i++ {
			_ = union.Update(sketches[i])
		}
	})
}
  • Current Union Update
goos: darwin
goarch: arm64
pkg: github.com/apache/datasketches-go/theta
cpu: Apple M1 Pro
BenchmarkUnionUpdate
BenchmarkUnionUpdate/Update_NewKeys
BenchmarkUnionUpdate/Update_NewKeys-10         	 8556276	       457.7 ns/op
BenchmarkUnionUpdate/Update_DuplicateKeys
BenchmarkUnionUpdate/Update_DuplicateKeys-10   	 7176099	       443.6 ns/op
  • New Union Update
goos: darwin
goarch: arm64
pkg: github.com/apache/datasketches-go/theta
cpu: Apple M1 Pro
BenchmarkUnionUpdate
BenchmarkUnionUpdate/Update_NewKeys
BenchmarkUnionUpdate/Update_NewKeys-10         	 8699008	       430.7 ns/op
BenchmarkUnionUpdate/Update_DuplicateKeys
BenchmarkUnionUpdate/Update_DuplicateKeys-10   	 7546215	       292.6 ns/op

@proost
Copy link
Contributor Author

proost commented Dec 10, 2025

Suggestion:

If breaking change is acceptable, Because we currently not release theta sketch officially, I'd like to change returning format error(like this) to sentinel error. Chaning to sentinel error avoids heap allocation when hash exceeds theta.

@freakyzoidberg
Copy link
Member

Thats cool, one of the benefit of not having done an official release yet

@freakyzoidberg freakyzoidberg merged commit 606825f into apache:main Dec 10, 2025
1 check passed
@proost proost deleted the perf-use-direct-comparison branch December 10, 2025 10:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants