-
Notifications
You must be signed in to change notification settings - Fork 13
Switch spark connector to use Weaviate Java client6 #130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Orca Security Scan Summary
| Status | Check | Issues by priority | |
|---|---|---|---|
| Infrastructure as Code | View in Orca | ||
| SAST | View in Orca | ||
| Secrets | View in Orca | ||
| Vulnerabilities | View in Orca |
e43d19b to
fbbff3e
Compare
fbbff3e to
7c70d30
Compare
54875c1 to
694010a
Compare
694010a to
d0d0a72
Compare
| assert(weaviateObject.getProperties.get("content") == "Sam") | ||
| assert(weaviateObject.getProperties.get("wordCount") == 5) | ||
| assert(weaviateObject.getTenant == "TenantA") | ||
| assert(weaviateObject.properties().get("title").equals("Sam")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| assert(weaviateObject.properties().get("title").equals("Sam")) | |
| assert(weaviateObject.properties().get("title") == ("Sam")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about Scala, but in Java you actually want to compare strings with .equals, because it compares the contents themselves, whereas == compares object references (which will always differ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvmnd, seems Scala doesn't have that catch 👍
g-despot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
build.sbt
Outdated
| lazy val scalaCollectionCompatVersion = "2.13.0" | ||
| lazy val sparkVersion = "4.0.1" | ||
| lazy val grpcNettyShadedVersion = "1.76.0" | ||
| lazy val weaviateClient6Version = "6.0.0-RC2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to use 6.0.0 now ✨
| if (result.isEmpty) throw WeaviateClassNotFoundError(s"Collection ${className} was not found.") | ||
| val properties = result.get().properties().asScala |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: if you want to lean into Optional syntax hard
| if (result.isEmpty) throw WeaviateClassNotFoundError(s"Collection ${className} was not found.") | |
| val properties = result.get().properties().asScala | |
| if (result.isEmpty) throw WeaviateClassNotFoundError(s"Collection ${className} was not found.") | |
| val properties = result | |
| .map(c -> c.properties().asScala) | |
| .orElseThrow(() -> WeaviateClassNotFoundError(s"Collection ${className} was not found.")) |
| if (weaviateOptions.id == null) { | ||
| builder.id(java.util.UUID.randomUUID.toString) | ||
| builder.uuid(java.util.UUID.randomUUID.toString) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: this if-block is not strictly necessary
I believe this was originally added because BatchInsert over gRPC does not allow the client to leave out the UUID (unlike REST-based inserts).
It's something I also stumbled upon while working on client6, so in the new version every WeaviateObject gets a random UUID by default.
| val allVectors = ListBuffer.empty[Vectors] | ||
| if (vector != null) { | ||
| allVectors += Vectors.of(vector) | ||
| } | ||
| if (vectors.nonEmpty) { | ||
| builder.vectors(vectors.map { case (key, arr) => key -> arr.map(Float.box) }.asJava) | ||
| val arr = vectors.map { case (key, arr) => Vectors.of(key, arr) }.toArray | ||
| allVectors ++= arr | ||
| } | ||
| if (multiVectors.nonEmpty) { | ||
| builder.multiVectors(multiVectors.map { case (key, multiVector) => key -> multiVector.map { vec => { vec.map(Float.box) }} }.toMap.asJava) | ||
| val arr = multiVectors.map { case (key, multiVector) => Vectors.of(key, multiVector) }.toArray | ||
| allVectors ++= arr | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: there's now Vectors::withVectors method, that should let you do sth like:
val vectors = new Vectors();
if (vector != null) {
vectors = vectors.withVectors(Vectors.of(vector))
}
if (vectors.nonEmpty) {
// ... so on ...
}But if the list-syntax works well for you there's also no real need to change it, bc withVectors creates more intermediate objects (more GC work).
| dt = p.dataTypes().get(0) | ||
| } | ||
| }) | ||
| if ((dt == "geoCoordinates" || dt == "phoneNumber") && valueFromField.isInstanceOf[String]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: you can also use data type names from the library itself, e.g. DataType.GEO_COORDINATES or DataType.PHONE_NUMBER
|
|
||
| override def abort(): Unit = { | ||
| // TODO rollback previously written batch results if issue occured | ||
| // TODO rollback previously written batch results if issue occurred |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Is that something Weaviate supports? Rolling things back?
| assert(weaviateObject.getProperties.get("content") == "Sam") | ||
| assert(weaviateObject.getProperties.get("wordCount") == 5) | ||
| assert(weaviateObject.getTenant == "TenantA") | ||
| assert(weaviateObject.properties().get("title").equals("Sam")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvmnd, seems Scala doesn't have that catch 👍
| val moreNested = new Property.Builder("moreNested", DataType.OBJECT). | ||
| nestedProperties(Property.text("a"), Property.number("b")).build() | ||
|
|
||
| val nestedObjects = new Property.Builder("nestedObjects", DataType.OBJECT_ARRAY) | ||
| .nestedProperties( | ||
| Property.bool("nestedBoolLvl2"), | ||
| Property.date("nestedDateLvl2"), | ||
| Property.numberArray("nestedNumbersLvl2"), | ||
| moreNested | ||
| ).build() | ||
|
|
||
| new Property.Builder("objectProperty", dataType) | ||
| .nestedProperties( | ||
| Property.integer("nestedInt"), | ||
| Property.number("nestedNumber"), | ||
| Property.text("nestedText"), | ||
| nestedObjects | ||
| ).build() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: you should be able to do
Property.object("objectProperty", obj -> obj
.nestedProperties(
...,
Property.numberArray("nestedNumbersLvl2"),
Property.object("moreNested", more -> more
.nestedProperties(
Property.text("a"),
Property.number("b")
)
)
)
)
This PR adds switches Weaviate Java v5 client to latest client6