-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Handle NPE on null vector columns #13938
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Could you elaborate more on where this is occurring or is this just a library issue? |
This is related to this #10275 Unless there's a different/better way to do this |
|
@RussellSpitzer any updates on what you think? Trying to get this functionality landed as its currently blocking some internal tooling from functioning without an exclusion for some columns that encounter this behavior |
|
Ah I think the issue is that in our code in the library we assume that the Parquet Reader already has a project which only selects those columns which need to be read prior to opening the file. We have to do this anyway because we have to map the names in the schema to the names in the file based on field id's. iceberg/arrow/src/main/java/org/apache/iceberg/arrow/vectorized/VectorizedReaderBuilder.java Lines 171 to 174 in cf74b65
It seems like we aren't doing a similar thing with the Arrow reader? Is that on track? I'm trying to figure this out but I think ideally we just don't try to read null vectors at all at a higher level? |
That sounds correct |
| return new NullAccessor(t); | ||
| } | ||
|
|
||
| // Primitive typed fast-paths return boxed nulls; callers should check nullability separately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems missing getDecimal() and others
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
Currently there is a null pointer exception being thrown when there is a null vector
This only happens when I'm using the vectorized stream reading. This might fix the issue here