Skip to content

GH-1142: Fix buffer ownership in VectorSchemaRoot addVector/removeVector#1145

Closed
John-W-Lewis wants to merge 2 commits into
apache:mainfrom
John-W-Lewis:Correcting-VectorSchemaRoot
Closed

GH-1142: Fix buffer ownership in VectorSchemaRoot addVector/removeVector#1145
John-W-Lewis wants to merge 2 commits into
apache:mainfrom
John-W-Lewis:Correcting-VectorSchemaRoot

Conversation

@John-W-Lewis
Copy link
Copy Markdown

What's Changed

VectorSchemaRoot#addVector and #removeVector previously created a new VectorSchemaRoot by sharing raw FieldVector references between the original and new root. This violates Arrow's ownership model: closing either root would release buffers still in use by the other, leading to use-after-free errors.

This fix uses TransferPair to properly transfer buffer ownership to the returned root, consistent with how VectorSchemaRoot#slice already handles this. After the operation, the original root's vectors are left in a transferred (empty) state and can be reused via allocateNew().

Tests added to verify that the returned root's data remains valid after the original root and input vector are closed.

Closes #1142.

Made with Cursor

…oveVector

Use TransferPair to properly transfer buffer ownership when creating a
new VectorSchemaRoot, preventing use-after-free when the original root
is closed.

Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions
Copy link
Copy Markdown

Thank you for opening a pull request!

Please label the PR with one or more of:

  • bug-fix
  • chore
  • dependencies
  • documentation
  • enhancement

Also, add the 'breaking-change' label if appropriate.

See CONTRIBUTING.md for details.

@John-W-Lewis
Copy link
Copy Markdown
Author

John-W-Lewis commented May 11, 2026

Being neither a chicken nor an egg, I understand that I am unable to label this as "bug-fix".
Thank you to whoever can do that.

@John-W-Lewis
Copy link
Copy Markdown
Author

There are also a couple of enhancements I would suggest for VectorSchemaRoot but, based on the advice I've seen, it seems better to describe those in a separate PR.

@axreldable
Copy link
Copy Markdown
Contributor

axreldable commented May 17, 2026

Hi @John-W-Lewis , thank you for the bug-fix. Yes, we need to wait for the maintainers to add the label and trigger the other checks. In the meantime, you can build locally, for example using Maven. Now, there is a format violation in TestVectorSchemaRoot.java.

mvn clean install
...
[ERROR] Failed to execute goal com.diffplug.spotless:spotless-maven-plugin:3.4.0:check (spotless-check) on project arrow-vector: The following files had format violations:
[ERROR]     src/test/java/org/apache/arrow/vector/TestVectorSchemaRoot.java
[ERROR]         @@ -363,8 +363,7 @@
[ERROR]          ······intVector2.setValueCount(5);
[ERROR]          ······intVector3.setValueCount(5);
[ERROR]          
[ERROR]         -······VectorSchemaRoot·original·=
[ERROR]         -··········new·VectorSchemaRoot(Arrays.asList(intVector1,·intVector2));
[ERROR]         +······VectorSchemaRoot·original·=·new·VectorSchemaRoot(Arrays.asList(intVector1,·intVector2));
[ERROR]          ······original.setRowCount(5);
[ERROR]          
[ERROR]          ······VectorSchemaRoot·result·=·original.addVector(1,·intVector3);
[ERROR] Run 'mvn spotless:apply' to fix these violations.

You can fix it with the suggested command. mvn spotless:apply

image

With the formatting fix, the build was successful for me, which means that the unit tests passed.

Copy link
Copy Markdown
Contributor

@axreldable axreldable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the nit (nitpick) and the doc clarification, LGTM.

FieldVector v = fieldVectors.get(i);
TransferPair transferPair = v.getTransferPair(v.getAllocator());
transferPair.transfer();
newVectors.add((FieldVector) transferPair.getTo());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Consider extracting the transferring logic into a separate method to avoid code duplication.
e.g.

private static FieldVector transferVector(FieldVector vector) {
    TransferPair transferPair = vector.getTransferPair(vector.getAllocator());
    transferPair.transfer();
    return (FieldVector) transferPair.getTo();
  }

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion., done.

*
* <p>Buffer ownership is transferred to the returned root via {@link TransferPair}. After this
* operation, the vectors in this root are left in a transferred (empty) state. The removed
* vector's data is not transferred and is released. This root can be reused by calling {@link
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The removed vector's data is not transferred and is released.

is released is not 100% accurate and can be confusing. The vector remains in the original schema root until it's closed/cleared.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for spotting this. You're right that "is released" was inaccurate.

Looking more closely, it was also a code issue. For the source vectors which are not removed, transfer() creates new ArrowBufs in the corresponding target vector, each referencing the same underlying allocation as its source counterpart, while the source vector's ArrowBufs release their references. So, after the operation, those source vectors' ArrowBufs no longer own any allocated memory. But for the removed vector, we weren't doing anything -- its ArrowBufs still held references to their underlying allocations, which would only be freed when the original root was eventually closed.

I've added a clear() on the removed vector so that its buffers also release their underlying allocations, making it consistent with the other vectors in the source root after the operation.

…d vector

- Extract transferVector() helper to reduce duplication in addVector/removeVector
- Fix Spotless formatting violation in TestVectorSchemaRoot.java
- Clear the removed vector's buffers in removeVector() so the original root
  is consistently empty after the operation, matching the documented behaviour

Co-authored-by: Cursor <cursoragent@cursor.com>
@John-W-Lewis
Copy link
Copy Markdown
Author

@axreldable Thank you. TestVectorSchemaRoot.java formatting fixed.

@John-W-Lewis
Copy link
Copy Markdown
Author

Closing in favour of #1149, which provides a better fix using a new VectorOps.shareCopy utility. That approach preserves the original intended semantics (both source and result roots remain readable) while making it safe through proper reference counting, rather than changing the semantics to transfer/empty the source.

Thank you @axreldable for your helpful review feedback on this PR -- it contributed to the thinking that led to the improved solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Managing ownership in VectorSchemaRoot#addVector, recent changes miss the main fault.

2 participants