Optimize Cube output with OpenMP formatting and async writing#7425
Open
Ulikin wants to merge 2 commits into
Open
Optimize Cube output with OpenMP formatting and async writing#7425Ulikin wants to merge 2 commits into
Ulikin wants to merge 2 commits into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR optimizes Cube file output in
write_cube()by reducing the overhead of formatting and writing large real-space grid data.Main changes:
Motivation
When
out_chgis enabled for large systems, Cube output can become a noticeable cost. Initial profiling showed that most of the time was spent inwrite_cube_data_records.Further timing showed that the main bottleneck was not only disk I/O, but also converting a large number of floating-point grid values into text. Therefore, this PR first optimizes the formatting path, then adds asynchronous writing to hide part of the remaining write time.
Performance
Test case:
375 x 192 x 192SPIN1_CHG.cubeObserved timing:
write_cube_totalFor the complete
write_vdata_palgridoutput path:write_vdata_palgrid_totalCorrectness
The optimized path preserves:
The change is limited to Cube output and does not modify SCF logic or physical quantities.
Notes
The asynchronous writer uses a bounded queue with capacity 2. If the background I/O thread is slower than formatting,
push()blocks when the queue is full, preventing unbounded memory growth.This work is based on the LTS branch used in the course environment. If the target branch has diverged or contains concurrent course submissions, manual conflict resolution may be required.