Optimize Cube output with OpenMP formatting and async writing by Ulikin · Pull Request #7425 · deepmodeling/abacus-develop

Ulikin · 2026-06-03T07:29:55Z

Summary

This PR optimizes Cube file output in write_cube() by reducing the overhead of formatting and writing large real-space grid data.

Main changes:

Replace repeated iostream-based Cube data formatting with preallocated buffers.
Parallelize Cube data formatting on the output rank with OpenMP.
Write Cube data in chunks while preserving the original data order.
Add an asynchronous writer with a bounded queue to overlap formatted data submission and file writing.
Keep the Cube header format, data precision, and output ordering unchanged.

Motivation

When out_chg is enabled for large systems, Cube output can become a noticeable cost. Initial profiling showed that most of the time was spent in write_cube_data_records.

Further timing showed that the main bottleneck was not only disk I/O, but also converting a large number of floating-point grid values into text. Therefore, this PR first optimizes the formatting path, then adds asynchronous writing to hide part of the remaining write time.

Performance

Test case:

System: Si256 LCAO
Grid: 375 x 192 x 192
Output: SPIN1_CHG.cube
Formatting threads: 8 OpenMP threads on the output rank

Observed timing:

Version	`write_cube_total`
Original	~5.8-6.0 s
OpenMP formatting	~0.77-1.36 s
OpenMP + async writing	~0.71-1.33 s

For the complete write_vdata_palgrid output path:

Version	`write_vdata_palgrid_total`
Original	~6.3-6.5 s
OpenMP formatting	~1.30-1.73 s
OpenMP + async writing	~1.24-1.71 s

Correctness

The optimized path preserves:

Cube header format;
data ordering;
output precision;
number of values per data line;
rank-0 ordered file writing.

The change is limited to Cube output and does not modify SCF logic or physical quantities.

Notes

The asynchronous writer uses a bounded queue with capacity 2. If the background I/O thread is slower than formatting, push() blocks when the queue is full, preventing unbounded memory growth.

This work is based on the LTS branch used in the course environment. If the target branch has diverged or contains concurrent course submissions, manual conflict resolution may be required.

dyzheng added 2 commits June 3, 2026 11:38

Optimize cube output formatting

2d4ef28

Optimize cube file data formatting and writing

6a9a0e4

mohanchen added the project_learning label Jun 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Cube output with OpenMP formatting and async writing#7425

Optimize Cube output with OpenMP formatting and async writing#7425
Ulikin wants to merge 2 commits into
deepmodeling:LTSfrom
Ulikin:optimize-cube-output

Ulikin commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Ulikin commented Jun 3, 2026

Summary

Motivation

Performance

Correctness

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants