Skip to content

[refactor](storage) drop StorageField wrapper and clean up related dead code#63233

Open
csun5285 wants to merge 2 commits into
apache:masterfrom
csun5285:refactor/drop-storagefield-wrapper
Open

[refactor](storage) drop StorageField wrapper and clean up related dead code#63233
csun5285 wants to merge 2 commits into
apache:masterfrom
csun5285:refactor/drop-storagefield-wrapper

Conversation

@csun5285
Copy link
Copy Markdown
Contributor

@csun5285 csun5285 commented May 14, 2026

Drop the StorageField wrapper and related dead code. StorageField was a thin layer over TabletColumn — every accessor just forwarded, all 11 subclasses were empty stubs with no caller distinguishing them via
dynamic_cast/typeid. After removing it, several dead pieces fell out.

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

StorageField was a thin wrapper over TabletColumn that cached a KeyCoder*
pointer and pre-resolved an owned tree of sub-fields. Aside from those two
extras, every accessor (type/length/name/is_nullable/unique_id/...) was a
direct forward to the underlying TabletColumn it held a copy of, and all 11
subclasses (CharField/VarcharField/.../HllAggField) were empty stubs with
zero callers distinguishing them (no dynamic_cast/typeid/static_cast).

Replace StorageField with TabletColumn throughout the storage layer:
- Schema now stores vector<TabletColumnPtr> instead of vector<StorageField*>,
  so copy/dtor are handled by shared_ptr ref counting; the deep-copy clone()
  path is gone.
- ColumnWriter family takes TabletColumnPtr (owned by writer) instead of
  unique_ptr<StorageField>; get_field()/_field renamed to get_column()/_column.
- IndexColumnWriter::create and ZoneMapIndexWriter::create take
  const TabletColumn* directly.
- DataTypeFactory drops the StorageField overload (kept the existing
  TabletColumn one).
- Five row_cursor.cpp encode sites switch to free helpers
  (get_key_coder(type)->...), the only consumers of StorageField's
  _key_coder cache. Per-call switch overhead is negligible since this is
  not a hot path; production hot paths (vertical/segment_writer,
  indexed_column_*) already cache KeyCoder locally without going through
  StorageField.
- _has_char_type, _init_column_mapping, index_builder, variant_*, and
  segment_iterator switch to TabletColumn::get_sub_column/get_subtype_count
  in place of StorageField::get_sub_field/get_sub_field_count.
- Rename row_cursor's _encode_field to _encode_column_value and
  column_schema(cid) to column(cid) to match the new semantics; rename
  local null_field/bigint_field to null_column_ptr/length_column_ptr.

Net effect: -434 lines (field.h removed plus dead 11-subclass hierarchy),
no behavioral change. BE UT (302 tests across tablet_schema, storage_types,
KeyCoder, InvertedIndex, BkdIndex, ColumnWriter, ZoneMap, RowCursor) all
pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@csun5285
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

…rs + minor cleanup

Follow-up to "drop StorageField wrapper". Three small refactors:

1. ColumnWriter::cell_size() helper

Replace 7 occurrences of `field_type_size(get_column()->type())` in
ColumnWriter::append_nullable / ScalarColumnWriter::append_* with a single
inline cell_size() method on the base class. Pure DRY, no behavior change.

2. VariantWriter ctors: drop redundant raw `column` param

VariantDocCompactWriter / VariantSubcolumnWriter / VariantColumnWriter
previously took both `(const TabletColumn* column, TabletColumnPtr owned_column)`
where the call site always passed the same column in both positions:

  std::make_unique<VariantDocCompactWriter>(
      opts, column, std::make_shared<TabletColumn>(*column));

The raw `column` was stored as `_tablet_column` for direct getter access, but
the same pointer is available via the base class's `get_column()` (returning
`_column.get()`). Collapse to a single `TabletColumnPtr column` param and drop
the redundant `_tablet_column` member from both ColumnWriter subclasses; rewrite
their 6 method-body usages to call `get_column()`.

VariantColumnWriterImpl is the pimpl impl of VariantColumnWriter and is NOT a
ColumnWriter subclass, so it keeps its own `_tablet_column` member; the outer
VariantColumnWriter ctor now passes `get_column()` to the impl ctor.

Semantic change: the raw pointer formerly pointed to the caller's original
TabletColumn; now it points to the base class's owned copy (made via
`std::make_shared<TabletColumn>(*column)` at the call site). Lifetime-safe and
content-equivalent for read-only access.

3. olap_common.h: update stale comments

  - FieldType enum: drop the dangling "Field" reference (StorageField was
    removed by the parent commit, doris::Field is unrelated here).
  - FieldAggregationMethod enum: restore the previous verbose comment style
    with the class name updated to TabletColumn.
@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29579 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c322ef733f915fa262c959cac119947984d1a3de, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17645	3852	3839	3839
q2	q3	10718	900	618	618
q4	4663	456	351	351
q5	7465	1331	1139	1139
q6	183	167	140	140
q7	944	943	745	745
q8	9316	1428	1301	1301
q9	5552	5393	5331	5331
q10	6268	2093	1834	1834
q11	482	268	259	259
q12	633	414	292	292
q13	18081	3283	2771	2771
q14	295	285	260	260
q15	q16	900	877	791	791
q17	1011	1089	777	777
q18	6475	5695	5664	5664
q19	1150	1261	1034	1034
q20	502	407	263	263
q21	4553	2277	1868	1868
q22	421	357	302	302
Total cold run time: 97257 ms
Total hot run time: 29579 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4165	4083	4094	4083
q2	q3	4652	4768	4205	4205
q4	2072	2165	1373	1373
q5	4996	5017	5256	5017
q6	183	165	130	130
q7	2071	1765	1793	1765
q8	3632	3234	3294	3234
q9	8612	8546	8571	8546
q10	4498	4498	4230	4230
q11	616	428	416	416
q12	702	753	509	509
q13	3542	3633	2906	2906
q14	310	314	289	289
q15	q16	792	789	681	681
q17	1362	1323	1283	1283
q18	8013	7119	7123	7119
q19	1170	1146	1139	1139
q20	2209	2230	1955	1955
q21	6226	5427	4922	4922
q22	540	527	425	425
Total cold run time: 60363 ms
Total hot run time: 54227 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 172234 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c322ef733f915fa262c959cac119947984d1a3de, data reload: false

query5	4306	683	540	540
query6	324	225	201	201
query7	4227	580	319	319
query8	328	238	218	218
query9	8843	4062	4061	4061
query10	452	353	289	289
query11	5776	2448	2284	2284
query12	184	132	129	129
query13	1279	624	474	474
query14	6583	5366	5089	5089
query14_1	4329	4336	4305	4305
query15	206	200	183	183
query16	997	468	434	434
query17	1133	759	630	630
query18	2763	519	360	360
query19	218	206	158	158
query20	137	131	137	131
query21	229	135	112	112
query22	13643	13605	13509	13509
query23	17377	16335	16700	16335
query23_1	16331	16321	16304	16304
query24	7537	1843	1428	1428
query24_1	1456	1417	1407	1407
query25	598	571	486	486
query26	1452	340	184	184
query27	2875	622	344	344
query28	4339	1970	1972	1970
query29	1047	654	543	543
query30	316	257	208	208
query31	1112	1081	952	952
query32	91	75	74	74
query33	579	369	313	313
query34	1184	1155	646	646
query35	796	789	688	688
query36	1334	1286	1203	1203
query37	152	115	92	92
query38	3228	3144	3074	3074
query39	932	943	910	910
query39_1	876	877	860	860
query40	234	161	144	144
query41	71	68	69	68
query42	113	110	110	110
query43	326	332	284	284
query44	
query45	212	205	249	205
query46	1048	1231	741	741
query47	2287	2288	2166	2166
query48	392	424	293	293
query49	632	545	444	444
query50	682	282	216	216
query51	4405	4349	4225	4225
query52	105	105	91	91
query53	257	272	201	201
query54	319	272	263	263
query55	93	89	83	83
query56	299	308	313	308
query57	1400	1378	1318	1318
query58	298	255	262	255
query59	1609	1678	1426	1426
query60	364	326	321	321
query61	175	165	162	162
query62	668	623	564	564
query63	241	198	200	198
query64	2327	836	708	708
query65	
query66	1700	504	397	397
query67	30497	29729	30300	29729
query68	
query69	465	347	299	299
query70	1036	1000	995	995
query71	305	276	269	269
query72	3015	2774	2427	2427
query73	851	776	433	433
query74	5091	4913	4760	4760
query75	2810	2725	2356	2356
query76	2310	1158	751	751
query77	449	430	341	341
query78	13308	13308	12538	12538
query79	1524	978	739	739
query80	1376	555	503	503
query81	510	294	256	256
query82	1280	154	117	117
query83	348	283	258	258
query84	259	134	109	109
query85	924	516	441	441
query86	444	352	308	308
query87	3468	3403	3222	3222
query88	3547	2701	2654	2654
query89	434	371	333	333
query90	1898	183	179	179
query91	179	171	140	140
query92	79	71	72	71
query93	956	964	571	571
query94	724	340	313	313
query95	672	459	360	360
query96	999	808	318	318
query97	2738	2689	2528	2528
query98	241	235	227	227
query99	1110	1114	985	985
Total cold run time: 256525 ms
Total hot run time: 172234 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 67.88% (112/165) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.60% (20589/38412)
Line Coverage 37.18% (194795/523906)
Region Coverage 33.59% (152469/453864)
Branch Coverage 34.58% (66440/192137)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 85.98% (141/164) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.81% (27766/37619)
Line Coverage 57.64% (301226/522565)
Region Coverage 54.93% (251797/458397)
Branch Coverage 56.39% (108775/192891)

@csun5285
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29501 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit dba29514cf661aeb26b3d7da818211a793307cac, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17670	3908	3869	3869
q2	q3	10718	885	610	610
q4	4659	462	339	339
q5	7461	1335	1136	1136
q6	191	166	141	141
q7	915	932	740	740
q8	9325	1372	1307	1307
q9	5633	5443	5337	5337
q10	6250	2100	1839	1839
q11	456	261	261	261
q12	631	415	295	295
q13	18180	3272	2779	2779
q14	292	285	264	264
q15	q16	903	859	786	786
q17	954	1017	715	715
q18	6512	5782	5408	5408
q19	1156	1248	1106	1106
q20	504	392	257	257
q21	4644	2384	1986	1986
q22	481	448	326	326
Total cold run time: 97535 ms
Total hot run time: 29501 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4789	4710	4746	4710
q2	q3	4666	4821	4214	4214
q4	2130	2192	1446	1446
q5	5008	5000	5360	5000
q6	200	182	137	137
q7	2095	1783	1616	1616
q8	3310	3078	3083	3078
q9	8518	8420	8395	8395
q10	4475	4489	4241	4241
q11	612	414	397	397
q12	699	745	513	513
q13	3212	3616	2960	2960
q14	310	294	280	280
q15	q16	963	805	702	702
q17	1362	1311	1284	1284
q18	8033	7254	7160	7160
q19	1170	1138	1206	1138
q20	2216	2214	1936	1936
q21	6143	5460	4854	4854
q22	556	512	439	439
Total cold run time: 60467 ms
Total hot run time: 54500 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 171293 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit dba29514cf661aeb26b3d7da818211a793307cac, data reload: false

query5	4317	667	508	508
query6	330	223	204	204
query7	4368	597	303	303
query8	329	226	221	221
query9	8823	4041	4032	4032
query10	461	342	305	305
query11	5794	2382	2226	2226
query12	186	130	127	127
query13	1344	621	443	443
query14	6692	5361	5072	5072
query14_1	4383	4444	4334	4334
query15	219	205	190	190
query16	1020	431	450	431
query17	1307	761	644	644
query18	2709	491	346	346
query19	248	198	153	153
query20	136	129	122	122
query21	214	135	118	118
query22	13533	14101	14519	14101
query23	17328	16550	16229	16229
query23_1	16305	16279	16229	16229
query24	7414	1746	1330	1330
query24_1	1336	1348	1366	1348
query25	540	477	424	424
query26	1278	301	168	168
query27	2674	588	339	339
query28	4388	1946	1919	1919
query29	987	669	530	530
query30	307	233	197	197
query31	1118	1065	942	942
query32	86	74	74	74
query33	550	361	316	316
query34	1159	1135	635	635
query35	769	797	673	673
query36	1360	1305	1125	1125
query37	152	106	91	91
query38	3183	3136	3085	3085
query39	925	918	887	887
query39_1	889	876	866	866
query40	236	159	139	139
query41	70	68	66	66
query42	109	110	107	107
query43	318	338	278	278
query44	
query45	210	201	191	191
query46	1045	1221	765	765
query47	2284	2350	2176	2176
query48	411	429	294	294
query49	656	559	438	438
query50	724	280	235	235
query51	4341	4308	4185	4185
query52	107	106	92	92
query53	257	277	204	204
query54	335	285	266	266
query55	96	93	87	87
query56	321	318	308	308
query57	1430	1380	1298	1298
query58	308	287	276	276
query59	1570	1626	1410	1410
query60	399	334	319	319
query61	162	155	185	155
query62	668	619	559	559
query63	236	195	204	195
query64	2328	815	669	669
query65	
query66	1714	501	398	398
query67	30085	29870	29257	29257
query68	
query69	467	340	300	300
query70	1012	983	991	983
query71	308	260	267	260
query72	2995	2716	2460	2460
query73	808	775	414	414
query74	5102	4881	4696	4696
query75	2792	2650	2320	2320
query76	2276	1152	727	727
query77	413	417	343	343
query78	12980	12967	12343	12343
query79	1497	1025	763	763
query80	1376	555	469	469
query81	500	283	239	239
query82	1248	155	123	123
query83	359	277	250	250
query84	256	144	114	114
query85	915	517	465	465
query86	441	336	304	304
query87	3402	3333	3214	3214
query88	3566	2676	2672	2672
query89	435	379	332	332
query90	1911	181	177	177
query91	177	165	142	142
query92	75	75	71	71
query93	952	953	572	572
query94	719	338	297	297
query95	675	479	351	351
query96	1010	788	368	368
query97	2675	2678	2580	2580
query98	233	228	226	226
query99	1110	1121	993	993
Total cold run time: 255233 ms
Total hot run time: 171293 ms

@csun5285 csun5285 changed the title [refactor](storage) drop StorageField wrapper and others dead code [refactor](storage) drop StorageField wrapper and clean up related dead code May 14, 2026
@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 69.19% (119/172) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.64% (20603/38413)
Line Coverage 37.21% (194930/523904)
Region Coverage 33.61% (152566/453867)
Branch Coverage 34.60% (66485/192139)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 91.23% (156/171) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.77% (27752/37619)
Line Coverage 57.61% (301012/522530)
Region Coverage 54.94% (251796/458291)
Branch Coverage 56.40% (108782/192867)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants