Skip to content

branch-4.1 [fix](variant) Bind Variant search to nested indexes#63765

Merged
eldenmoon merged 1 commit into
apache:branch-4.1from
eldenmoon:branch-pick-63660-4.1
May 28, 2026
Merged

branch-4.1 [fix](variant) Bind Variant search to nested indexes#63765
eldenmoon merged 1 commit into
apache:branch-4.1from
eldenmoon:branch-pick-63660-4.1

Conversation

@eldenmoon
Copy link
Copy Markdown
Member

@eldenmoon eldenmoon commented May 27, 2026

cherry-pick #63660 to branch-4.1

### What problem does this PR solve?

Issue Number: N/A

Related PR: apache#63660

Problem Summary: Backport apache#63660 to branch-4.1. Bind Variant inverted-index search to the resolved scalar or nested Variant index reader, map nested leaf results back to the expected document scope, and preserve null bitmap semantics for empty bitset truth bitmaps. Adapt the segment index iterator call to the branch-4.1 ColumnReader API.

Cherry-picked from commits 8310d28 and 315ad31.

### Release note

Fix Variant inverted-index search binding for scalar and nested Variant paths.

### Check List (For Author)

- Test:
    - Unit Test: ./run-be-ut.sh --run --filter='*Variant*:FunctionSearchTest.TestBuildLeafQueryDirectUnknownClauseUsesLeafMapper:FunctionSearchNestedTest.*:BitSetQueryTest.EmptyTruthBitmapPreservesNullBitmap'
- Behavior changed: Yes. Fixes Variant inverted-index search binding and null bitmap handling.
- Does this need documentation: No
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

@eldenmoon eldenmoon marked this pull request as ready for review May 28, 2026 01:26
@eldenmoon eldenmoon requested a review from yiguolei as a code owner May 28, 2026 01:26
Copilot AI review requested due to automatic review settings May 28, 2026 01:26
@eldenmoon eldenmoon changed the title [fix](variant) Bind Variant search to nested indexes branch-4.1 [fix](variant) Bind Variant search to nested indexes May 28, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR cherry-picks Variant inverted-index search fixes into branch-4.1, separating Variant binding/nested search logic from the generic search() function and adding diagnostics for Variant index binding.

Changes:

  • Adds variant_inverted_index_search support for Variant subcolumn binding, direct BKD reads, UNKNOWN bitmap handling, and nested-doc mapping.
  • Updates search() query construction and BitSetQuery/BitSetWeight to preserve null bitmap semantics.
  • Adds diagnostics and focused unit tests for Variant binding, nested mapping, and empty-truth/null-bitmap behavior.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
be/src/exprs/function/function_search.cpp Integrates Variant resolver/evaluator, UNKNOWN queries, and direct-reader leaf query handling.
be/src/exprs/function/function_search.h Moves Variant-specific resolver/evaluator APIs into the new header.
be/src/exprs/function/variant_inverted_index_search.cpp Implements Variant field binding, nested leaf mapping, and nested search evaluation.
be/src/exprs/function/variant_inverted_index_search.h Declares Variant search binding and nested mapping APIs.
be/src/exprs/vsearch.cpp Adds Variant binding diagnostics during input collection and search evaluation.
be/src/storage/index/inverted/query_v2/bit_set_query/bit_set_query.h Adds null-bitmap storage to BitSetQuery.
be/src/storage/index/inverted/query_v2/bit_set_query/bit_set_weight.h Preserves scorers when only null bitmap data is present.
be/src/storage/index/inverted/inverted_index_stats.h Stores capped Variant binding diagnostics.
be/src/storage/index/inverted/inverted_index_profile.h Publishes binding diagnostics to runtime profile info strings.
be/src/storage/segment/segment.cpp Adds index-file probe and iterator creation diagnostics.
be/src/storage/segment/segment_iterator.cpp Passes stats into Variant subcolumn index discovery and logs iterator diagnostics.
be/src/storage/segment/variant/variant_column_reader.cpp Adds direct/inherited/missing subcolumn index candidate diagnostics.
be/src/storage/segment/variant/variant_column_reader.h Extends subcolumn index lookup API with optional stats.
be/test/exprs/function/function_search_test.cpp Adds tests for Variant missing fields, resolver selection, and direct scalar reads.
be/test/exprs/function/function_search_nested_test.cpp Adds tests for nested doc mapping and evaluator behavior.
be/test/storage/index/inverted/query_v2/boolean_query_test.cpp Adds coverage for empty truth bitmap with preserved null bitmap.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +141 to +146
const bool is_text_field =
column_type != nullptr && is_string_type(column_type->get_storage_field_type());
auto fb_it = _field_binding_map.find(field_name);
std::string analyzer_key;
if (is_text_field && is_variant_sub && fb_it != _field_binding_map.end() &&
fb_it->second->__isset.index_properties && !fb_it->second->index_properties.empty()) {
Comment thread be/src/exprs/vsearch.cpp
@@ -252,6 +303,11 @@ Status VSearchExpr::evaluate_inverted_index(VExprContext* context, uint32_t segm
if (bundle.iterators.empty() && !is_nested_query) {
Comment on lines +1938 to +1951
FieldReaderBinding binding;
binding.logical_field_name = "var.items.active";
binding.stored_field_name = "1.var.items.active";
binding.stored_field_wstr = L"1.var.items.active";
binding.column_type = bool_type;
binding.query_type = InvertedIndexQueryType::MATCH_PHRASE_QUERY;
binding.state = SearchFieldBindingState::BOUND;
TabletIndex index_meta;
binding.inverted_reader = std::make_shared<DummyInvertedIndexReader>(&index_meta);

std::string key = resolver.binding_key_for("1.var.items.active",
InvertedIndexQueryType::MATCH_PHRASE_QUERY);
binding.binding_key = key;
resolver._cache[key] = binding;
Comment on lines +2008 to +2029
FieldReaderBinding binding;
binding.logical_field_name = "var.items.active";
binding.stored_field_name = "1.var.items.active";
binding.stored_field_wstr = L"1.var.items.active";
binding.column_type = bool_type;
binding.query_type = InvertedIndexQueryType::MATCH_ANY_QUERY;
binding.state = SearchFieldBindingState::BOUND;
TabletIndex index_meta;
binding.inverted_reader = std::make_shared<DummyInvertedIndexReader>(&index_meta);

std::string key =
resolver.binding_key_for("1.var.items.active", InvertedIndexQueryType::MATCH_ANY_QUERY);
binding.binding_key = key;
resolver._cache[key] = binding;

inverted_index::query_v2::QueryPtr out;
std::string out_binding_key;
Status st = function_search->build_leaf_query(clause, context, resolver, &out, &out_binding_key,
"OR", 0, 10);
ASSERT_TRUE(st.ok());
ASSERT_NE(out, nullptr);
EXPECT_EQ(key, out_binding_key);
Comment on lines +2067 to +2088
FieldReaderBinding binding;
binding.logical_field_name = "var.items.flags.level";
binding.stored_field_name = "1.var.items.flags.level";
binding.stored_field_wstr = L"1.var.items.flags.level";
binding.column_type = int_type;
binding.query_type = InvertedIndexQueryType::MATCH_ANY_QUERY;
binding.state = SearchFieldBindingState::BOUND;
TabletIndex index_meta;
binding.inverted_reader = std::make_shared<DummyInvertedIndexReader>(&index_meta);

std::string key = resolver.binding_key_for("1.var.items.flags.level",
InvertedIndexQueryType::MATCH_ANY_QUERY);
binding.binding_key = key;
resolver._cache[key] = binding;

inverted_index::query_v2::QueryPtr out;
std::string out_binding_key;
Status st = function_search->build_leaf_query(clause, context, resolver, &out, &out_binding_key,
"OR", 0, 10);
ASSERT_TRUE(st.ok());
ASSERT_NE(out, nullptr);
EXPECT_EQ(key, out_binding_key);
@yiguolei
Copy link
Copy Markdown
Contributor

skip buildall

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label May 28, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@eldenmoon eldenmoon merged commit b360a98 into apache:branch-4.1 May 28, 2026
37 of 40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants