Skip to content

Conversation

dwblaikie
Copy link
Contributor

@dwblaikie dwblaikie commented Aug 28, 2025

XOR the fiddled CheckIRId (add one (to make sure the too-uninteresting zero id isn't used) then bit reverse (to reduce chance of overlap with simple/early InstIds, and increase the chance that failure to remove this will produce numbers outside the valid range of indexes for a given store) into InstIds, then when they're used to retrieve an Inst, XOR this value back in to get the original index.

If an Id from a different CheckIRId is used, it's likely it'll be outside the bounds (especially for test cases with smaller numbers of insts and only a few checkirs) of the inst store and assert.

This doesn't feel quite baked, but would love some design feedback on at least these things, and anything else that springs to mind:

  • /some/ (at the moment, at least - maybe if/once every store has this bit fiddling this won't be true anymore) FixedSizeValueStores have to reference some other ValueStore to decode and validate indexes. For now I've added a janky ADL-based way for FixedSizeValueStore to get the CheckIRIdTag from its underlying store - with a default implementation that returns a trivial CheckIRIdTag that does nothing (by including all ids in the reserved prefix)

  • the usual naming concerns - CheckIRIdTag is a bit verbose and simultaneously not especially descriptive/motivating...

  • some? ConstantValueStores need their backing InstStore's CheckIRIdTag, but it seems some ConstantValueStores don't have a backing InstStore (& so get a trivial CheckIRIdTag) - I don't quite know enough to know what all the ConstantValueStores are, why this is, or if it's legitimate. (see the added ConstantValueStore ctor that takes a possibly-null const InstStore* to retrieve the CheckIRIdTag from, and the construction in import_ref.cpp:InternalAddImportIR)

  • printing - currently the reserved/global insts are printed the old way inst13 for instance, and then the CheckIR-specific insts are printed as inst.<CheckIRId>.<inst index> (eg: inst.0.17). This means some regex matching needs to be a bit more complicated to account for both - we could make the global ids more consistent (perhaps inst..13 or inst._.13 or somesuch - but those all feel a bit more confusing - maybe just inst.13 would make it more consistent, though still need some regex variation))

dwblaikie added a commit to dwblaikie/carbon-lang that referenced this pull request Aug 28, 2025
Found by WIP validation for this type of issue ongoing in carbon-language#5997

I'm not entirely sure how the one test update falls out of this change -
but it is from the same test that I originally reduced the problem from,
which is reassuring.

The reduced test case I investigated the issue with was this:
a.carbon:
    library "lib";
    interface I1(Other:! type) {
      let Result:! type;
    }
b.carbon:
    import library "lib";
    class T1 { }
    impl T1 as I1(Self) where .Result = Self { }

The SemIR dump diff looked like this:
89c89
<   %Main.import_ref.b6f = import_ref Main//lib, inst28 [no loc], unloaded
---
>   %Main.import_ref.b6f = import_ref Main//lib, inst27 [no loc], unloaded
96c96
<   %Main.import_ref.f7b: @i1.%I1.type (%I1.type.e87) = import_ref Main//lib, inst28 [no loc], loaded [symbolic = @i1.%Self (constants.%Self.c47)]
---
>   %Main.import_ref.f7b: @i1.%I1.type (%I1.type.e87) = import_ref Main//lib, inst27 [no loc], loaded [symbolic = @i1.%Self (constants.%Self.c47)]

Which is a difference, but given the `inst28`/`inst27` don't appear
anywhere else than these two lines, it doesn't give a terribly
meaningful diff/story about what changed - but perhaps it's
sufficient...

Not sure if this test ^ is sufficiently more interesting than the diff
update already in this patch. If so, happy to add the above as a new
test case.

Open to ideas.
github-merge-queue bot pushed a commit that referenced this pull request Aug 28, 2025
…#5998)

Found by WIP validation for this type of issue ongoing in #5997

I'm not entirely sure how the one test update falls out of this change -
but it is from the same test that I originally reduced the problem from,
which is reassuring.

The reduced test case I investigated the issue with was this:
`a.carbon`:
```
library "lib";
interface I1(Other:! type) {
   let Result:! type;
}
```
`b.carbon`:
```
import library "lib";
class T1 { }
impl T1 as I1(Self) where .Result = Self { }
```
The SemIR dump diff looked like this:
```
89c89
<   %Main.import_ref.b6f = import_ref Main//lib, inst28 [no loc], unloaded
---
>   %Main.import_ref.b6f = import_ref Main//lib, inst27 [no loc], unloaded
96c96
<   %Main.import_ref.f7b: @i1.%I1.type (%I1.type.e87) = import_ref Main//lib, inst28 [no loc], loaded [symbolic = @i1.%Self (constants.%Self.c47)]
---
>   %Main.import_ref.f7b: @i1.%I1.type (%I1.type.e87) = import_ref Main//lib, inst27 [no loc], loaded [symbolic = @i1.%Self (constants.%Self.c47)]
```
Which is a difference, but given the `inst28`/`inst27` don't appear
anywhere else than these two lines, it doesn't give a terribly
meaningful diff/story about what changed - but perhaps it's
sufficient...

Not sure if this test ^ is sufficiently more interesting than the diff
update already in this patch. If so, happy to add the above as a new
test case.

Open to ideas.
@dwblaikie dwblaikie force-pushed the debug_checks_checkirid_matching_value_store branch from e89bd87 to 9c3465d Compare August 28, 2025 23:14
@dwblaikie dwblaikie force-pushed the debug_checks_checkirid_matching_value_store branch from 9c3465d to d0734b3 Compare September 8, 2025 18:51
Otherwise invalid InstIds passed to canstants would cause
arbitrary/large memory allocation as the constant storage would be grown
to fit the giant invalid index.
… of any context

useful for dumping given only an id without the container or SemIR::File
it might've come from
The more-informative check failure (reproducing the original bug that
motivated this extra checking) looks like:

CHECK failure at ./toolchain/base/value_store.h:295: index < size_:
  Untagged index was outside of container range.
  Possibly tagged index 2113929262.
  Best-effort decomposition: CheckIRId: 30, index: 46.
  The CheckIRIdTag for this container is: 29

Not sure how this should be phrased - it all feels a bit awkward.

Maybe something like "CheckIRId was 30, should be 29. Index is 46"?
Maybe print the tagged index as hex?
@dwblaikie dwblaikie force-pushed the debug_checks_checkirid_matching_value_store branch from d0734b3 to ea8fb53 Compare September 9, 2025 19:47
@dwblaikie dwblaikie requested a review from zygoloid September 9, 2025 22:15
Comment on lines 135 to 141
auto Set(InstId inst_id, ConstantId const_id) -> void {
CARBON_DCHECK(inst_id.index >= 0);
if (static_cast<size_t>(inst_id.index) >= values_.size()) {
values_.Resize(inst_id.index + 1, default_);
auto index = insts_ ? insts_->GetRawIndex(inst_id) : inst_id.index;
if (static_cast<size_t>(index) >= values_.size()) {
values_.Resize(index + 1, default_);
}
values_.Get(inst_id) = const_id;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like if insts_ isn't set, we could easily allocate a 4-8GB value store here if inst_id is tagged.

It would be good to understand how we build a ConstantValueStore with no InstStore -- this is presumably happening in the call from import_ref.cpp, and I guess we're calling that function before we build the SemIR for the dependency. Maybe it's possible to determine and pass in the CheckIRId from there.

(As far as I'm aware, there are only two places where we create ConstantValueStores -- we have one for a file, and then each file has one for each ImportIRId to determine the constant values of that other file's instructions in this file's constant ID space. Both kinds should have tagged InstIds.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like if insts_ isn't set, we could easily allocate a 4-8GB value store here if inst_id is tagged.

Yep... had some of those failures while debugging this work in progress.

It would be good to understand how we build a ConstantValueStore with no InstStore -- this is presumably happening in the call from import_ref.cpp, and I guess we're calling that function before we build the SemIR for the dependency. Maybe it's possible to determine and pass in the CheckIRId from there.

Yep, that's the one that I think I came across.

(As far as I'm aware, there are only two places where we create ConstantValueStores -- we have one for a file, and then each file has one for each ImportIRId to determine the constant values of that other file's instructions in this file's constant ID space. Both kinds should have tagged InstIds.)

Hm, OK.

Looks like the case where the ConstantValueStores in import_ref.cpp are created without a SemIR::File are for the two special ImportIRs: ImportIRId::ApiForImpl and ImportIRId::Cpp - in both those cases, it seems the ImportIR never gets a SemIR::File?

Hmm, "A null entry is added if there is none, in which case this ID should not show up in instructions."

I see, so empty placeholder ones are added, and then optionally replaced if needed. It's the empty ones that are created without a SemIR::File?

Hmm - maybe the ApiForImpl one has no SemIR::File if it's unused. But for Cpp - it looks like it never has a SemIR::File? Am I understanding that correctly?

If so, what'd you reckon we do for the ConstantValueStores associated with these two (well, associated with Cpp in any case, and associated with ApiForImpl when there is no api for impl - I guess in this latter case some kind of nonce IdTag that's never valid/check-fails immediately on use would suffice, maybe the same is true for Cpp, that nothing should actually get stored in this ConstantValueStore?)

Yeah, looks like in these cases the ConstantValueStore is never used - so I can change the conditional use of insts_ to a CARBON_CHECK that insts_ is non-null and then use it unconditionally. Done in 280ed8c

But I'm open to better ideas here... maybe some way not to have these unused ConstantValueStores at all? Or make them even more explicitly/directly explosive-if-used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so the problem is specifically the calls to InternalAddImportIR from SetSpecialImportIR, which are just creating placeholders and not adding a real value store at all.

Perhaps we could add another ConstantValueStore constructor to build an invalid value store, and use that instead of calling InternalAddImportIR from SetSpecialImportIR in the case where there is no corresponding SemIR.

explicit FixedSizeValueStore(const ValueStoreT& size_source,
ConstRefType default_value) {
ConstRefType default_value)
: tag_(GetCheckIRIdTag(size_source)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know what types get used as ValueStoreT here? I wonder if we can use size_source.GetCheckIRIdTag() here instead of going via the helper function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, at least - ValueStoreT can also be

In theory if we push this id-validation into other containers beyond insts (which I'd be happy to - if you/folks are on board with it) - then eventually we'd have it for specific & generic store, and they'd all have the GetIdTag (nee GetCheckIRIdTag) function and there'd be no need for this template specialization indirection.

Certainly open to ideas.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, could we add a GetIdTag member function to those two stores and make it return IdTag() to explicitly indicate that no tagging is happening? Then I think we could use size_source.GetIdTag() here -- I think that'd be simpler overall than the function template specialization.

…urate

Add back in the original `IdToChunkIndices` but as a non-static member
(since it's needed to un-swizzle the CheckIRIdTag back out of the Id to
get to the raw index)
@dwblaikie dwblaikie requested a review from zygoloid September 17, 2025 20:28
@dwblaikie dwblaikie marked this pull request as ready for review September 17, 2025 21:07
@github-actions github-actions bot requested a review from josh11b September 17, 2025 21:07
@dwblaikie dwblaikie removed the request for review from josh11b September 17, 2025 21:08
@dwblaikie
Copy link
Contributor Author

Ping @zygoloid - further thoughts?

Copy link
Contributor

@zygoloid zygoloid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, a couple of comments but basically LGTM.

(Looks like it's worth making some updates to the first comment on the PR before that becomes the commit message.)

I forget -- have you done any performance testing here? I'm not really expecting much difference but value stores are pretty hot so it seems worth checking.

explicit ConstantValueStore(ConstantId default_value, const InstStore* insts)
: default_(default_value),
values_(insts ? insts->GetIdTag()
: IdTag(0, std::numeric_limits<int32_t>::max())),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
: IdTag(0, std::numeric_limits<int32_t>::max())),
: IdTag(),

explicit FixedSizeValueStore(const ValueStoreT& size_source,
ConstRefType default_value) {
ConstRefType default_value)
: tag_(GetCheckIRIdTag(size_source)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, could we add a GetIdTag member function to those two stores and make it return IdTag() to explicitly indicate that no tagging is happening? Then I think we could use size_source.GetIdTag() here -- I think that'd be simpler overall than the function template specialization.

Comment on lines 135 to 141
auto Set(InstId inst_id, ConstantId const_id) -> void {
CARBON_DCHECK(inst_id.index >= 0);
if (static_cast<size_t>(inst_id.index) >= values_.size()) {
values_.Resize(inst_id.index + 1, default_);
auto index = insts_ ? insts_->GetRawIndex(inst_id) : inst_id.index;
if (static_cast<size_t>(index) >= values_.size()) {
values_.Resize(index + 1, default_);
}
values_.Get(inst_id) = const_id;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so the problem is specifically the calls to InternalAddImportIR from SetSpecialImportIR, which are just creating placeholders and not adding a real value store at all.

Perhaps we could add another ConstantValueStore constructor to build an invalid value store, and use that instead of calling InternalAddImportIR from SetSpecialImportIR in the case where there is no corresponding SemIR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants