-
Notifications
You must be signed in to change notification settings - Fork 1.5k
WIP check for use of InstIds from the wrong SemIR File #5997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
WIP check for use of InstIds from the wrong SemIR File #5997
Conversation
Found by WIP validation for this type of issue ongoing in carbon-language#5997 I'm not entirely sure how the one test update falls out of this change - but it is from the same test that I originally reduced the problem from, which is reassuring. The reduced test case I investigated the issue with was this: a.carbon: library "lib"; interface I1(Other:! type) { let Result:! type; } b.carbon: import library "lib"; class T1 { } impl T1 as I1(Self) where .Result = Self { } The SemIR dump diff looked like this: 89c89 < %Main.import_ref.b6f = import_ref Main//lib, inst28 [no loc], unloaded --- > %Main.import_ref.b6f = import_ref Main//lib, inst27 [no loc], unloaded 96c96 < %Main.import_ref.f7b: @i1.%I1.type (%I1.type.e87) = import_ref Main//lib, inst28 [no loc], loaded [symbolic = @i1.%Self (constants.%Self.c47)] --- > %Main.import_ref.f7b: @i1.%I1.type (%I1.type.e87) = import_ref Main//lib, inst27 [no loc], loaded [symbolic = @i1.%Self (constants.%Self.c47)] Which is a difference, but given the `inst28`/`inst27` don't appear anywhere else than these two lines, it doesn't give a terribly meaningful diff/story about what changed - but perhaps it's sufficient... Not sure if this test ^ is sufficiently more interesting than the diff update already in this patch. If so, happy to add the above as a new test case. Open to ideas.
…#5998) Found by WIP validation for this type of issue ongoing in #5997 I'm not entirely sure how the one test update falls out of this change - but it is from the same test that I originally reduced the problem from, which is reassuring. The reduced test case I investigated the issue with was this: `a.carbon`: ``` library "lib"; interface I1(Other:! type) { let Result:! type; } ``` `b.carbon`: ``` import library "lib"; class T1 { } impl T1 as I1(Self) where .Result = Self { } ``` The SemIR dump diff looked like this: ``` 89c89 < %Main.import_ref.b6f = import_ref Main//lib, inst28 [no loc], unloaded --- > %Main.import_ref.b6f = import_ref Main//lib, inst27 [no loc], unloaded 96c96 < %Main.import_ref.f7b: @i1.%I1.type (%I1.type.e87) = import_ref Main//lib, inst28 [no loc], loaded [symbolic = @i1.%Self (constants.%Self.c47)] --- > %Main.import_ref.f7b: @i1.%I1.type (%I1.type.e87) = import_ref Main//lib, inst27 [no loc], loaded [symbolic = @i1.%Self (constants.%Self.c47)] ``` Which is a difference, but given the `inst28`/`inst27` don't appear anywhere else than these two lines, it doesn't give a terribly meaningful diff/story about what changed - but perhaps it's sufficient... Not sure if this test ^ is sufficiently more interesting than the diff update already in this patch. If so, happy to add the above as a new test case. Open to ideas.
e89bd87
to
9c3465d
Compare
9c3465d
to
d0734b3
Compare
Otherwise invalid InstIds passed to canstants would cause arbitrary/large memory allocation as the constant storage would be grown to fit the giant invalid index.
… of any context useful for dumping given only an id without the container or SemIR::File it might've come from
The more-informative check failure (reproducing the original bug that motivated this extra checking) looks like: CHECK failure at ./toolchain/base/value_store.h:295: index < size_: Untagged index was outside of container range. Possibly tagged index 2113929262. Best-effort decomposition: CheckIRId: 30, index: 46. The CheckIRIdTag for this container is: 29 Not sure how this should be phrased - it all feels a bit awkward. Maybe something like "CheckIRId was 30, should be 29. Index is 46"? Maybe print the tagged index as hex?
d0734b3
to
ea8fb53
Compare
auto Set(InstId inst_id, ConstantId const_id) -> void { | ||
CARBON_DCHECK(inst_id.index >= 0); | ||
if (static_cast<size_t>(inst_id.index) >= values_.size()) { | ||
values_.Resize(inst_id.index + 1, default_); | ||
auto index = insts_ ? insts_->GetRawIndex(inst_id) : inst_id.index; | ||
if (static_cast<size_t>(index) >= values_.size()) { | ||
values_.Resize(index + 1, default_); | ||
} | ||
values_.Get(inst_id) = const_id; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like if insts_
isn't set, we could easily allocate a 4-8GB value store here if inst_id
is tagged.
It would be good to understand how we build a ConstantValueStore
with no InstStore
-- this is presumably happening in the call from import_ref.cpp
, and I guess we're calling that function before we build the SemIR
for the dependency. Maybe it's possible to determine and pass in the CheckIRId
from there.
(As far as I'm aware, there are only two places where we create ConstantValueStore
s -- we have one for a file, and then each file has one for each ImportIRId
to determine the constant values of that other file's instructions in this file's constant ID space. Both kinds should have tagged InstId
s.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like if
insts_
isn't set, we could easily allocate a 4-8GB value store here ifinst_id
is tagged.
Yep... had some of those failures while debugging this work in progress.
It would be good to understand how we build a ConstantValueStore with no InstStore -- this is presumably happening in the call from import_ref.cpp, and I guess we're calling that function before we build the SemIR for the dependency. Maybe it's possible to determine and pass in the CheckIRId from there.
Yep, that's the one that I think I came across.
(As far as I'm aware, there are only two places where we create ConstantValueStores -- we have one for a file, and then each file has one for each ImportIRId to determine the constant values of that other file's instructions in this file's constant ID space. Both kinds should have tagged InstIds.)
Hm, OK.
Looks like the case where the ConstantValueStore
s in import_ref.cpp
are created without a SemIR::File
are for the two special ImportIR
s: ImportIRId::ApiForImpl
and ImportIRId::Cpp
- in both those cases, it seems the ImportIR
never gets a SemIR::File
?
Hmm, "A null entry is added if there is none, in which case this ID should not show up in instructions."
I see, so empty placeholder ones are added, and then optionally replaced if needed. It's the empty ones that are created without a SemIR::File
?
Hmm - maybe the ApiForImpl
one has no SemIR::File
if it's unused. But for Cpp
- it looks like it never has a SemIR::File
? Am I understanding that correctly?
If so, what'd you reckon we do for the ConstantValueStore
s associated with these two (well, associated with Cpp
in any case, and associated with ApiForImpl
when there is no api for impl - I guess in this latter case some kind of nonce IdTag
that's never valid/check-fails immediately on use would suffice, maybe the same is true for Cpp
, that nothing should actually get stored in this ConstantValueStore
?)
Yeah, looks like in these cases the ConstantValueStore is never used - so I can change the conditional use of insts_
to a CARBON_CHECK
that insts_
is non-null and then use it unconditionally. Done in 280ed8c
But I'm open to better ideas here... maybe some way not to have these unused ConstantValueStore
s at all? Or make them even more explicitly/directly explosive-if-used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, so the problem is specifically the calls to InternalAddImportIR
from SetSpecialImportIR
, which are just creating placeholders and not adding a real value store at all.
Perhaps we could add another ConstantValueStore
constructor to build an invalid value store, and use that instead of calling InternalAddImportIR
from SetSpecialImportIR
in the case where there is no corresponding SemIR
.
explicit FixedSizeValueStore(const ValueStoreT& size_source, | ||
ConstRefType default_value) { | ||
ConstRefType default_value) | ||
: tag_(GetCheckIRIdTag(size_source)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know what types get used as ValueStoreT
here? I wonder if we can use size_source.GetCheckIRIdTag()
here instead of going via the helper function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, at least - ValueStoreT
can also be
SpecificStore
herecarbon-lang/toolchain/lower/specific_coalescer.cpp
Lines 15 to 17 in cac3578
lowered_specifics_type_fingerprint_(specifics, {}), lowered_specific_fingerprint_(specifics, {}), equivalent_specifics_(specifics, SemIR::SpecificId::None) {} GenericStore
herelowered_specifics_(sem_ir.generics(),
In theory if we push this id-validation into other containers beyond insts (which I'd be happy to - if you/folks are on board with it) - then eventually we'd have it for specific & generic store, and they'd all have the GetIdTag
(nee GetCheckIRIdTag
) function and there'd be no need for this template specialization indirection.
Certainly open to ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, could we add a GetIdTag
member function to those two stores and make it return IdTag()
to explicitly indicate that no tagging is happening? Then I think we could use size_source.GetIdTag()
here -- I think that'd be simpler overall than the function template specialization.
…urate Add back in the original `IdToChunkIndices` but as a non-static member (since it's needed to un-swizzle the CheckIRIdTag back out of the Id to get to the raw index)
Ping @zygoloid - further thoughts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, a couple of comments but basically LGTM.
(Looks like it's worth making some updates to the first comment on the PR before that becomes the commit message.)
I forget -- have you done any performance testing here? I'm not really expecting much difference but value stores are pretty hot so it seems worth checking.
explicit ConstantValueStore(ConstantId default_value, const InstStore* insts) | ||
: default_(default_value), | ||
values_(insts ? insts->GetIdTag() | ||
: IdTag(0, std::numeric_limits<int32_t>::max())), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
: IdTag(0, std::numeric_limits<int32_t>::max())), | |
: IdTag(), |
explicit FixedSizeValueStore(const ValueStoreT& size_source, | ||
ConstRefType default_value) { | ||
ConstRefType default_value) | ||
: tag_(GetCheckIRIdTag(size_source)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, could we add a GetIdTag
member function to those two stores and make it return IdTag()
to explicitly indicate that no tagging is happening? Then I think we could use size_source.GetIdTag()
here -- I think that'd be simpler overall than the function template specialization.
auto Set(InstId inst_id, ConstantId const_id) -> void { | ||
CARBON_DCHECK(inst_id.index >= 0); | ||
if (static_cast<size_t>(inst_id.index) >= values_.size()) { | ||
values_.Resize(inst_id.index + 1, default_); | ||
auto index = insts_ ? insts_->GetRawIndex(inst_id) : inst_id.index; | ||
if (static_cast<size_t>(index) >= values_.size()) { | ||
values_.Resize(index + 1, default_); | ||
} | ||
values_.Get(inst_id) = const_id; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, so the problem is specifically the calls to InternalAddImportIR
from SetSpecialImportIR
, which are just creating placeholders and not adding a real value store at all.
Perhaps we could add another ConstantValueStore
constructor to build an invalid value store, and use that instead of calling InternalAddImportIR
from SetSpecialImportIR
in the case where there is no corresponding SemIR
.
XOR the fiddled
CheckIRId
(add one (to make sure the too-uninteresting zero id isn't used) then bit reverse (to reduce chance of overlap with simple/earlyInstId
s, and increase the chance that failure to remove this will produce numbers outside the valid range of indexes for a given store) intoInstId
s, then when they're used to retrieve anInst
, XOR this value back in to get the original index.If an Id from a different
CheckIRId
is used, it's likely it'll be outside the bounds (especially for test cases with smaller numbers of insts and only a few checkirs) of the inst store and assert.This doesn't feel quite baked, but would love some design feedback on at least these things, and anything else that springs to mind:
/some/ (at the moment, at least - maybe if/once every store has this bit fiddling this won't be true anymore)
FixedSizeValueStore
s have to reference some otherValueStore
to decode and validate indexes. For now I've added a janky ADL-based way forFixedSizeValueStore
to get theCheckIRIdTag
from its underlying store - with a default implementation that returns a trivialCheckIRIdTag
that does nothing (by including all ids in the reserved prefix)the usual naming concerns -
CheckIRIdTag
is a bit verbose and simultaneously not especially descriptive/motivating...some?
ConstantValueStore
s need their backingInstStore
'sCheckIRIdTag
, but it seems someConstantValueStore
s don't have a backingInstStore
(& so get a trivialCheckIRIdTag
) - I don't quite know enough to know what all theConstantValueStore
s are, why this is, or if it's legitimate. (see the addedConstantValueStore
ctor that takes a possibly-nullconst InstStore*
to retrieve theCheckIRIdTag
from, and the construction inimport_ref.cpp
:InternalAddImportIR
)printing - currently the reserved/global insts are printed the old way
inst13
for instance, and then theCheckIR
-specific insts are printed asinst.<CheckIRId>.<inst index>
(eg:inst.0.17
). This means some regex matching needs to be a bit more complicated to account for both - we could make the global ids more consistent (perhapsinst..13
orinst._.13
or somesuch - but those all feel a bit more confusing - maybe justinst.13
would make it more consistent, though still need some regex variation))