-
Notifications
You must be signed in to change notification settings - Fork 71
copilot-theorem
: Add function to produce counterexamples for invalid properties. Refs #589.
#595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
ivanperez-keera
merged 10 commits into
Copilot-Language:master
from
GaloisInc:develop-copilot-theorem-counterexamples-take-three
Feb 28, 2025
Merged
copilot-theorem
: Add function to produce counterexamples for invalid properties. Refs #589.
#595
ivanperez-keera
merged 10 commits into
Copilot-Language:master
from
GaloisInc:develop-copilot-theorem-counterexamples-take-three
Feb 28, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. To define a companion function to `prove` that also returns counterexample information upon a failed proof, it is convenient to be able to display `Type` information in panic messages. This commit derives a basic `Show` instance for `Type` so that `copilot-theorem` can display them whenever an internal invariant is violated.
…Copilot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. To update the `valFromExpr` function in order to produce concrete array values for counterexample purposes, we need to call the `Array` data constructor, which has a `Typed` constraint. However, the `XEmptyArray` and `XArray` data constructors do not record evidence that their array element types were instances of the `Typed` class, which makes it impossible to use them in `valFromExpr`. This commit adds the necessary constraints to each data constructor to make their array elements `Typed`.
Change Manager: The build has failed. Please check. |
2321709
to
2aaeea1
Compare
My apologies, I was accidentally relying on code that only typechecked using GHC 8.8 or later. I've fixed this now—PTAL. |
Implementor: Fix implemented, review requested. |
Copilot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. The `valFromExpr` function (which produces concrete values when making a counterexample) was lacking cases for `XEmptyArray` and `XArray`, so it would fail if the function was called on these values. This commit adds these missing cases, which make use of the `Typed` evidence added to `XEmptyArray` and `XArray` in a previous commit. We do not yet add a case for structs, which prove more challenging.
…properties. Refs Copilot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. This introduces a new `proveWithCounterExample` function to `Copilot.Theorem.What4` that mirrors the type signature of `prove`, except that it returns a variant of `SatResult` (`SatResultCex`) where the `Invalid` equivalent (`InvalidCex`) encodes counterexample information. `copilot-theorem` users can then interpret the results of the counterexample in Copilot specifications. As part of this commit, we change the definition of the `CounterExample` data type. This is safe to do, as `CounterExample` was completely unused prior to this commit, nor was it exported.
Copilot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. The `CounterExample`, `SatResultCex`, and `CopilotValue` data types lack `Show` and `ShowF` instances, which makes it impractical for users to display them. This commit adds `Show` and `ShowF` instances for all three data types so that they can be shown.
…ilot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. A prior commit has introduced a `proveWithCounterExample` function, which provides a counterexample when a property is proven invalid. This commit updates the test suite to ensure that basic uses of `proveWithCounterExample` work as intended.
…lot-Language#589. Currently, the `Copilot.Theorem.What4.prove` function returns a list of results, where each result contains a `SatResult` that describes whether a property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean. To demonstrate how to effectively use the newly added `proveWithCounterExample` function, this commit adds a new `examples/what4/ArithmeticCounterExamples.hs` function that behaves like `examples/what4/Arithmetic.hs`, but using `proveWithCounterExamples` instead of `prove`.
2aaeea1
to
9861071
Compare
Implementor: Fix implemented, review requested. |
Change Manager: Verified that:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently, the
Copilot.Theorem.What4.prove
function returns a list of results, where each result contains aSatResult
that describes whether a property isValid
,Invalid
, orUnknown
. TheInvalid
result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean.This introduces a new
proveWithCounterExample
function toCopilot.Theorem.What4
that mirrors the type signature ofprove
, except that it returns a variant ofSatResult
(SatResultCex
) where theInvalid
equivalent (InvalidCex
) encodes counterexample information.copilot-theorem
users can then interpret the results of the counterexample in Copilot specifications.As part of this commit, we change the definition of the
CounterExample
data type. This is safe to do, asCounterExample
was completely unused prior to this commit, nor was it exported.Fixes #589.