Skip to content

Conversation

RyanGlScott
Copy link
Collaborator

Currently, the Copilot.Theorem.What4.prove function returns a list of results, where each result contains a SatResult that describes whether a property is Valid, Invalid, or Unknown. The Invalid result has the limitation that it does not give any information about a specific counterexample that could drive Copilot into falsifying the property, however. This makes it challenging to interpret what the results of prove mean.

This introduces a new proveWithCounterExample function to Copilot.Theorem.What4 that mirrors the type signature of prove, except that it returns a variant of SatResult (SatResultCex) where the Invalid equivalent (InvalidCex) encodes counterexample information. copilot-theorem users can then interpret the results of the counterexample in Copilot specifications.

As part of this commit, we change the definition of the CounterExample data type. This is safe to do, as CounterExample was completely unused prior to this commit, nor was it exported.

Fixes #589.

Currently, the `Copilot.Theorem.What4.prove` function returns a list of
results, where each result contains a `SatResult` that describes whether a
property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the
limitation that it does not give any information about a specific
counterexample that could drive Copilot into falsifying the property, however.
This makes it challenging to interpret what the results of prove mean.

To define a companion function to `prove` that also returns counterexample
information upon a failed proof, it is convenient to be able to display `Type`
information in panic messages.

This commit derives a basic `Show` instance for `Type` so that
`copilot-theorem` can display them whenever an internal invariant is violated.
…Copilot-Language#589.

Currently, the `Copilot.Theorem.What4.prove` function returns a list of
results, where each result contains a `SatResult` that describes whether a
property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the
limitation that it does not give any information about a specific
counterexample that could drive Copilot into falsifying the property, however.
This makes it challenging to interpret what the results of prove mean.

To update the `valFromExpr` function in order to produce concrete array values
for counterexample purposes, we need to call the `Array` data constructor,
which has a `Typed` constraint. However, the `XEmptyArray` and `XArray` data
constructors do not record evidence that their array element types were
instances of the `Typed` class, which makes it impossible to use them in
`valFromExpr`.

This commit adds the necessary constraints to each data constructor to make
their array elements `Typed`.
@ivanperez-keera
Copy link
Member

Change Manager: The build has failed. Please check.

@RyanGlScott RyanGlScott force-pushed the develop-copilot-theorem-counterexamples-take-three branch from 2321709 to 2aaeea1 Compare February 25, 2025 10:48
@RyanGlScott
Copy link
Collaborator Author

My apologies, I was accidentally relying on code that only typechecked using GHC 8.8 or later. I've fixed this now—PTAL.

@RyanGlScott
Copy link
Collaborator Author

Implementor: Fix implemented, review requested.

Copilot-Language#589.

Currently, the `Copilot.Theorem.What4.prove` function returns a list of
results, where each result contains a `SatResult` that describes whether a
property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the
limitation that it does not give any information about a specific
counterexample that could drive Copilot into falsifying the property, however.
This makes it challenging to interpret what the results of prove mean.

The `valFromExpr` function (which produces concrete values when making a
counterexample) was lacking cases for `XEmptyArray` and `XArray`, so it would
fail if the function was called on these values.

This commit adds these missing cases, which make use of the `Typed` evidence
added to `XEmptyArray` and `XArray` in a previous commit. We do not yet add a
case for structs, which prove more challenging.
…properties. Refs Copilot-Language#589.

Currently, the `Copilot.Theorem.What4.prove` function returns a list of
results, where each result contains a `SatResult` that describes whether a
property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the
limitation that it does not give any information about a specific
counterexample that could drive Copilot into falsifying the property, however.
This makes it challenging to interpret what the results of prove mean.

This introduces a new `proveWithCounterExample` function to
`Copilot.Theorem.What4` that mirrors the type signature of `prove`, except that
it returns a variant of `SatResult` (`SatResultCex`) where the `Invalid`
equivalent (`InvalidCex`) encodes counterexample information. `copilot-theorem`
users can then interpret the results of the counterexample in Copilot
specifications.

As part of this commit, we change the definition of the `CounterExample` data
type. This is safe to do, as `CounterExample` was completely unused prior to
this commit, nor was it exported.
Copilot-Language#589.

Currently, the `Copilot.Theorem.What4.prove` function returns a list of
results, where each result contains a `SatResult` that describes whether a
property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the
limitation that it does not give any information about a specific
counterexample that could drive Copilot into falsifying the property, however.
This makes it challenging to interpret what the results of prove mean.

The `CounterExample`, `SatResultCex`, and `CopilotValue` data types lack `Show`
and `ShowF` instances, which makes it impractical for users to display them.

This commit adds `Show` and `ShowF` instances for all three data types so that
they can be shown.
…ilot-Language#589.

Currently, the `Copilot.Theorem.What4.prove` function returns a list of
results, where each result contains a `SatResult` that describes whether a
property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the
limitation that it does not give any information about a specific
counterexample that could drive Copilot into falsifying the property, however.
This makes it challenging to interpret what the results of prove mean.

A prior commit has introduced a `proveWithCounterExample` function, which
provides a counterexample when a property is proven invalid.

This commit updates the test suite to ensure that basic uses of
`proveWithCounterExample` work as intended.
…lot-Language#589.

Currently, the `Copilot.Theorem.What4.prove` function returns a list of
results, where each result contains a `SatResult` that describes whether a
property is `Valid`, `Invalid`, or `Unknown`. The `Invalid` result has the
limitation that it does not give any information about a specific
counterexample that could drive Copilot into falsifying the property, however.
This makes it challenging to interpret what the results of prove mean.

To demonstrate how to effectively use the newly added `proveWithCounterExample`
function, this commit adds a new `examples/what4/ArithmeticCounterExamples.hs`
function that behaves like `examples/what4/Arithmetic.hs`, but using
`proveWithCounterExamples` instead of `prove`.
@RyanGlScott RyanGlScott force-pushed the develop-copilot-theorem-counterexamples-take-three branch from 2aaeea1 to 9861071 Compare February 27, 2025 14:01
@RyanGlScott
Copy link
Collaborator Author

Implementor: Fix implemented, review requested.

@ivanperez-keera
Copy link
Member

ivanperez-keera commented Feb 28, 2025

Change Manager: Verified that:

  • Solution is implemented:
    • The code proposed compiles and passes all tests. Details:
      Build log: https://github.com/Copilot-Language/copilot/runs/37925988037
    • The solution proposed produces the expected result. Details:
      The following Dockerfile runs an example included in the PR that demonstrates the new feature by printing the results of proving some combination of valid and invalid properties, for which counterexamples are produced when applicable. I have checked by visual inspection that the results are produced, and that the message is well formatted and informative enough:
      FROM ubuntu:focal
      
      ENV DEBIAN_FRONTEND=noninteractive
      RUN apt-get update
      
      RUN apt-get install --yes \
            libz-dev \
            git \
            curl \
            gcc \
            g++ \
            make \
            libgmp3-dev  \
            pkg-config \
            z3
      
      RUN mkdir -p $HOME/.ghcup/bin
      RUN curl https://downloads.haskell.org/~ghcup/0.1.19.2/x86_64-linux-ghcup-0.1.19.2 -o $HOME/.ghcup/bin/ghcup
      RUN chmod a+x $HOME/.ghcup/bin/ghcup
      ENV PATH=$PATH:/root/.ghcup/bin/
      ENV PATH=$PATH:/root/.cabal/bin/
      
      SHELL ["/bin/bash", "-c"]
      
      RUN ghcup install ghc 9.10
      RUN ghcup install cabal 3.2
      RUN ghcup set ghc 9.10
      RUN cabal update
      
      SHELL ["/bin/bash", "-c"]
      CMD git clone $REPO && cd $NAME && git checkout $COMMIT && cd .. \
        && cabal v1-sandbox init \
        && cabal v1-install alex happy --constraint='happy <= 2' \
        && cabal v1-install $NAME/copilot**/ \
        && cabal v1-exec -- runhaskell $NAME/copilot/examples/what4/ArithmeticCounterExamples.hs \
        && echo "Success"
      Command (substitute variables based on new path after merge):
      $ docker run -e "REPO=https://github.com/GaloisInc/copilot-1" -e "NAME=copilot-1" -e "COMMIT=9861071718850fb8ef85e533a6c0d6d083f69094" -it copilot-verify-589
      
  • Implementation is documented. Details:
    All new top-level definitions include haddock documentations, as well as the internals of the code and the examples.
  • Change history is clear.
  • Commit messages are clear.
  • Changelogs are updated.
  • Examples are updated. Details:
    A new example is introduced to demonstrate the feature.
  • Required version bumps are evaluated. Details:
    Bump required; the new feature changes the types of some value constructors in copilot-theorem, thus affecting the public API.

@ivanperez-keera ivanperez-keera merged commit f39b9da into Copilot-Language:master Feb 28, 2025
1 check passed
@RyanGlScott RyanGlScott deleted the develop-copilot-theorem-counterexamples-take-three branch May 18, 2025 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

copilot-theorem: Produce a counterexample when the What4 backend falsifies a property
2 participants