Skip to content

mercari/cel-smt

Repository files navigation

cel-smt

This repository implements a portion of Google's Common Expression Language (CEL) with an SMT solver (Z3) backend by leveraging cel-python. The primary motivation is to allow performing symbolic analysis on Google Cloud IAM configurations which use CEL for conditions.

Overview

The following features are implemented:

  • Core CEL types:
    • Scalars: bool, int, uint, double, string, bytes, null
    • Composite: list, map
    • Error pseudo values
  • Literals for supported types
  • Operators:
    • Comparisons: equality, lt, gt, lte, gte, ne, in
    • member dot access and index operator
    • math operators: addition, subtraction, multiplication, division, negation, modulus
    • Logic operators: and, or, ternary
  • Functions:
    • Casting functions for supported types
    • Strings: contains, startsWith, endsWith
    • size function for containers/strings/bytes

For all of the above, cel-smt passes the official CEL conformance tests, with a few exceptions:

  • Some floating point operations are not exactly correct, particularly with respect to overflow, rounding modes, NaNs and Infs.
  • Several conversion functions have gaps and return a CEL runtime error instead of the correct result:
    • Conversions from strings to doubles is not supported as this is unsupported in z3.
    • Conversion between bytes and strings are not yet supported
  • Map and list lookups that rely on equality between different types are not supported and instead will not find the value (for example, a map key that is a uint and looking it up using an int, or using the index operator on a list with a double)

Further, the following things are not implemented and attempting to use them will throw a NotImplementedError (these tests are marked as "skipped" in pytest):

  • Types:
    • timestamp and duration
    • Everything related to protobufs
    • CEL Type checking
    • the type value (runtime type reflection)
  • Macros (e.g. has)
  • Bitwise operations
  • Functions:
    • Bitwise functions
    • Floating point
    • Most string functions
    • Floating point functions
    • Anything else not specified above
  • Specific error messages are not guaranteed, just if there is an error propogated or not.
  • Anything cel-python doesn't support

In practice, the current coverage has been sufficient to analyze in-the-wild GCP IAM expressions that are in use at Mercari.

Usage

General usage looks like:

import z3
import celpy
from cel_smt import CelSmtTranspiler, assert_true

source = "a == b"

env = celpy.Environment()

vars_array = CelSmtTranspiler.create_vars_array()

ast = env.compile(source)
expr = CelSmtTranspiler.transpile(ast, vars_array)

solver = z3.Solver()
# assert_true is a helper function that returns true iff the expression is a "true" CEL boolean value
solver.add(assert_true(expr))
print(solver.check())
print(solver.model()[vars_array])

This will return:

sat
cel_map_inner(Concat(Unit(simple_string("b")),
                     Unit(simple_string("a"))),
              Concat(Unit(cel_null), Unit(cel_null)))

Which is a valid sat model for the expression provided representing the map {"a":null, "b": null}

Workload Identity Pool Analysis Demo

In the demo directory there is a script gcp_widp.py that can be run with uv run ./demo/gcp_widp.py that contains a demo program that uses cel-smt to analyze Google Cloud OIDC workload identity pools.

It supports two modes:

  • check_binding: As input takes a workload identity pool provider id, and a pool attribute value principal identifier. This will print a model of what (if any) OIDC token could assume that identity using the pool+provider.
  • verify_github_orgs: As an input takes a workload identity pool provider id that trusts the GitHub Actions OIDC issuer (https://token.actions.githubusercontent.com), and a list of trusted github organization names. If a token issued outside of those organizations could pass the attributeCondition check of the pool, it will print an example token and fail.

Structure/Design

The core programming model of CEL is a good fit for modeling with an SMT solver as it is fundementally expression based, does not allow side effects, has minimal control flow management functionality and is relatively simple. Further, with CEL's extensive public test suite, it is possible to verify that the transpiler implementation is correct.

The transpiler operators on Lark ASTs that are produced by the cel-python parser. These ASTs closely follow the official spec. cel-smt uses a Lark "Interpreter" to walk the parse tree and build up an equivilent expression for the SMT solver.

As CEL is dynamically typed, the expression produced by the transpiler must have the same z3 "sort" regardless of the contents. As such, CEL values are represented using the CEL datatype theory with a single sort CelVal that has separate constructors for the different types in CEL. A separate sort CelSimpleVal is used for the types of scalars that can be map keys - int, uint, bool and string.

CEL does not support exceptions, so instead errors are represented as a value in the CelVal datatype. These errors need to be correctly propogated across operator boundaries which is an important consideration when implementing operators/functions like == as these can return true/false/error which is important as all functions in the SMT solvers are total.

The operators and functions are implemented mainly as python functions that generate an expression, with a few that are implemented as smt recursive functions. These functions/operators generally follow the pattern of first switching on the input types, unwrapping the CelVal to get the underlying value, calling the z3 builtin and then re-wrapping the result.

Development

This project uses uv and pre-commit, so you will need them installed.

Ensure you have the submodules checked out.

When you first set up the repository, run uv run pre-commit install to get the pre-commit hooks set up.

To run the tests, use:

uv run pytest tests/ -v -n auto

To add new CEL functionality, the general process is to:

  • Identify a failing/skipped conformance test case, and isolate it with the pytest -k switch.
  • Implement the appropriate function/operation in ops.py, taking care to appropriately propogate errors.
  • Call that function in the transpiler at the appropriate place in the parse tree (this can be identified by looking at the traceback for the NotImplementedError)

Backends

Currently the z3 backend is recommended and the default.

A cvc5 backend is also available using the cvc5 pythonic API. You can try it by setting the environment variable CEL_SMT_BACKEND to cvc5. There are a few gaps in the cvc5 pythonic API relative to z3 that are polyfilled in the file backend.py. Generall CVC5 has slightly worse performance with the complex expressions cel-smt spits out which heavily use recrusive datatypes and recursive functions. Experimentally, the full unit test suite runs about 5 times faster with Z3 than CVC5.

Contribution

If you want to submit a PR for bug fixes or documentation, please read the CONTRIBUTING.md and follow the instruction beforehand.

License

cel-smt is released under the MIT License.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages