parse boolean expressions semantically. Intended for semantic equivalence and SMT solving.
Find a file
2026-05-31 18:26:09 +02:00
tests Implemented L1 fully, fixed bugs, implemented testing 2026-05-31 18:26:09 +02:00
.gitignore Implemented L1 fully, fixed bugs, implemented testing 2026-05-31 18:26:09 +02:00
_ast.py ran llm code, minor changes 2026-05-20 23:32:54 +02:00
canonicalize.py Implemented L1 fully, fixed bugs, implemented testing 2026-05-31 18:26:09 +02:00
LICENSE added license 2026-05-23 23:30:43 +02:00
models.py Implemented L1 fully, fixed bugs, implemented testing 2026-05-31 18:26:09 +02:00
README.md Implemented L1 fully, fixed bugs, implemented testing 2026-05-31 18:26:09 +02:00
regions.py Implemented L1 fully, fixed bugs, implemented testing 2026-05-31 18:26:09 +02:00
requirements-dev.txt Implemented L1 fully, fixed bugs, implemented testing 2026-05-31 18:26:09 +02:00

semparse

A semantic analysis and JIT compilation frontend for predicate expressions over numerical and categorical data.

What it does

Semantic domain

  • Parse and canonicalize boolean/comparison expressions
  • Find semantic equivalence between expressions
  • SMT-backed satisfiability solving — find the input regions where an expression is true or false
  • User-facing diagnostics: "this criterion can never return True", "this term is redundant"

JIT numerics domain (in progress)

  • Compile predicate expressions to high-performance native code
  • Explicit SIMD vectorization for evaluating boolean criteria over large data vectors
  • Target backend: Cranelift

Use cases

  • Columnar data filtering — detect redundant or equivalent filters in a Pandas/Polars/Arrow pipeline; JIT-compile predicates for fast column evaluation
  • Monitoring and alerting — find alert rules that can never fire, or alerts fully subsumed by a broader condition (Prometheus/Grafana-style expressions)
  • Scientific selection criteria — semantic analysis of experiment cuts in HEP or simulation pipelines (particle/event selection)
  • Business rules and policy — detect contradictory, redundant, or unreachable rules in eligibility criteria, fraud detection, or policy-as-code systems
  • Weak supervision / labeling functions — identify semantically equivalent labeling functions in Snorkel-style ML pipelines

Status

Core boolean canonicalization (Level 1) is implemented. SMT-backed region extraction and JIT compilation are in active development. Progress is measured by the class of expressions handled in each domain.

Design

Expression classes are handled incrementally:

  • Level 1: Boolean operators, comparison operators, abs/min/max
  • Level 1+: Constant folding, membership operators (in, not in)
  • Level 2: Arithmetic operators (+, -, *, /, //, %, **)
  • Level 3: Conditional expressions

SMT solving is backed by Z3.