Mechanizing the Metatheory of Sledgehammer - People

Mechanizing the Metatheory of Sledgehammer - People

Mechanizing the Metatheory of Sledgehammer Jasmin Christian Blanchette and Andrei Popescu Fakultät für Informatik, Technische Universität München, Ger...

216KB Sizes 0 Downloads 2 Views

Recommend Documents

Mechanizing the Metatheory of Sledgehammer - Andrei Popescu
Lemma: ∀x. ∃y. P(x, y). Proof (with Isabelle's structured proof language):. Fix x. Obtain z where Q(x, z) using 〈f

Mechanizing the Metatheory of Sledgehammer - FroCoS 2013
Lemma: ∀x. ∃y. P(x, y). Proof (with Isabelle's structured proof language):. Fix x. Obtain z where Q(x, z) using 〈f

Step 1 - Identify the librarian stereotype. Step 2 - List the characteristics that set you apart from that stereotype. S

Step 1 - Identify the librarian stereotype. Step 2 - List the characteristics that set you apart from that stereotype. S

Three Years of Experience with Sledgehammer, a Practical - People
Sledgehammer is a highly successful subsystem of Isabelle/HOL that calls ... Sledgehammer works well with Isar structure

The SledgeHammer digital lead management package contains hundreds of emails targeted at audiences by age range. Marketi

The Sledgehammer of Universalism - Eric Barger
The Sledgehammer of Universalism: “Few will be in Heaven but Billions Will Be in Hell”. By Dr. James De Young, Senio

mechanizing bakelite - Apyasa
Material consisting of overlapping layers of Kraft paper, impregnated with phenolic resin and thermally compressed to ob


Sledgehammer Data Sheet - ComfortClick
Page 1. Sledgehammer. Product # CC-S-1. Page 2. ✓. ✓. ✓. ✓. ✓. ✓

Mechanizing the Metatheory of Sledgehammer Jasmin Christian Blanchette and Andrei Popescu Fakultät für Informatik, Technische Universität München, Germany

Abstract. This paper presents an Isabelle/HOL formalization of recent research in automated reasoning: efficient encodings of sorts in unsorted first-order logic, as implemented in Isabelle’s Sledgehammer proof tool. The formalization provides the general-purpose machinery to reason about formulas and models, emulating the theory of institutions. Quantifiers are represented using a nominal-like approach designed for interpreting syntax in semantic domains.



Despite steady progress in the usability of proof assistants, paper proofs reign supreme in the automated reasoning community. Myreen and Davis’s verification of an ACL2like prover in HOL4 [17] and Harrison’s partial self-verification of HOL Light [13] are exceptions rather than the rule. Important metamathematical results have been formalized (e.g., Shankar’s Gödel proof [26]), but new research is still carried out almost exclusively on paper, with all the risks this entails. This paper presents a formalization in Isabelle/HOL [18] of the proofs for translations from many-sorted to unsorted first-order logic (FOL). Claessen et al. [10] designed lightweight encodings that eliminate much of the clutter associated with traditional schemes. Blanchette et al. [3, 4] introduced even lighter encodings in a sequel. Central to these new encodings is the notion of monotonicity. Informally, a sort is monotonic if its domain can be extended with new elements without compromising satisfiability. Nonmonotonic sorts can be made monotonic by introducing protector functions or predicates, and monotonic sorts can be merged into a single sort. Sorts are trivially monotonic in FOL without equality. The addition of interpreted equality makes it possible to encode upper cardinality bounds on the models, breaking monotonicity. Like other interesting semantic properties, monotonicity is undecidable but can often be inferred in practice. Monotonicity has many applications in theorem provers and model finders [5,10]. It is also roughly equivalent to smoothness, a criterion that arises when combining decision procedures in SMT solvers [28]. The Sledgehammer [19] proof tool for Isabelle/HOL relies on the monotonicitybased encodings to apply state-of-the-art unsorted provers to sorted problems. The tool translates interactive proof goals along with relevant lemmas and invokes the external automatic theorem provers to find proofs, which are reconstructed through Isabelle’s inference kernel. Early versions of Sledgehammer relied on unsound sort encodings; as a result, they would often find spurious, unreconstructable proofs, which annoyed users and could conceal sound proofs. Whereas Sledgehammer reconstructs the external proofs, tools such as Monotonox [10] and the fully-automatic competition version of Isabelle [27] do not perform such checks; soundness is crucial for them.

The mechanization of the sort encodings fully covers the correctness proofs from Claessen et al. [10] and the monomorphic half of its sequel [3, 4], as well as a theorem by Bouillaguet et al. [9]. This formalization work arose from a desire to provide more solid assurance to this recent research. Even if the intuition is clear, a paper proof offers many opportunities for flaws, especially because of the variety of encodings. The mechanization effort partly coincided with the development of the informal proofs [4]. The two proofs largely follow the same conventions, with one major difference: The core of the formal proof (Sections 3 to 5) assumes quantifier-free clausal normal form (CNF) rather than negation normal form (NNF). This reduces the exposure to name binders, which are notoriously difficult to reason about. The results are lifted to NNF using a clausification theorem (Sections 6 to 8). This organization is reminiscent of the architecture of automatic reasoners that combine a clausifier and a CNF core. Isabelle’s higher-order logic (HOL) might not be as expressive as set or type theory, but it can cope with the statements and proofs of classical metatheorems (as shown by Harrison and others [2, 12, 25]) and practical results. The proof assistant offers many conveniences; two features have been particularly useful:

• Locales [1,15] parameterize theories over constants and assumptions, with the usual benefits associated with modularity. Locales are particularly suited to expressing logic translations abstractly as in the theory of institutions [11].

• A framework for syntax with bindings [23,24] eases reasoning about quantified formulas. It lies at the intersection of first-order nominal approaches [21] and higherorder abstract syntax [20]. The framework is designed specifically for interpreting syntax in semantic domains. Locales have been part of Isabelle for many years and are widely used. The syntax with bindings is a newer addition; the current application is among the first case studies that feature it. The formal proofs are available online [6, 7]. Although sort encodings are the focus of this paper, our infrastructure is designed to be reusable for other applications of many-sorted FOL. Many important metatheories are awaiting formalization, such as the completeness of paramodulation and tableaux.


An Isabelle View of Logic Translations

The formalization covers a variety of translations, including not only the sort encodings but also clausification. The guiding principles, described below, originate from the theory of institutions; their Isabelle materialization relies on locales. Institutions. A logic L provides a category of signatures S ig and, for each signature Σ ∈ S ig, a set of sentences S en(Σ), a class of structures (interpretations) S tr(Σ), and a satisfaction relation Σ between structures and sentences. A signature morphism k : Σ → Σ0 is equipped with a forward sentence translation k : S en(Σ) → S en(Σ0 ) and a backward structure translation k : S tr(Σ0 ) → S tr(Σ). An institution is a logic whose signature morphisms enjoy the property that “truth is invariant under change of notation”: M 0 Σ0 k ϕ ←→ M 0 k Σ ϕ for all k : Σ → Σ0 , M 0 ∈ S tr(Σ0 ), and ϕ ∈ S en(Σ).

A translation of L -problems (sets of sentences) into L 0 -problems consists of a function $ between L ’s and L 0 ’s signature classes and, for each Σ ∈ dom($) and Σ-problem Φ, a sentence translation encΦ : S en(Σ) → S en(Σ$ ) and a set of axioms A x Φ ⊆ S en(Σ$ ). The translation of Φ is defined as enc Φ = {encΦ ϕ | ϕ ∈ Φ} ∪ A x Φ . Thus, L -problems are mapped to L 0 -problems by joining an elementwise translation and additional axioms. Given a class C of L -problems, the translation is sound w.r.t. C if satisfiability of Φ implies satisfiability of enc Φ for all Φ ∈ C , and complete if the converse holds. The institution literature focuses on “uniform” encodings. For these, the sentence translation depends only on Φ’s signature Σ, and there exists a backward translation dec : S tr(Σ$ ) → S tr(Σ) for which an inter-institution version of the institutional condition holds: M 0 Σ$ encΣ ϕ ←→ dec M 0 Σ ϕ. This condition implies completeness. The source logic L for all the translations considered in this paper is many-sorted FOL; the target logic L 0 is either many-sorted or unsorted FOL. Sentences are either CNF clauses or NNF formulas. Most of the translations are nonuniform. Isabelle. Isabelle/HOL is based on polymorphic HOL, which can be thought of as a fragment of Standard ML enriched with logical constructs and a proof system. Type variables are identified by a leading prime (e.g., 0a). The type σ → τ is interpreted as the set of (total) functions from σ to τ. Propositions are terms of type bool, and predicates are functions to bool. Function applications are written without parentheses (e.g., f x y) or in infix notation (e.g., x + y). Constants and variables can be functions. The type 0a list of finite lists over 0a is generated freely from the empty list [] and the infix constructor # : 0a → 0a list → 0a list. The notation [x1 , x2 , . . . , xn ] abbreviates x1 #(x2 #(· · ·#(xn #[]) · · · )). The higher-order constant map : ( 0a → 0b) → 0a list → 0b list applies a unary function to each element in a list, and set : 0a list → 0a set returns the set of elements in a list. Sets are written using traditional mathematical notation. Type parameters of polymorphic types are sometimes omitted (e.g., set for 0a set). Locales. Isabelle locales are a structuring mechanism provided on top of basic HOL. They fix types, constants, and assumptions, as in the following schematic examples: locale X = fixes 0a fixes c : σ 0a assumes P 0a,c locale Y = fixes 0b fixes d : τ 0b assumes Q 0b,d The definition of locale X fixes a type 0a and a constant c whose type σ 0a may depend on 0a, and states an assumption P 0 : bool over 0a and c. Lemmas proved within the locales a,c can rely on them. In general, a single locale can introduce several types, constants, and assumptions. The definition of X also produces a polymorphic locale predicate X = (λc. P 0a,c ). Seen from outside the locale, the lemmas proved in locale X are polymorphic in type variable 0a, universally quantified over variable c, and conditional on X c. Locales support inheritance, union, and embedding. To embed X into Y, one needs to indicate how an arbitrary instance of X can be regarded as an instance of Y, by providing, in the context of X, definitions of the types and constants of Y together with proofs of Y’s assumptions. The command sublocale

X < Y where 0b = υ and d = t

emits the goal Qυ,t , where υ and t may depend on types and constants from X. After the proof, all the lemmas proved in the Y become available in X, with υ and t in place of 0b and d. Homonymous constants d in X and Y are instantiated as d = d by default.

The sublocale relationship is sometimes abbreviated to X 0a, c < Yυ, t or X < Y. Locales provide a shallow realization of institutions in Isabelle. The institutional methodology serves as an inspiration and guidance to formulate results about specific logic translations in a consistent style. Given a logic L , its signatures S ig are captured by a locale L .Signature, which fixes Isabelle constants for the signature components (e.g., sorts and symbols) and defines a notion of sentence (e.g., clauses or formulas). A locale L .Problem extends L .Signature with a fixed set of sentences Φ. Structures M are represented by a locale L .Structure that also defines a notion of satisfaction. Finally, satisfiable problems are represented by a locale L .Model that joins L .Problem and L .Structure and further requires satisfaction between Φ and M . In this setting, translations between logics L and L 0 and their properties are captured via locale embedding mechanisms in four steps. S IG : Define $ as a sublocale relationship L .Signature < L 0 .Signature with suitable parameter instantiations reflecting the definition of Σ$ in terms of Σ. T RANS : Define encΦ inside L .Problem (where Σ and the Σ-problem Φ are fixed). S OUND : To prove soundness, define a Σ$ -structure M 0 inside L .Model (where the signature Σ, the Σ-problem Φ, and the structure M such that Φ Σ M are fixed) and show L .ModelM < L 0 .ModelM 0 . C OMPLETE : To prove completeness, define a locale Problem_Model0 = L .Problem + L 0.Model that joins a Σ-problem Φ and a Σ$ -model M 0 of enc Φ, define inside Problem_ Model0 a Σ-structure M , and show Problem_Model0M 0 < L .ModelM .


Clausal First-Order Logic

The terms, atoms, and literals of (quantifier-free) CNF are represented in HOL by MLstyle free datatypes, parameterized by types 0f and 0p of function and predicate symbols: datatype 0f

( 0f, 0p) atm = Pr tm list) | Eq ( 0f tm) ( 0f tm)

tm =


Var var | Fn 0f ( 0f tm list)

0p ( 0f

( 0f, 0p) lit = Pos | Neg (( 0f, 0p) atm)


(( 0f, 0p) atm)

The type var is countably infinite. An atom is either an applied predicate (e.g., p(t)) or equality (e.g., t ≈ u). A clause is a list of literals (interpreted disjunctively), and a problem is a set of clauses (interpreted conjunctively). Formally, ( 0f, 0p) clause = ( 0f, 0p) lit list and ( 0f, 0p) problem = ( 0f, 0p) clause set. The CNF representation involves no name binders, unlike (quantified) NNF (Section 6). Many-sorted signatures (for CNF and NNF) are captured by the following locale: locale Signature = fixes 0s and 0f and 0p fixes arity F : 0f → 0s list and res : 0f → 0s and arity P : 0p → 0s list assumes countable UNIV 0s and countable UNIV 0f and countable UNIV 0p

The locale is parameterized by types for sorts ( 0s), function symbols ( 0f ), and predicate symbols ( 0p), all required to be countable (i.e. finite or countably infinite). The locale attaches to each symbol a sort arity (arity F or arity P ) and, for functions, a result sort (res). The sort arity can be empty. Symbols cannot be overloaded. The polymorphic constant UNIV 0a : 0a set is predefined in Isabelle as the set of all values of type 0a.

The Signature locale defines an underspecified function sort : var → 0s that arbitrarily assigns sorts to variables. Whereas the formalization consistently refers to FOL’s sorts as types (in view of a possible extension to n-ary type constructors and polymorphism), in this paper they are more precisely called sorts. Wellsortedness and wellformedness of terms and the other syntactic categories are defined in the usual way. Wellformedness is a precondition to many operations, but such details are omitted here. The Problem locale joins a signature Σ and a CNF Σ-problem Φ. The Structure locale combines a signature, a universe 0u, and a triple of functions (int S , int F , int P ) that interpret sorts, function symbols, and predicate symbols: locale Problem = Signature 0s, 0f , 0p fixes Φ : ( 0f, 0p) problem

arity F res arity P +

locale Structure = Signature 0s, 0f , 0p arity F res arity P + fixes 0u fixes int S : 0s → 0u → bool and int F : 0f → 0u list → 0u and

int P : 0p → 0u list → bool A few wellformedness assumptions are made on the triple (int S , int F , int P ), such as inhabitation of all sorts (∀σ. ∃d. int S σ d). The Structure locale also defines the interpretation of terms and satisfaction of clauses. A related locale, Model, represents satisfiable CNF problems by combining a Problem and a Structure it satisfies.


Monotonicity and Its Inference

This section focuses on monotonicity in its own right; Section 5 discusses the associated sort encodings. To simplify the monotonicity arguments, both sections assume a fixed infinitely countable type ω as the universe 0u of structures, thus working implicitly with the instances Structure ω and Model ω . This limitation is lifted in Section 8 by appealing to the downward Löwenheim–Skolem theorem. Claessen et al. [10, §2] define monotonicity on single sorts. Blanchette et al. [3, §3] generalized the notion to sets of sorts S, making it more useful. The sorts S are collectively monotonic in the problem Φ if for all models M of Φ, there exists a model M 0 such that for all sorts σ, M 0 interprets σ by an infinite domain if σ ∈ S and by a domain of the same cardinality as in M otherwise. In the formalization, the Mono_Problem locale enriches Problem with a monotonicity assumption on all sorts, expressed using locale predicates:  ∃int S int F int P . Model arity F res arity P Φ int S int F int P −→ ∃int S int F int P . Infinite_Model arity F res arity P Φ int S int F int P The Infinite_Model locale is itself an enrichment of Model with the assumption that for each sort σ, the expression int S σ d is true for infinitely many elements d. First Criterion. Claessen et al. designed two syntactic criteria to infer monotonicity. The first one is defined as a predicate B that checks the absence of naked variables of a given sort σ in a clause c or a problem Φ: σ B c ←→ ∀x ∈ nv c. sort x 6= σ

σ B Φ ←→ ∀c ∈ Φ. σ B c

A naked variable is a variable that occurs directly on either side of a positive equality, such as X in the literal X ≈ f(Y). Formally: nv (Var x) = {x} nv (Fn f ts) = 0/

nv (Eq t1 t2 ) = nv t1 ∪ nv t2 nv (Pr p ts) = 0/

nv (Pos a) = nv a nv (Neg a) = 0/


with nv c = set (map nv c) for clauses. The criterion B soundly infers monotonicity. This is expressed as a sublocale inclusion Problem_Crit1 < Mono_Problem, where Problem_Crit1 enriches Problem with the assumption ∀σ. σ B Φ. The inclusion holds because a model of a problem whose sorts pass B can be extended into an infinite model by replicating elements. For each finite sort σ, the extended model contains infinitely many copies of some element pick σ, all interpreted as in the original model. Blanchette et al. strengthened the criterion by injecting “infinity knowledge”: Any sort that is interpreted by an infinite domain in all models is monotonic, regardless of naked variables [3, §3]. This aspect is part of the formalization but omitted here. Second Criterion. The improved criterion is parameterized by an assignment of a persort extension policy—which may be true, false, or copy—to each predicate symbol. In the model construction, the true-extended (resp. false-extended) predicates are interpreted as true (resp. false) for new domain elements of the given sort, whereas the copy-extended predicates are treated as in the simple criterion. Implementations can enumerate the possible policy combinations (e.g., using a SAT solver). In the formalization, the policies are supplied along with the problem as a curried function policy that maps pairs σ, p to T, F, or C. A function guard associates each variable x in need of protection with its guarding literal. The criterion is defined as σ I c ←→ ∀l x. l ∈ set c ∧ x ∈ nv l ∧ sort x = σ −→ isGuard x (guard c l x) σ I Φ ←→ ∀c ∈ Φ. σ I c where isGuard determines whether the given literal actually protects the variable x: isGuard x (Pos (Eq t1 t2 )) isGuard x (Neg (Eq t1 t2 )) isGuard x (Pos (Pr p ts)) isGuard x (Neg (Pr p ts))

←→ ←→ ←→ ←→

False W

2 i=1Sti

= Var x ∧ ∃ f ts. t3−i = Fn f ts x ∈ set (map nv ts) ∧ policy (sort x) p = T S x ∈ set (map nv ts) ∧ policy (sort x) p = F

The notion of naked variables is generalized to account for ill-polarized predicates: nv (Pos (Pr p ts)) = {x ∈ set (map nv ts) | policy (sort x) p = F} S nv (Neg (Pr p ts)) = {x ∈ set (map nv ts) | policy (sort x) p = T} S

Theorem 1. Let Φ be a Σ-problem and σ be a Σ-sort. (1) If σ B Φ, then σ I Φ for a copy-extended policy. (2) Given some extension policies, if σ I Φ for all Σ-sorts σ, then the set of all Σ-sorts is monotonic in Φ. This theorem is expressed in Isabelle as a pair of sublocale inclusions. The where clause below instantiates Problem_Policy_Crit2’s policy parameter with λσ p. C to enforce the copy policy for all sorts and predicate symbols: sublocale sublocale

Problem_Crit1 < Problem_Policy_Crit2 where policy = (λσ p. C) Problem_Policy_Crit2 < Mono_Problem


Sort Encodings

A naive, unsound way to translate a many-sorted FOL problem to unsorted FOL is to erase all the sorts and otherwise leave the problem unchanged. There are two main sound alternatives that encode the sort information. Sort tags are functions t σ (X) that directly associate a term X with its sort σ. Sort guards are predicates g σ (X) that check whether X has sort σ in the original problem. The formalized versions of these encodings follow the four steps sketched in Section 2. Full Erasure. Full sort erasure is unsound but complete. What makes it interesting is that it is sound for the class of monotonic problems. By way of composition, it lies at the heart of the tag- and guard-based encodings. The theory prefix U distinguishes unsorted entities from their many-sorted counterparts. S IG : The signature of the target unsorted problem has the same function and predicate symbols as the original signature but collapses the sorts into a single, implicit sort. T RANS : The translation function e is the identity except that it forgets the sorts. S OUND : For the soundness proof, a model of a monotonic problem is extended into a model that interprets all sorts infinitely, which in turn is transformed into an isomorphic “full” model that interprets all the sorts uniformly as λd. True (i.e., ∀σ. ∀d. int S σ d), from which it is easy to build an unsorted model for the e-translated problem: Mono_Model < Infinite_Model < Full_Model < U.Model The last step corresponds to Theorem 1 in Bouillaguet et al. [9] and, more approximately, to Lemma 1 in Claessen et al. [10]. Incidentally, the formalization revealed a flaw in Claessen et al.: Their main result holds, but not their Lemma 3.1 C OMPLETE : The locale Problem_UModel combines a many-sorted problem and an unsorted model with domain D of the problem’s e translation. The unsorted model can be regarded as a many-sorted model in which every sort is interpreted as D. Protector-Based Encodings. Claessen et al. observe that protectors, whether tags or guards, are not needed for terms with monotonic sorts. The sequel [3] advocates protecting only those variables that cause the monotonicity check to fail, to reduce clutter. Thus, for both tags and guards, three schemes are available: the traditional encoding, the lightweight version due to Claessen et al., and the “featherweight” version from the sequel. These are called et , et ?, and et ?? for tags and e g, e g ?, and e g ?? for guards. Consider the following fragment of a many-sorted problem, where S has sort st: S ≈ on ∨ S ≈ off 1

flip(S) 6≈ S

The flawed lemma states that whenever there exists a model M where a monotonic sort σ is interpreted with a given cardinality, there exists for any larger cardinality k a model where σ has cardinality k and the other sorts have the same cardinalities as in M . This proposition is invalid for k > ℵ0 because FOL problems can encode the constraint that there exists a bijection between two infinite, and hence monotonic, sorts σ and τ, making it impossible to increase σ’s cardinality without also increasing τ’s. This issue is independent of which of the two definitions of monotonicity is used. We discovered it at an early stage of the formalization as we were looking for a correct formulation of Löwenheim–Skolem for many-sorted FOL.

The traditional et encoding inserts tags around every subterm: t st (S) ≈ t st (on) ∨ t st (S) ≈ t st (off)

t st (flip(t st (S))) 6≈ t st (S)

Since the sort st is not monotonic (its only models have cardinality 2), the et ? encoding coincides with et . In contrast, the featherweight et ?? encoding tags only naked variables: t st (S) ≈ on ∨ t st (S) ≈ off

flip(S) 6≈ S

The et ??-encoded problem is complemented by typing axioms that repair mismatches between tagged and untagged occurrences of well-sorted terms: t st (on) ≈ on

t st (off) ≈ off

t st (flip(S)) ≈ flip(S)

For guards, the traditional and lightweight encodings e g and e g ? protect each variable: ¬ g st (S) ∨ S ≈ on ∨ S ≈ off

¬ g st (S) ∨ flip(S) 6≈ S

The featherweight encoding e g ?? guards only naked variables: ¬ g st (S) ∨ S ≈ on ∨ S ≈ off

flip(S) 6≈ S

The guard encodings are completed by the axioms g st (on), g st (off), and g st (flip(S)). General Encoding Procedure. The full sort erasure encoding e is part of a two-stage procedure to encode any many-sorted FOL problem into unsorted FOL. The first stage makes the problem monotonic by introducing protectors (tags or guards). This corresponds to a sound and complete encoding of many-sorted FOL into itself; the soundness proofs rely on the monotonicity criteria. The second stage merges all the sorts using e, which is sound and complete for monotonic problems. Tags and guards are formalized separately, but for a protector kind, the traditional, lightweight, and featherweight encodings are treated as instances of a single generalized encoding. Both generalized encodings are parameterized by a partition of sorts by level of protection, via disjoint predicates prot, protFw, unprot : 0s → bool indicating whether terms of a sort should be fully protected, protected in a featherweight fashion, or left unprotected. The last option is available only for sorts inferred monotonic by B. Tags. The tag encoding builds on a datatype of extended function symbols containing the old symbols as well as a tag for each sort: datatype

( 0f, 0s) efsym = Old 0f | Tag 0s

S IG : Signatures over the extended symbols treat the old function symbols as before. The new symbols Tag σ are unary operations of sort arity [σ] and result sort σ. T RANS : The encoding function is specified as follows:  Var x if unprot (sort x) t (Var x) = Fn (Tag (sort x)) [Var x] otherwise t (Fn f ts) = t0 (Fn f ts) t (Pos (Eq t1 t2 )) = Pos (Eq (t t1 ) (t t2 )) t (Neg (Eq t1 t2 )) = Neg (Eq (t0 t1 ) (t0 t2 )) t (Pos (Pr p ts)) = Pos (Pr p (map t0 ts)) t (Neg (Pr p ts)) = Neg (Pr p (map t0 ts))

t0 (Var x) = 0

t (Fn f ts) =

Fn (Tag (sort x)) [Var x] if prot (sort x) Var x otherwise

Fn (Tag (res f )) [Fn (Old f ) (map t0 ts)] if prot (res f ) Fn (Old f ) (map t0 ts) otherwise

The t function tags naked variables unless they are of an unprotected sort. The auxiliary function t0 adds tags only for fully protected sorts; it is invoked on all subterms except naked variables. The tag axioms A x Φ —needed to repair mismatches between tagged and untagged terms in the featherweight encoding et ??—have the form Pos (Eq (Fn (Tag (res f ) [t])) t), where t = Fn (Old f ) (map Var xs) and xs is a list of distinct variables of sorts arity F f , for all function symbols f such that protFw (res f ). The encoding of a problem is t Φ = {map t c | c ∈ Φ} ∪ A x Φ . S OUND : Given a model of the fixed problem Φ, a model of t Φ is obtained by extending it with interpreting tags as the identity functions. C OMPLETE : Completeness is more difficult. To convey a sense of the complexity, let us quote the informal proof, in which x stands for et ? or et ?? (et is analogous to et ?) and JΦKx denotes the x-encoding of the NNF problem Φ [4, §4.4]: A model of JΦKx is canonical if all tag functions t σ are interpreted as the identity. From a canonical model, we obtain a model of Φ by leaving out t σ . It then suffices to prove that whenever there exists a model M of JΦKx , there exists a canonical model M 0 . For et ?, values of a tagged type σ are systematically accessed through t σ . Hence, we can safely permute the entries of the function table of each t σ so that it is the identity for the values in its range. We then construct M 0 by removing the domain elements for which t σ is not the identity. It is a model by Lemma 4.13 [which states that substructures of NNF models are models if they preserve existential witnesses]. For et ??, the construction must take possibly nonmonotonic types into account. No permutation is necessary for these thanks to the typing axioms, which ensure that the tag functions are the identity for well-typed terms. For each σ 6B Φ, we remove the model elements for which t σ is not the identity. The typing axioms ensure that the substructure is well-defined: each tag function is the identity for at least one element and also for each element within the range of a non-tag function. The equations t σ (X) ≈ X generated for existential variables ensure that witnesses are preserved, as required by Lemma 4.13.

Relying on permutations is intuitive on paper, but in the proof assistant it is simpler to combine the permutation and the reduction to a canonical model:  eint F (Tag (res f )) [eint F (Old f ) (map2 q (arity F f ) as)] if prot (res f ) int F f as = eint F (Old f ) (map2 q (arity F f ) as) otherwise Here, eint F denotes the int F component of the fixed model of t Φ, and map2 applies a binary function elementwise on parallel lists. The auxiliary function q maps a sort σ

and an element d to d if either unprot σ or d is in the range of eint F (Tag σ); otherwise, it maps σ, d to eint F (Tag σ) d. The proof that the resulting structure is a model of the original problem Φ involves defining suitable back-and-forth functions between the two structures. Finally, proving monotonicity of t Φ is reduced to showing that the first criterion always succeeds on the translated problem: Problem Φ < Problem_Crit1 t Φ . Guards. The guard encoding requires extending the signature with guard predicates: datatype

( 0p, 0s) epsym = Old 0p | Guard 0s

Each symbol Guard σ has arity [σ] and contributes axioms to the translated problem. The soundness proof extends models of Φ into models of g Φ by interpreting the guard predicates as true everywhere. The completeness part is easier for guards than for tags. A canonical model is one where all guard predicates are interpreted as true everywhere. The proof handles the three levels of protection uniformly, reflecting the more uniform nature of e g ??—there are no counterparts to the “typing axioms that repair mismatches between tagged and untagged occurrences of well-sorted terms” of et ??. Monotonicity is proved using the second criterion, with the extension policy C for the predicates Old p and F for the distinguished symbols Guard σ. This is a departure from the informal proof, which inlines the model extension argument without appealing to the monotonicity criterion.


First-Order Logic with Quantifiers

This and the next two sections are concerned with lifting the results presented in the previous sections to negation normal form and structures with arbitrarily large domains. The locales for quantified FOL formulas in NNF are either the same or similar to those for CNF; the theory prefix Q is used for disambiguation (e.g., Q.Model). No cardinality assumption is made about the universe. Terms and atoms are as for CNF, but formulas can nest positive connectives and quantifiers arbitrarily. The following declaration gives an approximation of the syntactic category of formulas. The actual type identifies them modulo α-equivalence (variable renaming): ( 0s, 0f, 0p) fm = Pos | Conj (( 0s, 0f, 0p) fm) (( 0s, 0f, 0p) fm) | All 0s var (( 0s, 0f, 0p) fm) | 0 0 Neg (( f, p) atm) | Disj (( 0s, 0f, 0p) fm) (( 0s, 0f, 0p) fm) | Ex 0s var (( 0s, 0f, 0p) fm)


(( 0f, 0p) atm)

The proper formal management of binding syntax modulo α-equivalence is a topic of extensive research in λ-calculus and programming languages. FOL syntax poses similar challenges. In particular, substitution and its interplay with the semantics is difficult to handle rigorously; for example, a standard textbook [16] dedicates dozens of lemmas to these preliminaries, with rough proof sketches. Many of these refer to properties of any syntax with static bindings, falling under the scope of a general metatheory of syntax formalized by Popescu et al. [23, 24]. A prominent feature of this framework— distinguishing it from the more established Nominal Isabelle [14], based on nominal logic [21]—is that it is centered around the notion of substitution:

• The framework defines substitution, including parallel and unary variants, and provides a large collection of basic facts about the interaction of substitution with free variables and the other operators.

• It provides a recursor for defining operators that are directly compositional with substitution. (In contrast, the nominal logic recursor targets compositionality with permutations, a less useful concept.) This unconventional focus is appropriate: Substitution is without doubt the central syntactic operator in logics and type systems. Another main feature is the facilitation of semantic interpretation of syntax, which is problematic in frameworks optimized for manipulating finitary syntax. For example, Pitts encounters “a really nontrivial freshness condition on binders” [22, §6.3] he needs to discharge in the context of applying the nominal recursor to interpret the λ-calculus in a semantic domain. This feature is illustrated below for interpreting FOL syntax. The framework requires the user to provide semantic domains—for FOL, types T , A , and F for interpreting terms, atoms, and formulas—as well as first-order operations corresponding to the non-binding constructors other than for variables (e.g., FN : 0f → T list → T ) and second-order operations corresponding to the binders: ALL : 0s → (T → F ) → F and EX : 0s → (T → F ) → F . In exchange, the framework produces the functions int Tm : tm → (var → T ) → T , int At : atm → (var → T ) → A , and int Fm : fm → (var → T ) → F that interpret syntax in the semantic domains. They map variables according to a valuation ξ. They map the action of non-binding constructors to that of the corresponding semantic operators, and similarly for binding constructors but in a valuation-sensitive way. For example: int Tm (Var x) ξ int Tm (Fn f ts) ξ int At (Eq t1 t2 ) ξ int Fm (Conj ϕ1 ϕ2 ) ξ int Fm (All σ x ϕ) ξ

= = = = =

ξx FN f (map (λt. int Tm t ξ) ts) EQ (int Tm t1 ξ) (int Tm t2 ξ) CONJ (int Fm ϕ1 ξ) (int Fm ϕ2 ξ) ALL σ (λd. int Fm ϕ ξ[x 7→ d])

where ξ[x 7→ d] denotes the function that maps x to d and otherwise coincides with ξ. So far, this looks like the standard interpretation of binding syntax in a semantic domain, except that here the recursive definition is modulo α-equivalence (which is a priori difficult to achieve in a proof assistant). The framework also derives compositionality of substitution w.r.t. valuation update and obliviousness of the interpretation w.r.t. fresh variables in a systematic, FOL-agnostic way: int Fm ϕ[t/x] ξ = int Fm ϕ ξ[x 7→ int Tm t ξ] int Fm ϕ ξ = int Fm ϕ ξ0 if ξ and ξ0 differ only on variables fresh for ϕ In the first equation, ϕ[t/x] denotes capture-free substitution of t for x in ϕ. A many-sorted structure (int S , int F , int P ) can be organized as a semantic domain by taking T = ω, A = F = bool, FN = int F , EQ = (=), CONJ = (∧), ALL σ P = (∀d. int S σ d −→ P d), and so on. This yields the recursive equations JVar xKξ = ξ x JFn f tsKξ = int F f (map (λt. JtKξ ) ts) ξ Eq t1 t2 ←→ Jt1 Kξ = Jt2 Kξ ξ Conj ϕ1 ϕ2 ←→ ξ ϕ1 ∧ ξ ϕ2 ξ All σ x ϕ ←→ ∀d. int S σ d −→ ξ[x 7→ d] ϕ

which characterize term interpretation (with JtKξ = int Tm t ξ), atom satisfaction (ξ a = int At a ξ), and formula satisfaction (ξ ϕ = int Fm ϕ ξ). These functions are defined in the Q.Structure locale. The framework also produces the substitution lemma ξ ϕ[t/x] ←→ ξ[x 7→ JtKξ ] ϕ. In the next section, the notations  ϕ and  Φ abbreviate ∀ξ. ξ ϕ and ∀ϕ ∈ Φ.  ϕ. The structure can also be made explicit—e.g., (int S , int F , int P ) ξ ϕ. If the orientation toward substitution is the main strength of the framework, its main weakness is the lack of automation. For each desired binding syntax type, users must currently instantiate the general theorems manually, much like mathematicians do routinely when applying universal algebra to groups or rings. The instantiation is tedious due to the large number of theorems. Despite the availability of “template files,” this process can take days and thousands of lines of proof text. Automation in the form of a definitional package, which would provide the basic convenience expected by users of Nominal Isabelle (while supporting substitution natively), remains for future work.


Classical Metatheorems

The lifting argument from countable CNF structures to unbounded NNF structures (Section 8) relies on clausification and Löwenheim–Skolem for many-sorted FOL with equality. Earlier formalizations focus on unsorted FOL without equality [2,12,25]. Sorts and equality are tedious to formalize, and they often fail to reward the formalizer with deep logical insight, but they are central to monotonicity and sort encodings. Clausification. The translation of a finite quantified problem into clausal form involves skolemizing all the existentially quantified variables into function symbols that take the universally quantified variables in scope as arguments. Skolemization is surprisingly difficult to treat formally; for example, Harrison [12] claims that it poses greater challenges than completeness. On the positive side, clausification can be seen as an instance of the general semantic interpretation principle introduced in Section 6. The definition of clausification and its soundness and completeness proof follow the four-step institutional approach. S IG : Skolemization introduces new function symbols Sko σs x, built from a list of sorts σs (specifying the arity) and a variable name x, while preserving the sorts of Σ-symbols: datatype 0f

efsym = Old 0f | Sko ( 0s list) var

T RANS : The clausification function cls takes a Σ-formula ϕ, an environment ρ : var → tm, a list of universal variables vs, and a set of fresh variables V as arguments. In addition to massaging the connectives, it replaces existential variables by new symbols that depend on vs, replaces bound universal variables by fresh variables from V, and substitutes free variables according to ρ to produce a Σ0 -clause. The characteristic equations for cls are obtained by instantiating the semantic interpretation principle with T = tm, A = atm, and F = var list → var set → fm, taking suitable operators on these domains, and letting cls be int Fm . The interesting cases are cls (All σ x ϕ) ρ vs V = cls ϕ ρ[x 7→ Var v] (v # vs) (V \ {v}) cls (Ex σ x ϕ) ρ vs V = cls ϕ ρ[x 7→ Fn f (map Var vs)] vs (V \ {v})

where v ∈ V is some variable of sort σ and f = Sko (map sort vs) v is the Skolem function symbol, which is applied to the universal variables vs. For closed formulas, clausification is defined as clausify ϕ = cls ϕ ρ [] UNIV for some irrelevant choice of ρ. As a simple example, let ϕ = All σ x (Ex τ y (Eq (Var x) (Var y))), let v1 , v2 be the variables picked from UNIV and UNIV \ {v1 }, and let f = Sko [σ] v2 . Then = = = =

clausify ϕ cls ϕ ρ [] UNIV cls (Ex τ y (Eq (Var x) (Var y))) ρ[x 7→ Var v1 ] [v1 ] (UNIV \ {v1 }) cls (Eq (Var x) (Var y)) ρ[x 7→ Var v1 , y 7→ Fn f [Var v1 ]] [v1 ] (UNIV \ {v1 , v2 }) Eq (Var v1 ) (Fn f [Var v1 ])

S OUND : Soundness is proved in the Structure locale, which fixes a Σ-structure (int S , int F , int P ). The “Skolem model” predicate skmod ϕ ρ vs V eint F eint 0F transforms, for each valuation ξ : var → 0u, an extended structure eint F such that ξ cls ϕ ρ vs V into an extended structure eint 0F such that ξ  ρ ϕ, where  composes valuations with environments. The introduction rules of skmod emulate cls’s equations; for example, skmod ϕ ρ[x 7→ Fn f (map Var vs)] vs (V \ {v}) eint F [ f 7→ F] eint 0F skmod (Ex σ x ϕ) ρ vs V eint F eint 0F where v ∈ V and F : 0u list → 0u is a suitable interpretation for the Skolem symbol f , defined so that F us gives an arbitrary u such that (int S , int F , int P ) ξ  ρ[x 7→ u] ϕ, where ξ maps vs to us elementwise. The skmod relation is total on the last argument. For closed formulas ϕ such that (int S , int F , int P )  ϕ, starting with an extension eint F of int F , skmod yields eint 0F such that (int S , eint 0F , int P )  clausify ϕ. Thus, if ϕ has a model, then clausify ϕ also has a model. V V For problems, we define clausify Φ = clausify ( Φ), where Φ is the conjunction of all formulas in Φ, which must be finite. The locale Q.Model fixes Φ and a model, V which is also a model of the formula Φ. By soundness of clausify on closed formulas, this yields a model of clausify Φ. C OMPLETE : For completeness, it suffices to show that the backward structure translation of a model of clausify Φ is a model of Φ. This is straightforward. Löwenheim–Skolem. The proof of the downward Löwenheim–Skolem theorem is based on a formalization of a complete inference system, described in a separate paper [8]. In the Q.Model locale, which fixes a problem and model, it constructs a syntactic Henkin model. Since this model has a countable universe, there exists an isomorphic copy on ω (the countably infinite universe fixed throughout Sections 4 and 5). This yields Q.Model 0u < Q.Model ω . Using the obvious sound and complete embedding embed of CNF problems into NNF problems, it is possible to transfer the Löwenheim–Skolem theorem to CNF: Model 0u, Φ < Q.Model 0u, embed Φ < Q.Model ω, embed Φ < Model ω, Φ To summarize the results of this section: Theorem 2. An NNF problem Φ has a model iff clausify Φ has a model. Theorem 3. An NNF problem has a model iff it has a countable model.


Lifting to Arbitrary Structures and Formulas with Binders

The focus on clausal form and countable structures is a useful simplification, but it is not faithful to the NNF-based paper proof [3] (or to the implementation in Sledgehammer). Thanks to a lifting argument that relies on clausification and Löwenheim–Skolem, the final results are free of such restrictions. Figure 1 shows how the results are connected. Starting at the top with a satisfiable quantified problem Φ, the problem is first clausified, then by Löwenheim–Skolem it is countably satisfiable (by taking 0u = ω). On the left-hand side, the clausified problem is further encoded using tags or guards (x ∈ {t, g}) and shown to pass one of the monotonicity criteria (i = 1 for t and 2 for g), meaning it is monotonic. On the right-hand side, the encoded problem is satisfiable. Merging the two branches yields a monotonic satisfiable problem, whose erasure is a satisfiable unsorted problem. Since every translation step is also shown complete, the right-hand side can also be traversed bottom-up, producing a model of the original problem from a model of the translated unsorted problem. The overall translation is thus sound and complete. Theorem 4. Given x ∈ {t, g} and a finite many-sorted NNF problem Φ, let Φ0 be the unsorted CNF problem e (x (clausify Φ)), i.e., the sort-erased x-translated clausified Φ. (1) For each model M of Φ ( forming together with Φ an instance of Q.Model Φ ), there exists a model M 0 of Φ0 ( forming together with Φ0 an instance of U.Model Φ0 ). (2) Conversely, for every model of Φ0 , there exists a model of Φ. The formal proof puts together many constructions and results of independent interest, notably soundness of the monotonicity criteria (Theorem 1), soundness and completeness of clausification (Theorem 2), and downward Löwenheim–Skolem (Theorem 3). Q.Model 0u, Φ Model 0u, clausify Φ Model ω, clausify Φ Problem ω, clausify Φ Problem_Criti ω, x (clausify Φ)

Model ω, x (clausify Φ)

Mono_Problem ω, x (clausify Φ) Mono_Model ω, x (clausify Φ) U.Model ω, e (x (clausify Φ)) U.Model 0u, e (x (clausify Φ)) Figure 1. The verified translation pipeline



This paper describes a framework and a methodology for formalizing applications of many-sorted first-order logic while acting as a companion to recent papers on sort encodings [3, 9, 10]. To readers from the proof assistant community, it also provides a contribution to the ongoing binder representation debate. And to readers rooted in algebraic methods, it shows a practical application of the theory of institutions in a context where the translation functions cannot be assumed to be uniform. The formalization widely reaffirmed already proved results. On one occasion, it revealed a flaw in a published lemma (Lemma 3 of Claessen et al. [10]). It also helped detect mistakes in a subsequent paper proof [4] before it reached any readers. The work provided the opportunity to rethink the proof; for example, the generalized monotonicity concept, in terms of sets of sorts, arose during the formalization. A potential practical benefit of this work is connected to step-by-step proof reconstruction. Although the encodings are sound, the inferences in a machine-generated proof may violate the sort discipline, resulting in failures in Sledgehammer’s proof replay. In future work, we want to investigate the feasibility of connecting the soundness proofs of the encodings with a verified checker for unsorted FOL proofs. The advantages of machine-checked metatheory are well known from programming language research, where papers are often accompanied by formal developments and proof assistants have made it into the classroom. Paradoxically, in the automated reasoning community, we have not been very enthusiastic about formalizing our own results. This paper reported on some steps we have taken to address this. Acknowledgement. We thank Tobias Nipkow for making this work possible. Jesper Bengtson, Nicholas Smallbone, Mark Summerfield, Dmitriy Traytel, and several anonymous reviewers suggested improvements to earlier versions of this paper. The research was supported by the Deutsche Forschungsgemeinschaft (DFG) projects Security Type Systems and Deduction (grant Ni 491/13-1), part of the program Reliably Secure Software Systems (RS3, Priority Program 1496), and Hardening the Hammer (grant Ni 491/ 14-1). The authors are listed in alphabetical order.

References [1] Ballarin, C.: Locales: A module system for mathematical theories. J. Autom. Reasoning, to appear [2] Berghofer, S.: First-order logic according to Fitting. In: Klein, G., Nipkow, T., Paulson, L. (eds.) Archive of Formal Proofs. (2007) [3] Blanchette, J.C., Böhme, S., Popescu, A., Smallbone, N.: Encoding monomorphic and polymorphic types. In: Piterman, N., Smolka, S. (eds.) TACAS 2013. LNCS, vol. 7795, pp. 493–507. Springer (2013) [4] Blanchette, J.C., Böhme, S., Popescu, A., Smallbone, N.: Encoding monomorphic and polymorphic types. Tech. report associated with TACAS 2013 paper [3], http://www21. (2013) [5] Blanchette, J.C., Krauss, A.: Monotonicity inference for higher-order formulas. J. Autom. Reasoning 47(4), 369–398 (2011)

[6] Blanchette, J.C., Popescu, A.: Formal development associated with this paper. http:// (2013) [7] Blanchette, J.C., Popescu, A.: Sound and complete sort encodings for first-order logic. In: Klein, G., Nipkow, T., Paulson, L. (eds.) Archive of Formal Proofs. http://afp. (2013) [8] Blanchette, J.C., Popescu, A., Traytel, D.: Coinductive pearl: Modular first-order logic completeness, submitted, [9] Bouillaguet, C., Kuncak, V., Wies, T., Zee, K., Rinard, M.: Using first-order theorem provers in the Jahob data structure verification system. In: Cook, B., Podelski, A. (eds.) VMCAI 2007. LNCS, vol. 4349, pp. 74–88. Springer (2007) [10] Claessen, K., Lillieström, A., Smallbone, N.: Sort it out with monotonicity—Translating between many-sorted and unsorted first-order logic. In: Bjørner, N., Sofronie-Stokkermans, V. (eds.) CADE-23. LNAI, vol. 6803, pp. 207–221. Springer (2011) [11] Goguen, J.A., Burstall, R.M.: Institutions: Abstract model theory for specification and programming. J. ACM 39(1), 95–146 (1992) [12] Harrison, J.: Formalizing basic first order model theory. In: Grundy, J., Newey, M. (eds.) TPHOLs ’98. LNCS, vol. 1479, pp. 153–170. Springer (1998) [13] Harrison, J.: Towards self-verification of HOL Light. In: Furbach, U., Shankar, N. (eds.) IJCAR 2006. LNCS, vol. 4130, pp. 177–191. Springer (2006) [14] Huffman, B., Urban, C.: Proof pearl: A new foundation for Nominal Isabelle. In: Kaufmann, M., Paulson, L.C. (eds.) ITP 2010. LNCS, vol. 6172, pp. 35–50. Springer (2010) [15] Kammüller, F., Wenzel, M., Paulson, L.C.: Locales—A sectioning concept for Isabelle. In: Bertot, Y., Dowek, G., Hirschowitz, A., Paulin, C., Théry, L. (eds.) TPHOLs ’99. LNCS, vol. 1690, pp. 149–166. Springer (1999) [16] Monk, J.D.: Mathematical Logic. Springer (1976) [17] Myreen, M.O., Davis, J.: A verified runtime for a verified theorem prover. In: van Eekelen, M., Geuvers, H., Schmaltz, J., Wiedijk, F. (eds.) ITP 2011. LNCS, vol. 6898, pp. 265–280. Springer (2011) [18] Nipkow, T., Paulson, L.C., Wenzel, M.: Isabelle/HOL: A Proof Assistant for Higher-Order Logic, LNCS, vol. 2283. Springer (2002) [19] Paulson, L.C., Blanchette, J.C.: Three years of experience with Sledgehammer, a practical link between automatic and interactive theorem provers. In: Sutcliffe, G., Ternovska, E., Schulz, S. (eds.) IWIL-2010 (2010) [20] Pfenning, F., Elliott, C.: Higher-order abstract syntax. In: Wexelblat, R.L. (ed.) PLDI ’88. pp. 199–208. ACM (1988) [21] Pitts, A.M.: Nominal logic, a first order theory of names and binding. Inf. Comput. 186(2), 165–193 (2003) [22] Pitts, A.M.: Alpha-structural recursion and induction. J. ACM 53(3), 459–506 (2006) [23] Popescu, A., Gunter, E.L.: Recursion principles for syntax with bindings and substitution. In: Chakravarty, M.M.T., Hu, Z., Danvy, O. (eds.) ICFP 2011. pp. 346–358. ACM (2011) [24] Popescu, A., Gunter, E.L., Osborn, C.J.: Strong normalization of System F by HOAS on top of FOAS. In: LICS 2010. pp. 31–40. IEEE (2010) [25] Ridge, T., Margetson, J.: A mechanically verified, sound and complete theorem prover for first order logic. In: Hurd, J., Melham, T.F. (eds.) TPHOLs 2005. LNCS, vol. 3603, pp. 294–309. Springer (2005) [26] Shankar, N.: Metamathematics, Machines, and Gödel’s Proof, Cambridge Tracts in Theoretical Computer Science, vol. 38. Cambridge University Press (1994) [27] Sutcliffe, G.: The 6th IJCAR automated theorem proving system competition—CASC-J6. AI Comm. 26(2), 211–223 (2013) [28] Tinelli, C., Zarba, C.G.: Combining decision procedures for sorted theories. In: Alferes, J., Leite, J. (eds.) JELIA 2004. LNCS, vol. 3229, pp. 641–653. Springer (2004)