Title: Synthesizing Pareto-Optimal Interpretations for Black-Box ML Models
When: Thursday, 14 September 2023 at 1900 hrs (IST)
Abstract:
We present a new multi-objective optimization approach for
synthesizing interpretations that "explain" the behavior of
black-box machine learning models. Constructing
human-understandable interpretations for black-box models often
requires balancing conflicting objectives. A simple
interpretation may be easier to understand for humans while
being less precise in its predictions vis-a-vis a complex
interpretation. Existing methods for synthesizing
interpretations use a single objective function and are often
optimized for a single class of interpretations. In contrast,
we provide a more general and multi-objective synthesis
framework that allows users to choose (1) the class of
syntactic templates from which an interpretation should be
synthesized, and (2) quantitative measures on both the
correctness and explainability (or other suitable measure) of
an interpretation. For a given black-box, our approach yields a
set of Pareto-optimal interpretations with respect to the
correctness and explainability measures. We show that the
underlying multi-objective optimization problem can be solved
via a reduction to quantitative constraint solving, such as
weighted maximum satisfiability. To demonstrate the benefits of
our approach, we have applied it to synthesize interpretations
for black-box neural-network classifiers. Our experiments show
that there often exists a rich and varied set of choices for
interpretations that are missed by existing approaches.
This is joint work with Hazem Torfah, Shetal Shah, S. Akshay
and Sanjit Seshia.