IARCS Verification Seminar Series



Title: Synthesizing Pareto-Optimal Interpretations for Black-Box ML Models

Speaker: Supratik Chakraborty    (bio) (bio)


Supratik Chakraborty is Bajaj Group Chair Professor in the Department of Computer Science and Engineering at IIT Bombay. His current research interests include constrained sampling and counting, automated synthesis, formal verification, automata theory and logic. He is particularly interested in the development of scalable algorithmic techniques with strong guarantees for reasoning about different computational models. Supratik is a Distinguished Member of ACM, a Fellow of Indian National Academy of Engineering and a Distinguished Alumnus Awardee of IIT Kharagpur.


When: Thursday, 14 September 2023 at 1900 hrs (IST)   Slides  Video  

Abstract:
We present a new multi-objective optimization approach for synthesizing interpretations that "explain" the behavior of black-box machine learning models. Constructing human-understandable interpretations for black-box models often requires balancing conflicting objectives. A simple interpretation may be easier to understand for humans while being less precise in its predictions vis-a-vis a complex interpretation. Existing methods for synthesizing interpretations use a single objective function and are often optimized for a single class of interpretations. In contrast, we provide a more general and multi-objective synthesis framework that allows users to choose (1) the class of syntactic templates from which an interpretation should be synthesized, and (2) quantitative measures on both the correctness and explainability (or other suitable measure) of an interpretation. For a given black-box, our approach yields a set of Pareto-optimal interpretations with respect to the correctness and explainability measures. We show that the underlying multi-objective optimization problem can be solved via a reduction to quantitative constraint solving, such as weighted maximum satisfiability. To demonstrate the benefits of our approach, we have applied it to synthesize interpretations for black-box neural-network classifiers. Our experiments show that there often exists a rich and varied set of choices for interpretations that are missed by existing approaches.

This is joint work with Hazem Torfah, Shetal Shah, S. Akshay and Sanjit Seshia.