Grace C. Kim

What's Your Stake in Sustainability of AI?: An Informed Insider's Guide

AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2024

Grace C. Kim
Annabel Rothschild
Carl DiSalvo
Betsy DiSalvo

Abstract

As large language models (LLMs) see wider real-world use, understanding and mitigating their unsafe behaviors is critical. Interpretation techniques can reveal causes of unsafe outputs and guide safety, but such connections with safety are often overlooked in prior surveys. We present the first survey that bridges this gap, introducing a unified framework that connects safety-focused interpretation methods, the safety enhancements they inform, and the tools that operationalize them. Our novel taxonomy, organized by LLM workflow stages, summarizes nearly 70 works at their intersections. We conclude with open challenges and future directions. This timely survey helps researchers and practitioners navigate key advancements for safer, more interpretable LLMs.

BibTeX

			
@inbook{kim2025stake,
  author    = {Grace C. Kim and Annabel Rothschild and Carl DiSalvo and Betsy DiSalvo},
  title     = {What's Your Stake in Sustainability of {AI}?: An Informed Insider's Guide},
  booktitle = {Proceedings of the 2024 AAAI/ACM Conference on AI, Ethics, and Society},
  year      = {2025},
  publisher = {AAAI Press},
  pages     = {738--750},
  numpages  = {13},
  doi       = {10.5555/3716662.3716726}
}