What's Your Stake in Sustainability of AI?: An Informed Insider's Guide
Abstract
As large language models (LLMs) see wider real-world use, understanding and mitigating their unsafe behaviors is critical. Interpretation techniques can reveal causes of unsafe outputs and guide safety, but such connections with safety are often overlooked in prior surveys. We present the first survey that bridges this gap, introducing a unified framework that connects safety-focused interpretation methods, the safety enhancements they inform, and the tools that operationalize them. Our novel taxonomy, organized by LLM workflow stages, summarizes nearly 70 works at their intersections. We conclude with open challenges and future directions. This timely survey helps researchers and practitioners navigate key advancements for safer, more interpretable LLMs.
BibTeX
@inbook{kim2025stake,
author = {Grace C. Kim and Annabel Rothschild and Carl DiSalvo and Betsy DiSalvo},
title = {What's Your Stake in Sustainability of {AI}?: An Informed Insider's Guide},
booktitle = {Proceedings of the 2024 AAAI/ACM Conference on AI, Ethics, and Society},
year = {2025},
publisher = {AAAI Press},
pages = {738--750},
numpages = {13},
doi = {10.5555/3716662.3716726}
}