publications


published works

Focus On This, Not That! Steering LLMs with Adaptive Feature Specification (ICML 2025)
Lamb, T., Davies, A., Paren, A., Torr, P., Pinto, F.
webpage paper poster slides code

Social Science Is Necessary for Operationalizing Socially Responsible Foundation Models (HAIC @ ICLR 2025)
Davies, A., Nguyen, E., Simeone, M., Johnston, E., & Gubri, M.
paper poster

Can LLMs Reliably Simulate Human Learner Actions? A Simulation Authoring Framework for Open-Ended Learning Environments (AAAI 2025)
Mannekote, A., Davies, A., Kang, J., & Boyer, K. E.
paper code

Hidden in Plain Sight: Evaluating Abstract Shape Recognition in Vision-Language Models (NeurIPS 2024)
Hemmat, A., Davies, A., Lamb, T., Yuan, J., Torr, P., Khakzar, A., & Pinto, F.
webpage paper poster dataset code

Measuring the Reliability of Causal Probing Methods: Tradeoffs, Limitations, and the Plight of Nullifying Interventions (IAI @ NeurIPS 2024, Oral)
Canby, M.*, Davies, A.*, Rastogi, C., & Hockenmaier, J.
preprint (updated 2025) IAI 2024 paper IAI 2024 slides IAI 2024 poster
(* denotes equal contribution.)

Competence-Based Analysis of Language Models (IAI @ NeurIPS 2024)
Davies, A., Jiang, J., & Zhai, C.
paper poster

Large Language Models for Whole-Learner Support: Opportunities and Challenges (Frontiers in AI, 2024)
Mannekote, A., Davies, A., Pinto, J. D., Zhang, S., Olds, D., Schroeder, N. L., … & Zhai, C.
paper

Not Just Pretty Pictures: Toward Interventional Data Augmentation Using Text-to-Image Generators (ICML 2024)
Yuan, J.*, Pinto, F.*, Davies, A.*, & Torr, P.
paper webpage poster video code
(* denotes equal contribution.)

Understanding the social construction of juvenile delinquency: insights from semantic analysis of big-data historical newspaper collections (Journal of Computational Social Science, 2024)
Zhang, Y.*, Davies, A.*, & Zhai, C.
paper
(* denotes equal contribution.)

Toward a Big Data Analysis System for Historical Newspaper Collections Research (PASC 2022)
Satheesan, S. P., Bhavya, Davies, A., Craig, A. B., Zhang, Y., & Zhai, C.
paper slides video

preprints

Evaluating and Designing Sparse Autoencoders by Approximating Quasi-Orthogonality (Accepted at COLM 2025)
Lee, S., Davies, A., Canby, M., Hockenmaier, J.
preprint

The Cognitive Revolution in Interpretability: From Explaining Behavior to Interpreting Representations and Algorithms (arXiv preprint, in review)
Davies, A., & Khakzar, A.
preprint