publications


How Reliable are Causal Probing Interventions? (Accepted for publication at AACL 2025)
Canby, M.*, Davies, A.*, Rastogi, C., & Hockenmaier, J.
preprint
(* denotes equal contribution.)
Note: this work was originally presented at: IAI @ NeurIPS 2024 (Oral)
slides (IAI 2024 oral) poster (IAI 2024)

Evaluating and Designing Sparse Autoencoders by Approximating Quasi-Orthogonality (COLM 2025)
Lee, S., Davies, A., Canby, M., Hockenmaier, J.
paper

Do Role-Playing Agents Practice What They Preach? Belief-Behavior Alignment in LLM-Based Simulations of Human Trust (SocialSim @ COLM 2025)
Mannekote, A., Davies, A., Li, G., Boyer, K. E., Zhai, C., Dorr, B. J., & Pinto, F.
paper

Focus On This, Not That! Steering LLMs with Adaptive Feature Specification (ICML 2025)
Lamb, T., Davies, A., Paren, A., Torr, P., Pinto, F.
webpage paper poster slides code

Social Science Is Necessary for Operationalizing Socially Responsible Foundation Models (HAIC @ ICLR 2025)
Davies, A., Nguyen, E., Simeone, M., Johnston, E., & Gubri, M.
paper poster

Can LLMs Reliably Simulate Human Learner Actions? A Simulation Authoring Framework for Open-Ended Learning Environments (AAAI 2025)
Mannekote, A., Davies, A., Kang, J., & Boyer, K. E.
paper code

Hidden in Plain Sight: Evaluating Abstract Shape Recognition in Vision-Language Models (NeurIPS 2024)
Hemmat, A., Davies, A., Lamb, T., Yuan, J., Torr, P., Khakzar, A., & Pinto, F.
webpage paper poster dataset code

Competence-Based Analysis of Language Models (IAI @ NeurIPS 2024)
Davies, A., Jiang, J., & Zhai, C.
paper poster

Large Language Models for Whole-Learner Support: Opportunities and Challenges (Frontiers in AI, 2024)
Mannekote, A., Davies, A., Pinto, J. D., Zhang, S., Olds, D., Schroeder, N. L., … & Zhai, C.
paper

Not Just Pretty Pictures: Toward Interventional Data Augmentation Using Text-to-Image Generators (ICML 2024)
Yuan, J.*, Pinto, F.*, Davies, A.*, & Torr, P.
paper webpage poster video code
(* denotes equal contribution.)

Understanding the social construction of juvenile delinquency: insights from semantic analysis of big-data historical newspaper collections (Journal of Computational Social Science, 2024)
Zhang, Y.*, Davies, A.*, & Zhai, C.
paper
(* denotes equal contribution.)

Toward a Big Data Analysis System for Historical Newspaper Collections Research (PASC 2022)
Satheesan, S. P., Bhavya, Davies, A., Craig, A. B., Zhang, Y., & Zhai, C.
paper slides video

preprints

The Cognitive Revolution in Interpretability: From Explaining Behavior to Interpreting Representations and Algorithms (arXiv preprint, in review)
Davies, A., & Khakzar, A.
preprint