publications
How Reliable are Causal Probing Interventions?
(Accepted for publication at AACL 2025)
Canby, M.*, Davies, A.*, Rastogi, C., & Hockenmaier, J.
preprint
(* denotes equal contribution.)
Note: this work was originally presented at: IAI @ NeurIPS 2024 (Oral)
slides (IAI 2024 oral) poster (IAI 2024)
Evaluating and Designing Sparse Autoencoders by Approximating Quasi-Orthogonality
(COLM 2025)
Lee, S., Davies, A., Canby, M., Hockenmaier, J.
paper
Do Role-Playing Agents Practice What They Preach? Belief-Behavior Alignment in LLM-Based Simulations of Human Trust
(SocialSim @ COLM 2025)
Mannekote, A., Davies, A., Li, G., Boyer, K. E., Zhai, C., Dorr, B. J., & Pinto, F.
paper
Focus On This, Not That! Steering LLMs with Adaptive Feature Specification
(ICML 2025)
Lamb, T., Davies, A., Paren, A., Torr, P., Pinto, F.
webpage
paper
poster
slides
code
Social Science Is Necessary for Operationalizing Socially Responsible Foundation Models
(HAIC @ ICLR 2025)
Davies, A., Nguyen, E., Simeone, M., Johnston, E., & Gubri, M.
paper poster
Can LLMs Reliably Simulate Human Learner Actions? A Simulation Authoring Framework for Open-Ended Learning Environments
(AAAI 2025)
Mannekote, A., Davies, A., Kang, J., & Boyer, K. E.
paper code
Hidden in Plain Sight: Evaluating Abstract Shape Recognition in Vision-Language Models
(NeurIPS 2024)
Hemmat, A., Davies, A., Lamb, T., Yuan, J., Torr, P., Khakzar, A., & Pinto, F.
webpage paper poster dataset code
Competence-Based Analysis of Language Models
(IAI @ NeurIPS 2024)
Davies, A., Jiang, J., & Zhai, C.
paper poster
Large Language Models for Whole-Learner Support: Opportunities and Challenges
(Frontiers in AI, 2024)
Mannekote, A., Davies, A., Pinto, J. D., Zhang, S., Olds, D., Schroeder, N. L., … & Zhai, C.
paper
Not Just Pretty Pictures: Toward Interventional Data Augmentation Using Text-to-Image Generators
(ICML 2024)
Yuan, J.*, Pinto, F.*, Davies, A.*, & Torr, P.
paper webpage poster video code
(* denotes equal contribution.)
Understanding the social construction of juvenile delinquency: insights from semantic analysis of big-data historical newspaper collections
(Journal of Computational Social Science, 2024)
Zhang, Y.*, Davies, A.*, & Zhai, C.
paper
(* denotes equal contribution.)
Toward a Big Data Analysis System for Historical Newspaper Collections Research
(PASC 2022)
Satheesan, S. P., Bhavya, Davies, A., Craig, A. B., Zhang, Y., & Zhai, C.
paper slides video
preprints
The Cognitive Revolution in Interpretability: From Explaining Behavior to Interpreting Representations and Algorithms
(arXiv preprint, in review)
Davies, A., & Khakzar, A.
preprint