publications
published works
Focus On This, Not That! Steering LLMs with Adaptive Feature Specification
(ICML 2025)
Lamb, T., Davies, A., Paren, A., Torr, P., Pinto, F.
webpage
paper
poster
slides
code
Social Science Is Necessary for Operationalizing Socially Responsible Foundation Models
(HAIC @ ICLR 2025)
Davies, A., Nguyen, E., Simeone, M., Johnston, E., & Gubri, M.
paper poster
Can LLMs Reliably Simulate Human Learner Actions? A Simulation Authoring Framework for Open-Ended Learning Environments
(AAAI 2025)
Mannekote, A., Davies, A., Kang, J., & Boyer, K. E.
paper code
Hidden in Plain Sight: Evaluating Abstract Shape Recognition in Vision-Language Models
(NeurIPS 2024)
Hemmat, A., Davies, A., Lamb, T., Yuan, J., Torr, P., Khakzar, A., & Pinto, F.
webpage paper poster dataset code
Measuring the Reliability of Causal Probing Methods: Tradeoffs, Limitations, and the Plight of Nullifying Interventions
(IAI @ NeurIPS 2024, Oral)
Canby, M.*, Davies, A.*, Rastogi, C., & Hockenmaier, J.
preprint (updated 2025) IAI 2024 paper IAI 2024 slides IAI 2024 poster
(* denotes equal contribution.)
Competence-Based Analysis of Language Models
(IAI @ NeurIPS 2024)
Davies, A., Jiang, J., & Zhai, C.
paper poster
Large Language Models for Whole-Learner Support: Opportunities and Challenges
(Frontiers in AI, 2024)
Mannekote, A., Davies, A., Pinto, J. D., Zhang, S., Olds, D., Schroeder, N. L., … & Zhai, C.
paper
Not Just Pretty Pictures: Toward Interventional Data Augmentation Using Text-to-Image Generators
(ICML 2024)
Yuan, J.*, Pinto, F.*, Davies, A.*, & Torr, P.
paper webpage poster video code
(* denotes equal contribution.)
Understanding the social construction of juvenile delinquency: insights from semantic analysis of big-data historical newspaper collections
(Journal of Computational Social Science, 2024)
Zhang, Y.*, Davies, A.*, & Zhai, C.
paper
(* denotes equal contribution.)
Toward a Big Data Analysis System for Historical Newspaper Collections Research
(PASC 2022)
Satheesan, S. P., Bhavya, Davies, A., Craig, A. B., Zhang, Y., & Zhai, C.
paper slides video
preprints
Evaluating and Designing Sparse Autoencoders by Approximating Quasi-Orthogonality
(Accepted at COLM 2025)
Lee, S., Davies, A., Canby, M., Hockenmaier, J.
preprint
The Cognitive Revolution in Interpretability: From Explaining Behavior to Interpreting Representations and Algorithms
(arXiv preprint, in review)
Davies, A., & Khakzar, A.
preprint