Research

Background

I work on LLM pretraining research at Anthropic. I also collaborate with AI safety researchers through MATS, SPAR, and the Anthropic Fellows Program. Before transitioning to ML research, I worked as a software engineer at Stripe and studied Biomedical Engineering at Imperial College London.

Steering Llama 2 via Contrastive Activation AdditionNina Panickssery, Nick Gabrieli, Julian Schulz, Meg Tong, Evan Hubinger, Alexander TurnerProceedings of the 62nd Annual Meeting of the Association for Computational LinguisticsOutstanding Paper Award at ACL 2024 Conference
Many-shot JailbreakingCem Anil, Esin Durmus, Nina Panickssery, Mrinank Sharma, Joe Benton, Sandipan Kundu, Joshua Batson, Meg Tong, Jesse Mu, Daniel Ford, Francesco Mosconi et al.Advances in Neural Information Processing Systems, 2024
Refusal in Language Models Is Mediated by a Single DirectionAndy Arditi, Oscar Obeso, Aaquib Syed, Daniel Paleka, Nina Panickssery, Wes Gurnee, Neel NandaAdvances in Neural Information Processing Systems, 2024
Beyond Data Filtering: Knowledge Localization for Capability Removal in LLMsIgor Shilov, Alex Cloud, Aryo Pradipta Gema, Jacob Goldman-Wetzler, Nina Panickssery, Henry Sleight, Erik Jones, Cem AnilarXiv preprint arXiv:2512.05648, 2025
Inspection and Control of Self-Generated-Text Recognition Ability in Llama3-8b-InstructChristopher Ackerman, Nina PanicksseryInternational Conference on Learning Representations, 2025
Mitigating Many-Shot JailbreakingChristopher Ackerman, Nina PanicksseryarXiv preprint arXiv:2504.09604, 2025
Understanding Jailbreak Success: A Study of Latent Space Dynamics in Large Language ModelsSarah Ball, Frauke Kreuter, Nina PanicksseryarXiv preprint arXiv:2406.09289, 2024

Enhancing Model Safety through Pretraining Data FilteringYanda Chen, Mycal Tucker, Nina Panickssery, Tony Wang, Francesco Mosconi, Anjali Gopal, Carson Denison, Linda Petrini, Jan Leike, Ethan Perez, Mrinank SharmaAnthropic Alignment Science Blog, 2025
Role embeddings: making authorship more salient to LLMsNina Panickssery, Christopher AckermanLessWrong, 2025
Decomposing independent generalizations in neural networks via Hessian analysisDmitry Vaintrob, Nina PanicksseryLessWrong, 2023
Investigating the learning coefficient of modular addition: hackathon projectNina Panickssery, Dmitry VaintrobLessWrong, 2023