Research
Background
- I work on LLM pretraining research at Anthropic. I also collaborate with AI safety researchers through MATS, SPAR, and the Anthropic Fellows Program. Before transitioning to ML research, I worked as a software engineer at Stripe and studied Biomedical Engineering at Imperial College London.
Papers
- Steering Llama 2 via Contrastive Activation AdditionNina Panickssery, Nick Gabrieli, Julian Schulz, Meg Tong, Evan Hubinger, Alexander TurnerProceedings of the 62nd Annual Meeting of the Association for Computational LinguisticsOutstanding Paper Award at ACL 2024 Conference
- Many-shot JailbreakingCem Anil, Esin Durmus, Nina Panickssery, Mrinank Sharma, Joe Benton, Sandipan Kundu, Joshua Batson, Meg Tong, Jesse Mu, Daniel Ford, Francesco Mosconi et al.Advances in Neural Information Processing Systems, 2024
- Refusal in Language Models Is Mediated by a Single DirectionAndy Arditi, Oscar Obeso, Aaquib Syed, Daniel Paleka, Nina Panickssery, Wes Gurnee, Neel NandaAdvances in Neural Information Processing Systems, 2024
- Inspection and Control of Self-Generated-Text Recognition Ability in Llama3-8b-InstructChristopher Ackerman, Nina PanicksseryInternational Conference on Learning Representations, 2025
- Mitigating Many-Shot JailbreakingChristopher Ackerman, Nina PanicksseryarXiv preprint arXiv:2504.09604, 2025
- Understanding Jailbreak Success: A Study of Latent Space Dynamics in Large Language ModelsSarah Ball, Frauke Kreuter, Nina PanicksseryarXiv preprint arXiv:2406.09289, 2024
Other
- Enhancing Model Safety through Pretraining Data FilteringYanda Chen, Mycal Tucker, Nina Panickssery, Tony Wang, Francesco Mosconi, Anjali Gopal, Carson Denison, Linda Petrini, Jan Leike, Ethan Perez, Mrinank SharmaAnthropic Alignment Science Blog, 2025
- Role embeddings: making authorship more salient to LLMsNina Panickssery, Christopher AckermanLessWrong, 2025
- Decomposing independent generalizations in neural networks via Hessian analysisDmitry Vaintrob, Nina PanicksseryLessWrong, 2023
- Investigating the learning coefficient of modular addition: hackathon projectNina Panickssery, Dmitry VaintrobLessWrong, 2023
← Back to home