Official code for 'Not All Tokens Are Equally Useful for Steering': Robust DiM for token-selective steering-direction construction and Prefix Steering for early-token intervention in activation steering.
mechanistic-interpretability representation-engineering llm-interpretability activation-steering llm-steering robust-dim prefix-steering
-
Updated
Jun 15, 2026 - Python