Notes on Interpretability

Taxonomy-style notes on interpretability methods for transformer language models.