Questions for the seminar paper "Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models".
-----------------------------------------------------------------------------------------------
Please send your answers to: schrodi@cs.uni-freiburg.de

1) How do they attribute causal effects of nodes (SAE features) as well as edges? (2-3 sentences)
2) How can they incorporate the SAE errors in the circuits? (2 sentences)
3) What are 2 strengths and 2 weaknesses of the paper? (2 sentences)