Back to Portfolio
ML Research
A Black Box Made Less Opaque (Part 2)
Syntax vs Semantics
Objective
Investigate whether models focus on syntax (symbols) or semantics (meaning) using sparse autoencoders.
Motivation
Understand whether models pay attention to style or substance — an important question with implications for both performance and safety.
About this installment
The second installment explores whether models focus on syntax (symbols) or semantics (meaning) using sparse autoencoders.