ML Research

A Black Box Made Less Opaque (Part 2)

Syntax vs Semantics

Objective

Investigate whether models focus on syntax (symbols) or semantics (meaning) using sparse autoencoders.

Understand whether models pay attention to style or substance — an important question with implications for both performance and safety.

The second installment explores whether models focus on syntax (symbols) or semantics (meaning) using sparse autoencoders.