Back to Portfolio
ML Research

A Black Box Made Less Opaque (Part 1)

Introduction to SAEs

A Black Box Made Less Opaque (Part 1)

Objective

Apply sparse autoencoders to GPT-2 Small to explore feature activation and how it changes as the model processes inputs.

Motivation

Use SAEs to understand how models begin to classify and 'understand' user inputs.

About this installment

An introductory application of SAEs to GPT-2 Small, exploring feature activation and how it changes as the model processes inputs.

More in ML Research