Section and Structure

Section analysis divides a track into longer musical spans such as intro, verse, chorus, bridge, instrumental, and outro. It is a structural estimate: useful for navigation and visualization, but not a replacement for a producer's arrangement notes.

Use it to navigate a track, set loop points around a chorus, or drive a structural visualization — anywhere you need an approximate map of the song's large-scale form.

For newcomers, think of section analysis as building a map of the song. It looks at large spans such as an intro, verse-like area, chorus-like area, or bridge-like area, rather than short events like individual beats or chords.

Reading a self-similarity matrix

The central tool is the self-similarity matrix (SSM). It compares every moment of a feature sequence against every other moment.

You can read it as a "which parts sound alike?" table:

Pattern in the SSM	Meaning
Bright cell	These two times sound similar
Block along the diagonal	A span is internally consistent, so it may be one section
Stripe away from the diagonal	Two different times sound similar, often a repeated chorus or repeated verse

Two signals are especially useful:

Novelty means the SSM changes suddenly. It helps find boundaries.
Repetition means similar material appears in separate places. It helps group recurring sections.

Novelty alone tends to split too much. Repetition alone can miss one-off parts. libsonare combines both.

Boundaries first

libsonare detects section boundaries by building frame-level features, computing a self-similarity matrix, and looking for novelty peaks. The default feature mix uses MFCC and chroma, so boundaries can come from timbre changes, harmonic changes, or both.

minSectionSec controls how short a detected section is allowed to be. Very short edits, drops, or pickup bars may be merged into neighboring spans.

For long-form input, the boundary detector mean-pools its feature sequence when the self-similarity matrix would exceed the native integer index cap. Boundary time values remain in the original audio timeline; on very long files, the diagnostic frame field refers to the pooled analysis grid, so UI code should place markers from time, not from frame.

Then labels

After boundaries are found, the implementation classifies each span using several clues:

Clue	What it helps identify
Normalized energy	Whether a span feels like a high-energy or low-energy section
Chroma similarity to other spans	Whether the same harmonic material returns elsewhere
Vocal-likelihood descriptor	Whether the span is likely to contain a lead vocal or vocal-like material

Typical outcomes are heuristic:

Pattern	Likely label
Repeated, high-energy, vocal-like span	Chorus
Repeated, lower-energy span	Verse
Low-energy first or last span	Intro / outro
Distinctive interior span with low vocal likelihood	Instrumental / bridge

These labels are intentionally heuristic. They are good for orientation, not for declaring a canonical song form.

Why the result is an estimate

Song structure is partly subjective. Two listeners may disagree about the exact start of a chorus, and different genres use different cues. A techno track and a ballad do not announce sections in the same way.

Treat section output as a strong hint for navigation, looping, and visualization. It works best on music with clear repeated sections, and it is weaker on through-composed, ambient, or very gradual material. A boundary being a few seconds off is normal, so automatic results are best used as a starting point for review.

How libsonare computes it

BoundaryDetector combines MFCC and chroma features, L2-normalizes them, mean-pools long inputs when needed, builds a cosine self-similarity matrix, computes a checkerboard novelty curve, and picks boundary peaks. SectionAnalyzer turns boundaries into spans, computes RMS energy, chroma descriptors, spectral flatness, and vocal-band energy, then assigns Intro, Verse, Pre-Chorus, Chorus, Bridge, Instrumental, Outro, or Unknown labels with confidence.

Glossary

Foundations

Analysis Guides

Mixing Guides

Editing Guides

Instruments and MIDI

Arrangement and Projects

Realtime Guides

Room Acoustics

Mastering Concepts

Mastering Guides

Section and Structure

Reading a self-similarity matrix

Boundaries first

Then labels

Why the result is an estimate

Section and Structure ​

Reading a self-similarity matrix ​

Boundaries first ​

Then labels ​

Why the result is an estimate ​

Section and Structure

Reading a self-similarity matrix

Boundaries first

Then labels

Why the result is an estimate