Finding Choruses

There are several ideas about what constitutes a chorus. In general we say that a chorus is a contiguous set of lyrical lines that repeat. However, this leaves several details to define:

Length. Though we might initially be tempted to just take the longest sequence of lyrical lines, there are cases where lines preceding or following the chorus also repeat (e.g., All for Leyna)
Repeats. We might also be tempted to just take the sequence of lyrical lines that has the most repeats. But in many cases, particularly where a closing tagline repeats to fade, this solution will fail (e.g., Rocket Man)
Distribution. Seemingly inherent in the definition of a chorus is its role in tying together diverse sections of the song. As such, it seems desirable to select sequences of lines that are more evenly distributed across the song, whether by some sort of coverage metric or by the average or standard deviation in the length of the sequences delimited by the selected chorus.
Substructure. The purpose in our identifying structure is to model patterns of repetition. Thus if the pattern of repetition spans over what would be considered partly the chorus and partly the verse (e.g., Just the Way You Are), it may be desirable to consider the whole block of lines containing the complete substructure as belonging to the chorus.

Attempts to Identify Choruses

Longest Local Alignment

In this approach, we find the longest subsequence of lines that have a (nearly) perfect match (i.e., each line matches a corresponding line in another contiguous set of lines). It will prove helpful to visualize this as a matrix. In this matrix, each column represents a line and each row represents an offset (e.g., column 1 aligns line 1 against all other lines at increasing offsets; column 2 aligns line 2 against all other lines at increasing offsets; …; the last column aligns the second-to-last line against the last line (i.e., offset of 1)). Here's a matrix for “Goodnight Saigon”:

  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   1   1
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0
  0   0   0   0   0   0
  0   0   0   0   0
  0   0   0   0
  0   0   0
  0   0
  0

Note the three 1's about half way down, half way in (i.e., close to the diagonal). This means there are three lines about half way through the song that perfectly match three lines at an offset roughly equal to half the song length (i.e., a chorus about halfway that is repeated at the end). In this representation, the longest local alignment approach just looks for each instance contiguous 1's and takes the one (or more) that is the longest. Here is the result:

...
: [(11, u'Em'), (15, u'Am'), (28, u'Em'), (35, u'Am')]
: And it was dark So dark at night
: [(12, u'Dm'), (35, u'F')]
: And we held onto each other Like brother to brother
: [(3, u'Bb'), (16, u'F'), (29, u'G')]
: We promised our mothers we'd write
C : [(13, u'F'), (20, u'Bb'), (27, u'C'), (35, u'C7')]
C : And we would all go down together
C : [(13, u'F'), (20, u'Bb'), (27, u'C'), (35, u'C7')]
C : We said we'd all go down together
C : [(14, u'F'), (21, u'Bb'), (28, u'G'), (36, u'F'), (38, u'Dm'), (41, u'C')]
C : Yes we would all go down together
: [(0, u'C'), (9, u'Dm'), (19, u'Dm7'), (27, u'C')]
: Remember Charlie Remember Baker
...
: [(4, u'Bb'), (15, u'Dm7'), (23, u'G')]
: And waited for us to arrive
C : [(13, u'F'), (20, u'Bb'), (27, u'C'), (35, u'C7')]
C : And we would all go down together
C : [(13, u'F'), (20, u'Bb'), (27, u'C'), (35, u'C7')]
C : We said we'd all go down together
C : [(14, u'F'), (21, u'Bb'), (28, u'G'), (36, u'F'), (38, u'Dm'), (41, u'Bb')]
C : Yes we would all go down together
: [(0, u'G'), (3, u'F'), (5, u'Dm'), (8, u'Bb')]
:

It got it right! But quite often there is some variation in the chorus, even something as simple as repeating a phrase for emphasis on the last chorus, for example this song:

…

: [(0, u'A'), (7, u'B')]
: But you were the one
: [(0, u'Am/C'), (16, u'B')]
: You were the one
C : [(0, u'E'), (4, u'E7'), (7, u'G#'), (10, u'Am')]
C : I'm looking for comfort
C : [(9, u'D/F#')]
C : That I can take
C : [(1, u'D'), (3, u'F#7')]
C : >From someone else
C : [(6, u'B7'), (12, u'E')]
C : But after all
C : [(3, u'E7'), (6, u'G#'), (16, u'Am'), (19, u'Am/G')]
C : I know there is no one
: [(7, u'D/F#'), (13, u'D'), (20, u'Em')]
: That can save me from myself
: [(0, u'Am/C'), (10, u'G/D'), (14, u'D'), (18, u'G')]
: You were the only one
: [(0, u'G/B'), (6, u'C'), (10, u'D'), (14, u'G/B'), (20, u'C'), (24, u'A'), (28, u'D')]
:
...
: [(0, u'Dm'), (3, u'F#'), (7, u'D'), (11, u'Em')]
: Personne qui puisse me sauver
: [(2, u'Am/C'), (12, u'G/D'), (19, u'D')]
: Tu etais la seule
C : [(0, u'E'), (4, u'E7'), (7, u'G#'), (16, u'Am')]
C : I'm looking for comfort
C : [(0, u'Am/G'), (12, u'D/F#')]
C : That I can take
C : [(1, u'D'), (3, u'F#7')]
C : >From someone else
C : [(6, u'B7'), (12, u'E')]
C : But after al
C : [(3, u'E7'), (6, u'G#'), (14, u'Am'), (20, u'Am/G')]
C : I know there is no one
: [(4, u'D/F#'), (12, u'D'), (28, u'Em')]
: That can save me, save me from myself
: [(0, u'Am/C'), (10, u'G/D'), (15, u'D'), (19, u'G')]
: You were the only one
: [(17, u'G/B'), (23, u'C'), (27, u'D'), (31, u'G/B'), (37, u'C'), (41, u'A'), (45, u'D')]
:

As you can see, the longest local alignment approach failed to label the last two lines of the chorus because the second to last line repeats “save me” (very effectively I might add). The corresponding line in the matrix looks like this:

  0   0   0   0   0   1   0   0   0   0   0   0   1   1   1   1   1   0   1

Note the second-to-last value, a “0”, reflecting that the second-to-last line in the real chorus didn't find a match. So how to deal with this?

Highest-Scoring Local Alignment

In this approach we consider the accumulative alignment score over a subsequence of lyric lines. This is kind of like typical sequence alignment algorithms that allow finding an optimal score even if its path at times dips. In general, the Smith-Waterman algorithm differs from the Needleman-Wunch algorithm in that you never let the score go below 0 and you start your backtrack from whichever cell (speaking now of the character sequence alignment matrix) has the highest score.

By this line of thinking, the highest-scoring local alignment approach would fill the matrix with accumulative alignment scores (instead of 1's and 0's), where each cell is an accumulation of the pairwise alignment scores. Here is the matrix for “Goodnight Saigon”:

  3   0   3   0   9   0   0   0   0   0   0   0   0   0   0   0   0  19  39  25  11   0   0   0   0   0   0   0   0   0   0  19  37
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   6   0  28  17   3   0   0   0   0   0   0   0   0   2   0   0  26
  0   0   0   4   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   4   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   7   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   2   0   0   0   0   0   0   0   4   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   3   0   0   0   0   0   0   0   0   0
  0   0   2   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  28  12   1
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  17  38  20
  0   0   0   6   0   0   0   0   0   0   0   0   0   0   0   0   0  34  68 100
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   3   0  19  37
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   6   0  26
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   4   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   1   0   0
  0   0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   0   0   0   0   0
  0   0   0   0   0   0   5   0   0   0
  0   0   0   5   0   0   0   0   0
 11   0  33  13   5   0   0   0
  0   0   0   5   0   0   0
  0   0   2   8   0   0
  0   0   0   0   0
  0   0   0   0
  0   0   0
  0   0
  0

Note that where there were three 1's before there is now a “34 68 100”, where 100 is the highest score in the matrix. To find the subsequence you include every previous line until you find a “0” score. This produces the same (correct) result for “Goodnight Saigon”. But fixes the problem with “C'était Toi”:

…

: But you were the one
: [(0, u'Am/C'), (16, u'B')]
: You were the one
C : [(0, u'E'), (4, u'E7'), (7, u'G#'), (10, u'Am')]
C : I'm looking for comfort
C : [(9, u'D/F#')]
C : That I can take
C : [(1, u'D'), (3, u'F#7')]
C : >From someone else
C : [(6, u'B7'), (12, u'E')]
C : But after all
C : [(3, u'E7'), (6, u'G#'), (16, u'Am'), (19, u'Am/G')]
C : I know there is no one
C : [(7, u'D/F#'), (13, u'D'), (20, u'Em')]
C : That can save me from myself
C : [(0, u'Am/C'), (10, u'G/D'), (14, u'D'), (18, u'G')]
C : You were the only one
: [(0, u'G/B'), (6, u'C'), (10, u'D'), (14, u'G/B'), (20, u'C'), (24, u'A'), (28, u'D')]
:
...
: Personne qui puisse me sauver
: [(2, u'Am/C'), (12, u'G/D'), (19, u'D')]
: Tu etais la seule
C : [(0, u'E'), (4, u'E7'), (7, u'G#'), (16, u'Am')]
C : I'm looking for comfort
C : [(0, u'Am/G'), (12, u'D/F#')]
C : That I can take
C : [(1, u'D'), (3, u'F#7')]
C : >From someone else
C : [(6, u'B7'), (12, u'E')]
C : But after al
C : [(3, u'E7'), (6, u'G#'), (14, u'Am'), (20, u'Am/G')]
C : I know there is no one
C : [(4, u'D/F#'), (12, u'D'), (28, u'Em')]
C : That can save me, save me from myself
C : [(0, u'Am/C'), (10, u'G/D'), (15, u'D'), (19, u'G')]
C : You were the only one
: [(17, u'G/B'), (23, u'C'), (27, u'D'), (31, u'G/B'), (37, u'C'), (41, u'A'), (45, u'D')]
:

Awesome, right? It is awesome. And it works pretty well generally. But so far I've only shown songs where the chorus repeats once.

Substructure Identification

We are primarily concerned about ensuring that patterns of lyrical repetition (i.e., rhymes) are grouped within structures. But can we effectively identify rhymes? We investigated several options for identifying rhymes, ultimately choosing… Using this rhyme detection algorithm we… And the results were…

Preferring Distributed Choruses

I need to think out loud. Here's some options:

Find a bunch of candidate choruses and then score them
Find a metric which avoids finding candidates and instead just gets the highest matching sequence of lines

I like the first because it breaks up the task. Although the second seems more “professional”?

K, well let's assume we go with the first. How do we find candidate choruses? It seems that any line that repeats even once ought to be considered as a candidate chorus. That's easy enough, but we also need to consider sequences of lines (which may include a smaller candidate). I'm trying to avoid brute force.

How about for each line l at position i that has repeats, consider all candidate sequences of lines starting with line l such:

the first line and last line are both repeats
there are no two consecutive non-repeating lines in the sequence

So this would include any where in the matrix that matched, for example

1
1 1
1 0 1
1 0 1 0 1
1 1 1 1

but not

0 1 0
1 0
1 0 0 1

Scoring should favor:

Distributed choruses (as a function of amount of area is “covered” by each instance), as long as it isn't so distributed that the chorus “regions” overlap
Lengthier choruses (in number of lines)
More repetitive choruses (where the line is the fundamental unit of repetition)

What I ended up doing

I find candidate choruses as described above. Then I score them using a linear interpolation of three weighted factors:

a distribution metric (this is essentially how much of the song is “covered” by the choruses if you consider that a chorus covers its constituent lines and up to 2 lines on either side)
the length squared
the number of repetitions (repetitions beyond 4 aren't counted to avoid giving too much weight to a highly repetitive line)

Next thing I need to do is

play around with values for the weights
mark the chorus lines given the chosen chorus
move on to finding verses and bridges!

Model vs Reality

It may not be important that our model recognizes a chorus the same way that a person recognizes the chorus, as long as it ultimately produces a song that looks like a pop song. After all, there is some ambiguity about how humans label what is and what is not a chorus.

mind/chorus_detection.txt · Last modified: 2016/06/06 17:05 by norkish

Back to top

Table of Contents