There are several ideas about what constitutes a chorus. In general we say that a chorus is a contiguous set of lyrical lines that repeat. However, this leaves several details to define:
In this approach, we find the longest subsequence of lines that have a (nearly) perfect match (i.e., each line matches a corresponding line in another contiguous set of lines). It will prove helpful to visualize this as a matrix. In this matrix, each column represents a line and each row represents an offset (e.g., column 1 aligns line 1 against all other lines at increasing offsets; column 2 aligns line 2 against all other lines at increasing offsets; …; the last column aligns the second-to-last line against the last line (i.e., offset of 1)). Here's a matrix for “Goodnight Saigon”:
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Note the three 1's about half way down, half way in (i.e., close to the diagonal). This means there are three lines about half way through the song that perfectly match three lines at an offset roughly equal to half the song length (i.e., a chorus about halfway that is repeated at the end). In this representation, the longest local alignment approach just looks for each instance contiguous 1's and takes the one (or more) that is the longest. Here is the result:
... : [(11, u'Em'), (15, u'Am'), (28, u'Em'), (35, u'Am')] : And it was dark So dark at night : [(12, u'Dm'), (35, u'F')] : And we held onto each other Like brother to brother : [(3, u'Bb'), (16, u'F'), (29, u'G')] : We promised our mothers we'd write C : [(13, u'F'), (20, u'Bb'), (27, u'C'), (35, u'C7')] C : And we would all go down together C : [(13, u'F'), (20, u'Bb'), (27, u'C'), (35, u'C7')] C : We said we'd all go down together C : [(14, u'F'), (21, u'Bb'), (28, u'G'), (36, u'F'), (38, u'Dm'), (41, u'C')] C : Yes we would all go down together : [(0, u'C'), (9, u'Dm'), (19, u'Dm7'), (27, u'C')] : Remember Charlie Remember Baker ... : [(4, u'Bb'), (15, u'Dm7'), (23, u'G')] : And waited for us to arrive C : [(13, u'F'), (20, u'Bb'), (27, u'C'), (35, u'C7')] C : And we would all go down together C : [(13, u'F'), (20, u'Bb'), (27, u'C'), (35, u'C7')] C : We said we'd all go down together C : [(14, u'F'), (21, u'Bb'), (28, u'G'), (36, u'F'), (38, u'Dm'), (41, u'Bb')] C : Yes we would all go down together : [(0, u'G'), (3, u'F'), (5, u'Dm'), (8, u'Bb')] :
It got it right! But quite often there is some variation in the chorus, even something as simple as repeating a phrase for emphasis on the last chorus, for example this song:
…
: [(0, u'A'), (7, u'B')] : But you were the one : [(0, u'Am/C'), (16, u'B')] : You were the one C : [(0, u'E'), (4, u'E7'), (7, u'G#'), (10, u'Am')] C : I'm looking for comfort C : [(9, u'D/F#')] C : That I can take C : [(1, u'D'), (3, u'F#7')] C : >From someone else C : [(6, u'B7'), (12, u'E')] C : But after all C : [(3, u'E7'), (6, u'G#'), (16, u'Am'), (19, u'Am/G')] C : I know there is no one : [(7, u'D/F#'), (13, u'D'), (20, u'Em')] : That can save me from myself : [(0, u'Am/C'), (10, u'G/D'), (14, u'D'), (18, u'G')] : You were the only one : [(0, u'G/B'), (6, u'C'), (10, u'D'), (14, u'G/B'), (20, u'C'), (24, u'A'), (28, u'D')] : ... : [(0, u'Dm'), (3, u'F#'), (7, u'D'), (11, u'Em')] : Personne qui puisse me sauver : [(2, u'Am/C'), (12, u'G/D'), (19, u'D')] : Tu etais la seule C : [(0, u'E'), (4, u'E7'), (7, u'G#'), (16, u'Am')] C : I'm looking for comfort C : [(0, u'Am/G'), (12, u'D/F#')] C : That I can take C : [(1, u'D'), (3, u'F#7')] C : >From someone else C : [(6, u'B7'), (12, u'E')] C : But after al C : [(3, u'E7'), (6, u'G#'), (14, u'Am'), (20, u'Am/G')] C : I know there is no one : [(4, u'D/F#'), (12, u'D'), (28, u'Em')] : That can save me, save me from myself : [(0, u'Am/C'), (10, u'G/D'), (15, u'D'), (19, u'G')] : You were the only one : [(17, u'G/B'), (23, u'C'), (27, u'D'), (31, u'G/B'), (37, u'C'), (41, u'A'), (45, u'D')] :
As you can see, the longest local alignment approach failed to label the last two lines of the chorus because the second to last line repeats “save me” (very effectively I might add). The corresponding line in the matrix looks like this:
0 0 0 0 0 1 0 0 0 0 0 0 1 1 1 1 1 0 1
Note the second-to-last value, a “0”, reflecting that the second-to-last line in the real chorus didn't find a match. So how to deal with this?
In this approach we consider the accumulative alignment score over a subsequence of lyric lines. This is kind of like typical sequence alignment algorithms that allow finding an optimal score even if its path at times dips. In general, the Smith-Waterman algorithm differs from the Needleman-Wunch algorithm in that you never let the score go below 0 and you start your backtrack from whichever cell (speaking now of the character sequence alignment matrix) has the highest score.
By this line of thinking, the highest-scoring local alignment approach would fill the matrix with accumulative alignment scores (instead of 1's and 0's), where each cell is an accumulation of the pairwise alignment scores. Here is the matrix for “Goodnight Saigon”:
3 0 3 0 9 0 0 0 0 0 0 0 0 0 0 0 0 19 39 25 11 0 0 0 0 0 0 0 0 0 0 19 37 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 28 17 3 0 0 0 0 0 0 0 0 2 0 0 26 0 0 0 4 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 28 12 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 17 38 20 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 34 68 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 19 37 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 26 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 5 0 0 0 0 0 11 0 33 13 5 0 0 0 0 0 0 5 0 0 0 0 0 2 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Note that where there were three 1's before there is now a “34 68 100”, where 100 is the highest score in the matrix. To find the subsequence you include every previous line until you find a “0” score. This produces the same (correct) result for “Goodnight Saigon”. But fixes the problem with “C'était Toi”:
…
: But you were the one : [(0, u'Am/C'), (16, u'B')] : You were the one C : [(0, u'E'), (4, u'E7'), (7, u'G#'), (10, u'Am')] C : I'm looking for comfort C : [(9, u'D/F#')] C : That I can take C : [(1, u'D'), (3, u'F#7')] C : >From someone else C : [(6, u'B7'), (12, u'E')] C : But after all C : [(3, u'E7'), (6, u'G#'), (16, u'Am'), (19, u'Am/G')] C : I know there is no one C : [(7, u'D/F#'), (13, u'D'), (20, u'Em')] C : That can save me from myself C : [(0, u'Am/C'), (10, u'G/D'), (14, u'D'), (18, u'G')] C : You were the only one : [(0, u'G/B'), (6, u'C'), (10, u'D'), (14, u'G/B'), (20, u'C'), (24, u'A'), (28, u'D')] : ... : Personne qui puisse me sauver : [(2, u'Am/C'), (12, u'G/D'), (19, u'D')] : Tu etais la seule C : [(0, u'E'), (4, u'E7'), (7, u'G#'), (16, u'Am')] C : I'm looking for comfort C : [(0, u'Am/G'), (12, u'D/F#')] C : That I can take C : [(1, u'D'), (3, u'F#7')] C : >From someone else C : [(6, u'B7'), (12, u'E')] C : But after al C : [(3, u'E7'), (6, u'G#'), (14, u'Am'), (20, u'Am/G')] C : I know there is no one C : [(4, u'D/F#'), (12, u'D'), (28, u'Em')] C : That can save me, save me from myself C : [(0, u'Am/C'), (10, u'G/D'), (15, u'D'), (19, u'G')] C : You were the only one : [(17, u'G/B'), (23, u'C'), (27, u'D'), (31, u'G/B'), (37, u'C'), (41, u'A'), (45, u'D')] :
Awesome, right? It is awesome. And it works pretty well generally. But so far I've only shown songs where the chorus repeats once.
We are primarily concerned about ensuring that patterns of lyrical repetition (i.e., rhymes) are grouped within structures. But can we effectively identify rhymes? We investigated several options for identifying rhymes, ultimately choosing… Using this rhyme detection algorithm we… And the results were…
I need to think out loud. Here's some options:
I like the first because it breaks up the task. Although the second seems more “professional”?
K, well let's assume we go with the first. How do we find candidate choruses? It seems that any line that repeats even once ought to be considered as a candidate chorus. That's easy enough, but we also need to consider sequences of lines (which may include a smaller candidate). I'm trying to avoid brute force.
How about for each line l at position i that has repeats, consider all candidate sequences of lines starting with line l such:
So this would include any where in the matrix that matched, for example
but not
Scoring should favor:
I find candidate choruses as described above. Then I score them using a linear interpolation of three weighted factors:
Next thing I need to do is
It may not be important that our model recognizes a chorus the same way that a person recognizes the chorus, as long as it ultimately produces a song that looks like a pop song. After all, there is some ambiguity about how humans label what is and what is not a chorus.