# Down by the Bay

## Project Overview

Pachet creates rhythmic templates from existing lyrics and then uses Markov models with (stress, rhyme, and POS) constraints over lyrics to find other suitable replacements. ￼ However, this solution is very limited in that for the rhythmic template [100, 1, 1, 10, 1, 1, 1, 01] shown above. For example, although the phrase “innocence of a story i could leave today” satisfies the rhythmic constraints, the phrase “innocence of a story in an alleyway” ([100,1,1,10,1,1,101]) would not, despite being a very suitable lyric. This is one of many examples that show that the word-level rhythmic template is too restrictive in terms of what we might like to consider. Really we'd like to be able to consider any phrase that matches the syllable-level rhythmic template [100111011101].

How do we do this? We create a constrained Markov model over syllables. We can still constrain for rhyming (perhaps even more effectively since we're looking at phonemes). We can consider a greater breadth of solutions, but would need to sacrifice the POS template, which would break the Markov assumption in our case.

The order would have to be sufficiently high so as to ensure that words were formed from the syllables and syllables would probably need to be marked with their POS tag and possibly their position in a word in order to encourage grammatical cohesion and word cohesion respectively. We could then optionally constrain on certain POS tags if we wanted at specific syllable positions. The reality is that if we want to maintain the Markovian property and have internal rhymes, the order has to be long enough to where both rhyming positions fall within the scope of the order.

Take as a simple example “Down by the Bay”. This famous song requires the singer to ad lib words that follow the phrase “Did you ever see…” such that the rhythmic template matches [0101010101] or a derived form of this template where certain stresses are optionally omitted:

[01010101010] ("a llama eating polka dot pajamas")
[01---10-01-] ("a bear . . . combing . his hair .")
[-1---10-01-] (". us . . . riding . the bus .")
[01-10101-1-] ("a moose . with a pair of new . shoes .")

This sequence is a max of 11 syllables long. The constraints for this problem would be (using 1-base coords):

1. the syllable at position 1 must either be null or have POS tag DT
2. the syllable at position 2 must have POS tag NN (can't be null)
3. the syllable at position 10 must have POS tag NN, ADJ, ADV (can't be null)
4. the syllables at positions 2 and 10 must rhyme
5. the syllables at positions 2 and 11 must either both be null or must rhyme
6. OPTIONAL: the syllables at positions 6 (and maybe 7) must be non-null
7. For all positions if not null, must have indicated stress

Note that to ensure that the constrained model only produces solutions that meet the rhyme constraint (#4), the Markov order has to be 8 or bigger (keep in mind this is syllables we're talking about, so that's not too bad).

Why do we care? I'm generating a melody without lyrics and I need to add lyrics. Whatever melody (specifically the rhythm of the melody) I generate suggests a syllable-level rhythmic template, not a word-level rhythmic template. I need to be able to generate lyrics according to that rhythm template.

This would work great for shorter lyrics (e.g., poetry, maybe kids books) but longer songs might require some adaptation. For example, songwriters often choose the words they want to rhyme first or a set of possible rhymes and then they try to figure out how to syntactically and grammatically weave them together. The reality is that the farther apart they are, the more likely it seems that you'll be able to find a way to make them fit together, so you could likely pick a pair of rhyming words, constrain the respective positions for that rhyming pair accordingly and then generate using a Markov model.

Question: is there a way to use parts of speech in the training data to augment the Markov transition model with examples it hasn't seen? For example, if the Markov model has only ever trained on “skin a beaver” (or the equivalent sequence of syllable tokens) then the transition matrix would look like this:

		skin		a	beaver
skin		0.0		1.0	0.0
a		0.0		0.0	1.0
beaver	0.0		0.0	0.0

But what if we represented “skin a duck” as “single_syllable_infinitive_verb single_syllable_determiner 2_syllable_noun_1 2_syllable_noun_2”. Suddenly our model (equipped with a dictionary of suitable words and their POS tags) has more expressive power. Perhaps instead of generating a sequence of syllables, we generate simply a sequence of POS-tagged syllables. We could not deal with rhymes directly this way, but it would increase expressive power.

There are connections to prosody here. Like, given a melody (think, for example, of the first half of whistling part on “Don't Worry, Be happy”) that suggests a rhythmic template [1111011010101] and a phrase like “Ring around the rosies a pocketful of rye” that has rhythmic template [101011010101], how do you match this phrase to this melody? Here are two solutions (note again how this is an alignment problem):

[1111011010101]
[1-01011010101]

## To Dos

### To Do

• Figure out whether parallelization can be improved somehow (is there synchronization hiding there somewhere?)
• Figure out a good threshold for the memory cap for FSL
• Try running on smaller node on FSL
• Improve rhyme function
• Weaken stress constraints?
• Genetic algorithm to find optimal weights for rhyme function using linguistic attributes - Ben
• Get some good examples from our system
• Don't split on all punctuation, make sure that syllables are generalizable beyond punctuation
• no past tense verbs, possessive adjectives
• allow preterites at final position
• allow more complex constraints
• constraints typical of noun phrases
• Try the NOW corpus

### Future To Do

• Find a way to assign stress for multi-syllable G2P pronunciations
• Allow syllable tokens to have multiple POSes for contraction words and combine Stanford POS tokens for contraction words

### Done

• Finish draft of paper - Paul
• Get RhymeZone API working - Ben
• Try haiku, limerick (either abstract or specify rhymes ahead of time), kids book
• Roses are red, Violets are blue (use half-generated examples)
• Try Hirjee with alignment to improve rhyming - Paul
• Maybe aim for simple constraint set initially for proof concept, then discuss how to allow multiple possibilities without actually implementing
• Set up twitterbot - Ben
• Normalize probabilities for sentences with multiple pronunciations
• Discovered that I couldn't fully normalize probabilities correctly without framing sequence in terms of priors and transitions (vs just transitions) because how do you normalize initial probabilities without a prior? - Paul
• Optimize NHMM to A) not copy whole transition matrix, but rather just copy viable transitions given position (speed up: none)and B) apply constraints when adding transitions at position rather than at the end (speed up: 50xish).
• Train on every possible pronunciation of a sentence
• Add G2P functionality for words without pronunciations
• incorporate rhymeScore function into binary rhyme constraint class - Ben
• pre-process corpus as sequences of syllableTokens (binarize stress to [0,1]) - Paul
• split on sentence ending punctuation
• Ignore sentences with '@'
• Ignore tokens starting with '##'
• Ignore <p> tokens (are there other meta characters?)
• Reattach contractions, etc.
• Start smallish
• implement a constrained Markov model - Paul
• implement double rhymeScore(syl1, syl2) - Ben
• implement a variable-order Markov model - Paul
• Clean up wiki - Paul
• Replace Pos Enum and Phoneme classes - Ben
• implement variable-order Markov model
• Create GIT repo and email repo address - Paul
• Add constraints in main class - Paul