Lyrist To-Dos

Research

Look into this: http://wordnet.princeton.edu/
Look for exitsing rhyming APIs, like Rhyme Zone datamuse.com/api
Do more research on Stanford CoreNLP, nltk, and Parsey McParseface
Learn how to do vector arithmetic, learn what cosine distances are. Kahn academy.
Research all my paper’s sources more in-depth, find further applications
Stay up-to-date with NLP and NLG news, browse Google Scholar for ideas
Hierarchical Neural Network Model? https://pdfs.semanticscholar.org/17f5/c7411eeeeedf25b0db99a9130aa353aee4ba.pdf
Read: word2vec parameter learning explained https://arxiv.org/abs/1411.2738
Read http://multithreaded.stitchfix.com/blog/2015/03/11/word-is-worth-a-thousand-vectors/
Read http://hen-drik.de/msc_thesis/sci_2015_heuer_hendrik.pdf
Read https://districtdatalabs.silvrback.com/modern-methods-for-sentiment-analysis
LSTM (long short term memory)?
Eventually learn about matrices, they seem useful too
Chinese poetry generation https://arxiv.org/pdf/1604.01537v1.pdf
Come up with second study idea for evaluating poem scores and people’s preferences
Wevi: https://ronxin.github.io/wevi/ https://github.com/ronxin/wevi
Frank Liang → syllibificaiton w/out phonemes
Go back and study my lab from CS 236, including rules.
Study and understand casting, superclasses, subclasses, and inheritance better
Understand Clonable in Java, find out whether it’s okay or depreicated
Study Stanford classes I use http://www-nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/tagger/maxent/MaxentTagger.html
Look up RhymeZone’s API http://www.rhymezone.com/
Look at generators on this list http://www.yoyogames.com/blog/119

Design

Decide on really good naming conventions, programming-wise, linguistics-wise, and music-wise.
Frequency means occurences / total # of words
Should methods return result, or something more specific
> Decide the relationship between filters and jobs, look back at CS 340 and decide which design patterns to use
State pattern for operation-filter process on songs? Draw out possible diagrams.
Command pattern for filters and jobs?
> Use Paul’s classes for inspiration
Decide how to syllibify
Figure out exceptions, errors.
> What to do in a case where filters block all replacement words? NoReplacementException?
Keep punctuation somehow
Retain grammar across lines
Use priority queue for Word2Vec operations?

Eventually think of a way to make currently relevant lyrics by using day-of data from the web. Bot compositions are great, but the day after election day they seemed out of it and unconnected to current human emotion.

Decide whether to make an optional recursive option sort of like a bigger stanza that can hold lists of stanzas
Decide whether protected or private is better when dealing with big class hierarchies
Decide good enum practice, look at Paul’s examples
Think of a way for Lyrist to theme certain lines and song sections differently

& While designing, imagine the project at a much larger scale. Does the current design make sense?

Design time tracker by method
Decide how to accept user parameters w/ command line args or an outside file w/ chosen settings.
Decide if I should use Doc2Vec or Phrase2Vec or my own Java methods.
Decide how to deal with punctuation, capitalization. Have lower-case mode and normal mode. Have no punctuation mode and normal punctuation mode.
Decide scope and access rules, like should a PosTagger accept a song, or an arraylist of arraylists of strings?
Look into Pop* lyric stuff, appropriate wherever possible
Come up with a good plan for proper nouns (named entity recognition)
Decide which POS tagger is best: Stanford CoreNLP, nltk, or Parsey McParseface
Come up with lots of new W2v job ideas
Opposite finder—spacially? By analogy (White is to black as word is to opposite)?
> Come up with word data usage ideas
Occurences / total words = frequency
> Come up with lots of new filter ideas
Figure out whether its best practically to use marked words, and if so, whether there should be multiple types of word markings.
Figure out how to integrate n-grams to build actual new sentences (modify templates before replacements?).
Figure out how to use rules like my rap research to score new sentences.
Figure out measures of song aesthetic like the 4 in my poem research: appropriateness, flamboyance, lyricism, relevancy.
Figure out how to use comparison, analogy, metaphor, and similee like in my poem research.
Figure out what interesting features could come about by Lyrist drawing content from the Internet.
Figure out out how to deal with word stresses and meter
Think of ways to generate new themes
Decide how the high-level lyric replacement jobs are decided (all mark then replace by analogy?)
Ponder ways to make program faster, somehow simply time every method’s time taken in the program. Look at proportions.
Decide whether and where to use serialization
Find out how to add a corpus to an already-existing model (Gensim or DL4J lets you do that), also how to weight it.
Figure out how to optimally categorize data for filters use.
Brainstorm in advance parameter sets for model training on my corpora (see above, word2vec parameter learning explained). Try a few different sets of parameters and compare the results.
Decide if it’s okay to use only 1 model or if I should have multiple
If I ever release any source code, come up with ideas for crazy cool comments I can put in it. Poetry, inspirational quotes, generated pieces, art w/ characters, scriptures, images, codes to decrypt, links to interesting external points. Don’t make it creepy; make it artistic and inspirational.
Design a brain
> Design my intention base. Based on a general song sentiment, learn from a class of emotional progressions and create my own 500-dimensional new emotional progression to follow.
> Design a way for Lyrist to choose its own filters, its own templates, etc

Implementation

Change Sentiment to Theme. Sentiment now means something more specific.
Add Google n-gram frequency filter, allowing year input and threshold input
Stop variations of “to be” from being replaced
Look up verbs in dictionary, then tag them with transitive, intransitive, or ambitransitive
Make complete Stanford pipeline parse complete song before any alterations. Use these part of speech tags and named entity tags. Have templateReader read the same text and put its tagged Word objects into its own objects (Song, Stanzas, Lines).
Get Java W2v operations to read in bytes correctly
> Then train newer, better, word2vec models
>> Then Ensure my word2vec operations for analogy and sentiment are correct
> Then experiment with multiplication and division
» use a serialized Stanford NLP pipeline object
Figure out how the c word2vec script avoids a triple nested for loop.

Outline a complete contract for Lyrist: Where errors occur, required input, guarunteed output, etc.
Simplify any object that there are thousands of instances of (W2vSuggestion, W2vWordSuggestion, Word, the Stanford tagging process)

Make an enum for each filter type, use this for FiltrationCommander instead.

Change multiple spaces to tabs in IntelliJ
Have list of filter enums that are the currently functional filters.
Make a W2V job only return the resulting song, then have filters and other jobs run on it elsewhere

 > Recognize compound nouns, replace them with nouns or compound nouns > use coreferences to get gender right

Build point system
Give points to words / sentences for: correct POS, similar POS, correct NER, a good rhyme, a mediocre rhyme, sticking to a meter, sticking to a grammatical structure 
> Add ability to preserve punctuation or not
Scan Pos as best as possible
> Scan w2v suggestions within their context
> Use more advanced Pos, all the categories mom taught me
Recognize named entities
Add a good dictionary filter
Scan for phonemes accurately
Scan for syllables accurately
Scan for stresses accurately
Add a good Rhyme filter
Turn off blinking cursor
Separate Rhyme-complete from Lyrist’s RhymeFilter

Build a class or software that interfaces w/ Twitter API
Build my own scraper for a giant lyrical database. Store lyrics by artist / group, genre, date written / published, structure
Implement structures using superclass to make an object of a subclass like this:
List<String> names = new ArrayList<String>();
> Make a bunch of useful exceptions and organize them

@> Allow W2vModel to be Serializable. This may speed up the model loading time.

Get rid of the Stanford logging, I think it slows me down a bit when there’s one print for
Eventually set up test classes mirroring all my current classes.
Eventually make my git repositiories private, especially before any publication.
Change the annoying default comment when I make a new class
Implement time tracker that tells the name of the method and the time it took
Build functionality for the mean of a Job, and the heirarchy:
1 job = x input words, 1 output word
> By trial and error get my software to handle large datasets like Google News.
Try building huge models on the supercomputer.
Test effects of different arithmetic operations on word vectors.
word * pi
word * e
word * constants > 1 (intensifies word? simply ruins its meaning?)
word * constants < 1 (weakens word? simply ruins its meaning?)
word * word (finds word related to both? gives unrelated/useless result?)
sqrt(word)
log(word)
standard deviation(words) = range of ideas??
median(words)
averge(words) = sentiment?
> Eventually do test classes. Check my line coverage and eliminate functions I don’t need.
Eventually start writing my crazy cool text to be inserted into Lyrist as comments.
Eventually do documentation for Lyrist (Javadoc? Comments?).
Eventually change i++ to ++i everywhere, Dr. Rodham once said it’s always the same speed or faster.

mind/lyristtodo.txt · Last modified: 2016/12/19 00:57 by bayb2

Back to top

Table of Contents

Lyrist To-Dos

Research

Design

Implementation