mind:noteary_log [CS Wiki]

1/14/17 Researching into three data collection methods. The first method is preferable but depends on if I can find software (or build it, if not too difficult). The process: Generate a spectrogram from a MusicXML or MIDI song, and compare it to a spectrogram of a musician’s performance of that song. Align the two spectrograms (edit distance algorithm?) Mainly note the differences at each area regarding tone length, pitch and volume. The second method seems more difficult but potentially possible – do not align spectrogram vs. spectrogram, but align the audio to its musicxml score (see below). The third method, if the above 2 are too difficult: Using a MusicXML song, strip all musical expression, and compare it to the original. Data also analyzed from the MusicXML, MIDI or user-supplied will be: Time signature. Researching paper: “Real-Time Audio-to-Score alignment of music performances containing errors and arbitrary repeats and skips” for method 2.

1/16/17 Potential for method #2: Antescofo http://forumnet.ircam.fr/product/antescofo-en/

1/17/17 Meeting notes: Annotated Beatles Find good data, otherwise you’re toast.

Dr. Ventura has not seen anything like methods 1 and 2 in research. They hinge on whether I can find good data (or if not, whether I can create good data, in which case my objective would change).

1/18/17 Decided that Antescofo and methods 1 & 2 are too difficult. Most research and software here is not well-supported, plus, data is hard to obtain. I will stay with method 3 for now, as there is a large enough corpus in MusicXML format (where ornamentations are supported). This is assuming that the composer’s additions (in contrast to the performer’s), are sufficient – but an interesting project would be comparing how composers add ornamentation v.s. how performers interpret it. Potentially, I can hand-annotate music based on live performances, but I plan to do this after analyzing enough data with method 3.

1/21/17 Downloaded music21, a MusicXML corpus and library in Python. Installed python and tested that everything worked. Created a WinForms app in C#. I plan to use IronPython to call python scripts for parsing.

1/23/17 Windows 10 crashed and ended up reinstalling everything.

1/24/17 Starting to create the part of the program responsible for parsing the Music21 corpus and putting it in a database, in a format easier to analyze. Creating definitions for parts of music. Got IronPython working. Looking at Paul’s code (from Java) and deciding it is not focused on my area of research.

1/25/17 Understanding more about Music21, Python and IronPython.

1/29/17 Tried to get IronPython working with the Music21 module. Individually they work, but together they do not. Exceptions when trying to import modules. Cannot figure it out. Opened a StackOverflow issue.

1/31/17 No answers to StackOverflow issue and no luck fixing it myself. Tried cx_Freeze to compile it to an exe but that didn’t work either. Decided to switch to pure Python. I might not know much about the language, but hopefully I can get something done.

2/1/17 Porting definitions created in C# to python. Learning more about music21.

2/4/17 Installed TensorFlow, ensured it worked with music21 (had to reinstall Python for x64) Researched music21 format, was able to get all notes from a score, also able to get score metadata.

2/5/17 Continued working on parser. Able to get slurs and loudness dynamics (forte, piano, etc). Also separated song into parts.

2/6/17 Able to get music keys and time signatures.

2/7/17 Parsing literals, expressions, etc. Basically everything else related to definitions done.

2/9/17 Started to fix up missing composers in scores – works on bach (will work on others). Trying different scores to fix parsing.

2/12/17 Studying TensorFlow options. Also realized the parser needs more work, especially things like metadata - choral or instrumental.

2/13/17 Considering different models for prediction. Markov chains seem the most appropriate, although I think they will need memory. An ordinary markov chain does not have memory, but because things like ornamentation are usually sparse, the model needs to not be able to generate one every second. Researching an Additive Markov Chain.

2/14/17 Talked to a mentor and decided to consider an LSTM (Long short-term memory) RNN. Researching TensorFlow for how to do it.

2/15/17 More research into TensorFlow. Getting the sample MNIST data set to work.

2/16/17 Researching into RNNs and LSTMs. Looking into an example for more sequential-like data.

2/17/17 Read into an RNN example for text generation. The text is usually grammatically correct, but doesn't make sense. Worried that the same might happen to my project. A few things I could try to reduce the error:

Evaluate scores generated from the algorithm, and manually penalize bad expression generation.

2/19/17 Realized I needed to get some basics down, so I researched more into TensorFlow.

2/20/17 Did more research in TensorFlow, specifically how it does LSTMs. Planning a way to represent the sequence of notes/rests (“literals”) as a feature. Going to try a batch size of 1 measure for now.

2/23/17 Discussed the possible features for the meeting. All seem to be nominal, which can present a problem if they have different possible # of classes. It was recommended that I “binarize” (sp?) the inputs to some set bit size.

2/24/17 Implementing the algorithm for my own data. Testing with just 1 part of the song on a single note of the chord, just training on relative note length.

2/26/17 Successfully trained on the above simple task. Bug: Get an error at epoch 100.

2/28/17 Discovered that currently a variable size input would not work well if I want to train multiple songs. Considering a fixed size and just labeling the literals for their relationship with the measure

3/2/17 Successfully trained on multiple scores, some with differently-numbered parts. The error function was very simple though. My short-term goal is to improve the error function and also try it on more combinations of scores to make sure I didn't forget anything. Then I will implement the actual output part.

3/6/17 More formatting, getting writing to work is a higher priority now, so working on that.

3/7/17 Was able to get it to write, but it's just trained to not write any expressions. Need to update error function, or train on more data.

3/11/17 I was able to get semi-decent results. My objective algorithm is as follows:

Let diffs = predicted_outputs - actual_outputs
Let minV = abs( minimum (diffs) )
Shift the values in diffs up by minV + some small value epsilon
Let absLogged = abs ( ln ( diffs ) )
Let squared = absLogged * absLogged
Let error = the sum over squared + minV * some constant k

The intention for this function was that false negatives (i.e. predicting “normal” when there should have been an expression) would get penalized more than false positives, without messing up the relative distance from the ideal. A score was produced on a Bach piece “40.6” (actual name unknown) - trained from about 20 examples and 40 epochs. It put staccatos everywhere, the occasional fermata, but notably an accent on the half notes of the downbeat. This could be promising.

3/14/17 Made a few changes:

Replaced pitch with relative pitch (relative to the key signature) as a feature
Eliminated grace notes as features (for now)
Gave “filler” notes the same value as real notes (for example, accidental = None rather than “undefined”)
Changed the previous logged algorithm to the following:
- Let diffs = predicted_outputs - actual_outputs
- Let error = Mean(diffs^2 * (Sign(diffs) + A_COEF)^2 ,
  - where 1 > A_COEF > -1 (currently 0.1) and Sign() outputs 1 if > 0, 0 if = 0, and -1 if < 0

I found that this works better, otherwise I was getting many extra trills and mordents everywhere. I decided to go back to the basics and train on just a single instance. With a few hundred epochs, it was able to converge perfectly. Training on two instances closely matched the original although it is still outputting trills and mordents where fermatas should be.

3/16/17 Got some feedback on my progress so far:

Try to enable GPU for speed
Separate nominal inputs into individual classes
Consider a bi-directional LSTM
Store & Recall Weights

3/20/17 Trying to parse Beethoven scores. Analyzed the key for missing scores with the Krumhansl-Schmuckler algorithm. Was too difficult to enable GPU acceleration, so I'm skipping it.

3/23/17 On recommendation, I will try the GPU acceleration again. Also trying to get range dynamics working, but music21's “offset” is unreliable.

3/26/17 Found out the reason it was unreliable. I inspected the offset of a dynamic, and when I inspected it again (without running the program any further), it changed. Schrodinger's bug :) I gave up and made my own custom offset function. Also, GPU acceleration works, but the speed is a barely noticeable improvement. Did more monkeying around with how the parts were set up. It amazes me that music21 can have so many features but have so little consistency in how the data is stored.

4/1/17 Combining different composers in the training. Sent a request for the supercomputer access. I think results will be more interesting when i'm not training on a handful of scores. Researching into previous attempts at musical expression. Can't find anything relevant so far.

mind/noteary_log.txt · Last modified: 2017/04/02 03:07 by soreric

Back to top