**This is an old revision of the document!**

This is Jacob O'Bryant's project to create a music player that learns what songs the user wants to listen to based primarily on when they skip or listen to the currently playing song.

# current plans

1. implement model-based CF
2. implement freshness
3. implement exploration vs. exploitation
4. recruit users and evaluate
5. add a content-based model to help with exploration

# activity notes

## 29 Apr 2017

My three main future work points from last semester are 1) exploration vs. exploitation, 2) freshness, 3) model-based collaborative filtering. I wanted to start with exploration, so this week I've read this paper in depth: ENHANCING COLLABORATIVE FILTERING MUSIC RECOMMENDATION BY BALANCING EXPLORATION AND EXPLOITATION. It was one of the papers I cited in my literature review. It turns out that the methods used in this paper involve both freshness and model-based CF. So basically I just need to replicate what that paper did. It's a little different because they used explicit 5-star rating data while we're using implicit skipping data. The stuff they did on exploration vs. exploitation is very hard for me to understand. I think I've got a decent grip on model-based CF and freshness though. And their implementation of exploration depends on model-based CF, so I'll probably do that first. Freshness is super easy; it'd probably take less than half an hour to get the whole thing written and tested. The exploration part is gonna be a beast though. It involves bayesian statistics and machine learning, neither of which I've used before. So if I could implement those three things and then get a lot of people to use the app so I can have a real evaluation, I would consider this a very successful summer.

That paper though… I understand it for the most part up until section 3.2.2, at which point it's all greek to me (literally, ha ha). And I have no idea what in the heck is going on with section 3.3. But by the end of the summer, I'd like to understand everything.

## 13 Apr 2017

I've implemented a working session-based collaborative filtering recommendation thing. I've replaced the original moodless algorithm with it. So there's no longer an option in the app to specify your mood. I've used it a little bit so far and it seems to be working well. I cleared the database on my phone, so for the next week I'll be using just the new algorithm from a clean slate. The next step is making the algorithm better at exploring instead of just exploiting what it already knows. You can run a cool demo of the system with the code here.

## 06 Apr 2017

I've finally finished the literature review of music recommendation that I had to do for ENGL 316. The paper is available here. Writing it has been extremely helpful for seeing how this project can fit into the existing research on music recommendation. For instance, it helped me to figure out how to implement the next step in the moodful algorithm which I'll describe now.

At first, the system will use a simple content-based approach to pick songs. Something based solely on artist similarity might work well enough. It will have to balance keeping the user happy with trying to explore new music. Apparently this is a well studied problem in reinforcement learning (exploration vs. exploitation). I found a couple papers that cover this problem specifically in music recommendation. According to the author (the same people wrote both papers), very few recommendation systems incorporate this idea. Instead they use the greedy approach (100% exploitation). The two papers study exploration vs. exploitation in content-based and collaborative filtering, respectively. As I implement active learning in this app, I'll dig deeper into those papers to see how they handle the problem.

As the app collects more data, it'll gradually switch over to a session-based collaborative filtering model. At first it'll use a simple memory-based collaborative approach, so no machine learning will be necessary. Once we get everything working, I could implement a model-based approach in order to improve the recommendations. We won't need the model-based approach to handle scaling since the algorithm will just be working on data from the current user (so it shouldn't be a ton of data). But once we get tons of users and start recommending new songs, scaling could become an issue.

The papers I've mentioned are all in the references list at the end of my lit review paper. Writing that paper was seriously such a beneficial experience. It's given me a lot of direction for this project. I think it'll basically set the stage for what I'll be doing over the entire next semester of working on this project.

But yeah, so I have a really clear direction now. So it's time to start coding again.

(also–I've advertised the app on the Play store a little bit, but only like one person has started using it. I could try to recruit harder for more users, but I'm really tempted to finish the session-based collaborative filtering algorithm first because that should make the app way better)

oh and one more thing. I figured out how to create a java class in clojure (a dialect of lisp that runs on the JVM) and add it to an android project, so I'll be writing all the fancy new algorithm stuff in clojure. I was feeling kinda bummed about doing it in java, so this will be really nice. The normal java part of the app will handle interfacing with the android framework, but all the recommendation logic will be encapsulated within the clojure code.

## 23 Mar 2017

I've done a lot of stuff over the past week. The app now uploads the skipping data to a web app I have running on digital ocean. It just uploads the raw sqlite3 db file. The app generates a random user id the first time it uploads, so the server uses that to know if a subsequent upload is an update for an existing user or a new db for a new user. I've also cleaned up various UI things and packaged everything up for Google Play. It's on the store now: Moody Music Player. I also reworked how the skipping data is saved. At first, the database had fields for “mood” (integer) and “skipped” (boolean). To separate songs that were suggested randomly instead of with the algorithm, I would set mood to -1. I realized this was bad because you need to have a separate control group for each mood. Also, this prevented feedback from the random shuffling to be used by the algorithm. So I added an “algorithm” field (integer). 0 means control, otherwise it's the version of the algorithm. Right now the version is just 1, but that number can be incremented as I modify the algorithm.

I need to get people to start downloading the app so I can make sure it works for other people to, but other than that, I believe it's time to start working on the algorithm again. I need to figure out how 1) how to implement the mood-detecting part and 2) how to improve moodless part so it can learn about your music preferences faster. And I need a good way to test the changes I make.

## 14 Mar 2017

I found and fixed a critical bug. When the app read in old skip event data, each event was counted as “listened”, whether or not the song was actually listened or skipped. So that explains why the algorithm wasn't outperforming pure shuffling. I fixed the bug last saturday, and I've generated about 100 skip/listen events since then. Here's the data now:

   control (n=50, confidence=0.950)
skip ratio: 0.680
margin of error: 0.109
   mood 0 (n=50, confidence=0.950)
skip ratio: 0.280
margin of error: 0.104

So thank goodness the algorithm is actually doing something good. I didn't want to get other users if the app sucked, but now I'm feeling a lot better about getting other people to start using it.

## 9 Mar 2017

Latest data:

   control (n=30, confidence=0.950)
skip ratio: 0.533
margin of error: 0.150
   mood 0 (n=30, confidence=0.950)
skip ratio: 0.567
margin of error: 0.149


I'm concerned that the recommender doesn't seem to be working very well. It isn't outperforming the baseline. I think the problem isn't with the validity of the recommendation algorithm (since it's so simple. It should at least make a small improvement). I think the problem is just how long it takes to learn about the user's preferences. If we played each song in the library several times, we should have no problem making good recommendations. But how do we get to an acceptable level of performance without having to wait that long?

general directions I could take:

- Figure out how to visualize the system's current understanding of the user's preferences
- Improve the algorithm so it learns faster
- recruit more users so we have more data to work with

I'm thinking of doing those things in that order. I think my own data is sufficient for this early development.

## 7 Mar 2017

I discovered a bug in the code that records skip events. I was recording what the current position in the song was and counting the song as skipped if over 50% of the song has been listened to. However, I discovered that I was receiving incorrect values for the current position in the song. I recently discovered how to find out if the song song is being skipped because the user pressed skip or simply because we've reached the end of the song. I changed the code to record skip events based on that so we don't even worry about the current song position. I've been using the app for a few days since fixing that, and I can now verify that the data is being collected corrected.

I wrote some python code to do statistical analysis on the data also. The output looks like this:

   control (n=20, confidence=0.950)
skip ratio: 0.500
margin of error: 0.184
   mood 0 (n=30, confidence=0.950)
skip ratio: 0.333
margin of error: 0.142

The control group is the songs that were suggested purely by random. All other groups have songs suggested by our algorithm. The data in mood 0 shows that our algorithm performs better than the control, but the margin of error is pretty high. I only have a couple days' data to work with, so the numbers will be more interesting after I've used the app for longer. The script also generates this graph:

The graph shows the skip ratio when considering the first n skip events. Since the algorithm improves over time but the graph always includes data from the beginning, the graph will tend to overestimate the skip ratio for our algorithm.

## 2 Mar 2017

I've changed the app so whenever it suggests a new song, there is a 10% chance the song will be picked totally by random. When the app records the user's response to the song (skip or listen), it also records whether the song was suggested with our fancy algorithm or by chance. This data will be our “control group.” I've also added some very basic code for allowing the user to manually switch between different moods. Right now they get three different moods. Soon I'll make it so you can create however many moods you want, etc.

My goal right now is to get to the recruiting stage as fast as possible. As soon as I have the code in place to make the app upload its data to me, I'll start looking for some early research participants. While they start to use the app and give me their data, I'll work on the app's usability so I can start recruiting more heavily.

## 23 Feb 2017

I was thinking about how to remedy the following problem. With the current moodless player, it falls back to album and artist skip ratios if there isn't sufficient data about the actual song itself being played. If you have an album with nine songs you hate and one song you love, it's likely the one song will get buried. We record the timestamp of whenever you listen or skip a song, but currently it isn't being used. We could give higher weight to songs that haven't been played. The timestamps could be used so we don't….

OK different idea that just popped into my head: Before we make the app automatically detect the user's mood, we can give them an option to specify what mood they're in. This would make the app work for everyday use, so we could deploy it and start collecting data right away while we figure out the auto mood stuff. The data would probably help with figuring out the mood-detecting algorithm too. We could also start detecting how good the moodless algorithm is. To get the performance of our moodless algorithm, we calculate the number of skips per hour or number of skips per number of listens. To compare this to random shuffling, every ten songs we could give the user a totally random song and use their reaction to see what the performance is. That way we can get both kinds of data, but we don't have to make the user sit through a lot of random shuffling. (maybe throwing in a totally random song here and there would help with adding variety anyway).

The app can keep track of what version of the algorithm is used to pick songs. So if we want to improve the algorithm, all we do is make the change, update the version, push out an update for the app and wait for the new data to be uploaded to our server. That would be a really nice setup to have working. So I think getting that up and running should be the first priority.

After that's done, we could start making tweaks like the one I started to mention in the first paragraph. But more importantly, we'd be able to see if the current algorithm is good enough. If we can verify that the app is pretty good at playing the right songs when the user tells us what mood they're in, then we should be in good shape to do the auto mood stuff. And then we'd have some data we could use to write a paper.

Note on the user specifying their mood: To the app, a new mood would be like a new user account on a computer or something. The app wouldn't infer anything from what the user names the mood (like “rock” or “lighthearted” or whatever), it would basically be the same as “mood 1”, “mood 2” etc. When the user creates and switches to a new mood, the skip ratio data from other moods would be completely ignored. So that's why it's like a new user account: the new user's data is completely separate.

To help in improving the algorithm, I'll add a feature in the app so you can browse through your songs/artists/albums and the app will show you how likely it is to play them. This will help us to find ways to improve the algorithm. I've also read that users have been shown to have more trust/think more favorably towards recommendation systems when the systems are transparent, i.e. the user has some idea of what's going on under the hood instead of it being a black box. So the users would probably appreciate that. It could make them more willing to keep using the app. They might also be more enthusiastic about using it when we push out updates to the algorithm since they might be more cognizant of how our updates are changing the app's behaviour.

## 22 Feb 2017

As of now, the music player app keeps track of skip ratios correctly. I actually finished the skip-behaviour-recording code last Thursday, but there were some bugs I had to iron out. The music player app had some code to do gapless playback that messed things up. While the current song was playing, the next song would be preloaded so it could start right as the current one finishes. The problem is that the new code I write doesn't decide what song to play next until the current song ends, so sometimes the next song would be the preloaded one instead of the one my code came up with. I spent a few hours figuring that out, and the problem was fixed after commenting out a few lines of code. So everything is fine and dandy. Now I can start I can use the app to collect data from my own personal use. After a little while (a few days? a week?) there will hopefully be enough data to start working with.

Now that this part of the coding is over, I think I'm back into a thinking stage. Here's an updated todo list:

1. figure out how to evaluate the performance of the smart shuffle thing (e.g. we could record number of skips/hour and compare this to when we use normal shuffling)
2. figure out an algorithm for detecting what mood the user is in and organizing the data we collect accordingly
3. implement the evaluation code so we can compare the moodless algorithm to normal shuffling and make a nice graph or something.
4. implement the moodful shuffling algorithm.
5. compare the algorithm's performance to normal shuffling and moodless smart shuffling.
6. Set up a web app and make the app upload the data to it.
7. Recruit people to use the app so we can get there data.
8. Measure the performance of the algorithm on all the other people and write a paper about it.

I also just remembered that I'm technically supposed to write a progress report halfway through the semester. So maybe I'll do that soon too.

Side note: Here's an interesting paper about music information retrieval (MIR): http://www.cp.jku.at/research/papers/schedl_jiis_2013.pdf (The Neglected User in Music Information Retrieval Research by Schedl et. al.). It basically says that most MIR studies do evaluation by comparing their results to existing databases, etc. instead of having actual humans directly involved in evaluating. The problem is that these “system-based evaluations” don't necessarily match up with how humans would evaluate things (since humans are really complicated). He says other related fields (like information retrieval in general) are better about this than MIR is right now. So a cool thing about this project is that we're making an app that actual humans use, and we can use their usage data to directly evaluate our algorithm's performance. So that'll be a good thing to keep in mind.

## 09 Feb 2017

The raw data we collect will be a collection of records with these fields:

- song title
- album
- artist
- timestamp
- skipped?

A record will be created whenever the music player moves to the next song, whether that be from the current song finishing or the user skipping it.

The current state includes the following information that we care about:

- listen/skip ratios for songs, artists and albums (observable)
- the user's mood (hidden)

(Paul mentioned that I should look into hidden Markov models. From what I can tell, these would be the observable and hidden components of that model).

Using the timestamp data, we can find out when the listen/skip behaviour of the user changes. This means the user must be in a different mood.

Given a library of songs, we need to come up with a probability for each song that it is the one the user would most enjoy listening to next (so the sum of the probabilities will be 1). To simplify things, let's assume the mood is constant and the current skip ratios correspond to the current mood. We can calculate the probability that if we play a certain song, the user will listen to it (instead of skipping).

The ratio of number of times listened to number of times played (i.e. number of times listened + number of times skipped) for the individual song is this probability. If we have played song A for the user 1,000 times and they skipped it 500 times, then the probability that the user will listen to the song if we play it again is clearly 0.5. (Remember that we're holding the mood constant, and we also do not take into account that the user will get tired of the song if we play it over and over again). In this case, we don't care about the artist and album skip ratios. The song skip ratio is categorically more important than the other skip ratios.

However, things are different if we don't have as much data on the individual song. If we played song A for the user only two times and they skipped it once, then the ratio is the same, but we probably don't want to rely solely on it. If we have a lot of data on the album or artist skip ratios, those would probably improve our calculations. This suggests a confidence level for each ratio. Let R_s, R_al, R_ar be the ratio of number of times listened to number of times played for the song, album and artist, respectively. Let C_s, C_al, C_ar be the confidence levels of those ratios, where each confidence level is between 0 and 1. Then we could calculate P(the user will listen to the song | we play the song) like so:

   P = R_s*C_s + (1-C_s)(R_al*C_al + (1-C_al)(R_ar*C_ar))

Given this probability for each song, we could normalize them to get the probability of each song being the best song to play next. Then we would have the data we need to choose the next song to play.

We do need to calculate the confidence level though. If we've played the song 0 times, the confidence should be zero. As the number of times we've played the song increases, the confidence should approach 1. So some sort of simple, mathy function should do fine there.

   confidence(n) = n/(n + a)

where n is the number of times played and a is some constant. The smaller a is, the faster the confidence will increase. Perhaps machine learning could be used to find the optimal value of a?

But I think machine learning will primarily be needed only when we consider the mood. As a first step, I could implement all this stuff and make a music player that assumes you're always in the same mood. Then the next step would be figuring out how to model mood changes (how hard could it be?).

Here's something we could do: use the timestamps to cluster the data records into listening sessions. We could assume that the mood stays the same during each listening session (I don't think that assumption will be true all the time, so it'd be best to do something fancier down the road so we don't need that assumption). Then we could calculate the skip ratios for each listening session. Then we'd need a way to figure out if the skip ratios for one session are similar or different from another session. If they're similar enough, we can assume the user was in the same mood and combine the ratios. After doing that with all the listening sessions, we'll have a number of “moods”, each with a corresponding set of skip ratio data.

When the user starts listening and we don't know what mood they're in, we can sample songs from each mood set that are likely to be played when the user is in that mood. We can calculate the current skip ratio data and continue sampling until the skip ratio data becomes similar to one of the existing moods. If the data doesn't become similar, the user must be in a different mood. Then we assign them to a “new mood”, and make suggestions solely off of data from the current listening session.

short term goals: Figure out if this whole thing is the correct approach to take. If it is, start working on the moodless music player. Also figure out more concretely how to implement the moodful music player. Figure out a model and stuff like that.

## 02 Feb 2017

I finished step one, adding a dummy method to control the next song when skipping. In the code, there's not a huge distinction between when the next song is played because the user skipped or because the song finished. So when the song skips, I get the current position in the song which can be used to see if the song played to completion. I modified the music player so that if you skip a song and it's been playing for less than five seconds, the next song will be the theme song from “Bill Nye the Science Guy.” Once the machine learning stuff is in there, I just have to change the code to play whatever song the algorithm suggests.

Short-term plans: Finish the machine learning tutorial and start messing around with TensorFlow. Design the data model and start implementing it.

## 31 Jan 2017

Up until today, most of the work I've done has been researching and deciding on what platform/tools to use for the music player app. I looked into a few projects that allow you to write a mobile app and then compile it automatically for Android, iOS and other platforms. The two main ones I looked into were NativeScript and Kivy. With NativeScript, you write the app in HTML + CSS + JavaScript, and the project translates that somehow into actual native UI elements on the respective platforms. It looked nice, but Paul brought up the question of if machine learning libraries were available. I started looking at Kivy because it lets you write the code in Python. Kivy doesn't use native UI elements, so the app would look funny, but that doesn't really matter for our purposes. I made an Android hello world app with Kivy. It was pleasant, but after some more thought, I decided to scrap the whole cross-platform idea and just write a standard Android app in Java. I wanted to try one of the non-java solutions because

1. I've done a fair amount of normal Android development already and I don't particularly like Java.
2. It would make it easier to target iOS instead of just Android.

I ended up sticking with just Java + Android because

1. I can use an existing open-source music player instead of writing one from scratch (specifically, Vanilla Music Player*). I don't think it would have taken too long to write a basic music player from scratch (we only need the most basic features), but the time needed would still have been nontrivial. Starting with the open-source player will let me jump into machine learning faster.
2. Targeting iOS would be nice, but I think it's more important to get a prototype working as fast as possible. Even if the project stays on just Android, that should be sufficient for the near future. I can always switch to a cross-platform solution or create a separate iOS app later if it really would add enough value.

(*Last year, as part of a different research project, I tested out a bunch of open-source music players for Android. Vanilla Music Player is (relatively) simple and I can actually clone and build the project without getting a million errors (that's a lot more than I can say for the other players. Encore Music Player in particular was a nightmare, even though the precompiled app on the Play store looks really nice).)

So I've been delving into the code for Vanilla, making minor changes to help me understand how it's working. The disadvantage of using an existing player is that it's a relatively large project and I have to get used to the code base. I've also been reading through a machine learning tutorial on the TensorFlow website.

Short-term plans: Identify where the code for shuffling songs is. Figure out how to replace it with my own song-choosing code. Finish the machine learning tutorial and start messing around with TensorFlow.