**This is an old revision of the document!**

This is Jacob O'Bryant's project to create a music player that learns what songs the user wants to listen to based primarily on when they skip or listen to the currently playing song.

# activity notes

## 5 Aug 2017

As of today I've finished the Spotify integration. The app now streams new recommendations from Spotify, and it audio feature data from Spotify to calculate song similarity to help out with exploration.

## 1 Aug 2017

Last week I wrote up a description of the algorithm. It's written for a general audience, but it gives a decent overview of what I've done so far. I've included it below. I'm almost done integrating Spotify stuff so that the app can recommend new songs if the user has a Spotify premium account. Spotify includes a few API endpoints that provide a list of recommendations for the user based on their previous spotify listening history. I made the app so it takes those recommendations and adds them to the user's library of songs. So as far as the smart shuffle algorithm is concerned, there's no difference between local songs and songs from spotify. After the algorithm picks a song, the app checks if it's a spotify song or a normal song and then takes measures accordingly to play it.

Here's the algorithm description:

SMART SHUFFLE ALGORITHM

Every time you listen to a song, the algorithm records the current time and whether you finished the song or skipped it. Then your listening history is grouped into sessions. Suppose in the morning you listened to songs A, B and C but skipped song D, and then in the afternoon you listened to songs D, E and F but skipped songs A and B. There would be two sessions:

   Session #   Listened    Skipped
1           A, B, C     D
2           D, E, F     A, B

The algorithm doesn't have a notion about if you “like” a song or not. Instead, it thinks about what songs “go well together.” In this example, A, B and C go well together, but D does not go well with either A, B or C. So if you started a new session and you listened to song D, the algorithm would be less likely to play A, B or C, but it would be more likely to play E or F. This allows us to automatically play the correct music for whatever mood you're currently in. It's based on the assumption that your mood doesn't change during a single session.

This is called session-based collaborative filtering, and it's the main part of the algorithm. However, there are two more parts: exploration and freshness.

Exploration

The problem with collaborative filtering is that it doesn't work well until you already have a lot of data. When you first start using the app, songs are simply chosen at random. But after the algorithm learns about some of your songs, it must make a choice: should it play a song that it already knows you'll like, or should it randomly pick another song that you haven't listened to very much yet (or at all)?

Currently, both choices are given equal time. Whenever the algorithm chooses which song to play next, it partitions all the songs in your library into two sets. The first set contains songs that you've listened to a lot; the second set contains songs you haven't listened to very much. The alghorithm chooses the best possible song from each set and then flips a coin to decide which one should be played.

How does it know which song from the second set is the “best” if there isn't much listening data? It chooses randomly for the most part, but it tries to make an educated guess. If you've skipped songs from a certain artist, it will give higher priority to songs from different artists. I have a lot of Beatles music in my collection, but I rarely listen to it. When the algorithm plays a few Beatles songs for me and I skip them, it'll instead explore the other artists I have before it goes back to try a different Beatles song.

Freshness

The algorithm also tries to avoid playing the same songs over and over again–it tries to play songs that are “fresh.” Before the algorithm plays a song, it'll look at all the times it was played in the past and how long ago it was played each time. The more times the song was played and the more recently it was played, the less fresh it will be, so that song will have a smaller chance of being chosen.

But the algorithm also realizes that some songs are OK to play a lot while other songs should be played only once in a while. If you skipped a song and the last time you heard it was yesterday, the algorithm will try to wait a longer period of time before playing it again. If you instead listen to the song, the algorithm will play the song again sooner.

FUTURE WORK

In it's current state, the algorithm performs well for my use. But there are many ways it could be improved. I think exploration needs work the most. Instead of choosing songs mostly at random, it would be better to find ways to improve our guesses. Some services like last.fm offer song data that could be used to figure out what songs in the user's library are similar to each other.

In addition, the 50/50 split between exploration and exploitation could be adjusted. It would be better if the algorithm decided how often would be best to explore new songs instead of always doing it 50% of the time.

## 26 July 2017

Yesterday I completed the freshness code. Instead of the following:

   Freshness = 1 - sum_{i=1 to n} 1/e^(t_i/s)

I did this:

   Freshness = (1 - 1/e^(t_1/s)) * (1 - 1/e^(t_2/s)) * ... * (1 - 1/e^(t_n/s))

That way the freshness score stays between 0 and 1. To figure out a value for s (the memory strength for the individual song), the app minimizes an error function.

Suppose a song has been played n times in the past. The error function will take n data points. The ith data point includes t_1 to t_(i-1), i.e. the amounts of time that have passed since each previous time we played that song. The ith data point also includes whether or not the song was skipped on the ith time we played it. We use that skip value as the observed value for what the freshness function “should” have been. If the song was skipped, the observed value is 0; otherwise the value is 1. In other words, we would expect a song with a high freshness value (i.e. near 1) to be listened to, and we would expect a song with a low freshness value (near 0) to be skipped. Given a value for s and the n data points, the error function returns the sum of the squares of the errors.

To minimize the error function, we simply call it with a bunch of different (constant) strength values, e.g. {0.5,1,1.5,…,30}. It's kind of janky, but it was easy to code and it seems to be working well enough. I tested the strength-figuring-out code on my own listening history. As expected, it gave low memory strength values to songs that I like and high memory strength values to songs that I didn't like. In other words, the songs that I liked were given a memory strength value that would allow them to be played with a high frequency and vice versa.

I thought that all was going to take me longer to do than it did, so I guess I'll be on to something else now. I'll be using the app and trying to evaluate it qualitatively. If it seems to work well, I'll probably focus on adding support for cloud music services so we can hopefully get a lot more users and get a more quantitative analysis. Otherwise, I'll keep working on the algorithm and get by with the users we have.

## 24 July 2017

I have returned. The wiki logged me out as I was writing my last entry in June and I lost part of it, and I never got around to finishing it. So I'll just start fresh here with the state of the project.

We've got about nine users and a total of about 2,200 listening events. The data we have shows that the skip ratio for the algorithm is about 10 percentage points lower than the skip ratio for pure random shuffling.

Things that could be done next include the following:

GETTING MORE USERS:

- create a version of the app for iOS
- add support for cloud music services

IMPROVING THE ALGORITHM:

- add content-based recommendation
- improve freshness score handling

For the next step I'd like to do “improve freshness score handling.” If I can do it quickly, I'd also like to do “add support for cloud music services.”

Here's the plan for freshness stuff. Right now, the freshness score (a value from 0 to 1) is calculated using this formula:

   Freshness = 1 - 1/e^(t/s)

where t is the amount of time elapsed since the amount of time elapsed since the song was last played and s is a parameter representing the strength of the user's memory (the higher s is, the longer it will take for the song's freshness to recover). I'd like to change it to something like this:

   Freshness = 1 - sum_{i=1 to n} 1/e^(t_i/s)

where n is the number of times the song has been played and t_i is the amount of time elapsed since the ith time the song was played. This would make the freshness score take into account all the times the user listened to the song, not just the most recent time. The second thing I want to do with freshness is figure out the appropriate value for s. It should be different for each song. The app should dynamically learn the value of s for each song. Given a value of s for a particular song, we could calculate an error for that song like so. For each time that we played the song in the past, calculate the freshness f of that song at that time. If the user listened to the song, the error for that instance is 1 - f. If the user skipped the song, the error is f. Then we sum the errors for each instance to get a total error for that value of s. Then we just find a value for s such that the total error is minimized.

So I want to implement a basic version of the freshness stuff as fast as I can and then hopefully get the cloud music stuff working. To really get some useful data I'll need a lot more users, and to get more users I really need to get the app to work with cloud music services. So if I could get both of those knocked out before summer ends, I'd be tickled pink.

## 23 June 2017

I've done some recruiting and we've got a handful of users now. I'll still make some more online posts before I start EFY in two days. One issue with recruiting is that there seems to be so few people who 1) have their own music collections and 2) store it locally on their phone. Even if they are in the group of people who still buy music, they often use something like google play music to store and sync their music for them. So the app doesn't work for them. I did find a really cool open-source music player called Tomahawk. It can play songs from your local collection but it also seamlessly can play songs from cloud collections like google play music or amazon music. It can also play from other sources like Spotify and Soundcloud. I'd like to transition my project over to using that app instead of the current one (Vanilla music player). I toyed with the idea of doing it before summer ends so that we could recruit users that store their music in the cloud. However I'm thinking it would be better to wait until summer term is over to do that. I'll just focus on recruiting people that can use the app as it is right now so that I can use the rest of the semester to keep working on the algorithm.

Speaking of the algorithm, I've been thinking about something that will hopefully be a major improvement. For freshness, there's a “memory strength” parameter that controls how long it takes a song to become fresh again after you play it. I currently have it set so that most of the freshness is recovered after two days. Here's the brilliant idea: keep individual memory strength parameters for each song. whenever the user skips a song, increase the parameter for that song (so it'll take longer for the song to regain it's freshness). This is brilliant because we can think of the memory strength parameter as how much the user likes the song. e.g. if you hate a song, you might say that the appropriate amount of time to wait before playing it a second time is 1,000 years. The more you like a song, the more frequently you

## 10 June 2017

I've finally got a system that works really well for myself personally. The main recent change is I've altered how the app combines the main score with the “content-based” (artist similarity) score. Before, we computed a confidence level for the main score and then used a weighted average between the two scores. However, all the songs we hadn't played yet had the same content score, so they all got lumped together. That made the app either 1) always play songs we hadn't heard before, or 2) never play songs we hadn't heard before. The exploration code only helped for songs we've listened to a little but not very much.

Here's how it's done now: We divide all the possible next songs into two lists. If the main score confidence level for a song is high enough, we put it in the first list; otherwise, we put it in the second list. 80% of the time, we choose the best song from the first list to play. The rest of the time we choose the best song from the second list. It's a rough solution, but it's working decently.

The next step is to recruit users. But for improving the algorithm, here's things I'm thinking about:

1. Figure out a way to dynamically adjust the ratio for picking a content-based song instead of always using 80%/20%. This could take into account the size of the two lists and the scores of the songs in both lists in addition to other things.

2. Use a better freshness function. The formula now only uses the most recent time, so there would be no difference between a song that had been played a thousand times and a song that had been played only once if they both had been played recently. So it'd be nice to incorporate more of the play history. Also, currently the freshness of a song almost fully recovers after about two days. It might be better to do a logarithmic thing that keeps improving over time without an asymptote. And the speed at which the freshness recovers should be adjusted depending on the user and the individual song.

3. Get a real content-based score for the songs we don't know anything about. We could use last.fm tags to try and get something, but I'm not sure if that'll really be so helpful.

## 07 June 2017

I've mainly done a lot of coding over the last month. I heavily refactored the way the app does recommendations so the time complexity is a lot better. Startup time still takes a while, but recommendations can be done quickly. I also implemented a simple artist-similarity content-model to help make suggestions when there's not a lot of usage data about a particular song. In effect, this means that when we're guessing which songs to play (because we can't make an informed recommendation), we'll be less likely to suggest a song from a certain artist if the user has already skipped a different song from the same artist.

## 12 May 2017

I've implemented model-based CF, freshness and exploration using a really simple (and pretty arbitrary) margin-of-error function. Here are some things I'd like to do:

• add some simple content-based stuff so we can make guesses for songs the user hasn't listened to yet.
• figure out how to quantitatively measure the algorithm's performance
• use the quantitative feedback to iterate on the recommendation algorithm and make it really good
• recruit a bunch of users
• make the code more efficient. It takes ~10 seconds to start, and then it takes ~5 seconds between songs to make the next recommendation.

## 29 Apr 2017

My three main future work points from last semester are 1) exploration vs. exploitation, 2) freshness, 3) model-based collaborative filtering. I wanted to start with exploration, so this week I've read this paper in depth: ENHANCING COLLABORATIVE FILTERING MUSIC RECOMMENDATION BY BALANCING EXPLORATION AND EXPLOITATION. It was one of the papers I cited in my literature review. It turns out that the methods used in this paper involve both freshness and model-based CF. So basically I just need to replicate what that paper did. It's a little different because they used explicit 5-star rating data while we're using implicit skipping data. The stuff they did on exploration vs. exploitation is very hard for me to understand. I think I've got a decent grip on model-based CF and freshness though. And their implementation of exploration depends on model-based CF, so I'll probably do that first. Freshness is super easy; it'd probably take less than half an hour to get the whole thing written and tested. The exploration part is gonna be a beast though. It involves bayesian statistics and machine learning, neither of which I've used before. So if I could implement those three things and then get a lot of people to use the app so I can have a real evaluation, I would consider this a very successful summer.

That paper though… I understand it for the most part up until section 3.2.2, at which point it's all greek to me (literally, ha ha). And I have no idea what in the heck is going on with section 3.3. But by the end of the summer, I'd like to understand everything.

## 13 Apr 2017

I've implemented a working session-based collaborative filtering recommendation thing. I've replaced the original moodless algorithm with it. So there's no longer an option in the app to specify your mood. I've used it a little bit so far and it seems to be working well. I cleared the database on my phone, so for the next week I'll be using just the new algorithm from a clean slate. The next step is making the algorithm better at exploring instead of just exploiting what it already knows. You can run a cool demo of the system with the code here.

## 06 Apr 2017

I've finally finished the literature review of music recommendation that I had to do for ENGL 316. The paper is available here. Writing it has been extremely helpful for seeing how this project can fit into the existing research on music recommendation. For instance, it helped me to figure out how to implement the next step in the moodful algorithm which I'll describe now.

At first, the system will use a simple content-based approach to pick songs. Something based solely on artist similarity might work well enough. It will have to balance keeping the user happy with trying to explore new music. Apparently this is a well studied problem in reinforcement learning (exploration vs. exploitation). I found a couple papers that cover this problem specifically in music recommendation. According to the author (the same people wrote both papers), very few recommendation systems incorporate this idea. Instead they use the greedy approach (100% exploitation). The two papers study exploration vs. exploitation in content-based and collaborative filtering, respectively. As I implement active learning in this app, I'll dig deeper into those papers to see how they handle the problem.

As the app collects more data, it'll gradually switch over to a session-based collaborative filtering model. At first it'll use a simple memory-based collaborative approach, so no machine learning will be necessary. Once we get everything working, I could implement a model-based approach in order to improve the recommendations. We won't need the model-based approach to handle scaling since the algorithm will just be working on data from the current user (so it shouldn't be a ton of data). But once we get tons of users and start recommending new songs, scaling could become an issue.

The papers I've mentioned are all in the references list at the end of my lit review paper. Writing that paper was seriously such a beneficial experience. It's given me a lot of direction for this project. I think it'll basically set the stage for what I'll be doing over the entire next semester of working on this project.

But yeah, so I have a really clear direction now. So it's time to start coding again.

(also–I've advertised the app on the Play store a little bit, but only like one person has started using it. I could try to recruit harder for more users, but I'm really tempted to finish the session-based collaborative filtering algorithm first because that should make the app way better)

oh and one more thing. I figured out how to create a java class in clojure (a dialect of lisp that runs on the JVM) and add it to an android project, so I'll be writing all the fancy new algorithm stuff in clojure. I was feeling kinda bummed about doing it in java, so this will be really nice. The normal java part of the app will handle interfacing with the android framework, but all the recommendation logic will be encapsulated within the clojure code.

## 23 Mar 2017

I've done a lot of stuff over the past week. The app now uploads the skipping data to a web app I have running on digital ocean. It just uploads the raw sqlite3 db file. The app generates a random user id the first time it uploads, so the server uses that to know if a subsequent upload is an update for an existing user or a new db for a new user. I've also cleaned up various UI things and packaged everything up for Google Play. It's on the store now: Moody Music Player. I also reworked how the skipping data is saved. At first, the database had fields for “mood” (integer) and “skipped” (boolean). To separate songs that were suggested randomly instead of with the algorithm, I would set mood to -1. I realized this was bad because you need to have a separate control group for each mood. Also, this prevented feedback from the random shuffling to be used by the algorithm. So I added an “algorithm” field (integer). 0 means control, otherwise it's the version of the algorithm. Right now the version is just 1, but that number can be incremented as I modify the algorithm.

I need to get people to start downloading the app so I can make sure it works for other people to, but other than that, I believe it's time to start working on the algorithm again. I need to figure out how 1) how to implement the mood-detecting part and 2) how to improve moodless part so it can learn about your music preferences faster. And I need a good way to test the changes I make.

## 14 Mar 2017

I found and fixed a critical bug. When the app read in old skip event data, each event was counted as “listened”, whether or not the song was actually listened or skipped. So that explains why the algorithm wasn't outperforming pure shuffling. I fixed the bug last saturday, and I've generated about 100 skip/listen events since then. Here's the data now:

   control (n=50, confidence=0.950)
skip ratio: 0.680
margin of error: 0.109
   mood 0 (n=50, confidence=0.950)
skip ratio: 0.280
margin of error: 0.104

So thank goodness the algorithm is actually doing something good. I didn't want to get other users if the app sucked, but now I'm feeling a lot better about getting other people to start using it.

## 9 Mar 2017

Latest data:

   control (n=30, confidence=0.950)
skip ratio: 0.533
margin of error: 0.150
   mood 0 (n=30, confidence=0.950)
skip ratio: 0.567
margin of error: 0.149


I'm concerned that the recommender doesn't seem to be working very well. It isn't outperforming the baseline. I think the problem isn't with the validity of the recommendation algorithm (since it's so simple. It should at least make a small improvement). I think the problem is just how long it takes to learn about the user's preferences. If we played each song in the library several times, we should have no problem making good recommendations. But how do we get to an acceptable level of performance without having to wait that long?

general directions I could take:

- Figure out how to visualize the system's current understanding of the user's preferences
- Improve the algorithm so it learns faster
- recruit more users so we have more data to work with

I'm thinking of doing those things in that order. I think my own data is sufficient for this early development.

## 7 Mar 2017

I discovered a bug in the code that records skip events. I was recording what the current position in the song was and counting the song as skipped if over 50% of the song has been listened to. However, I discovered that I was receiving incorrect values for the current position in the song. I recently discovered how to find out if the song song is being skipped because the user pressed skip or simply because we've reached the end of the song. I changed the code to record skip events based on that so we don't even worry about the current song position. I've been using the app for a few days since fixing that, and I can now verify that the data is being collected corrected.

I wrote some python code to do statistical analysis on the data also. The output looks like this:

   control (n=20, confidence=0.950)
skip ratio: 0.500
margin of error: 0.184
   mood 0 (n=30, confidence=0.950)
skip ratio: 0.333
margin of error: 0.142

The control group is the songs that were suggested purely by random. All other groups have songs suggested by our algorithm. The data in mood 0 shows that our algorithm performs better than the control, but the margin of error is pretty high. I only have a couple days' data to work with, so the numbers will be more interesting after I've used the app for longer. The script also generates this graph:

The graph shows the skip ratio when considering the first n skip events. Since the algorithm improves over time but the graph always includes data from the beginning, the graph will tend to overestimate the skip ratio for our algorithm.

## 2 Mar 2017

I've changed the app so whenever it suggests a new song, there is a 10% chance the song will be picked totally by random. When the app records the user's response to the song (skip or listen), it also records whether the song was suggested with our fancy algorithm or by chance. This data will be our “control group.” I've also added some very basic code for allowing the user to manually switch between different moods. Right now they get three different moods. Soon I'll make it so you can create however many moods you want, etc.

My goal right now is to get to the recruiting stage as fast as possible. As soon as I have the code in place to make the app upload its data to me, I'll start looking for some early research participants. While they start to use the app and give me their data, I'll work on the app's usability so I can start recruiting more heavily.

## 23 Feb 2017

I was thinking about how to remedy the following problem. With the current moodless player, it falls back to album and artist skip ratios if there isn't sufficient data about the actual song itself being played. If you have an album with nine songs you hate and one song you love, it's likely the one song will get buried. We record the timestamp of whenever you listen or skip a song, but currently it isn't being used. We could give higher weight to songs that haven't been played. The timestamps could be used so we don't….

OK different idea that just popped into my head: Before we make the app automatically detect the user's mood, we can give them an option to specify what mood they're in. This would make the app work for everyday use, so we could deploy it and start collecting data right away while we figure out the auto mood stuff. The data would probably help with figuring out the mood-detecting algorithm too. We could also start detecting how good the moodless algorithm is. To get the performance of our moodless algorithm, we calculate the number of skips per hour or number of skips per number of listens. To compare this to random shuffling, every ten songs we could give the user a totally random song and use their reaction to see what the performance is. That way we can get both kinds of data, but we don't have to make the user sit through a lot of random shuffling. (maybe throwing in a totally random song here and there would help with adding variety anyway).

The app can keep track of what version of the algorithm is used to pick songs. So if we want to improve the algorithm, all we do is make the change, update the version, push out an update for the app and wait for the new data to be uploaded to our server. That would be a really nice setup to have working. So I think getting that up and running should be the first priority.

After that's done, we could start making tweaks like the one I started to mention in the first paragraph. But more importantly, we'd be able to see if the current algorithm is good enough. If we can verify that the app is pretty good at playing the right songs when the user tells us what mood they're in, then we should be in good shape to do the auto mood stuff. And then we'd have some data we could use to write a paper.

Note on the user specifying their mood: To the app, a new mood would be like a new user account on a computer or something. The app wouldn't infer anything from what the user names the mood (like “rock” or “lighthearted” or whatever), it would basically be the same as “mood 1”, “mood 2” etc. When the user creates and switches to a new mood, the skip ratio data from other moods would be completely ignored. So that's why it's like a new user account: the new user's data is completely separate.

To help in improving the algorithm, I'll add a feature in the app so you can browse through your songs/artists/albums and the app will show you how likely it is to play them. This will help us to find ways to improve the algorithm. I've also read that users have been shown to have more trust/think more favorably towards recommendation systems when the systems are transparent, i.e. the user has some idea of what's going on under the hood instead of it being a black box. So the users would probably appreciate that. It could make them more willing to keep using the app. They might also be more enthusiastic about using it when we push out updates to the algorithm since they might be more cognizant of how our updates are changing the app's behaviour.

## 22 Feb 2017

As of now, the music player app keeps track of skip ratios correctly. I actually finished the skip-behaviour-recording code last Thursday, but there were some bugs I had to iron out. The music player app had some code to do gapless playback that messed things up. While the current song was playing, the next song would be preloaded so it could start right as the current one finishes. The problem is that the new code I write doesn't decide what song to play next until the current song ends, so sometimes the next song would be the preloaded one instead of the one my code came up with. I spent a few hours figuring that out, and the problem was fixed after commenting out a few lines of code. So everything is fine and dandy. Now I can start I can use the app to collect data from my own personal use. After a little while (a few days? a week?) there will hopefully be enough data to start working with.

Now that this part of the coding is over, I think I'm back into a thinking stage. Here's an updated todo list:

1. figure out how to evaluate the performance of the smart shuffle thing (e.g. we could record number of skips/hour and compare this to when we use normal shuffling)
2. figure out an algorithm for detecting what mood the user is in and organizing the data we collect accordingly
3. implement the evaluation code so we can compare the moodless algorithm to normal shuffling and make a nice graph or something.
4. implement the moodful shuffling algorithm.
5. compare the algorithm's performance to normal shuffling and moodless smart shuffling.
6. Set up a web app and make the app upload the data to it.
7. Recruit people to use the app so we can get there data.
8. Measure the performance of the algorithm on all the other people and write a paper about it.

I also just remembered that I'm technically supposed to write a progress report halfway through the semester. So maybe I'll do that soon too.

Side note: Here's an interesting paper about music information retrieval (MIR): http://www.cp.jku.at/research/papers/schedl_jiis_2013.pdf (The Neglected User in Music Information Retrieval Research by Schedl et. al.). It basically says that most MIR studies do evaluation by comparing their results to existing databases, etc. instead of having actual humans directly involved in evaluating. The problem is that these “system-based evaluations” don't necessarily match up with how humans would evaluate things (since humans are really complicated). He says other related fields (like information retrieval in general) are better about this than MIR is right now. So a cool thing about this project is that we're making an app that actual humans use, and we can use their usage data to directly evaluate our algorithm's performance. So that'll be a good thing to keep in mind.

## 09 Feb 2017

The raw data we collect will be a collection of records with these fields:

- song title
- album
- artist
- timestamp
- skipped?

A record will be created whenever the music player moves to the next song, whether that be from the current song finishing or the user skipping it.

The current state includes the following information that we care about:

- listen/skip ratios for songs, artists and albums (observable)
- the user's mood (hidden)

(Paul mentioned that I should look into hidden Markov models. From what I can tell, these would be the observable and hidden components of that model).

Using the timestamp data, we can find out when the listen/skip behaviour of the user changes. This means the user must be in a different mood.

Given a library of songs, we need to come up with a probability for each song that it is the one the user would most enjoy listening to next (so the sum of the probabilities will be 1). To simplify things, let's assume the mood is constant and the current skip ratios correspond to the current mood. We can calculate the probability that if we play a certain song, the user will listen to it (instead of skipping).

The ratio of number of times listened to number of times played (i.e. number of times listened + number of times skipped) for the individual song is this probability. If we have played song A for the user 1,000 times and they skipped it 500 times, then the probability that the user will listen to the song if we play it again is clearly 0.5. (Remember that we're holding the mood constant, and we also do not take into account that the user will get tired of the song if we play it over and over again). In this case, we don't care about the artist and album skip ratios. The song skip ratio is categorically more important than the other skip ratios.

However, things are different if we don't have as much data on the individual song. If we played song A for the user only two times and they skipped it once, then the ratio is the same, but we probably don't want to rely solely on it. If we have a lot of data on the album or artist skip ratios, those would probably improve our calculations. This suggests a confidence level for each ratio. Let R_s, R_al, R_ar be the ratio of number of times listened to number of times played for the song, album and artist, respectively. Let C_s, C_al, C_ar be the confidence levels of those ratios, where each confidence level is between 0 and 1. Then we could calculate P(the user will listen to the song | we play the song) like so:

   P = R_s*C_s + (1-C_s)(R_al*C_al + (1-C_al)(R_ar*C_ar))

Given this probability for each song, we could normalize them to get the probability of each song being the best song to play next. Then we would have the data we need to choose the next song to play.

We do need to calculate the confidence level though. If we've played the song 0 times, the confidence should be zero. As the number of times we've played the song increases, the confidence should approach 1. So some sort of simple, mathy function should do fine there.

   confidence(n) = n/(n + a)

where n is the number of times played and a is some constant. The smaller a is, the faster the confidence will increase. Perhaps machine learning could be used to find the optimal value of a?

But I think machine learning will primarily be needed only when we consider the mood. As a first step, I could implement all this stuff and make a music player that assumes you're always in the same mood. Then the next step would be figuring out how to model mood changes (how hard could it be?).

Here's something we could do: use the timestamps to cluster the data records into listening sessions. We could assume that the mood stays the same during each listening session (I don't think that assumption will be true all the time, so it'd be best to do something fancier down the road so we don't need that assumption). Then we could calculate the skip ratios for each listening session. Then we'd need a way to figure out if the skip ratios for one session are similar or different from another session. If they're similar enough, we can assume the user was in the same mood and combine the ratios. After doing that with all the listening sessions, we'll have a number of “moods”, each with a corresponding set of skip ratio data.

When the user starts listening and we don't know what mood they're in, we can sample songs from each mood set that are likely to be played when the user is in that mood. We can calculate the current skip ratio data and continue sampling until the skip ratio data becomes similar to one of the existing moods. If the data doesn't become similar, the user must be in a different mood. Then we assign them to a “new mood”, and make suggestions solely off of data from the current listening session.

short term goals: Figure out if this whole thing is the correct approach to take. If it is, start working on the moodless music player. Also figure out more concretely how to implement the moodful music player. Figure out a model and stuff like that.

## 02 Feb 2017

I finished step one, adding a dummy method to control the next song when skipping. In the code, there's not a huge distinction between when the next song is played because the user skipped or because the song finished. So when the song skips, I get the current position in the song which can be used to see if the song played to completion. I modified the music player so that if you skip a song and it's been playing for less than five seconds, the next song will be the theme song from “Bill Nye the Science Guy.” Once the machine learning stuff is in there, I just have to change the code to play whatever song the algorithm suggests.

Short-term plans: Finish the machine learning tutorial and start messing around with TensorFlow. Design the data model and start implementing it.

## 31 Jan 2017

Up until today, most of the work I've done has been researching and deciding on what platform/tools to use for the music player app. I looked into a few projects that allow you to write a mobile app and then compile it automatically for Android, iOS and other platforms. The two main ones I looked into were NativeScript and Kivy. With NativeScript, you write the app in HTML + CSS + JavaScript, and the project translates that somehow into actual native UI elements on the respective platforms. It looked nice, but Paul brought up the question of if machine learning libraries were available. I started looking at Kivy because it lets you write the code in Python. Kivy doesn't use native UI elements, so the app would look funny, but that doesn't really matter for our purposes. I made an Android hello world app with Kivy. It was pleasant, but after some more thought, I decided to scrap the whole cross-platform idea and just write a standard Android app in Java. I wanted to try one of the non-java solutions because

1. I've done a fair amount of normal Android development already and I don't particularly like Java.
2. It would make it easier to target iOS instead of just Android.

I ended up sticking with just Java + Android because

1. I can use an existing open-source music player instead of writing one from scratch (specifically, Vanilla Music Player*). I don't think it would have taken too long to write a basic music player from scratch (we only need the most basic features), but the time needed would still have been nontrivial. Starting with the open-source player will let me jump into machine learning faster.
2. Targeting iOS would be nice, but I think it's more important to get a prototype working as fast as possible. Even if the project stays on just Android, that should be sufficient for the near future. I can always switch to a cross-platform solution or create a separate iOS app later if it really would add enough value.

(*Last year, as part of a different research project, I tested out a bunch of open-source music players for Android. Vanilla Music Player is (relatively) simple and I can actually clone and build the project without getting a million errors (that's a lot more than I can say for the other players. Encore Music Player in particular was a nightmare, even though the precompiled app on the Play store looks really nice).)

So I've been delving into the code for Vanilla, making minor changes to help me understand how it's working. The disadvantage of using an existing player is that it's a relatively large project and I have to get used to the code base. I've also been reading through a machine learning tutorial on the TensorFlow website.

Short-term plans: Identify where the code for shuffling songs is. Figure out how to replace it with my own song-choosing code. Finish the machine learning tutorial and start messing around with TensorFlow.