Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
mind:smart-shuffle [2017/03/09 15:00]
obryantj
mind:smart-shuffle [2017/08/22 14:21] (current)
obryantj
Line 1: Line 1:
 This is Jacob O'​Bryant'​s project to create a music player that learns what songs the user wants to listen to based primarily on when they skip or listen to the currently playing song. This is Jacob O'​Bryant'​s project to create a music player that learns what songs the user wants to listen to based primarily on when they skip or listen to the currently playing song.
-= plans = 
-# Make the app upload the skip behavior data to a server 
-# make some minor usability improvements 
-# start recruiting users 
-# use the data from other people to verify that our basic, moodless algorithm is an improvement over pure shuffling 
-# start improving the algorithm, including making the app auto-guess the current mood 
  
 = activity notes = = activity notes =
 +
 +== 22 Aug 2017 ==
 +Since finishing writing the paper, I've been adding a few various improvements to the app. Mainly, I've been reworking how the app generates the model so that it's more efficient. Originally the app would read through all the listening data every time the app starts. This takes a long time when there'​s a lot of listening history data. I refactored the code so that the model could be built incrementally as more listening data is generated. Now the model gets serialized to a file and updated whenever the app starts. It still takes about 20 seconds to deserialize the model, but that's a pretty big improvement,​ and the time complexity is constant with the size of the listening history.
 +
 +I have plans to further improve the performance,​ though. Instead of serializing and deserializing the entire model, I'm going to store the model in an SQL database and only load the parts that we need during runtime. The size of the model is O(n^2) where n is the number of songs in the user's library. We use the model whenever the user finishes listening to a song. We take the parts of the model that correspond to that song and update the scores for all the possible songs that we could recommend next. So we only need at most n entries from the model at any one time. After we migrate the model to the SQL database, we'll save a lot of time because we won't have to load the entire thing into memory all at once. (we'll also save a lot of space since we don't have to keep the entire model in memory all the time).
 +
 +(I've started coding this up already, by the way)
 +
 +
 +== 5 Aug 2017 ==
 +As of today I've finished the Spotify integration. The app now streams new recommendations from Spotify, and it audio feature data from Spotify to calculate song similarity to help out with exploration.
 +
 +== 1 Aug 2017 ==
 +Last week I wrote up a description of the algorithm. It's written for a general
 +audience, but it gives a decent overview of what I've done so far. I've included
 +it below. I'm almost done integrating Spotify stuff so that the app can
 +recommend new songs if the user has a Spotify premium account. Spotify includes
 +a few API endpoints that provide a list of recommendations for the user based on
 +their previous spotify listening history. I made the app so it takes those
 +recommendations and adds them to the user's library of songs. So as far as the
 +smart shuffle algorithm is concerned, there'​s no difference between local songs
 +and songs from spotify. After the algorithm picks a song, the app checks if it's
 +a spotify song or a normal song and then takes measures accordingly to play it.
 +
 +Here's the algorithm description:​
 +
 +SMART SHUFFLE ALGORITHM
 +
 +Every time you listen to a song, the algorithm records the current time and
 +whether you finished the song or skipped it. Then your listening history is
 +grouped into sessions. Suppose in the morning you listened to songs A, B and C
 +but skipped song D, and then in the afternoon you listened to songs D, E and F
 +but skipped songs A and B. There would be two sessions:
 +
 +    Session #   ​Listened ​   Skipped
 +    1           A, B, C     D
 +    2           D, E, F     A, B
 +
 +The algorithm doesn'​t have a notion about if you "​like"​ a song or not. Instead,
 +it thinks about what songs "go well together."​ In this example, A, B and C go
 +well together, but D does not go well with either A, B or C. So if you started a
 +new session and you listened to song D, the algorithm would be less likely to
 +play A, B or C, but it would be more likely to play E or F. This allows us to
 +automatically play the correct music for whatever mood you're currently in. It's
 +based on the assumption that your mood doesn'​t change during a single session.
 +
 +This is called session-based collaborative filtering, and it's the main part of
 +the algorithm. However, there are two more parts: exploration and freshness.
 +
 +'''​Exploration'''​
 +
 +The problem with collaborative filtering is that it doesn'​t work well until you
 +already have a lot of data. When you first start using the app, songs are simply
 +chosen at random. But after the algorithm learns about some of your songs, it
 +must make a choice: should it play a song that it already knows you'll like, or
 +should it randomly pick another song that you haven'​t listened to very much yet
 +(or at all)?
 +
 +Currently, both choices are given equal time. Whenever the algorithm chooses
 +which song to play next, it partitions all the songs in your library into two
 +sets. The first set contains songs that you've listened to a lot; the second set
 +contains songs you haven'​t listened to very much. The alghorithm chooses the
 +best possible song from each set and then flips a coin to decide which one
 +should be played.
 +
 +How does it know which song from the second set is the "​best"​ if there isn't
 +much listening data? It chooses randomly for the most part, but it tries to make
 +an educated guess. If you've skipped songs from a certain artist, it will give
 +higher priority to songs from different artists. I have a lot of Beatles music
 +in my collection, but I rarely listen to it. When the algorithm plays a few
 +Beatles songs for me and I skip them, it'll instead explore the other artists I
 +have before it goes back to try a different Beatles song.
 +
 +'''​Freshness'''​
 +
 +The algorithm also tries to avoid playing the same songs over and over again--it
 +tries to play songs that are "​fresh."​ Before the algorithm plays a song, it'll
 +look at all the times it was played in the past and how long ago it was played
 +each time. The more times the song was played and the more recently it was
 +played, the less fresh it will be, so that song will have a smaller chance of
 +being chosen.
 +
 +But the algorithm also realizes that some songs are OK to play a lot while other
 +songs should be played only once in a while. If you skipped a song and the last
 +time you heard it was yesterday, the algorithm will try to wait a longer period
 +of time before playing it again. If you instead listen to the song, the
 +algorithm will play the song again sooner.
 +
 +
 +'''​FUTURE WORK'''​
 +
 +In it's current state, the algorithm performs well for my use. But there are
 +many ways it could be improved. I think exploration needs work the most. Instead
 +of choosing songs mostly at random, it would be better to find ways to improve
 +our guesses. Some services like last.fm offer song data that could be used to
 +figure out what songs in the user's library are similar to each other.
 +
 +In addition, the 50/50 split between exploration and exploitation could be
 +adjusted. It would be better if the algorithm decided how often would be best to
 +explore new songs instead of always doing it 50% of the time.
 +
 +
 +== 26 July 2017 ==
 +Yesterday I completed the freshness code. Instead of the following:
 +
 +    Freshness = 1 - sum_{i=1 to n} 1/e^(t_i/s)
 +
 +I did this:
 +
 +    Freshness = (1 - 1/​e^(t_1/​s)) * (1 - 1/​e^(t_2/​s)) * ... * (1 - 1/​e^(t_n/​s))
 +
 +That way the freshness score stays between 0 and 1. To figure out a value for
 +s (the memory strength for the individual song), the app minimizes an error
 +function. ​
 +
 +Suppose a song has been played n times in the past. The error function will take
 +n data points. The ith data point includes t_1 to t_(i-1), i.e. the amounts of
 +time that have passed since each previous time we played that song. The ith data
 +point also includes whether or not the song was skipped on the ith time we
 +played it. We use that skip value as the observed value for what the freshness
 +function "​should"​ have been. If the song was skipped, the observed value is 0;
 +otherwise the value is 1. In other words, we would expect a song with a high
 +freshness value (i.e. near 1) to be listened to, and we would expect a song with
 +a low freshness value (near 0) to be skipped. Given a value for s and the n data
 +points, the error function returns the sum of the squares of the errors.
 +
 +To minimize the error function, we simply call it with a bunch of different
 +(constant) strength values, e.g. {0.5,​1,​1.5,​...,​30}. It's kind of janky, but it
 +was easy to code and it seems to be working well enough. I tested the
 +strength-figuring-out code on my own listening history. As expected, it gave low
 +memory strength values to songs that I like and high memory strength values to
 +songs that I didn't like. In other words, the songs that I liked were given a
 +memory strength value that would allow them to be played with a high frequency
 +and vice versa.
 +
 +I thought that all was going to take me longer to do than it did, so I guess
 +I'll be on to something else now. I'll be using the app and trying to evaluate
 +it qualitatively. If it seems to work well, I'll probably focus on adding
 +support for cloud music services so we can hopefully get a lot more users and
 +get a more quantitative analysis. Otherwise, I'll keep working on the algorithm
 +and get by with the users we have.
 +
 +== 24 July 2017 ==
 +I have returned. The wiki logged me out as I was writing my last entry in June
 +and I lost part of it, and I never got around to finishing it. So I'll just
 +start fresh here with the state of the project.
 +
 +We've got about nine users and a total of about 2,200 listening events. The data
 +we have shows that the skip ratio for the algorithm is about 10 percentage
 +points lower than the skip ratio for pure random shuffling.
 +
 +Things that could be done next include the following:
 +
 +GETTING MORE USERS:
 + - create a version of the app for iOS
 + - add support for cloud music services
 +
 +IMPROVING THE ALGORITHM:
 + - add content-based recommendation
 + - improve freshness score handling
 +
 +For the next step I'd like to do "​improve freshness score handling."​ If I can do
 +it quickly, I'd also like to do "add support for cloud music services."​
 +
 +Here's the plan for freshness stuff. Right now, the freshness score (a value
 +from 0 to 1) is
 +calculated using this formula:
 +
 +    Freshness = 1 - 1/e^(t/s)
 +
 +where t is the amount of time elapsed since the amount of time elapsed since the
 +song was last played and s is a parameter representing the strength of the
 +user's memory (the higher s is, the longer it will take for the song's freshness
 +to recover). I'd like to change it to something like this:
 +
 +    Freshness = 1 - sum_{i=1 to n} 1/e^(t_i/s)
 +
 +where n is the number of times the song has been played and t_i is the amount of
 +time elapsed since the ith time the song was played. This would make the
 +freshness score take into account all the times the user listened to the song,
 +not just the most recent time.
 + 
 +The second thing I want to do with freshness is figure out the appropriate value
 +for s. It should be different for each song. The app should dynamically learn
 +the value of s for each song. Given a value of s for a particular song, we could
 +calculate an error for that song like so. For each time that we played the song
 +in the past, calculate the freshness f of that song at that time. If the user
 +listened to the song, the error for that instance is 1 - f. If the user skipped
 +the song, the error is f. Then we sum the errors for each instance to get a
 +total error for that value of s. Then we just find a value for s such that the
 +total error is minimized.
 +
 +So I want to implement a basic version of the freshness stuff as fast as I can
 +and then hopefully get the cloud music stuff working. To really get some useful
 +data I'll need a lot more users, and to get more users I really need to get the
 +app to work with cloud music services. So if I could get both of those knocked
 +out before summer ends, I'd be tickled pink.
 +
 +
 +== 23 June 2017 ==
 +I've done some recruiting and we've got a handful of users now. I'll still make some more online posts before I start EFY in two days. One issue with recruiting is that there seems to be so few people who 1) have their own music collections and 2) store it locally on their phone. Even if they are in the group of people who still buy music, they often use something like google play music to store and sync their music for them. So the app doesn'​t work for them. I did find a really cool open-source music player called Tomahawk. It can play songs from your local collection but it also seamlessly can play songs from cloud collections like google play music or amazon music. It can also play from other sources like Spotify and Soundcloud. I'd like to transition my project over to using that app instead of the current one (Vanilla music player). I toyed with the idea of doing it before summer ends so that we could recruit users that store their music in the cloud. However I'm thinking it would be better to wait until summer term is over to do that. I'll just focus on recruiting people that can use the app as it is right now so that I can use the rest of the semester to keep working on the algorithm.
 +
 +Speaking of the algorithm, I've been thinking about something that will hopefully be a major improvement. For freshness, there'​s a "​memory strength"​ parameter that controls how long it takes a song to become fresh again after you play it. I currently have it set so that most of the freshness is recovered after two days. Here's the brilliant idea: keep individual memory strength parameters for each song. whenever the user skips a song, increase the parameter for that song (so it'll take longer for the song to regain it's freshness). This is brilliant because we can think of the memory strength parameter as how much the user likes the song. e.g. if you hate a song, you might say that the appropriate amount of time to wait before playing it a second time is 1,000 years. The more you like a song, the more frequently you 
 +
 +== 10 June 2017 ==
 +I've finally got a system that works really well for myself personally. The main recent change is I've altered how the app combines the main score with the "​content-based"​ (artist similarity) score. Before, we computed a confidence level for the main score and then used a weighted average between the two scores. However, all the songs we hadn't played yet had the same content score, so they all got lumped together. That made the app either 1) always play songs we hadn't heard before, or 2) never play songs we hadn't heard before. The exploration code only helped for songs we've listened to a little but not very much.
 +
 +Here's how it's done now: We divide all the possible next songs into two lists. If the main score confidence level for a song is high enough, we put it in the first list; otherwise, we put it in the second list. 80% of the time, we choose the best song from the first list to play. The rest of the time we choose the best song from the second list. It's a rough solution, but it's working decently.
 +
 +The next step is to recruit users. But for improving the algorithm, here's things I'm thinking about:
 +
 +1. Figure out a way to dynamically adjust the ratio for picking a content-based song instead of always using 80%/20%. This could take into account the size of the two lists and the scores of the songs in both lists in addition to other things.
 +
 +2. Use a better freshness function. The formula now only uses the most recent time, so there would be no difference between a song that had been played a thousand times and a song that had been played only once if they both had been played recently. So it'd be nice to incorporate more of the play history. Also, currently the freshness of a song almost fully recovers after about two days. It might be better to do a logarithmic thing that keeps improving over time without an asymptote. And the speed at which the freshness recovers should be adjusted depending on the user and the individual song.
 +
 +3. Get a real content-based score for the songs we don't know anything about. We could use last.fm tags to try and get something, but I'm not sure if that'​ll really be so helpful. ​
 +
 +== 07 June 2017 ==
 +I've mainly done a lot of coding over the last month. I heavily refactored the way the app does recommendations so the time complexity is a lot better. Startup time still takes a while, but recommendations can be done quickly. I also implemented a simple artist-similarity content-model to help make suggestions when there'​s not a lot of usage data about a particular song. In effect, this means that when we're guessing which songs to play (because we can't make an informed recommendation),​ we'll be less likely to suggest a song from a certain artist if the user has already skipped a different song from the same artist.
 +
 +== 12 May 2017 ==
 +I've implemented model-based CF, freshness and exploration using a really
 +simple (and pretty arbitrary) margin-of-error function. Here are some things
 +I'd like to do:
 +
 +* add some simple content-based stuff so we can make guesses for songs the user hasn't listened to yet.
 +* figure out how to quantitatively measure the algorithm'​s performance
 +* use the quantitative feedback to iterate on the recommendation algorithm and make it really good
 +* recruit a bunch of users
 +* make the code more efficient. It takes ~10 seconds to start, and then it takes ~5 seconds between songs to make the next recommendation.
 +
 +
 +
 +==29 Apr 2017==
 +My three main future work points from last semester are 1) exploration vs. exploitation,​ 2) freshness, 3) model-based collaborative filtering. I wanted to start with exploration,​ so this week I've read this paper in depth:
 +[[http://​www.terasoft.com.tw/​conf/​ismir2014/​proceedings/​T081_140_Paper.pdf|ENHANCING COLLABORATIVE FILTERING MUSIC
 +RECOMMENDATION BY BALANCING EXPLORATION AND
 +EXPLOITATION]]. It was one of the papers I cited in my literature review. It turns out that the methods used in this paper involve both freshness and model-based CF. So basically I just need to replicate what that paper did. It's a little different because they used explicit 5-star rating data while we're using implicit skipping data. The stuff they did on exploration vs. exploitation is very hard for me to understand. I think I've got a decent grip on model-based CF and freshness though. And their implementation of exploration depends on model-based CF, so I'll probably do that first. Freshness is super easy; it'd probably take less than half an hour to get the whole thing written and tested. The exploration part is gonna be a beast though. It involves bayesian statistics and machine learning, neither of which I've used before. So if I could implement those three things and then get a lot of people to use the app so I can have a real evaluation, I would consider this a very successful summer.
 +
 +That paper though... I understand it for the most part up until section 3.2.2, at which point it's all greek to me (literally, ha ha). And I have no idea what in the heck is going on with section 3.3. But by the end of the summer, I'd like to understand everything.
 +
 +==13 Apr 2017==
 +I've implemented a working session-based collaborative filtering recommendation thing. I've replaced the original moodless algorithm with it. So there'​s no longer an option in the app to
 +specify your mood. I've used it a little bit so far and it seems to be working well. I cleared the database on my phone, so for the next week I'll be using just the new algorithm from a clean slate.
 +The next step is making the algorithm better at exploring instead of just exploiting what it already knows. You can run a cool demo of the system with the code [[https://​github.com/​tooke7/​reco|here]].
 +
 +
 +==06 Apr 2017==
 +I've finally finished the literature review of music recommendation that I had to do for ENGL 316. The paper is available [[http://​jacobobryant.com/​about/​mrs.pdf|here]]. Writing it has
 +been extremely helpful for seeing how this project can fit into the existing research on music recommendation. For instance, it helped me to figure out how to implement the next step in the moodful algorithm which I'll describe now.
 +
 +At first, the system will use a simple content-based approach to pick songs. Something based solely on artist similarity might work well enough. It will have to balance keeping the user happy with trying to explore new music. Apparently this is a well studied problem in reinforcement learning (exploration vs. exploitation). I found a couple papers that cover this problem specifically in music recommendation. According to the author (the same people wrote both papers), very few recommendation systems incorporate this idea. Instead they use the greedy approach (100% exploitation). The two papers study exploration vs. exploitation in content-based and collaborative filtering, respectively. As I implement active learning in this app, I'll dig deeper into those papers to see how they handle the problem.
 +
 +As the app collects more data, it'll gradually switch over to a session-based collaborative filtering model. At first it'll use a simple memory-based collaborative approach, so no machine learning will be necessary. Once we get everything working, I could implement a model-based approach in order to improve the recommendations. We won't need the model-based approach to handle scaling since the algorithm will just be working on data from the current user (so it shouldn'​t be a ton of data). But once we get tons of users and start recommending new songs, scaling could become an issue.
 +
 +The papers I've mentioned are all in the references list at the end of my lit review paper. Writing that paper was seriously such a beneficial experience. It's given me a lot of direction for this project. I think it'll basically set the stage for what I'll be doing over the entire next semester of working on this project.
 +
 +But yeah, so I have a really clear direction now. So it's time to start coding again.
 +
 +(also--I'​ve advertised the app on the Play store a little bit, but only like one person has started using it. I could try to recruit harder for more users, but I'm really tempted to finish the session-based collaborative filtering algorithm first because that should make the app way better)
 +
 +oh and one more thing. I figured out how to create a java class in clojure (a dialect of lisp that runs on the JVM) and add it to an android project, so I'll be writing all the fancy new algorithm stuff in clojure. I was feeling kinda bummed about doing it in java, so this will be really nice. The normal java part of the app will handle interfacing with the android framework, but all the recommendation logic will be encapsulated within the clojure code.
 +
 +
 +==23 Mar 2017==
 +I've done a lot of stuff over the past week. The app now uploads the skipping data to a web app I have running on digital ocean. It just uploads the raw sqlite3 db file. The app generates a random user id the first time it uploads, so the server uses that to know if a subsequent upload is an update for an existing user or a new db for a new user. I've also cleaned up various UI things and packaged everything up for Google Play. It's on the store now: [[https://​play.google.com/​store/​apps/​details?​id=com.jacobobryant.moody.vanilla&​hl=en|Moody Music Player]]. I also reworked how the skipping data is saved. At first, the database had fields for "​mood"​ (integer) and "​skipped"​ (boolean). To separate songs that were suggested randomly instead of with the algorithm, I would set mood to -1. I realized this was bad because you need to have a separate control group for each mood. Also, this prevented feedback from the random shuffling to be used by the algorithm. So I added an "​algorithm"​ field (integer). 0 means control, otherwise it's the version of the algorithm. Right now the version is just 1, but that number can be incremented as I modify the algorithm.
 +
 +I need to get people to start downloading the app so I can make sure it works for other people to, but other than that, I believe it's time to start working on the algorithm again. I need to figure out how 1) how to implement the mood-detecting part and 2) how to improve moodless part so it can learn about your music preferences faster. And I need a good way to test the changes I make.
 +
 +==14 Mar 2017==
 +I found and fixed a critical bug. When the app read in old skip event data, each event was counted as "​listened",​ whether or not
 +the song was actually listened or skipped. So that explains why the algorithm wasn't outperforming pure shuffling. I fixed the bug last saturday, and I've generated about 100 skip/listen events since then. Here's the data now:
 +
 +    control (n=50, confidence=0.950)
 +    skip ratio: 0.680
 +    margin of error: 0.109
 +
 +    mood 0 (n=50, confidence=0.950)
 +    skip ratio: 0.280
 +    margin of error: 0.104
 +
 +[{{ mind:​obryant:​ratios_2.png }}]
 +
 +So thank goodness the algorithm is actually doing something good. I didn't want to get other users if the app sucked, but now I'm feeling a lot better about getting other people to start using it.
 +
 +
 ==9 Mar 2017== ==9 Mar 2017==
 Latest data: Latest data:
mind/smart-shuffle.1489096833.txt.gz ยท Last modified: 2017/03/09 15:00 by obryantj
Back to top
CC Attribution-Share Alike 4.0 International
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0