Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
mind:smart-shuffle [2017/07/24 22:57]
obryantj
mind:smart-shuffle [2017/08/22 14:21] (current)
obryantj
Line 2: Line 2:
  
 = activity notes = = activity notes =
 +
 +== 22 Aug 2017 ==
 +Since finishing writing the paper, I've been adding a few various improvements to the app. Mainly, I've been reworking how the app generates the model so that it's more efficient. Originally the app would read through all the listening data every time the app starts. This takes a long time when there'​s a lot of listening history data. I refactored the code so that the model could be built incrementally as more listening data is generated. Now the model gets serialized to a file and updated whenever the app starts. It still takes about 20 seconds to deserialize the model, but that's a pretty big improvement,​ and the time complexity is constant with the size of the listening history.
 +
 +I have plans to further improve the performance,​ though. Instead of serializing and deserializing the entire model, I'm going to store the model in an SQL database and only load the parts that we need during runtime. The size of the model is O(n^2) where n is the number of songs in the user's library. We use the model whenever the user finishes listening to a song. We take the parts of the model that correspond to that song and update the scores for all the possible songs that we could recommend next. So we only need at most n entries from the model at any one time. After we migrate the model to the SQL database, we'll save a lot of time because we won't have to load the entire thing into memory all at once. (we'll also save a lot of space since we don't have to keep the entire model in memory all the time).
 +
 +(I've started coding this up already, by the way)
 +
 +
 +== 5 Aug 2017 ==
 +As of today I've finished the Spotify integration. The app now streams new recommendations from Spotify, and it audio feature data from Spotify to calculate song similarity to help out with exploration.
 +
 +== 1 Aug 2017 ==
 +Last week I wrote up a description of the algorithm. It's written for a general
 +audience, but it gives a decent overview of what I've done so far. I've included
 +it below. I'm almost done integrating Spotify stuff so that the app can
 +recommend new songs if the user has a Spotify premium account. Spotify includes
 +a few API endpoints that provide a list of recommendations for the user based on
 +their previous spotify listening history. I made the app so it takes those
 +recommendations and adds them to the user's library of songs. So as far as the
 +smart shuffle algorithm is concerned, there'​s no difference between local songs
 +and songs from spotify. After the algorithm picks a song, the app checks if it's
 +a spotify song or a normal song and then takes measures accordingly to play it.
 +
 +Here's the algorithm description:​
 +
 +SMART SHUFFLE ALGORITHM
 +
 +Every time you listen to a song, the algorithm records the current time and
 +whether you finished the song or skipped it. Then your listening history is
 +grouped into sessions. Suppose in the morning you listened to songs A, B and C
 +but skipped song D, and then in the afternoon you listened to songs D, E and F
 +but skipped songs A and B. There would be two sessions:
 +
 +    Session #   ​Listened ​   Skipped
 +    1           A, B, C     D
 +    2           D, E, F     A, B
 +
 +The algorithm doesn'​t have a notion about if you "​like"​ a song or not. Instead,
 +it thinks about what songs "go well together."​ In this example, A, B and C go
 +well together, but D does not go well with either A, B or C. So if you started a
 +new session and you listened to song D, the algorithm would be less likely to
 +play A, B or C, but it would be more likely to play E or F. This allows us to
 +automatically play the correct music for whatever mood you're currently in. It's
 +based on the assumption that your mood doesn'​t change during a single session.
 +
 +This is called session-based collaborative filtering, and it's the main part of
 +the algorithm. However, there are two more parts: exploration and freshness.
 +
 +'''​Exploration'''​
 +
 +The problem with collaborative filtering is that it doesn'​t work well until you
 +already have a lot of data. When you first start using the app, songs are simply
 +chosen at random. But after the algorithm learns about some of your songs, it
 +must make a choice: should it play a song that it already knows you'll like, or
 +should it randomly pick another song that you haven'​t listened to very much yet
 +(or at all)?
 +
 +Currently, both choices are given equal time. Whenever the algorithm chooses
 +which song to play next, it partitions all the songs in your library into two
 +sets. The first set contains songs that you've listened to a lot; the second set
 +contains songs you haven'​t listened to very much. The alghorithm chooses the
 +best possible song from each set and then flips a coin to decide which one
 +should be played.
 +
 +How does it know which song from the second set is the "​best"​ if there isn't
 +much listening data? It chooses randomly for the most part, but it tries to make
 +an educated guess. If you've skipped songs from a certain artist, it will give
 +higher priority to songs from different artists. I have a lot of Beatles music
 +in my collection, but I rarely listen to it. When the algorithm plays a few
 +Beatles songs for me and I skip them, it'll instead explore the other artists I
 +have before it goes back to try a different Beatles song.
 +
 +'''​Freshness'''​
 +
 +The algorithm also tries to avoid playing the same songs over and over again--it
 +tries to play songs that are "​fresh."​ Before the algorithm plays a song, it'll
 +look at all the times it was played in the past and how long ago it was played
 +each time. The more times the song was played and the more recently it was
 +played, the less fresh it will be, so that song will have a smaller chance of
 +being chosen.
 +
 +But the algorithm also realizes that some songs are OK to play a lot while other
 +songs should be played only once in a while. If you skipped a song and the last
 +time you heard it was yesterday, the algorithm will try to wait a longer period
 +of time before playing it again. If you instead listen to the song, the
 +algorithm will play the song again sooner.
 +
 +
 +'''​FUTURE WORK'''​
 +
 +In it's current state, the algorithm performs well for my use. But there are
 +many ways it could be improved. I think exploration needs work the most. Instead
 +of choosing songs mostly at random, it would be better to find ways to improve
 +our guesses. Some services like last.fm offer song data that could be used to
 +figure out what songs in the user's library are similar to each other.
 +
 +In addition, the 50/50 split between exploration and exploitation could be
 +adjusted. It would be better if the algorithm decided how often would be best to
 +explore new songs instead of always doing it 50% of the time.
 +
 +
 +== 26 July 2017 ==
 +Yesterday I completed the freshness code. Instead of the following:
 +
 +    Freshness = 1 - sum_{i=1 to n} 1/e^(t_i/s)
 +
 +I did this:
 +
 +    Freshness = (1 - 1/​e^(t_1/​s)) * (1 - 1/​e^(t_2/​s)) * ... * (1 - 1/​e^(t_n/​s))
 +
 +That way the freshness score stays between 0 and 1. To figure out a value for
 +s (the memory strength for the individual song), the app minimizes an error
 +function. ​
 +
 +Suppose a song has been played n times in the past. The error function will take
 +n data points. The ith data point includes t_1 to t_(i-1), i.e. the amounts of
 +time that have passed since each previous time we played that song. The ith data
 +point also includes whether or not the song was skipped on the ith time we
 +played it. We use that skip value as the observed value for what the freshness
 +function "​should"​ have been. If the song was skipped, the observed value is 0;
 +otherwise the value is 1. In other words, we would expect a song with a high
 +freshness value (i.e. near 1) to be listened to, and we would expect a song with
 +a low freshness value (near 0) to be skipped. Given a value for s and the n data
 +points, the error function returns the sum of the squares of the errors.
 +
 +To minimize the error function, we simply call it with a bunch of different
 +(constant) strength values, e.g. {0.5,​1,​1.5,​...,​30}. It's kind of janky, but it
 +was easy to code and it seems to be working well enough. I tested the
 +strength-figuring-out code on my own listening history. As expected, it gave low
 +memory strength values to songs that I like and high memory strength values to
 +songs that I didn't like. In other words, the songs that I liked were given a
 +memory strength value that would allow them to be played with a high frequency
 +and vice versa.
 +
 +I thought that all was going to take me longer to do than it did, so I guess
 +I'll be on to something else now. I'll be using the app and trying to evaluate
 +it qualitatively. If it seems to work well, I'll probably focus on adding
 +support for cloud music services so we can hopefully get a lot more users and
 +get a more quantitative analysis. Otherwise, I'll keep working on the algorithm
 +and get by with the users we have.
 +
 == 24 July 2017 == == 24 July 2017 ==
 I have returned. The wiki logged me out as I was writing my last entry in June I have returned. The wiki logged me out as I was writing my last entry in June
mind/smart-shuffle.1500958628.txt.gz ยท Last modified: 2017/07/24 22:57 by obryantj
Back to top
CC Attribution-Share Alike 4.0 International
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0