Metalearning
From NNML
Contents 
Metalearning
Objective
The objective of this project is to do largescale metalearning using various techniques and get several papers accepted in top tier venues so we can graduate with good jobs.
Action Items
Overall
 1stcreate an extensible data set available to the community similar to the UCI data repository. metadata set

Look at
machinelearned ranking (MLR) algorithms
. This will compare our ranking with other ranking algorithms
 Maybe look for some implementations of MLR algorithms
 Also look at the evaluation metrics that they use for ranking
 Look at getting more familiar with Recommender System stuff that is already out there.
Our Algorithms

Unsupervised backpropagation (maybe other collaborative filtering algorithms as well)
 Just the accuracies
 Accuracies and the data set metafeatures
 Also try using the rankings instead of the accuracies
 Trained neural network trained with the data set metafeatures
 Trained neural network with latent variables from unsupervised backprop
Things to think about
 Think about how to incorporate the time that an algorithm takes to run
 Think about to incorporate parameter settings into the algorithm such that we can rank the algorithm and the parameter settings.
Competitors
 Brazdil 5NN with his metafeatures

Maybe some ranking algorithms??
 RT Rank is an available implementation that we should try
Evaluation Metrics
 ARR (adjusted ratio of ratios from Brazdil)
 Look at the machinelearned ranking (MLR) algorithms
Data Points
 Continue to gather more accuracies from random parameter settings
 Look at this data set: HERE
@inproceedings{Reif2012, author = {Matthias Reif}, title = {A Comprehensive Dataset for Evaluating Approaches of Various Metalearning Tasks}, title = {ICPRAM 2012  Proceedings of the 1st International Conference on Pattern Recognition Applications and Methods, Volume 1, Vilamoura, Algarve, Portugal, 68 February, 2012}, year = {2012}, pages = {273276}, }
Other Thoughts/Ideas

We should see how we do as a ranking algorithm as well and/or we can improve on collaborative filtering
 There is some data from Microsoft research
 LETOR , also from Microsoft research has a number of data sets, and other information that could prove useful if we choose to also pursue this course.
 How to add a single instance after training up a collaborative filtering model.

How to add results from a novel learning algorithm/parameter settings

Make it incremental

Here are some references for incremental recommender systems (collaborative filtering):
 link Incremental Collaborative Filtering for HighlyScalable Recommendation Algorithms
 ItemBased and UserBased Incremental Collaborative Filtering for Web Recommendations
 Incremental Collaborative Filtering Considering Temporal Effects
 Incremental Collaborative Filtering recommender based on Regularized Matrix Factorization
 Incremental SVDBased Algorithms for Highly Scalable Recommender Systems

Here are some references for incremental recommender systems (collaborative filtering):

Make it incremental
Experiments

Collaborative filtering (Just accuracies and accuracies with metafeatures)
 For the initial experiments, run with just the accuracies missing but with the metafeatures in the training data

Models to try:
 Matrix Factorization
 Nonlinear PCA

Unsupervised Backprop
 Try without hidden layer to be able to compute intrinsic variables for novel instances
 Fuzzy KMeans

Classificationbased approaches
 Make sure to use a neural network with Backprop
 Combination of classification with collaborative filtering
 Ranking algorithms
Logan
522013

Get more data points (i.e. data sets). More data sets can be found at:
 UCI data repository
 Promise data repository
 http://mldata.org
 Search for more additional if needed?

Run learning algorithms over the new data sets
 Set upper limit to 100 hours? If a few don't finish, that's OK
 Parameter optimization (10 random searches?)

Get metafeatures on the data sets
 Hardness heuristics (See Mike)
 Brazdil
 Ho and Basu (Download source code for DCoL )

Run waffles over the results

Just collaborative filtering

Just accuracies
 using the Spearman Corralation Coefficient: .64822 removing 30% of data

Adding data set metafeatures
 using the Spearman Corralation Coefficient: .67633 removing 30% of data

Just accuracies
 Just a neural network
 Both
 Previous work (Brazdil)

Just collaborative filtering
 compare results of other methods

Ranking Algorithms
 Use and implement other ranking algorithms
 Compare how often recommend selects as the best: the best. the second best... the worst.
 Compare how often recommend selects as 2nd best: the best. the second best... the worst.
 etc.

Future Ideas

parameter optimization
 train model to predict accuracies of a specific model/dataset given metafeatures of the dataset and parameters of the model.

parameter optimization
Mike
522013
 Get Logan code for metafeatures
 Get Logan code for random hyperparameter selection
 Help Logan see overall picture to help with design for the application
Rob
522013
 Get data sets for Logan
 Help Logan see overall picture to help with design for the application
Related Works
Random HyperParameter Optimization
@article{Bergstra2012, author = {Bergstra, James and Bengio, Yoshua}, title = {Random Search for HyperParameter Optimization}, journal = {Journal of Machine Learning Research}, volume = {13}, month = March, year = {2012}, issn = {15324435}, pages = {281305}, numpages = {25}, url = {http://dl.acm.org/citation.cfm?id=2188385.2188395}, acmid = {2188395}, publisher = {JMLR.org}, }
Brazdil: Ranking Learning Algroithms
@Article{ Brazdil2003, author = "Pavel B. Brazdil and Carlos Soares and Joaquim Pinto Da Costa", title = "Ranking Learning Algorithms: Using IBL and MetaLearning on Accuracy and Time Results", journal = "Machine Learning", volume = "50", number = "3", year = "2003", pages = "251277", publisher = "Kluwer Academic Publishers", address = "Hingham, MA, USA", doi = "http://dx.doi.org/10.1023/A:1021713901879", annote = "This work presents a method to rank learning algorithms according to their utility given a data set. The similarity of a data set is compared to a set of previously processed data sets using \textit{k}NN on a set of metafeatures. The learning algorithm ranking method uses aggregate information concerning classification accuracy as well as the runtime of the learning algorithm. This work ranks learning algorithms as will be done in a part of this thesis, however, this thesis rank learning algorithms according to classification accuracy according to instance hardness." }
Ho and Basu: Complexity Measures
@Article{ Ho2002, author = "Tin Kam Ho and Mitra Basu", title = "Complexity Measures of Supervised Classification Problems", journal = "IEEE Trans. Pattern Anal. Mach. Intell.", volume = "24", issue = "3", month = "March", year = "2002", pages = "289300", numpages = "12", acmid = "507476", publisher = "IEEE Computer Society", address = "Washington, DC, USA" }