MLRR

From NNML

Contents

Administrating the Website

Starting and Stopping the Site

As it is currently built, MLRR uses node.js as a backend. However, since Apache is already listening to port 80 on axon, Apache is being used as a proxy to route traffic intended for mlrr.axon.cs.byu.edu to the node.js server. The Apache virtual host that acts as a proxy is defined at /etc/apache2/sites-available/default . Doing this is rather inefficient and negates some of node.js's performance benefits. However, it would only be a serious concern if traffic becomes extremely heavy.

The node.js server is registered with upstart. The config file can be found at /etc/init/mlrr.conf . The following self-explanatory, upstart commands can be run to mange the server.

start mlrr
stop mlrr
status mlrr

Otherwise, node.js will automatically start on boot and the log file is located at /var/log/node.log .

Revision Control

The revision control for the website is managed by git. The central repository is at axon.cs.byu.edu:/srv/git/mlrr.git . So, as an example, to checkout your own working copy, type

git clone [Your username]@axon.cs.byu.edu:/srv/git/mlrr.git

Because the website is based on Node.js, as long as you have node installed on your machine, you can run a local copy of the website by navigating to the base directory of your local repository and entering

node ./server.js

or on Ubuntu

sudo nodejs ./server.js

Note that you need to have root priviledges on Ubuntu to write to the log file. I think that we should move the log file. At that point, you can simply navigate in your favorite web browser (as well as Internet Explorer) to localhost:5000 and view your copy of the website.

Updating the Website

The central repository exists at /srv/git/mlrr.git on Axon. Once you have pushed your changes to the central repo, the changes need to be pulled to the live website. Note: that means a commit and push from your local repo to the central repo will not affect the live site visible to the world. Basically the central repo acts as a staging area for the changes that will eventually (or maybe not, if you revert them) be applied to the site. The website itself is just a local copy of the central repo. As such, a simple BASH script was created to easily manage the updating of the website. Whenever you are ready to push the changes in the central repo to the live site, enter the following command as root.

updatemlrr

It will make sure that the permissions and such are correct for you. Once that script has been run, you are done and all changes from the central repo are live. If you want to know exactly what is happening, the script is located at /usr/local/bin/updatemlrr .

Other Resources

Development

Technologies Used

The server is built using node.js, but there are naturally a significant number of modules and other pieces of software needed. The following list enumerates most of those.

  • node.js
  • express.js
    • express-session
  • mongodb
    • mongoose -
  • mustache.js - A lightweight template engine for node.js
  • Passport - Authentication middleware for node.js

Future Directions

TODO

  • Pyramid Scheme (MIKE)
  • Upload
    • Associate with users
  • Summaries about users
  • Clarify descriptions (MIKE)
  • Hover over results menu (DANIEL)
  • Present results better and more clearly (Is it a good idea to list all of them?)
  • Hook with OpenML API
  • Usage agreement and **Privacy Policy**
  • Domains (Topics of results)
  • Other metrics
  • Continuous outputs
  • Notify users of new/better data/results
  • Add Capabilities for uploading data from Auto-WEKA
    • This would be in the trajectories file. It contains the optimization metric and the evaluation metric (which will probably be cross validation). From Auto-WEKA, the instance level predictions are not stored. So, we need to think about how to store the results.
  • Update how training data are stored
    • Assume if not used for testing, it was used for training. If filtered or weighted, add the value.
  • Data visualization

WEKA

  • Create a plug in for WEKA.
  • Interface direct to back end for WEKA
  • Certificate to automate login
  • Potentially include an anonymous user


Database Discussion

Issues
  • Integrate users
    • Associate uploads with users
    • Personal information about users
  • Integrate papers
  • Integrate code
  • Sparse representation of training data
  • How to keep smartly keep track of the instances

SQL

  • More familiar

MongoDB

  • More flexible

These are the pictures of our ideas for the database design:

Direction for paper by May 2015

  • How to control the quality of the data.
    • log in/users
      • Have user pages to display institutions/their research/why they are using the MLRR
      • Allow for saved queries
      • Show contributions to MLRR
    • Flag questionable results
    • Have a flag for data that needs verification
    • Validate the data by rerunning the experiments
      • If only ran once, default to needs verification
      • Show the number of times validated
  • Also want to address the reproducible issue
    • Have users upload a script/file showing how to run their experiments
    • Include a README file
    • Creates a cycle with the validation.
    • Validated results allow a user to:
      • use those results
      • verify that their implementation is working correctly.
  • Remove some results/data sets (only if the user uploaded it)
  • How to handle proprietary data sets
    • Downloadable script to extract meta-features - OR - run it on our server???
    • Make the script cross-platform
  • Allow for outputs from commonly used machine learning libraries
  • openML API integration

Possible paper on linking data for papers

One problem faced in doing good science in computer science is the availability of data and code that is used in a paper. This paper would focus on:

  • Linking data with papers
  • Allowing users to create profiles
    • They can link their papers from their profiles
    • They can show what they have contributed
  • The papers are then linked to:
    • Author profile pages if they exist
    • Data sets that were used in their paper
    • Results of their experiments
    • Implementation of their technique
  • We would need to integrate their paper results into the overall repository
  • Allow for anonymous postings for results and data for blind reviewing
  • Present the paper and talk with a conference about hosting the data or at least for a workshop
    • get feedback on the process


Other Ideas/Needs

  • How to visualize the data
  • use the data for other fun experiments
    • Meta-learning
    • Most commonly used data set
    • Trends for problems being solved
  • How to extend this idea for cross-discipline collaboration
    • Bioinformatics
Views
Personal tools
  • Log in
Navigation
Toolbox