Administrating the Website
Starting and Stopping the Site
As it is currently built, MLRR uses node.js as a backend. However, since Apache is already listening to port
on axon, Apache is being used as a proxy to route traffic intended for mlrr.axon.cs.byu.edu to the node.js server. The Apache virtual host that acts as a proxy is defined at
. Doing this is rather inefficient and negates some of node.js's performance benefits. However, it would only be a serious concern if traffic becomes extremely heavy.
The node.js server is registered with upstart. The config file can be found at
. The following self-explanatory, upstart commands can be run to mange the server.
start mlrr stop mlrr status mlrr
Otherwise, node.js will automatically start on boot and the log file is located at
The revision control for the website is managed by git. The central repository is at
. So, as an example, to checkout your own working copy, type
git clone [Your username]@axon.cs.byu.edu:/srv/git/mlrr.git
Because the website is based on Node.js, as long as you have node installed on your machine, you can run a local copy of the website by navigating to the base directory of your local repository and entering
or on Ubuntu
sudo nodejs ./server.js
Note that you need to have root priviledges on Ubuntu to write to the log file. I think that we should move the log file.
At that point, you can simply navigate in your favorite web browser (as well as Internet Explorer) to
and view your copy of the website.
Updating the Website
The central repository exists at
on Axon. Once you have pushed your changes to the central repo, the changes need to be pulled to the live website. Note: that means a commit and push from your local repo to the central repo will
affect the live site visible to the world. Basically the central repo acts as a staging area for the changes that will eventually (or maybe not, if you revert them) be applied to the site. The website itself is just a local copy of the central repo. As such, a simple BASH script was created to easily manage the updating of the website. Whenever you are ready to push the changes in the
repo to the live site, enter the following command as root.
It will make sure that the permissions and such are correct for you. Once that script has been run, you are done and all changes from the central repo are live. If you want to know exactly what is happening, the script is located at
- A good series of tutorials that cover the basics can be found at https://www.atlassian.com/git/ .
- The authoritative reference for git is http://git-scm.com/docs
The server is built using node.js, but there are naturally a significant number of modules and other pieces of software needed. The following list enumerates most of those.
- mongoose -
- mustache.js - A lightweight template engine for node.js
- Passport - Authentication middleware for node.js
- Pyramid Scheme (MIKE)
- Associate with users
- Summaries about users
- Clarify descriptions (MIKE)
- Hover over results menu (DANIEL)
- Present results better and more clearly (Is it a good idea to list all of them?)
- Hook with OpenML API
- Domains (Topics of results)
- Other metrics
- Continuous outputs
- Notify users of new/better data/results
Add Capabilities for uploading data from Auto-WEKA
- This would be in the trajectories file. It contains the optimization metric and the evaluation metric (which will probably be cross validation). From Auto-WEKA, the instance level predictions are not stored. So, we need to think about how to store the results.
Update how training data are stored
- Assume if not used for testing, it was used for training. If filtered or weighted, add the value.
- Data visualization
- Create a plug in for WEKA.
- Interface direct to back end for WEKA
- Certificate to automate login
- Potentially include an anonymous user
- Associate uploads with users
- Personal information about users
- Integrate papers
- Integrate code
- Sparse representation of training data
- How to keep smartly keep track of the instances
- More familiar
- More flexible
These are the pictures of our ideas for the database design:
Direction for paper by May 2015
How to control the quality of the data.
- Have user pages to display institutions/their research/why they are using the MLRR
- Allow for saved queries
- Show contributions to MLRR
- Flag questionable results
- Have a flag for data that needs verification
Validate the data by rerunning the experiments
- If only ran once, default to needs verification
- Show the number of times validated
- log in/users
Also want to address the reproducible issue
- Have users upload a script/file showing how to run their experiments
- Include a README file
- Creates a cycle with the validation.
Validated results allow a user to:
- use those results
- verify that their implementation is working correctly.
- Remove some results/data sets (only if the user uploaded it)
How to handle proprietary data sets
- Downloadable script to extract meta-features - OR - run it on our server???
- Make the script cross-platform
- Allow for outputs from commonly used machine learning libraries
- openML API integration
Possible paper on linking data for papers
One problem faced in doing good science in computer science is the availability of data and code that is used in a paper. This paper would focus on:
- Linking data with papers
Allowing users to create profiles
- They can link their papers from their profiles
- They can show what they have contributed
The papers are then linked to:
- Author profile pages if they exist
- Data sets that were used in their paper
- Results of their experiments
- Implementation of their technique
- We would need to integrate their paper results into the overall repository
- Allow for anonymous postings for results and data for blind reviewing
Present the paper and talk with a conference about hosting the data or at least for a workshop
- get feedback on the process
- How to visualize the data
use the data for other fun experiments
- Most commonly used data set
- Trends for problems being solved
How to extend this idea for cross-discipline collaboration