'''Alert! Work in Progress''' We are in the process of making major upgrades to CCASH's architecture. New features include flexible data structures suitable for a variety of textual data and annotations. This reduces the burden on developers who do not wish to implement their own data structures. Other new features include fine-grained user permissions over these data structures. Unfortunately, the size of these changes means that the code base will be unstable for a time. Feel free to email us about progress details, or to get involved if you want things to move faster! == Introduction == === What is CCASH? === CCASH (Cost-Conscious Annotation Supervised by Humans) is a web-based annotation framework. It is designed to be an environment for evaluating state-of-the-art and experimental techniques for efficient annotation and also for applying those techniques to real world annotation projects. While designing CCASH we had our eye particularly on Active Learning; however other techniques such as feature labeling and incorporating rich prior knowledge could also be incorporated into CCASH without too much trouble. === How does it work? === CCASH coordinates the activities of two components: * '''Annotation Tasks''' are graphical user interface that run in your browser and allow you to annotate instances, or correct automatic annotations. An annotation task's job is to display a particular kind of instance and solicit a particular kind of annotation. CCASH tasks are implemented with the [http://code.google.com/webtoolkit/ Google Web Toolkit], allowing you to write code in Java assisted by GWT's WYSIWYG editors. * '''Annotation Managers''' run as xmlrpc services on the network. As such they may be written in any language with an xmlrpc implementation (that is to say, almost anything). Annotation managers are in charge of two important tasks: # Provide annotators with an optionally pre-annotated instances # Record annotations In a typical annotation scenario, CCASH would query an annotation manager for a pre-annotated instance, then present that instance to a human annotator via a compatible GUI task. After the annotator finished, the completed annotation would be sent back to the annotation manager to be preserved. == Getting Started == === Eclipse === CCASH is an Eclipse project, so you will want to get a current copy of [http://www.eclipse.org/downloads Eclipse]. We recommend the Eclipse Enterprise Edition (Eclipse EE) since it comes ready to run Apache Tomcat servers, which you'll need to run data providers. You'll need to install the following Eclipse plugins: * [http://code.google.com/eclipse/docs/getting_started.html Google Plugin for Eclipse] You will need at least the Google Plugin for your version of Eclipse, and the GWT SDK 2.4.0. The other features are for Android development. * [http://www.eclipse.org/subversive/ Subversive] for subversion funcationality. Use an SVN 1.6 API library. Subclipse would also work, but it requires more manual setup in non-Windows environments. === Postgres === ==== Install ==== CCASH manages its data in a relational database. We chose postgres as the default implementation because of its permissive licensing and sub-second timing values. You will need to [http://wiki.postgresql.org/wiki/Detailed_installation_guides install the postgres server] on your system. ==== Configure ==== After installing postgres, you must configure postgres to accept connections from your CCASH install. Do this by editing the [http://developer.postgresql.org/docs/postgres/auth-pg-hba-conf.html pg_hba.conf file] and changing the line that reads
host    all         all         127.0.0.1/32          ident
to read
host    all         all         127.0.0.1/32          trust
This tells postgres to trust all connections from the localhost. This is fine for development. (In the future when you deploy Ccash, you will probably want to increase security by changing the word "trust" to "md5" which will require you to create a postgres account and password for CCASH. The username and password can be whatever you want as long as you change the corresponding data inside of the file Ccash/src/META-INF/persistence.xml). ==== Create a database ==== Create a postgres database for CCASH by running the following command:
createdb -U postgres ccash
==== Create a database user ==== Create a postgres user for CCASH by running the following command:
createuser -U postgres ccash
=== Get CCASH === For a copy of CCASH licensed under the AGPL, see the SourceForge project at https://sourceforge.net/p/ccash/code/HEAD/tree/. Using Subclipse or Subversive check out a read-only copy of the the code from http://svn.code.sf.net/p/ccash/code/trunk. If you are interested in CCASH under a different license, please contact us directly. === Run CCASH === To run CCASH, right-click on the eclipse CCASH project, click "Run As," and select "Web Application." After a minute a "Development Mode" tab will open in Eclipse and display a url. Copy this url into a browser, and you will see the CCASH login screen. Login with username "admin" and password "passwd99". You can change this password after logging in by clicking the "Admin" menu item, and selecting "Annotators". === "Hello, World" annotation task=== Start annotating by [[Doing Simple Sentiment Classification]]. == How do I implement my own annotation task in CCASH? == CCASH is an annotation framework. Before you can apply CCASH to the annotation task you are interested in, you'll need to create an Annotation Manager to run on the server, and an Annotation Task to run in your annotators' browsers. If you create something that you think others might be interested in, please contribute it to the repository! * [[Creating an Annotation Task]] * [[Creating an Annotation Manager]] === Do I have to build my application from scratch? === We have already developed some annotation tasks that we are interested in. Feel free to use their pieces as building blocks for your own project! === Example annotation tasks === These are fully formed annotation tasks you can use for reference. Relevant classes are indicated by links to their javadocs * Simple Sentiment Classification (an '''extremely''' simple task put together for Demo purposes). ** [http://tempus.cs.byu.edu/Ccash#bogus Demo] ** [http://nlp.cs.byu.edu/Ccash/javadoc/bogus Javadoc] - SimpleClassificationAnnotationManager ** [http://nlp.cs.byu.edu/Ccash/javadoc/bogus Javadoc] - SimpleClassificationTask * English part of speech tagging - Label sequences of English words with their respective parts of speech from the [http://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html Penn Treebank Tagset]. ** [http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CCMQFjAA&url=http%3A%2F%2Fwww.lrec-conf.org%2Fproceedings%2Flrec2010%2Fpdf%2F451_Paper.pdf&ei=Bx4PT4TgFoGviAKH3qDpDQ&usg=AFQjCNGtjjNXdFNbI3rK9AHcy-qUSxKe6g&sig2=n8_qpONDf-oiTG45K8hYmQ Tag Dictionaries Accelerate Manual Annotation] - A report on the use of this task (adapted slightly) to answer a question regarding the utility of tag dictionaries. ** [http://tempus.cs.byu.edu/Ccash#bogus Demo] ** [http://nlp.cs.byu.edu/Ccash/javadoc/bogus Javadoc] - EnglishPosTagAnnotationManager ** [http://nlp.cs.byu.edu/Ccash/javadoc/bogus Javadoc] - EnglishPosTagTask * Syriac morphological tagging - Label sequences of Syriac words with their respective morphological analyses. This includes separating the prefix and stem from the main word, assigning a grammatical category (Noun, Verb, etc), assigning gender (common, masculine, feminine), and so on. ** [http://tempus.cs.byu.edu/Ccash#bogus Demo] ** [http://nlp.cs.byu.edu/Ccash/javadoc/bogus Javadoc] - SyriacMorphTagAnnotationManager ** [http://nlp.cs.byu.edu/Ccash/javadoc/bogus Javadoc] - SyriacMorphTagTask * Syriac morphological tagging tutorial - The same as normal Syriac morphological tagging, except that after each sentence the annotator receives feedback on how they did and optionally are obliged to try the sentence again. ** [http://tempus.cs.byu.edu/Ccash#bogus Demo] ** [http://nlp.cs.byu.edu/Ccash/javadoc/bogus Javadoc] - SyriacTutorialTask * Survey - Asks users to answer a series of short answer and multiple choice questions ** [http://tempus.cs.byu.edu/Ccash#bogus Demo] ** [http://nlp.cs.byu.edu/Ccash/javadoc/bogus Javadoc] - SurveyAnnotationManager ** [http://nlp.cs.byu.edu/Ccash/javadoc/bogus Javadoc] - SurveyTask * User study - Takes an annotator through a predetermined sequence of other tasks. ** [http://tempus.cs.byu.edu/Ccash#bogus Demo] - A user is taken through three surveys, an English POS tagging sentence, and a Syriac POS tagging sentence. ** [http://nlp.cs.byu.edu/Ccash/javadoc/bogus Javadoc] - UserStudyAnnotationManager ** [http://nlp.cs.byu.edu/Ccash/javadoc/bogus Javadoc] - UserStudyTask * Training - Presents annotators with a series of instructions on the left side of the screen while they perform an annotation task on the right side of the screen. ** [http://tempus.cs.byu.edu/Ccash#bogus Demo] - A user receives training in Syriac morphological analysis. ** [http://nlp.cs.byu.edu/Ccash/javadoc/bogus Javadoc] - AnnotationTrainingAnnotationManager ** [http://nlp.cs.byu.edu/Ccash/javadoc/bogus Javadoc] - AnnotationTrainingTask === Reusable components === These are reusable components that was have developed while working on our own tasks. Check out the linked javadocs for more information. * AbstractFileReadingAnnotationManager - reads a list of instances from a file, and (optionally) a list of pre-labels from another file, and finally records annotations received to a file. ** AbstractFileReadingInstanceProvider - reads a list of instances from a file and serves them up sequentially to each annotator. [bogus Javadoc]. Used to implement AbstractFileReadingAnnotationManager. ** AbstractFileReadingAutomaticAnnotationProvider - reads a list of annotations from a file and then uses their instance ids to match annotations to instances. [bogus Javadoc]. Used to implement AbstractFileReadingAutomaticAnnotationProvider. ** AbstractFileReadingAnnotationRecorder - records all annotations received to a file. Used to implement AbstractFileReadingAutomaticAnnotationProvider. === Half-baked Tasks === These are tasks that we have in the incubator. * Named Entity Tagging - Label noun phrases as Person, Location, Business, etc. ** [http://nlp.cs.byu.edu/Ccash/javadoc/bogus Javadoc] - NerTaggingAnnotationManager ** [http://nlp.cs.byu.edu/Ccash/javadoc/bogus Javadoc] - NerTaggingTask == I have a problem! What should I do? == First consult the [[CCASH Frequently Asked Questions]]. If your question isn't answered there, send us a note at ccash at cs dot byu dot edu. == Related Papers == * [http://www.lrec-conf.org/proceedings/lrec2012/pdf/511_Paper.pdf CCASH being used to evaluate pre-annotation and correction propagation for Syriac morphological analysis] * [http://lexitron.nectec.or.th/public/LREC-2010_Malta/pdf/451_Paper.pdf CCASH being used to evaluate tag dictionaries in English POS tagging] * [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.178.4696&rep=rep1&type=pdf Original CCASH paper] (partially out-of-date)