 += Formatting Lexicon Information =
 +== Details ==
 +This page is to keep track of information on potential candidates for a storage format for the lexicon. ​ Currently the goal is to simply store the part of speech tag for each token. ​ However, we want the design to be extensible and allow for additional types of information in the future. ​ Unfortunately we cannot predict all possible variations of what a project would store in its dictionary/​lexicon. ​ As a result, we are considering using some sort of storage format that could easily be manipulated and modified on a per dictionary basis. ​ This page is a collection of potential candidates that could be used.
 +== Candidates ==
 +It would be helpful to discuss these possibilities even if it is just adding a list of pros/cons to each candidate.
 +=== General Formats ===
 +These formats would allow a project to define a list of attributes to be included. ​ With the exception of Protocol Buffers, the list of attributes could easily be added/​updated by a project admin. ​
 +* [http://​www.w3.org/​XML/​ XML]
 +* [http://​code.google.com/​p/​protobuf/​ Protocol Buffers]
 +* [http://​www.json.org/​ JSON]
 +* Plain Text (Values separated by some delimiter)
 +=== Specific Formats ===
 +These are formats that are standardized to some extent. ​ They would allow easy exchange with other programs. ​ However, they tend to be very complicated formats and they do place some restrictions on what is stored.
 +* [http://​www.lisa.org/​Term-Base-eXchange.32.0.html TBX] or [http://​www.lisa.org/​TBX-Basic.926.0.html TBX-Basic]
 +* [http://​www.olif.net/​ OLIF]
