__NOTOC__
CMake utilizes a more verbose, structured syntax than gmake. Hopefully this will improve maintainability of our fairly brittle build system in the future.
CMake works by generating Makefiles for regular make to process. The main configuration file is called
CMakeLists.txt
. Other support scripts are contained in
HEAD/Language-ID/scripts/cmake
. To do this, cd into HEAD type :
cmake .
<br/> Assuming this succeeds, proceed to invoke make. First do so without specifying a target in order to build all needed utilities:<br/> :
make
<br/> will build DETware and Sphinx. Next, invoke a specific build target:<br/> :
make detcurve
<br/> will run the default experiments and produce detcurves using gnuplot.
To specify a particular experiment, special –norm value for resultbuilder.pl, or a different result name (nist-result-file, etc.), use a command like the following prior to running
make detcurve
:<br/> :
cmake -D FEATURE_SET_NAME_FORCE=fivegram -D RBLDR_NORM_FORCE=1 -D RESULT_NAME_FORCE=nist_norm1 .
With this commit, Cygwin continues to be a broken platform. Run experiments on entropy or on any other Linux workstation that meets the minimum requirements. Spoken Language-ID now depends on CMake 2.4 and Ruby 1.8 in addition to previous dependencies. Both of these packages are already installed on entropy and are available for installation under Cygwin installations, in the event that Cygwin starts working again
The cmake support folder is being moved to Language-ID/scripts/cmake. We're dropping some no-longer-used CMake modules, scripts, and documentation in the process. The FindJava.cmake module provides a special workaround for entropy having GCJ set as the default java implementation, so we can use Sun's java 5. A new get_wav_files.rb script replaces the old get_wav_files.sh.cmake script, and get_seg_files.pl and mkdir_if_missing.pl are dropped in favor of their already-existing Ruby counterparts so that all of scripts in cmake/Scripts are Ruby-based rather than being a hodgepodge. This allowed some improvements in indicating how much progress has been made towards copying the large seg and wav file datasets:<br/>
D cmake<br/> D cmake/Scripts<br/> D cmake/Scripts/get_wav_files.sh.cmake<br/> D cmake/Scripts/get_seg_files.sh.cmake<br/> D cmake/Docs<br/> D cmake/Docs/todo.txt<br/> D cmake/Docs/cmake notes.txt<br/> D cmake/Modules<br/> D cmake/Modules/MacroStripFileExtension.cmake<br/> D cmake/Modules/MacroAddPrefix+AddSuffix.cmake<br/> D cmake/Modules/MacroGetCygpath.cmake<br/> D cmake/Modules/MacroMakeLangDirs.cmake<br/> D cmake/Modules/MacroMakeDirectory.cmake<br/> A + Language-ID/scripts/cmake<br/> A + Language-ID/scripts/cmake/Scripts<br/> A + Language-ID/scripts/cmake/Scripts/mkdir_if_missing.rb<br/> A + Language-ID/scripts/cmake/Scripts/get_seg_files.rb<br/> A Language-ID/scripts/cmake/Scripts/get_wav_files.rb<br/> A + Language-ID/scripts/cmake/Modules<br/> A + Language-ID/scripts/cmake/Modules/MacroAddPrefix+AddSuffix.cmake<br/> A + Language-ID/scripts/cmake/Modules/FindJava.cmake<br/> A + Language-ID/scripts/cmake/Modules/MacroLoadProperty.cmake<br/>
Make some final core changes to the CMakeLists.txt files and move the old Makefile to Language-ID/scripts/Makefile-original so there will be no conflict with the CMake-generated output:<br/>
M CMakeLists.txt<br/>
M Sphinx4-1.0beta/CMakeLists.txt<br/>
M Statistical-NLP/CMakeLists.txt<br/>
D Language-ID/CMakeLists.txt<br/>
M Language-ID/scripts/CMakeLists.txt<br/>
M Language-ID/scripts/detware/bin/CMakeLists.txt<br/>
D Language-ID/scripts/Makefile<br/>
R + Language-ID/scripts/Makefile-original<br/>
Reduce the verbosity of this class's output so we can see what else is going on around it:<br/>
M Statistical-NLP/src/edu/berkeley/nlp/math/LBFGSMinimizer.java<br/>
Create a cmake-ified version of blddetcurve.sh as well as a fresh new Ruby implementation. The Ruby script no longer stores temporary data in an external file called
infile
. More is done within the script itself, rather than given to helpers like awk, helping to guarantee that the data flows through a sequential pipeline. The Ruby version may eventually be allowed to fully supersede the shell script and is currently being invoked by the build system:<br/>
D Language-ID/scripts/blddetcurve.sh<br/>
M Language-ID/scripts/blddetcurve.sh.cmake<br/>
A + Language-ID/scripts/blddetcurve.rb.cmake<br/>
Also create a cmake-ified version of regressiontest.sh along with a new Ruby implementation. The Ruby implementation allows lid-console.rb to run regression tests very conveniently, but can also be invoked independently from the command line and may eventually supersede regressiontest.sh[.cmake]:<br/>
D Language-ID/scripts/regressiontest.sh<br/>
A + Language-ID/scripts/regressiontest.sh.cmake<br/>
A + Language-ID/scripts/regression_test.rb<br/>
Fix a small syntax error in the gnu_det.sh script. This script could be considered deprecated in favor of using generate_gnuplot_script.rb :<br/>
M Language-ID/scripts/detware/scripts/gnu_det.sh<br/>
Remove the detware plot binary. This binary caused a problem by not having the executable bit set, preventing detcurves from being built. By removing it, we force it to be rebuilt on each system, which removes the need for 64-bit systems to run 32-bit code, since the repository's copy was 32-bit:<br/>
D Language-ID/scripts/detware/bin/plot<br/>
Then, create a successor implementation of gnu_det.sh. This is implemented in Ruby and facilitates generation of plots by the lid-console.rb script as well as the blddetcurve.rb script:<br/>
A + Language-ID/scripts/generate_gnuplot_script.rb<br/>
A + Language-ID/scripts/plot.rb<br/>
A + Language-ID/scripts/plot_line.rb<br/>
Take the 'norm' option as an integer rather than as a string; fix a typo:<br/>
M Language-ID/scripts/thetasweep.pl<br/>
Add a 'lang' command line option that specifies what language is being operated on rather than having that be inferred from the filename. The format of the outcome file names has changed in the past. This flag should prevent this script from breaking if we ever change the filenames again:<br/>
M Language-ID/scripts/thetasweep2.pl<br/>
Correctly infer languages from filenames. It's not possible to use a 'lang' flag as in thetasweep2.pl because this script operates on more than one language at a time. Also make some variable names more descriptive:<br/>
M Language-ID/scripts/resultbuilder.pl<br/>
Adjust to absolute paths being used in cmake. Take OS and architecture command line options so we can avoid quadratic regression on 64-bit and cygwin setups; check if corresponding output files already exist and don't re-process the wav/seg files if so; clean up output so it doesn't flood the terminal:<br/>
M Language-ID/scripts/seg2xml3.pl<br/>
Introduce the Feature Engineering Console prototype script:<br/>
A + lid-console.rb<br/>
A property file system for use both by lid-console.rb and cmake. This allows certain settings to be persistent and shared between cmake and ruby scripts:<br/>
A + Language-ID/scripts/properties.rb<br/>
A Perl one-liner that didn't work inside of CMake due to escaping problems. Of course, we could also probably use awk, but this works for now:<br/>
A + Language-ID/scripts/printfirstcolumn.pl<br/>
Enable filtering of what languages are used in training, etc. This helper script is used by the build system:<br/>
A + Language-ID/scripts/filter_langs.rb<br/>
Delete old, defunct detcurve stuff I discovered during the process:<br/>
D Language-ID/scripts/preamble.gp<br/>
D Language-ID/scripts/multiplot.pl<br/>
D Language-ID/scripts/detcurve.gp<br/>
You can successfully use the $@ variable as in regular make, but I suggest that you don't, as this variable is left uninterpreted by cmake and is only resolved at the gmake level, it's difficult to tell exactly what file will be pointed to in the end.
FindJava: Find Java
This module finds if Java is installed and determines where the include files and libraries are. This code sets the following variables: JAVA_RUNTIME = the full path to the Java runtime
JAVA_COMPILE = the full path to the Java compiler
JAVA_ARCHIVE = the full path to the Java archiver''
FindPerl: Find perl this module looks for Perl
PERL_EXECUTABLE - the full path to perl PERL_FOUND - If false, don't attempt to use perl.''
Source data: /home/data/langid/OGI_TS/SEGLOLA/MANDARIN seems to be a symlink, should be the actual directory. Do mv /home/data/langid/OGI_TS/SEGLOLA/mandarin /home/data/langid/OGI_TS/SEGLOLA/MANDARIN
LID Project Prereq's (Ubuntu Package Names):
My original heading: I have been attempting to move the Language-ID build / testrun system from (g)make to cmake. This has been rather tricky since I knew nothing about cmake at first, but it's coming together now and I want to document how I've accomplished the conversion.
Create CMakeLists.txt files in HEAD/, HEAD/Language-ID, etc. CMakeLists.txt is the main file upon which cmake operates. Language-ID/CMakeLists.txt was originally copied from Language-ID/scripts/Makefile.
<br/>gmake: THE_PREFIXED_LIST = $(addprefix theprefix/, $(SOME_LIST_OF_VALUES))<br/>cmake: set(THE_PREFIXED_LIST ${SOME_LIST_OF_VALUES})<br/>add_prefix(THE_PREFIXED_LIST “theprefix/”) Make all targets explicit:<br/>gmake: A_TARGET_NAME : ANOTHER_TARGET somefile/that-should-exist<br/>cmake: add_custom_target(A_TARGET_NAME DEPENDS ANOTHER_TARGET DEPENDS somefile/that-should-exist)<br/>gmake: somefile/that-should-exist :<br/>|–tab–| command_to_create_file arg1 arg2 …<br/>cmake: add_custom_command(OUTPUT somefile/that-should-exist COMMAND command_to_create_file arg1 arg2) :$(word $(words $(subst /, ,$(dir $@))),$(subst /, ,$(dir $@)))<br/>This just finds the last directory name in a path – /a/path/example/file.extension would yield example