Warning: Do EM found without Unlabel Portion!
Warning: You should specify a file parser!

File List Root Directory: z:\Reuters\data.new\reduced_set
Data Root Directory: z:\Reuters
Classifier Type: NB_MN
EM Enabled
Number of Clusters: 10
Distribution Initializer Type: RANDOM_HARD

Loading Training Data...
  Loading file list z:\Reuters\data.new\reduced_set\training\GHEA.txt...589 files.
  Loading file list z:\Reuters\data.new\reduced_set\training\GVOTE.txt...1098 files.
  Loading file list z:\Reuters\data.new\reduced_set\training\GDEF.txt...837 files.
  Loading file list z:\Reuters\data.new\reduced_set\training\GREL.txt...280 files.
  Loading file list z:\Reuters\data.new\reduced_set\training\GENT.txt...391 files.
Loading class labels...
[GVOTE, GREL, GHEA, GDEF, GENT]
Loading Test Data...
  Loading file list z:\Reuters\data.new\reduced_set\dev\GHEA.txt...73 files.
  Loading file list z:\Reuters\data.new\reduced_set\dev\GVOTE.txt...137 files.
  Loading file list z:\Reuters\data.new\reduced_set\dev\GDEF.txt...104 files.
  Loading file list z:\Reuters\data.new\reduced_set\dev\GREL.txt...35 files.
  Loading file list z:\Reuters\data.new\reduced_set\dev\GENT.txt...48 files.
Found command line argument k
Unlabeled: 3544/3544 documents (target: 1.0).
Just created instance of Random Hard Initializer
No labeled documents detected, commence clustering.
Likelihood for round 0 = -4108512.9494642015
Likelihood for round 1 = -4108485.8545367457
Likelihood for round 2 = -4035512.0653823065
Likelihood for round 3 = -3966089.4853993617
Likelihood for round 4 = -3921910.82400111
Likelihood for round 5 = -3895045.5002938258
Likelihood for round 6 = -3881272.5391428014
Likelihood for round 7 = -3870279.333689569
Likelihood for round 8 = -3856354.415602896
Likelihood for round 9 = -3844625.574732625
Likelihood for round 10 = -3834605.0231753476
Likelihood for round 11 = -3824872.0415230813
Likelihood for round 12 = -3818767.132530139
Likelihood for round 13 = -3815710.052677934
Likelihood for round 14 = -3814792.6428085743
Likelihood for round 15 = -3814240.611344331
Likelihood for round 16 = -3813557.825200702
Likelihood for round 17 = -3813324.683373153
Likelihood for round 18 = -3813090.477829844
Likelihood for round 19 = -3812948.9962592535
Likelihood for round 20 = -3812944.094247621
Likelihood for round 21 = -3812945.338805639
Likelihood for round 22 = -3812936.51004531
Likelihood for round 23 = -3812772.609597657
Likelihood for round 24 = -3812676.4874882638
Likelihood for round 25 = -3812669.2891049692
Likelihood for round 26 = -3812665.0176783875
Likelihood for round 27 = -3812635.0462839566
Likelihood for round 28 = -3812624.948677367
Likelihood for round 29 = -3812568.4781594537
Likelihood for round 30 = -3812459.071967899
Likelihood for round 31 = -3812068.7494916776
Likelihood for round 32 = -3811346.2435353254
Likelihood for round 33 = -3811171.591922586
Likelihood for round 34 = -3811137.241213156
Likelihood for round 35 = -3811127.018922891
Likelihood for round 36 = -3811127.0375125445
Likelihood for round 37 = -3811127.0560666067
Likelihood for round 38 = -3811127.0785230165
Likelihood for round 39 = -3811127.1094681732
Likelihood for round 40 = -3811127.159344023
Likelihood for round 41 = -3811127.2549962364
Likelihood for round 42 = -3811127.0055806288
Likelihood for round 43 = -3811104.876000952
Likelihood for round 44 = -3811083.1818371853
Likelihood for round 45 = -3811054.1568926456
Likelihood for round 46 = -3811054.160615779
Likelihood for round 47 = -3811054.161220626
Likelihood for round 48 = -3811054.1613139277
Likelihood for round 49 = -3811054.1613279106
Likelihood for round 50 = -3811054.1613300266
Likelihood for round 51 = -3811054.1613303465
Likelihood for round 52 = -3811054.1613304005
Likelihood for round 53 = -3811054.1613304005
Likelihood for round 54 = -3811054.1613304056
Likelihood for round 55 = -3811054.1613304056
Likelihood for round 56 = -3811054.1613304047
Now testing classifier: Naive Bayes Multinomial, EM: true  Clustering: true
		"Cluster00"		"Cluster01"		"Cluster02"		"Cluster03"		"Cluster04"		"Cluster05"		"Cluster06"		"Cluster07"		"Cluster08"		"Cluster09"		
GVOTE		546(0.44426)		260(0.21155)		6(0.00488)		23(0.01871)		2(0.00163)		0(0.00000)		392(0.31896)		0(0.00000)		0(0.00000)		0(0.00000)		
GREL		19(0.06230)		73(0.23934)		21(0.06885)		164(0.53770)		0(0.00000)		0(0.00000)		28(0.09180)		0(0.00000)		0(0.00000)		0(0.00000)		
GHEA		86(0.12991)		9(0.01360)		113(0.17069)		53(0.08006)		275(0.41541)		0(0.00000)		126(0.19033)		0(0.00000)		0(0.00000)		0(0.00000)		
GDEF		11(0.01198)		13(0.01416)		150(0.16340)		504(0.54902)		3(0.00327)		0(0.00000)		237(0.25817)		0(0.00000)		0(0.00000)		0(0.00000)		
GENT		86(0.20000)		18(0.04186)		189(0.43953)		49(0.11395)		72(0.16744)		0(0.00000)		15(0.03488)		0(0.00000)		1(0.00233)		0(0.00000)		
Adjusted Rand Index: 0E+1
Accuracy: 0.0
LDAP: couldn't connect to LDAP server
nlp/document-clustering.txt · Last modified: 2015/04/23 21:45 by ryancha
Back to top
CC Attribution-Share Alike 4.0 International
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0