Root-Cause Analysis:

## Problem

The recent genetic-algorithm experiments to generate a high-level level layout produced rubbish results, even after much tuning and 1-2 weeks of effort.

## 4 whys

### Why did the recent genetic-algorithm experiments produce no viable results?

• I made a faulty assumption – that using histograms of basic metrics of a level's layout (distance between shape nodes, etc.) would make a good fitness function. Essentially, I though that these histograms would be 'characteristic' of a level's layout style, while in fact many rubbish level layouts also roughly shared those histograms.

Action Item: The next set of metrics I use for the 'characteristic' of a level's flow must be validated – it should be clear that multiple levels with desired similarity in style reduce to the same metric.

### Why did it take so long to determine that the metrics were bogus?

• I was focused on getting results, and so I worked on coding up the solution before running any experimental validation on the metrics. This was my mistake in setting priorities.

### Why was only one level used as a means of brainstorming these metrics?

• I had a definition of 'level flow style' that seemed to be accurate on the surface. I only applied these metrics to one real-world level, using a bunch of paper/toy examples to justify them. These paper examples were academically interesting but not representative of how other real-world levels looked.

### Why was the use of property histograms as a fitness function not sufficient?

• What I was doing: I was comparing the source level's histogram of a property (such as “nearest-neighbor node distance”) to the histogram for that property of a level from the genetic population. The source-level raw data (all the different values of “nearest neighbor node distance”) was compared with the phenotype data (all those values), put along a scale that included all the values for both, and re-binned into two histograms (one for the source level, one for the phenotype). This caused some issues:
• Example:
• source: [1, 2, 3, 4, 100]
• phenotype: [1, 2, 3, 4, 1000]
• In this case, due to re-binning to normalize the histograms (and thus compare their similarity), the source and phenotype appear to have very similar histograms, even though one of the distances is WAY larger than the other. [1, 2, 3, 4] all get binned into the first bin, while [100] gets binned somewhere in the middle and [1000] gets binned at the end. That means that the two normalized histograms appear to be similar – only one value is different between them.