Classical Procedural Content Generation

11 min readAug 8, 2024

Introduction

I have been reading this book on Procedural Content Generation and its integration with ML. As I go through this book, I will try my best to implement the concepts we go over in small, easily implementable examples. In this first chapter, we journey through a series of classic Procedural Content Generation methodologies which fall into three categories — constructive-based, constraint-based and search based.

The Notebooks used in this article can be found in the below repository.

PCGML/classical_pcg at master · mattmacf98/PCGML

Contribute to mattmacf98/PCGML development by creating an account on GitHub.

github.com

Constructive Based

Constructive generation is a family of Procedural Content Generation which quickly generates content by using rules and randomness in a one-shot fashion. The rules are very important and are hand crafted by a domain expert in the content space.

Noise Based

The first constructive based approach we will look at is noise based. The randomness in noise generation is trivial, the rules for the noise generation are the parameters and and distribution the noise is sampled from.

Let’s go through an example of how we can generate some useful content for a map terrain using hierarchical noises.

The books shows how different noises with different ranges and resolutions can be combined together to form a final output which might nicely represent some terrain.

def generate_noise(resolution, max_val):
    noise = []
    for _ in range(resolution):
        row = [random.uniform(0, max_val) for _ in range(resolution)]
        noise.append(row)
    return noise

First, we will define a function which will take in the resolution and range of the noise we want to generate.

low_res = generate_noise(16, 0.25)
med_res = generate_noise(32, 0.25)
high_res = generate_noise(64, 0.5)
very_high_res = generate_noise(128, 1)

Then we will generate four noises of different resolutions and ranges, the idea is that the low resolution one will capture large features and the higher resolution images will capture smaller features.

The ranges indicate how dynamic the features are, I played around with the ranges for the resolutions to get cool looking mountain ranges.

low_res_interpolated = zoom(low_res, (128/16))
med_res_interpolated = zoom(med_res, (128/32))
high_res_interpolated = zoom(high_res, (128/64))

low_weight = 20
med_weight = 6
high_weight = 2
very_high_weight = 1

total_weight = low_weight + med_weight + high_weight + very_high_weight

final = (np.multiply(low_res_interpolated, low_weight)  + np.multiply(med_res_interpolated, med_weight) + np.multiply(high_res_interpolated, high_weight) + np.multiply(very_high_res, very_high_weight)) / total_weight

The book said we could just average the noises, but I introduced a weighted average mechanism to have more control on the output. I want the large features to be 20x more important than the very high resolution features.

Graphing the final output in 3D gave me some very nice looking mountain ranges.

You could tweak the weights manually to generate rolling hills or jagged mountain ranges, chop the tops of the mountains to create plateaus or specify a water level threshold to turn this into a world map generator.

Rules Based

The next approach focused on rule based generation, this was a lot less random than our noise generation and required verbose rules which I wrote in code.

For our example, we will make a Pokemon Mystery Dungeon Quest generator — a core memory from my childhood. If you have never played, the game sends you on a never ending stream of quests which follow a very predictable pattern “[objective] [subject] at [location] on floor [floor] to receive [item]”. Using a rule system, we can generate a never ending stream of such quests (which I now suspect Game Freak had implemented in their game).

First, let’s specify our rules


locations = ["Tiny Woods", "ThunderWave Cave", "Mt. Steam"]
types = ["escort", "rescue", "defeat"]
pokemon = ["wurmple", "diglett", "magnemite", "nidoran"]
floors = {
    "Tiny Woods": 4,
    "ThunderWave Cave": 8,
    "Mt. Steam": 12
}

diff_to_rewards = {
    10: ["potion", "seed"],
    20: [ "gummi", "orb"],
    30: ["scarf"],
    50: ["TM"]
}

We have a basic rule set stating that the missions can only be in the already known locations, there are only three types of missions and there are only a certain set of Pokemon which can be the subject of the mission.

Once we pick a dungeon, we have another rule of the max floor the mission can be on. Finally, after generating all of these parameters we synthesize a mission difficulty to determine the player’s reward.

First, we can choose the parameters which have no dependencies


location = np.random.choice(locations)
print(location)
typ = np.random.choice(types)
print(typ)
subject = np.random.choice(pokemon)
print(subject)

and use the dungeon to select our floor

floor = np.random.randint(1, floors[location])
print(floor)

Finally, we can determine a reward to give our player

difficulty = 0

if location == "Tiny Woods":
    difficulty += 5 + floor
elif location == "ThunderWave Cave":
    difficulty += 10 + math.floor(floor * 1.5)
elif location == "Mt. Steam":
    difficulty += 20 + 2 * floor

if typ == "rescue":
    difficulty += 5
elif typ == "escort" or "defeat":
    difficulty += 10

possible_rewards = []
if difficulty >= 50:
    possible_rewards = diff_to_rewards[50]
elif difficulty >= 30:
    possible_rewards = diff_to_rewards[30]
elif difficulty >= 20:
    possible_rewards = diff_to_rewards[20]
elif difficulty >= 10:
    possible_rewards = diff_to_rewards[10]

reward = np.random.choice(possible_rewards)

and generate our quest

print(f"{typ} {subject} at {location} Floor {floor} to receive a {reward}")

“defeat diglett at ThunderWave Cave Floor 2 to receive a orb”

Grammar Based

The last of our constructive based generations is grammar based. This method is a subset of the rule based approach. While general rules do not have a specific formalization, to be a grammar, you must have the following things.

terminal symbols (can’t be changed)
non-terminal symbols (must be changed)
rules which tells us how we can go from non-terminal to terminal

Grammar based approaches are the foundation for L-Systems which can create the cool trees you see above in a completely procedural manner.

For our example, we will create a pseudo L-System to create our own trees. First, we need to define the grammar rules and our starting symbol.

rules = {
    '|': ['|', '|\\', ' \\'],
    '\\': ['  ', ' *', ' \\'],
    '*': [' ']
}

symbols = '|'

Our rules tell us that the root can be another root, branch or root and branch, our branch can turn into nothing a fruit or another branch and our fruit must turn into nothing.

By printing out each stage until we have all terminal symbols,

while True:
    print(symbols)
    expansions = 0
    next_gen = ''
    for c in symbols:
        if c in rules:
            options = rules[c]
            choice = np.random.choice(options)
            next_gen += choice
            expansions += 1
        else:
            next_gen += c

    if expansions == 0:
        break
    else: symbols = next_gen

We can create something that looks like a branch of a tree, below is an iteration I thought looked nice.

|
|\
| *
|  
|\  
 \ *  
  \

Summary

Constructive Based Generation involves building content up using a set of defined rules and randomness
Noise is mostly random and we have some basic rules in the function parameters we use
Rule based approaches require hand crafted rules which must be translated into code
Grammars are a formalized subset of rules which expand and grow using substitutions of non-terminal symbols for terminal ones

Constraint Based

Constraint based construction represents the generative space as a set of n variables. We then enforce a set of constraints to ensure our generated content lives on some sub-hypersurface in this n-dimensional space.

To illustrate this, we will be implementing the palindrome generator that the book specifies. In this example, the dimensionality of the solution space is the number of letters in our palindrome. Each letter, has the domain of the entire english alphabet.

If it helps, instead of thinking of the values as characters, you can think of them as numerical coordinates and our space ranges from [0,27) in all directions — so abc is actually [0,1,2] — and we are looking for a palindromic coordinate.

The first step we need to take is to load and clean our english dictionary which you can find in the github repo — or you could bring your own.

with open('./words.txt') as f:
    words = f.read().splitlines()
    cleaned_words = [word.lower() for word in words if ',' not in word and '.' not in word]

Next, we will define a function which will try to generate a palindrome.

def generate_palindrome(word_len, corpus):
    # init the palindrome
    potential_words = [word for word in corpus if len(word) == word_len]
    print(f"{len(potential_words)} {word_len} letter words available")
    alphabet = 'abcdefghijklmnopqrstuvwxyz'
    possible_solutions = [list(alphabet) for i in range(word_len)]

    while True:
        # get char to collapse
        condition = np.array([len(char_options) for char_options in possible_solutions]) != 1
        collapsable_indices = np.where(condition)[0]
        index_to_collapse = np.random.choice(collapsable_indices)

        # collapse the character and the corresponding char for the pallindrome
        character_choice = np.random.choice(possible_solutions[index_to_collapse])
        possible_solutions[index_to_collapse] = [character_choice]
        possible_solutions[word_len - 1 - index_to_collapse] = [character_choice]

        # get remaining valid corpus
        template_word = ['.'] * word_len
        for i in range(word_len):
            if len(possible_solutions[i]) == 1:
                template_word[i] = possible_solutions[i][0]
        pattern = ''.join(template_word)
        regex = re.compile(pattern)

        potential_words = [word for word in potential_words if regex.match(word)]
        print(f"{len(potential_words)} words available matching {pattern}")

        # update character options
        for i in range(math.ceil(word_len/2)):
            if len(possible_solutions[i]) != 1:
                potential_chars = set([word[i] for word in potential_words])
                compliment_potential_chars = set([word[word_len - 1 - i] for word in potential_words])
                new_character_choices = potential_chars & compliment_potential_chars
                if len(new_character_choices) == 0:
                    raise RuntimeError("Generation Failed")
                possible_solutions[i] = list(new_character_choices)
                possible_solutions[word_len - 1 - i] = list(new_character_choices)

        # if we are done => break
        if np.all([len(char_options) == 1 for char_options in possible_solutions]):
            break

    final_word = [arr[0] for arr in possible_solutions]
    return ''.join(final_word)

Let’s break this function down into steps since it is quite large (and probably should be refactored).

We trim down our potential words to only those of the length we want to generate — this defines our generative space dimensionality
We then initialize each dimension so that it can be any value a-z (0–26)
In a loop, randomly pick a dimension to collapse and pick a coordinate to assign to it
We then satisfy our palindromic constraint which says coordinate_i == coordinate_n-i and collapse that corresponding coordinate
Next, we satisfy our other constraint that the word is a valid english word by getting all remaining words which match our pattern so far and updating the potential coordinates for each dimension according to the words left in our potential solutions — and throw an error if there are no viable solutions.
When we have completed collapsing all dimensions, we return the word

Since our generation can fail, we need to re-try it until we succeed

final_word = ''
while final_word == '':
    try:
        final_word = generate_palindrome(7, cleaned_words)
    except:
        print("Failed To Generate Restarting...")
        continue
print(f"Final Word {final_word}")

Below is an example output of a run


52050 7 letter words available
39 words available matching k.....k
0 words available matching k.o.o.k
Failed To Generate Restarting...
52050 7 letter words available
203 words available matching .u...u.
0 words available matching .ul.lu.
Failed To Generate Restarting...
52050 7 letter words available
0 words available matching j.....j
Failed To Generate Restarting...
52050 7 letter words available
8 words available matching ..h.h..
0 words available matching nah.han
Failed To Generate Restarting...
52050 7 letter words available
959 words available matching s.....s
4 words available matching so...os
1 words available matching sooloos
Final Word sooloos

There are a lot of improvements we could make here such as…

Collapsing the dimension with the least options remaining instead of randomly picking one
Collapsing to the coordinate that maintains the highest possible remaining solutions instead of randomly picking one
Backtracking when no solution is left instead of just re-starting

However, for a toy example, this works great!

Summary

Constraint based approaches define a solution space using a set of tunable variables
Constraint based approaches specify rules on the solution space to ensure the generated solution lies on a valid sub-surface of the space
Constraint based approaches can get stuck and fail, they need to be restarted or cleverly backtrack

Search Based Approaches

Search Based approaches define what makes “good” content and then searches for good content in the space defined by the input variables. The definition of programmatically identifying “good” content is the most difficult part of this approach. Any search is possible including exhaustive search — though this will be extremely sub-optimal.

In this example we will be exploring how two specific search functions — hill climbing and evolutionary — work to identify variables for a weapon’s damage and time to attack in order to maximize our fitness function.

First let’s define our fitness function — which was extremely arbitrary just to get an interesting graph.

def fitness(damage, time_to_attack):
    return (damage**3)/10 - time_to_attack * 5 - (damage - time_to_attack)**2 + 150

Below is the resulting fitness graph.

Hill Climbing

Hill Climbing will sample a random point and check surrounding points to see if any of them are better.

# pick random position
weapon_damage = np.random.randint(0,11)
weapon_tta = np.random.randint(0,11)
weapon_fitness = fitness(weapon_damage, weapon_tta)

def clamp(value, min_value, max_value):
    return max(min_value, min(value, max_value))

while True:
    # check if there is an adjacent better fitness coordinate
    best_fitness = weapon_fitness
    weapon_tuple = (weapon_damage, weapon_tta)
    for d in [-1,0,1]:
        for t in [-1,0,1]:
            candidate_fitness = fitness(clamp(weapon_damage + d, 0, 10), clamp(weapon_tta + t, 0, 10))
            if candidate_fitness > best_fitness:
                best_fitness = candidate_fitness
                weapon_tuple = (clamp(weapon_damage + d, 0, 10), clamp(weapon_tta + t, 0, 10))

    # finish if we have not moved
    if weapon_tuple[0] == weapon_damage and weapon_tuple[1] == weapon_tta:
        break

    # move to best found coordinate
    weapon_damage = weapon_tuple[0]
    weapon_tta = weapon_tuple[1]
    weapon_fitness = fitness(weapon_damage, weapon_tta)

print(f"Best weapon damage={weapon_damage} time_to_attack={weapon_tta}")

This works very well so long as there is no local maxima which we get stuck on. For example, in this graph (0,0) is a local maxima, if our algorithm arrives at that hill top, it will stop there.

Evolutionary Search

Evolutionary Search is a genetic algorithm which attempts to mimic survival of the fittest. It initializes a population, breeds and mutates them and then kills of the least fit individual for the next generation.

population_size = 10
generations = 10
mutation_chance = 0.1
population = []

def generate_weapon(damage, time_to_attack):
    fit = fitness(damage, time_to_attack)
    return {"damage": damage, "time_to_attack": time_to_attack, "fitness": fit}

def cross_over_and_mutate(pop1, pop2):
    dmg = pop1["damage"] if np.random.random() < 0.5 else pop2["damage"]
    tta = pop1["time_to_attack"] if np.random.random() < 0.5 else pop2["time_to_attack"]

    if np.random.random() < mutation_chance:
        if np.random.random() < 0.5:
            dmg = np.random.randint(0,11)
        else:
            tta = np.random.randint(0,11)

    fit = fitness(dmg, tta)
    return {"damage": dmg, "time_to_attack": tta, "fitness": fit}


for i in range(population_size):
    population.append(generate_weapon(np.random.randint(0,11), np.random.randint(0,11)))

for i in range(generations):
    next_generation = []
    for p in range(population_size):
        random_numbers = np.random.randint(0, population_size, size=2)
        next_generation.append(cross_over_and_mutate(population[random_numbers[0]], population[random_numbers[1]]))

    population.extend(next_generation)
    sorted_population = sorted(population, key=lambda x: x["fitness"])

    population = sorted_population[-1*population_size:]

sorted_population = sorted(population, key=lambda x: x["fitness"])
print(sorted_population[0])

The longer we run this, the more evolved our species will become and the better our generated quality. Thanks to the mutation step we will ensure that we don’t get stuck in local maxima since 10% of the time we randomize one of the coordinates, throwing us off any hilltop we may be stuck on.

Summary

Search based generation hinges on defining a good fitness function which properly encodes domain knowledge of what a good piece of content looks like
Searching can be done in any fashion we want — including exhaustive search. In practice, we want to pick a search algorithm which runs faster than exhaustively searching and has a low chance of getting stuck in local maxima

Conclusion

In this article we explored classical Content Generation strategies, at their core they all involve specifying some form of a set of rules and using randomness to create content satisfying these rules. In all cases, a good Procedural Content Generation application highly depends on a well authored and informed rule set for the domain — something we could maybe learn with ML.

Classical Procedural Content Generation

Introduction

PCGML/classical_pcg at master · mattmacf98/PCGML

Contribute to mattmacf98/PCGML development by creating an account on GitHub.

Constructive Based

Noise Based

Rules Based

Grammar Based

Summary

Constraint Based

Summary

Search Based Approaches

Hill Climbing

Evolutionary Search

Summary

Conclusion

Written by Matthew MacFarquhar

No responses yet