Probabilistic Based Procedural Content Generation with ML

Matthew MacFarquhar
9 min readAug 22, 2024

--

Introduction

I have been reading this book on Procedural Content Generation and its integration with ML. As I go through this book, I will try my best to implement the concepts we go over in small, easily implementable examples. In this chapter, we will implement some algorithms to allow our model to learn rules from example levels and layouts rather than us having to manually input that information, these examples will take the frequency of observed tile adjacencies to generate a probabilistic model instead of a pure rule based one.

The Notebooks used in this article can be found in the below repository.

In the previous article we implemented approaches to synthesize rules from example content, however this approach had two shortcomings.

  1. We will only ever be able to generate content based on what is seen in the example content (i.e. if tile A never borders tile C in example content, we will never see it generated). This may be desirable or undesirable based on your goals
  2. Frequency of adjacency is not considered when generating content (i.e. if B comes after A 500 times and C comes after A only once in example content, they will both appear equally after A in the generated content)

We will explore solutions which can encode this probability alongside the adjacency rules.

Markov Chain

A Markov chain is a stochastic process where the probability of transitioning to the next state depends only on the current state and not on the sequence of prior states. The current state can be a single tile, a set of the previous tiles, etc… For our example, we will use a Markov chain which only knows the column immediately preceding itself.

Generate Probabilities

Our first step is to extract adjacency frequencies from sample content.

next_col_frequencies = {}
for i in range(len(example_level) - 1):
col = example_level[i]
if col in next_col_frequencies:
next_col_frequency = next_col_frequencies[col]
else:
next_col_frequency = {}

next_col = example_level[i+1]
if next_col in next_col_frequency:
next_col_frequency[next_col] += 1
else:
next_col_frequency[next_col] = 1

next_col_frequencies[col] = next_col_frequency

print(next_col_frequencies)

We iterate over the columns and populate a next column frequency dict for our example level.

{
"C":{
"G":1,
"H":2
},
"G":{
"A":3
},
"A":{
"C":2,
"A":2,
"E":1,
"H":1
},
"H":{
"H":3,
"G":2,
"E":1
},
"E":{
"A":1
}

Next, we will take this frequency map and use it to generate a probability map we can sample from.

relative_probabilities = {}
smoothing_alpha = 0.05
for conditional in columns:
conditional_prob = {}
total_conditional = sum(conditional_prob.values())
for col in columns:
if conditional in next_col_frequencies and col in next_col_frequencies[conditional]:
observed_count = next_col_frequencies[conditional][col]
else:
observed_count = 0
conditional_prob[col] = (observed_count + smoothing_alpha) / (total_conditional + smoothing_alpha * len(columns))
relative_probabilities[conditional] = conditional_prob

print(relative_probabilities)

This map will encode the occurrences as probabilities, we also add in a smoothing value so we can possibly observe never before seen adjacencies in our generated content. If this value is 0 we will only generate adjacencies seen in the examples.

{
"A":{
"A":5.124999999999999,
"B":0.125,
"C":5.124999999999999,
"D":0.125,
"E":2.625,
"F":0.125,
"G":0.125,
"H":2.625
},
"B":{
"A":0.125,
"B":0.125,
"C":0.125,
"D":0.125,
"E":0.125,
"F":0.125,
"G":0.125,
"H":0.125
},
"C":{
"A":0.125,
"B":0.125,
"C":0.125,
"D":0.125,
"E":0.125,
"F":0.125,
"G":2.625,
"H":5.124999999999999
},
"D":{
"A":0.125,
"B":0.125,
"C":0.125,
"D":0.125,
"E":0.125,
"F":0.125,
"G":0.125,
"H":0.125
},
"E":{
"A":2.625,
"B":0.125,
"C":0.125,
"D":0.125,
"E":0.125,
"F":0.125,
"G":0.125,
"H":0.125
},
"F":{
"A":0.125,
"B":0.125,
"C":0.125,
"D":0.125,
"E":0.125,
"F":0.125,
"G":0.125,
"H":0.125
},
"G":{
"A":7.624999999999999,
"B":0.125,
"C":0.125,
"D":0.125,
"E":0.125,
"F":0.125,
"G":0.125,
"H":0.125
},
"H":{
"A":0.125,
"B":0.125,
"C":0.125,
"D":0.125,
"E":2.625,
"F":0.125,
"G":5.124999999999999,
"H":7.624999999999999
}
}

Because of the smoothing, these are not pure probabilities (sum to one) but can be used as relative weights when sampling.

Generation

To Generate, we start with a random first choice for our column and then flow our generation forward.

level_length = 10
first_col = np.random.choice(columns)[0]
generated_level = [first_col]

for i in range(1, level_length):
prev_col = generated_level[i - 1]
options_freqs = relative_probabilities[prev_col]
chosen_col = random.choices(list(options_freqs.keys()), weights=list(options_freqs.values()), k=1)[0]
generated_level.append(chosen_col)

print(generated_level)
display_level_image(col_dim, generated_level, col_name_to_image)

Here is an example level I generated.

Even though we did not see the column with the green power up block in the example content, we were still able to generate it because of the smoothing we applied to our weights.

Summary

  • Adjacency probabilities can be extracted from observing frequencies in example content
  • Markov chains act as conditional probabilities “Given the previous column is X what is the probability the next column is Y”
  • Markov chains can be given more memory in their state “Given the previous columns were W,X what is the probability the next column is Y”
  • Adding probabilities to rule generation allows us to generate patterns not observed in sample content and generate content with distributions which are similar to the example content

Markov Random Field

Another approach we can try is a Markov Random Field. A Markov random field is a probabilistic graphical model that represents the joint distribution of a set of variables with an undirected graph, where each variable depends only on its neighbors in the graph, reflecting a local Markov property.

Unlike a Markov Chain, it does not encode any implicit notion of directionality (i.e. this tile comes after this one) it is only an undirected neighbor association.

Extracting Frequencies

We will split this sample content up by tile (instead of columns) to ascertain the adjacency frequencies.

tile_adjacency_frequencies = {}
occurences = {}
for row in range(len(example_level)):
for col in range(len(example_level[0])):
tile = example_level[row][col]
if tile in occurences:
occurences[tile] += 1
else:
occurences[tile] = 1

if tile in tile_adjacency_frequencies:
tile_adjacency_frequency = tile_adjacency_frequencies[tile]
else:
tile_adjacency_frequency = {}

for tile_offset in [(0,-1), (1,0), (0,1), (-1,0)]:
if (row + tile_offset[1]) < 0 or (row + tile_offset[1]) >= len(example_level) or (col + tile_offset[0]) < 0 or (col + tile_offset[0]) >= len(example_level[0]):
continue
adjacent_tile = example_level[row + tile_offset[1]][col + tile_offset[0]]

if adjacent_tile in tile_adjacency_frequency:
tile_adjacency_frequency[adjacent_tile] += 1
else:
tile_adjacency_frequency[adjacent_tile] = 1

tile_adjacency_frequencies[tile] = tile_adjacency_frequency

print(tile_adjacency_frequencies)
print(occurences)

We have extracted the adjacency frequencies as well as a generic tile occurrence frequency.

To determine the possible values for a tile, we need to know its neighbors’ values to determine the probability distribution P(Tile Val | neighbor tile vals,). This becomes somewhat of a chicken and egg scenario — I need my neighbor’s value to get my value and my neighbor needs my value to get theirs — so our procedural flow of picking tiles which we used for Markov Chains will not work here.

Metropolis Hastings Algorithm

To get around this problem, we will use a Markov Chain Monte Carlo method called Metropolis Hastings. The algorithm first randomly selects tiles to place in our level.

At each time step, we randomly select two tiles to swap. We compute the likelihood of our level before and after the swap and determine if we should make the swap using the relationship between the likelihoods.

Initialization

To create our staring grid, I juts shuffled our example content (you could also just randomly sample from the occurrence distribution we created).

flat_list = [item for sublist in example_level for item in sublist]
random.shuffle(flat_list)
rows, cols = len(example_level), len(example_level[0])
grid = [flat_list[i * cols:(i + 1) * cols] for i in range(rows)]

Propose

In our propose method, we select two random tiles to swap.

def propose(grid):
rows = len(grid)
cols = len(grid[0])
choices = [ i for i in range(rows * cols)]
entries = random.sample(choices, 2)
while grid[entries[0] // cols][entries[0] % cols] == grid[entries[1] // cols][entries[1] % cols]:
entries = random.sample(choices, 2)

return [(entries[0] // cols, entries[0] % cols), (entries[1] // cols, entries[1] % cols)]We then return the row col pairs of those tiles.

We keep sampling until we are swapping two tiles with different values to avoid pointless iterations.

Accept

To determine if a grid is better before or after, we need to calculate the likelihoods of the gird before and after.

def calculate_liklihood(grid, position, adjacency_frequencies, occurences):
tile_val = grid[position[0]][position[1]]
prob = 1

for offset in [(0,1), (1,0), (0,-1), (-1,0)]:
row = position[0] + offset[0]
col = position[1] + offset[1]
if row < 0 or row >= len(grid) or col < 0 or col >= len(grid[0]):
prob *= occurences[tile_val]/sum(occurences.values())
continue
offset_tile_val = grid[row][col]
adjacencies = adjacency_frequencies[offset_tile_val]
total = sum([count for count in adjacencies.values()])
if tile_val in adjacencies:
adjacency_count = adjacencies[tile_val]
else:
adjacency_count = 0
prob *= adjacency_count / total

return prob

We multiply the four probabilities computed using the neighboring cells’ adjacency maps to determine how likely our current tile value is. If we are on an edge we just use the occurrence map to determine the likelihood — an advanced solution might add some padding types of tiles to help guide the ground and sky placement.

def accept(grid, tile_one_pos, tile_two_pos, adjacency_frequencies, occurences):
before_grid = deepcopy(grid)
before_likelihood = 1
for pos in [tile_one_pos, tile_two_pos]:
for offset in [(0,0), (0,1), (1,0), (0,-1), (-1,0)]:
row = pos[0] + offset[0]
col = pos[1] + offset[1]
if row < 0 or row >= len(grid) or col < 0 or col >= len(grid[0]):
continue
before_likelihood *= calculate_liklihood(before_grid, (row, col), adjacency_frequencies, occurences)

after_grid = deepcopy(grid)
temp = after_grid[tile_one_pos[0]][tile_one_pos[1]]
after_grid[tile_one_pos[0]][tile_one_pos[1]] = after_grid[tile_two_pos[0]][tile_two_pos[1]]
after_grid[tile_two_pos[0]][tile_two_pos[1]] = temp

after_likelihood = 1
for pos in [tile_one_pos, tile_two_pos]:
for offset in [(0,0), (0,1), (1,0), (0,-1), (-1,0)]:
row = pos[0] + offset[0]
col = pos[1] + offset[1]
if row < 0 or row >= len(grid) or col < 0 or col >= len(grid[0]):
continue
after_likelihood *= calculate_liklihood(after_grid, (row, col), adjacency_frequencies, occurences)
if before_likelihood == after_likelihood == 0:
after_likelihood = 0.5
before_likelihood = 0.5

should_accept = random.choices([False, True], weights=[before_likelihood, after_likelihood])[0]

if should_accept:
return after_grid
else:
return before_grid

We calculate the multiplication of the tile likelihoods that would be altered by the swapping of the two tiles (up, down, left, right, center) x2. We then use these likelihoods to determine if we should accept the swap or not.

Loop

We run this propose accept loop for a set number of steps — you could also choose to run it until you reach an agreeable total grid likelihood.

for i in range(100000):
tiles_to_swap = propose(grid)
grid = accept(grid, tiles_to_swap[0], tiles_to_swap[1], tile_adjacency_frequencies, occurences)
display_level_image(tile_dim, grid, tile_name_to_image)

Below is the result I got from running, it was pretty good at figuring out local relations, and clustering like tiles together. It was also able to capture adjacency relationships (e.x. dirt tile never touches sky, power up block is surrounded by sky) but could use some help with relational positions (the star should be on top of the cliff edge not to the left of it).

We could augment this approach by having different resolutions of levels (generating a meta larger layout and then generate the sub layouts), we could also experiment with including the diagonals as neighbors or expanding our neighborhood and including some cardinality on the edges to indicate neighbor distance.

Summary

  • Markov Random fields capture adjacency without direction
  • Since they have no directionality they cannot be procedurally created like the Markov chain example and are generated using Monte Carlo methods
  • This approach worked well on the micro level but failed to capture macro patterns and had no way of encoding directionality of adjacency

Conclusion

In this article we explored how we can add probability to our level generation which gave us the power to generate levels based on observed distributions and create never before seen layouts. Probability also enabled us to synthesize content in ways we could not do before because of the implied directionality of the rule dependencies. Adding probability to our PCGML algorithm will allow us to create a lot more diverse content which has not been seen, but may lead to content that is considered invalid. The decision of how much freedom to give the generation should be one taken on a case by case basis.

--

--

Matthew MacFarquhar
Matthew MacFarquhar

Written by Matthew MacFarquhar

I am a software engineer working for Amazon living in SF/NYC.

No responses yet