This post was originally published on this site

Long time ago I blogged about Conways’s Game of Life in a MODEL clause, which also made it into Chapter 6 of my book Practical Oracle SQL.

Today Samuel Nitsche applied this approach to solve Day 11 of Advent of Code 2020 (follow Sam’s blog for solving Advent of Code in SQL.)

Unfortunately it was awfully slow 😒. So here’s how to make it fast again – with some interesting findings on aggregates in MODEL clause…

For this post I’m starting out with code outlined in the chapter 6 sample code for the book.

Meaning I have this table:

create table conway_gen_zero (
x integer not null
, y integer not null
, alive integer not null check (alive in (0,1))
, constraint conway_gen_zero_pk primary key (x, y)

But instead of the small 10×10 grid used in the book and original blog post, I’ll fill this table with an initial generation zero population on a larger 100×100 grid:

truncate table conway_gen_zero;

insert into conway_gen_zero (x, y, alive)
select * from (
with numbers as (
select level as n from dual
connect by level <= 100
), grid as (
x.n as x
, y.n as y
from numbers x
cross join numbers y
), start_cells(x, y) as (
select 19, 10+level from dual connect by level <= 78 union /* distinct */
select 20, 10+level from dual connect by level <= 78 union /* distinct */
select 80, 10+level from dual connect by level <= 78 union /* distinct */
select 81, 10+level from dual connect by level <= 78 union /* distinct */
select 10+level, 19 from dual connect by level <= 78 union /* distinct */
select 10+level, 20 from dual connect by level <= 78 union /* distinct */
select 10+level, 80 from dual connect by level <= 78 union /* distinct */
select 10+level, 81 from dual connect by level <= 78
, g.y
, nvl2(sc.x, 1, 0) as alive
from grid g
left outer join start_cells sc
on sc.x = g.x
and sc.y = g.y


That inserts the 10.000 rows of a 100×100 grid with ALIVE=1 populated for a big hash-tag (not that it matters for the demo which rows have ALIVE=1 and which have ALIVE=0.)

I do a single iteration of my original code:

with conway as (
select *
from conway_gen_zero
dimension by (
0 as generation
, x, y
measures (
ignore nav
rules upsert all iterate (1)
alive[iteration_number + 1, any, any] =
case sum(alive)[
generation = iteration_number,
x between cv() - 1 and cv() + 1,
y between cv() - 1 and cv() + 1
] - alive[iteration_number, cv(), cv()]
when 2 then alive[iteration_number, cv(), cv()]
when 3 then 1
else 0
, listagg(
case alive
when 1 then 'X'
when 0 then ' '
) within group (
order by x
) cells
from conway
group by generation, y
order by generation, y;

This single iteration takes around 22 seconds in my environment. 22? That seems awfully slow. And worse is that each subsequent iteration takes progressively longer.

What to do about this? 🤔

Well, my first thing to try is often to add AUTOMATIC ORDER to the RULES clause, since this sometimes lets the optimizer think of a better ordering of evaluation. Unfortunately it doesn’t work here – for some unknown reason adding AUTOMATIC ORDER for this very specific statement causes an „ORA-32607: invalid ITERATE value in MODEL clause“. I don’t know why and I won’t use the time today to dig deeper.

Then I wanted to see if perhaps I could rephrase the calculations to do some of the pieces by iteratively building on previous results instead of summing everything from scratch in every cell.

So as a starting point in this rephrasing, I decided to eliminate the SUM aggregation. The SUM in the above query makes a sum of all neighbouring cells as well as current cell, which then necessitates subtracting current cell so the total calculation in the CASE gets the sum of all neighbouring cells only.

I can get the sum of all neighbouring cells in a different way by simply adding the values from each of the 8 cells manually by changing the RULES clause like this:

rules upsert all iterate (1)
alive[iteration_number + 1, any, any] =
alive[iteration_number, cv()-1, cv()-1]
+ alive[iteration_number, cv()-1, cv() ]
+ alive[iteration_number, cv()-1, cv()+1]
+ alive[iteration_number, cv() , cv()-1]
+ alive[iteration_number, cv() , cv()+1]
+ alive[iteration_number, cv()+1, cv()-1]
+ alive[iteration_number, cv()+1, cv() ]
+ alive[iteration_number, cv()+1, cv()+1]
when 2 then alive[iteration_number, cv(), cv()]
when 3 then 1
else 0

I tried to execute this just to verify that the results were identical before I continued rephrasing the code.

And it turned out that simply doing this made a single iteration execute in around 70 milliseconds instead of 22 seconds! Around 300x faster!

No need to try and develop this further with complex building on intermediate results (as was my original plan) – it turned out that simply removing the aggregate and replacing it with simple cell lookups was more than fast enough.

Then I tried with multiple iterations to see what would happen – if the alternative solution also would go progressively slower for each iteration added:

  • ITERATE(1)
    SUM aggregate: 00:21.58
    8 x cell lookups: 00:00.09
  • ITERATE(2)
    SUM aggregate: 00:52.11
    8 x cell lookups: 00:00.14
  • ITERATE(3)
    SUM aggregate: 01:31.67
    8 x cell lookups: 00:00.20
  • ITERATE(4)
    SUM aggregate: 02:20.56
    8 x cell lookups: 00:00.27
  • ITERATE(5)
    SUM aggregate: 03:17.61
    8 x cell lookups: 00:00.34

And no, it did not. The time for the cell lookup method grows linearly for each iteration, adding about 0.07 seconds for each iteration.

I notice an interesting pattern here:

Using the SUM aggregate method, iteration 1 is roughly 20 seconds, iteration 2 adds roughly 30 seconds, iteration 3 adds roughly 40 seconds, iteration 4 adds roughly 50 seconds, iteration 5 adds roughly 60 seconds. So each iteration becomes slower and slower.

ITERATE(1) adds the 10.000 generation one cells on top of the 10.000 generation zero cells in the table. ITERATE(2) adds another 10.000 generation two cells. And so on. So the first iteration works on 20.000 cells (generation 0 and 1), the second iteration on 30.000 cells (generations 0, 1 and 2), and so on. Looks very much like each iteration takes roughly a second per thousand cells existing.

My theory is (though I don’t have any proof besides this empirical evidence) that using SUM with a cell specification that picks cells with a „complex window“ makes each individual SUM call go through all the cells and check if the cell is within the window. Such a theory could explain why each iteration becomes slower, as each iteration then would check more and more cells.

We as humans can easily see that we only need to check for dimension values where generation is exactly one less and x,y coordinates are just one more or less – but having such a „window“ with both „before“ and „after“ cells might make it so hard for the state machine to do an efficient job, that it defaults to a code path that simply says „scan all cells and check if each cell satisfies the filter predicates“.

On the other hand the individual cell lookups are presumably a very standard simple code path in the state machine, so it is extremely optimized and gets only those cells exactly that it need.

It’s only a theory so far and this is just a single test-case – your mileage may (and probably will) vary!

But I think I’d say to you anyway, that if you have „relatively complex“ cell referencing in your aggregate functions in your MODEL clause RULES, then please test thoroughly all edge cases, including larger datasets. If your test reveals that it becomes progressively slower the larger the datasets are (and/or the more iterations you make), then see if you can rewrite to simpler cell referencing.

Thanks to Samuel for the heads-up, or I’d never have guessed this 😇.