Optimal Addition Sequences for Integer Multiplication

2025-06-30
10 minutes

Most “modern” computers have built-in hardware for integer multiplication. Note the quotes around modern. The original IBM PC had hardware multiplication. The Sega Genesis had hardware multiplication. But older CPUs, especially embedded ones, would often leave this out. One notable example is the Zilog Z80, which was still produced until 2024 (the newer eZ80 has multiplication). The Game Boy also uses a CPU similar to a Z80, and also does not have multiplication.

But that doesn’t mean that this computers can’t multiply integers. You may remember from elementary school that you can do multiplication by repeated edition. a * 8 = a + a + a + a + a + a + a + a. This isn’t the most efficient way of doing this. You can instead do b = a + a, then c = b + b, then d = c + c. This uses 3 additions rather than 7.

You may also be familiar with using bit shifting to do this. a « n is equivalent to a * 2 ^ n, so you could use a « 3 in a single operation. This is true, but the Game Boy can only shift by 1 in a single instruction, so it isn’t actually any better than multiplication.

I wanted to write a program that found the optimal “route” to do any multiplication by a constant on a Game Boy. The rest of this post will describe how I did this.

Method

Its easier to think about this problem as addition rather than multiplication. Instead of thinking about adding together a to get a * n, you can add together 1’s to get n. This is an equivalent way of thinking about the problem, and its easier to explain. Define c(n) which is the number of additions needed to reach n.

While thinking about how to implement it, my first thought was that it would be a good place to use dynamic programming. Given a + b = n, if you know c(a) and c(b), then you also know that one more addition is needed to get c(n). The problem has overlapping subproblems. Since a and b will be lower than n, you can calculate c(n) in increasing order of n, and never have to repeat work.

To start, c(1) = 0, since 1 is the multiplicative identity. For any n > 1, calculate all the pairs which add up to n in a loop (at this stage, you are only considering positive numbers). Calculate the cost for each pair (a, b) using c(a) + c(b) + 1. Out of all these costs, choose the minimum, and save that as the value for c(n). Loop to whatever you choose as your maximum n (255 for me), and you’re done!

Now, if you actually did this, you’ll notice an issue. The cost for every number is just n-1, which is exactly what you get from that naive repeated addition earlier. In fact, it is doing that repeated addition from earlier. To understand why the above solution is wrong, consider the following example:

b=a+a
c=a+a
d=b+c
return

This is a way that you can multiply a by 4. Obviously, it is not optimal. The variables b and c store the exact same value, so there’s no need to compute it twice. You can add in a check when doing that cost calculation, for this revised formula:

c(n) = (a == b) ? (c(a) + 1) : (c(a) + c(b) + 1)

This seems like a good solution, but it was really just one case of a larger bug.

This whole algorithm doesn’t work

I had actually written about half of this post before I realized this and had to go fix it. The issue is that this algorithm doesn’t handle overlapping calculations right. That fixes the most obvious example, but imagine something like this: 8 = 6 + 2. The algorithm will insert the cost of 6 and 2. In reality, the cost of 2 should be 0, because calculating 6 calculated 2 as a subproblem.

Another implication of this is that with how the algorithm works, c(n) can’t always be the smallest possible cost. It also needs to consider if choosing some other route of addition would solve subproblems better.

Luckily, there is one conjecture that helps keep the runtime complexity under control. You only need to consider the smaller value itself as a subproblem, and not overlapping subproblems between them. That sentence was confusing, so here’s an example:

n = x + 4

You are trying to get the cost of n, and are testing whether adding 4 to some subproblem x is a good route. Calculating x also calculates 2, but doesn’t calculate 4. You could use that 2 to make 4, adding 1 to the cost. But this also means that (x+2) will be considered later:

n = (x + 2) + 2

You can know that the cost for x+2 is at most 1 more than the cost of x. The 2 at the end is free, since it has already been calculated. The second route has a cost that is at most the same or smaller than the first route, and it is also easier to compute.

A new algorithm

In addition to defining c(n) as the minimum number of additions needed to reach n, you also define c(n,s) as the minimum number of additions to reach n in a way that also calculates s. c(n, s) is ∞ if it is impossible to compute s as a subproblem of computing n.

We also should define m(n) and m(n, s), which each define all the numbers calculated in the route for c(n) and c(n, s) respectively. m(n) and m(n, s) always include n. Any time c(n) or c(n, s) is stored, you should assume that m(n) and m(n, s) is also stored.

Start the algorithm by doing basically the same thing as for the old algorithm. After finding the minimum cost, look through m(n). For all k in m(n), store c(n, k) = c(n).

Then, check if you can take advantage of a subproblem. Assuming a is the larger number, calculate c(a, b) + 1. If this cost is smaller than the one you stored for c(n), then replace it with the new cost, and also replace all the values for c(n, k) as shown above.

In theory, another step is needed for correctness. In practice, it didn’t change the outcome at all (I’m not fully sure why). After both of these steps, and still for each pair a and b, we check for each possible subproblem.

Make a new nested loop s which goes from [1, n). Calculate the cost as c(a, s) + c(b) + 1. If this is less than c(n, s), store that, and also update c(n, k) again for all m(n, s).

Adding subtraction

The Game Boy also supports subtraction, and we should take advantage of that to make better formulas. You can essentially copy the function for addition and change it to support subtraction. There is, however, one issue with this: Since a and b can now be greater than n, we can’t just do the calculations in order. An improvement late in the list might improve some value earlier in the list. In practice, you can solve this by looping over the list multiple times until values stop changing. Since what I described is pretty rare, you only have to do this twice.

Conclusion

You can get my source code here

This algorithm was harder than I thought it would be to implement, but overall it was fun. There were a few things that I left out while writing. One of those is proving that conjecture from earlier. Another one is register allocation. On the Z80 (and the Game Boy), you really only have 5 or so general purpose registers to work with. The add instruction always outputs to the a register, and the first operand is always in the a register. You have to do some moves in addition to the adds to make it all work, and this could change what the optimal pattern is. I’ll leave that as an exercise for the reader. What I will do is give you an example of what the assembly for the hardest multiplication would look like:

; Multiply a with 233
ld b,a 
add a,a
add a,a
add a,a
ld c,a
add a,a
add a,a
ld d,a
add a,a
ld e,a
add a,a
add a,e
add a,d
add a,c
add a,b
ret

Table

A table of how many additions are needed for each multiplication.

nc(n)
10
21
32
42
53
63
74
83
94
104
115
124
135
145
155
164
175
185
196
205
216
226
236
245
256
266
276
286
297
306
316
325
336
346
357
366
377
387
397
406
417
427
437
447
457
467
477
486
497
507
517
527
538
547
558
567
578
588
598
607
618
627
637
646
657
667
678
687
698
708
718
727
738
748
758
768
779
788
798
807
818
828
838
848
858
868
879
888
899
908
919
928
938
948
958
967
978
988
998
1008
1019
1028
1039
1048
1059
1069
1079
1088
1099
1109
1119
1128
1139
1149
1159
1169
1179
1189
1199
1208
1219
1229
1239
1248
1259
1268
1278
1287
1298
1308
1319
1328
1339
1349
1359
1368
1379
1389
13910
1409
1419
1429
1439
1448
1459
1469
1479
1489
14910
1509
15110
1529
1539
15410
1559
1569
15710
1589
1599
1608
1619
1629
1639
1649
1659
1669
16710
1689
16910
1709
17110
1729
17310
17410
17510
1769
17710
17810
17910
1809
18110
18210
18310
1849
18510
1869
18710
1889
1899
1909
1919
1928
1939
1949
1959
1969
19710
1989
19910
2009
20110
20210
20310
2049
20510
20610
20710
2089
20910
21010
21110
21210
21310
21410
21510
2169
21710
21810
21910
22010
22110
22210
22310
2249
22510
22610
22710
22810
22910
23010
23110
23210
23311
23410
23510
23610
23710
23810
23910
2409
24110
24210
24310
24410
24510
24610
24710
2489
24910
25010
25110
2529
25310
2549
25510