In order to calculate the VRAM destination for these routines, we
need to compute the row pointer into map RAM:
```
HL = _SCRN0 + 32 * (SCY/8 - 2)
```
The obvious approach of shifting `SCY/8 - 2` right 5 times won't work
since overflow is possible. The first implementation worked by adding
`SCY/8-2` to `_SCRN0` 32 times in a loop.
Now, `SCY/8 - 2` is loaded into a 16-bit register and a 16-bit shift
macro is expanded 5 times before being added to HL.
The first approach took 802 * 4 cycles. This has been reduced to 25 * 4
cycles.