Fusion 2048 running on an Arduino Mega with bitbanged video output

In this article, I will describe a program that allows an Arduino Mega to output video with 3-bit color to a VGA monitor at a resolution of 150 by 100 pixels with a refresh rate of around 54Hz. No external video driver chip is used, so the Arduino must generate the vertical and horizontal timing signals at the correct frequencies as well as output the video data. To demonstrate the Arduino's video output, I programmed the game Fusion, which is a variant of the game 2048 that uses the first 20 elements instead of the first 11 powers of two. I chose this variation because the two-letter element names work better with the low output resolution than the four-digit numbers.

The game consists of a 5 by 5 grid, where each cell in the grid can contain a tile corresponding to an element or no tile. The game is controlled by four buttons, each corresponding to a direction. When a button is pressed, all of the tiles slide in that direction. Adjacent tiles in the direction of the move of the same element are merged into the next element. Each merge increases the score, with merges of higher elements increasing the score by a larger amount. Additionally, a hydrogen tile is inserted in a random empty cell following every move. The original version can be found here

VGA timing

While most monitors support a number of formats, for this project, I chose the 640 by 480 pixel format. However, since the Arduino is neither fast enough nor has enough memory to use the full resolution, the Arduino actually outputs only 150 pixels across, which are interpreted by the monitor as 640 pixels. Theoretically, the Arduino could output all 480 lines, but this would produce pixels that are much wider than they are tall. Thus, each line is repeated four times, giving a total of 120 possible lines. However, the Arduino does not have enough memory for all 120 lines, so only the first 100 are used. The horizontal and vertical sync signals output by the Arduino are compliant with VGA standard for a resolution of 640 by 480, except for the refresh rate, which was reduced to 54Hz from 60Hz.
640x480 standard, 60Hz refresh rateArduino output, 54Hz refresh rate
PixelsTimeClock cycles (16MHz)Time
Sync pulse 96 3.813 μs 64 4.000 μs
Back porch (blanking interval after sync) 48 1.907 μs 36 2.250 μs
Visible area (video output enabled) 640 25.42 μs 450 28.13 μs
Front porch (blanking interval before sync) 16 0.635 μs 10 0.625 μs
Total 800 31.78 μs 560 35.00 μs

While it would be theoretically possible for the Arduino to output horizontal and vertical sync pulses with the correct timing, I chose to reduce the refresh rate to increase the resolution. Since 3 Arduino clock cycles are needed per pixel, if only 25.42 μs per line were available for the video data, only 135 pixels could be generated. These timings seem to work with the monitor I used, despite the non-standard refresh rate.

The horizontal sync pulse is generated by the Arduino's TIMER0 module, with pin 4 (OC0B) configured as a PWM output. The period of the PWM signal is 560 clock cycles, and the length of the pulse is 64 clock cycles. The COMPA (comparator A) interrupt triggers at the beginning of every cycle (and thus, at the beginning of the sync pulse) and prepares and outputs the video data for that line. The details of this are explained later.

Vertical timing

640x480 standard, 60Hz refresh rate Arduino output, 54Hz refresh rate
Lines Time Clock cycles (16MHz) Time
Sync pulse 2 63.56 μs 1120 70.00 μs
Back porch (blanking interval after sync) 33 1.049 ms 18480 1.155 ms
Visible area (video output enabled) 480 15.25 ms 268800 16.80 ms
Front porch (blanking interval before sync) 10 317.8 μs 5600 350.0 μs
Total 525 16.68 ms 294000 18.38 ms

The vertical sync pulse is generated by the Arduino's TIMER1 module, with pin 11 (OC1A) configured as a PWM output. The period of the PWM signal is 294000 clock cycles, and the length of the pulse is 11250 clock cycles. The COMPB (comparator B) interrupt triggers 19320 clock cycles after the beginning of the sync pulse, or 18200 clock cycles after the end of the sync pulse, and is responsible for enabling the video transmission. Since this is less than the 18480 clock cycles given above, the interrupt triggers midway through the last line in the back porch interval, ensuring that the next line is enabled. The COMPC (comparator C) interrupt triggers 243040 clock cycles after the beginning of the sync pulse, or 223440 clock cycles into the video frame, and is responsible for disabling the video transmission. This is much shorter than the actual time (268800 cycles) because only the first 400 of the 480 lines are actually used.

Video data storage

The Arduino outputs a picture with 3-bit color at a resolution of 150 by 100 pixels, giving a total of 15000 pixels. Since two pixels can be packed into one byte, all of the pixel information can be stored in 7500 bytes, which just barely fits into the memory of the Arduino Mega. Bits 0-2 give the color for the left pixel, and bits 4-6 give the color for the right pixel.

Video transmission

As explained above, the interrupt for comparator A of the TIMER0 module (TIMER0_COMPA_vect in the code) is responsible for outputting the video data. The code for the interrupt, including comments, is reproduced below.

push r0 ; Store registers we will modify to the stack in r0, 0x3f ; This is required when working with interrupts push r0 push r24 push r25 push r30 push r31 lds r24, flags ; Load the variable "flags" into r24 ; The least significant bit of flags indicates ; whether video transmission is enabled. During the ; front porch interval, sync pulse, and back porch ; interval, this bit is zero, disabling transmission sbrs r24, 0 ; If the least significant bit of r24 is cleared ... rjmp done ; ... jump to the end of the function. lds r30, idx ; Load the index of the first pixel in the current lds r31, idx+1 ; row ; Each line must be repeated four times in order to ; produce square-ish pixels. row_num keeps track of ; how many repetitions have been transmitted. lds r24, row_num ; Load the row index into r24. inc r24 ; Increment the row index sbrc r24, 2 ; If the row index is 4... adiw r30, 40 ; ... add 40 ... sbrc r24, 2 adiw r30, 35 ; ... and then add 35 to the Z pointer ; for a total of 75. Thus, Z now points to the ; next row sbrs r24, 2 ; Otherwise ... adiw r30, 0 ; ... add 0 ... sbrs r24, 2 adiw r30, 0 ; ... and then add 0 again ; Either way, we add something to Z. This ensures ; that we use the same number of cycles every time andi r24, 3 ; Truncate the top 6 bits of the row counter ; so it is never greater than 3 sts row_num, r24 ; Store the final row counter back into row_num sts idx, r30 ; Store the Z pointer back into the row index sts idx+1, r31 ldi r24, lo8(pixels) ; Z is only an index; it does not refer to an actual ldi r25, hi8(pixels) ; location in the RAM. We must add to the index the add r30, r24 ; first address of the pixel array to convert the adc r31, r25 ; index to an address in r24, 0x26 ; Since there can be some uncertainty in how the in r25, 0x26 ; TIMER0_COMPA interrupt fires, we must check the sbrc r24, 0 ; TCNT0 register to determine the state of the rjmp no_del ; internal counter. If the counter is not where we nop ; expect it to be, we introduce a delay to nop ; compensate. no_del: nop ... nop ; A total of 27 nops for a delay of 27 clock cycles ; This aligns the beginning of the video data with ; the end of the back porch interval ld r24, Z+ ; Load the byte pointed to by the Z pointer and ; increment the Z pointer to point to the next byte ; in the line. This takes two clock cycles out 2, r24 ; Output the byte to PORTA. Since only the lower ; three outputs of PORTA are enabled, the first pixel ; is output to the monitor. This takes one clock ; cycle. nop ; Do nothing for one cycle swap r24 ; Swap the upper and lower 4 bits of the current ; byte. This swaps the left and right pixels. ; Together with the nop, this takes two clock cycles. out 2, r24 ; Output the right pixel. This takes one cycle. Note ; that for both pixels, the time between pixels is 3 ; clock cycles. ... ld r24, Z+ ; This block is repeated a total of 75 times. Each out 2, r24 ; block produces two pixels, for a total of 150 nop ; pixels per line swap r24 out 2, r24 nop ; Finally, output 0 to PORTA. The video data is ldi r24, 0 ; blanked in preparation for the front porch interval out 2, r24 ; and the sync pulse. done: ; Restore the values we saved earlier pop r30 pop r31 pop r25 pop r24 pop r0 out 0x3f, r0 pop r0 reti ; Return from the interrupt

Game logic

Storing the grid

The 5 by 5 grid is stored as an array of 25 integers, grid, with 0 representing an empty cell, 1 representing hydrogen, 2 representing helium, and so on. Since motions of the tiles are animated, we also need a second array, new_grid, to store the new positions of the tiles immediately after a move, and another array, delta, to store the distance each tile must travel.

Moving left

Since moving left is a horizontal move, each row is independent of the other rows, so the program can iterate over each row and determine the next state separately. This is done by iterating over the tiles in that row, and keeping track of the next available empty spot, and the value of the last tile it placed. If the current tile cannot be merged into the previous tile, then the tile is put in the next available spot, and the index of the next available spot is incremented. A similar method is used when moving in other directions.

Animation

In order to produce the animations, the program keeps track of the state before a move as well as the distance each tile in the old state moved. Each frame, an offset counter is incremented, and the moved tiles are displaced by the value of this counter. Once a tile has reached its target position, it stops moving, ensuring that tiles only move their prescribed distance. After all tiles have reached their target positions, the game board is switched to the post-move state, so that merged tiles are shown with the correct values.

Randomly adding new tiles

After a move, the program selects a random open position in the grid and inserts a hydrogen tile in that location. However, generating random numbers on an Arduino can be difficult. The program uses a floating analog input as a source of randomness. However, this floating input can have any value and may not change much from one measurement to the next. Thus, the measured value is fed into an 8-bit cyclic redundancy check (CRC) algorithm. This ensures that even if two consecutive measurements are close together, the corresponding "random" values are not close together. The CRC is cumulative, meaning it is not reset before every analog measurement.

Comments

Popular posts from this blog

Improving and calibrating the capacitive water sensor

Lightsaber prop - first prototype

Turn a buck converter module into a buck-boost converter with only two components