Kernel program for driving ws2811 LEDs on a Raspberry Pi

August 09, 2023

In this aritcle I will describe a kernel program that can be loaded on a Raspberry Pi to control ws2811 addressable LEDs. Unlike other programs that can be used to drive these LEDs on a Raspberry Pi, this program can drive multiple strings in parallel. This allows the user to break up long strings of LEDs into sections to allow them to be updated faster. The kernel program uses the Raspberry Pi's secondary memory interface (SMI) and direct memory access (DMA) hardware to drive the LEDs.

User-space programs can interact with the kernel program through the /dev/ws281x file. User-space programs must first use ioctl calls to configure the size and number of LED strings connected to the Raspberry Pi. Then, they can write the LED pixel data directly to the /dev/ws281x file. Finally, the user can either manually update the LEDs with an ioctl call, or configure the kernel program to automatically update after a write.

The source code for this project can be found on Github

`ws2811` LED protocol

Each string of LEDs requires only one output on the Raspberry Pi. Normally, the Pi keeps the output low, which prepares the LEDs to receive data, while allowing them to keep displaying the previous values. To write data, the Pi sends out a stream of pulses, where a zero is given by a short pulse (400ns on, 800ns off), and a one is given by a long pulse (800ns on, 400ns off). Data is sent out in 24-bit blocks, with one block for each LED. The first 8 bits of each block are for the intensity of the green component, the second are for the red component, and the third are for the blue component. Each byte is transmitted with the most significant bit first.

SMI

The secondary memory interface (SMI) can be used to write data to the Pi's output ports with precise timings. The SMI hardware is configured to update the output every 400 nanoseconds. This means that every 400ns, the SMI hardware takes a value from its first-in-first-out (FIFO) buffer and outputs it to the output pins. Since each bit is 1200ns, there are 3 SMI writes per bit. The first write sets all outputs to 1, the second outputs the data, and the third write sets all outputs to 0. This creates a long pulse for ones and a short pulse for zeros. This can be seen in the above image, where each vertical line represents an SMI write.

In the above image, channel 1 receives 1100011 and channel 2 receives 1011010. To output the data to the SMI hardware, the program must first replace every 1 with 110 (long pulse) and every 0 with 100 (short pulse). Thus, channel 1 would become 110110100100100110110 and channel 2 would become 110100110110100110100. Then, the program must combine the two bitstreams into a sequence of 16-bit words that are output to the SMI hardware. In this case, that sequence is (in hexadecimal):
0005 0005 0000 0005 0001 0000 0005 0004 0000 0005 0004 0000 0005 0000 0000 0005 0005 0000 0005 0001 0000

DMA

In order to maintain the required precise timing even when the CPU is doing something else, the program must use the DMA hardware to automatically load the next values into the SMI buffer. The DMA hardware can load information from one part of the Pi's memory to another automatically and without CPU intervention. To prevent a buffer underflow or overflow, the DMA hardware is configured to load the next value when the number of bytes in the buffer goes below a given threshold.

Potential issues when using the program

The main issues or challenges I encountered when creating this program were:

The documentation for the SMI hardware is hard to find as it is not included in the datasheet of the BCM2835 chip. This made it difficult to debug issues with the program.
The SMI hardware may also be used by other programs or features of the Pi. This will interfere with the output of the program and cause the LEDs to not update or display the wrong colors. For example, in order to make the program work, I had to disable the video output of the Pi with tvservice --off.
Since the DMA has no access to the CPU caches, the program must acquire uncached memory where the DMA control information and the data to be transferred can be stored. However, the process my program uses for acquiring uncached memory only works on older versions of Linux. The program seems to work on Linux version 4.4. I have not found a way to acquire uncached memory on newer versions of Linux.
The Raspberry Pi has 16 DMA channels, which may be used by other programs. One must select an unused DMA channel, otherwise the program may interfere with other programs on the Pi and vice versa.

Search This Blog

Matthias' Blog