HARDWARE

FIRMWARE

OPTIMIZATION

COMMUNICATION

MANUFACTURING

Super Pixie (Unreleased)

The Smartest Arduino Display Ever


Read The Alpha Firmwware (C/C++)


Super Pixie uses 106 RGB LEDs to display a full alphanumeric character set using vector graphics, allowing for advanced animation controlled over UART.

Table of Contents

  1. Massive Prototype
  2. Fast Line Rasterizer
  3. UART Chain
  4. Much, Much Smaller Prototype
  5. Fancy Transitions On Board
  6. Very Satisfying Numbers
  7. How About A Hundred More Characters?
  8. Chain Optimization
  9. Self Healing

Massive Prototype

Made from cut sections of 60 LED/meter strips and interlocking 3D printed parts, the original prototype stood about a foot tall, and featured 8 independent lines of 16 addressable LEDs. Having only 16 LEDs per line and driving each in parallel allows for frame rates well above 1,000 FPS:

$ \text{Time per LED} = \frac{24 \text{ bits}}{800,000 \text{ bits per second}} = 0.00003 \text{ seconds} = 30 \text{ µs} $

$ \text{Time for 16 LEDs} = 30 µs × 16 + 50\text{ us (latch)} = 530\text{ µs} $

$ \text{Maximum refresh rate} = \frac{1\text{ second}}{530\text{ µs}} = 1,886\text{ Hz} $

Of course, that doesn’t quite account for overhead in the actual C++ rendering process before data is transmitted to the LEDs, but I still achieved >600 FPS in practice.


Fast Line Rasterizer

Using a slightly modified Xiolin Wu line algorithm which can draw lines less than 1px in length or width, I convert vector font data in memory to an 8x16 raster image with anti-aliasing and subpixel positioning. Applying a position, scale, and rotation to a vector image is much simpler work than rotating the equivalent raster image, so very early on I had everything necessary for fancy animations and transitions.

Now that I’d proved the ultra fast refresh rates and vector font method were possible, I wanted to design the communication method used to control each display.


UART Chain

A what? It’s a strange method, but Super Pixie chains UART ports together to operate instead of a more common method like SPI or I2C. It’s stil very performant, but has some distinct advantages of its own:

SPI

  • Fast AF
  • Uses three GPIO
  • Addressed with one extra GPIO per device (ehh)

I2C

  • Relatively slower, still plenty fast
  • Not self-addressing
  • Impossible to automatically discover physical device order

UART CHAIN

  • Whatever maximum common baud rate is possible for all devices in the chain (fast enough for my application)
  • Self-addressing
  • Uses only two GPIO for a chain of any valid length
  • Works out of box on any microcontroller with a UART or SoftSerial

Super Pixies inherently exist in a physical position relative to one another that must be known when showing data so that the chain correctly reads “hello world” and not “doll howler”. The simplest answer would be to just shift ASCII data down the line until latching it, but Super Pixies are a little more involved. They support custom vectors, different transitions, any color combo you want, a backlight, etc..

To handle this complexity, Super Pixies instead send tiny packets back and forth, which contain descriptors about their purpose and content. Instead of shifting “A” directly to a display, you’d send a packet telling the Super Pixie at that position to begin a transition to a the vector of “A” already stored in its flash. It’s a few bytes extra, but the overhead is worth the flexibility.

The MAIN unit of the chain (the user’s microcontroller that’s commanding Super Pixies) has a UART TX connected to the UART RX of the first device in the chain. Depending on the state of the chain device, it will either ingest or forward any bytes sent to it.

This unique control over the propagation of data allows for chain length discovery, automatic address assignment based on physical position within the chain, and pseudo-bidirectional communication between any two points with minimal latency. For a chain unit to respond back to MAIN, it has to send a packet addressed for MAIN to it’s child link on the right. The packet eventually cycles back to MAIN two hops later.

USER CONTROLLER   ------------- SUPER PIXIES -------------

+--------+        +--------+     +--------+     +--------+
| MAIN   |        | PIX 1  |     | PIX 2  |     | PIX 3  |
|        |        |        |     |        |     |        |
| RX  TX + -----> + RX  TX + --> + RX  TX + --> + RX  TX |
+-+------+        +--------+     +--------+     +-----+--+
  ^                                                   |
  |                  <--- RETURN LINE                 |
  +---------------------------------------------------+

To make them self-addressing, the following sequence runs at boot:

  1. MAIN sends an ASSIGN ADDRESS 1 packet @ BROADCAST
  2. PIX 1 receives this, assigns itself Address 1 (instead of default ADDRESS NULL)
  3. PIX 1 enables propagation, so that future bytes received are passed
  4. PIX 2 heard nothing yet
  5. MAIN sends an ASSIGN ADDRESS 2 packet @ BROADCAST
  6. PIX 1 ignores this (already assigned) PIX 2 has address assigned instead
  7. PIX 2 enables propagation
  8. This repeats until MAIN tries to assign a fourth address
  9. There are only three devices, so this packet will fully loop back to MAIN
  10. MAIN can’t have it’s address assigned (Permanent address of 0)
  11. MAIN now knows there are three devices in the chain based on final address assignment which failed.

Strobe Warning

Here’s a look at a UART Chain discovery process shown on a high speed camera, with debugging LEDs enabled. This normally happens within 150ms, and without any visual readout.

Once the chain is established, the MAIN controller can individually command any single unit by sending data to their physical address. Bytes propagate through every device in the chain with only a single byte of delay per unit in the chain. This delay is tiny, and imperceptiple under normal circumstances. Below was my very first UART chain, where an ESP8266 is commanding three ESP32s to blink their LEDs in sequence.


Much, Much Smaller Prototype

While designing the UART chain method, I repurposed my own Pixie Chroma boards as debug displays, showing dots of various colors to indicate things like “is propagation enabled” or displaying their assigned addresses.

Originally, Super Pixie was meant to have an 8x16 display, but I had a weird realization: two Pixie Chroma PCBs make up a strangely spaced but TINY 7x15 matrix with some leftover LEDs below for debugging. Since I wrote the whole system around vectors instead of rasters, I just updated a few variables and suddenly my Pixie Chromas functioned as tiny Super Pixie displays, just with one less row and column.

I’d lost the high refresh rates since it was now 140 pixels on a single GPIO instead of 16 pixels on 8 (>8x slower) but that still resulted in 235 FPS. Now I was able to prototype both the rasterization code and the UART chain code in a single device.


Fancy Transitions On Board

Remember the scaling and rotation? Now I had them working along with self-addressing, so that you can send a single packet down the line with a number to show, and each Super Pixie will decide whether or not to flip to a new character onscreen. This transition is handled fully internally, meaning that after sending a single packet your microcontroller is free to do anything else while a transition is still occurring. Super Pixies handle the hundreds-of-FPS rendering for you.


Very Satisfying Numbers

I always love watching odometers turn past 9, or digital clocks change to 10:00. Smooth transitions make sure that anyone else as weird as myself gets a good show:


How About A Hundred More Characters?

A hundred more charcters? What is this, The Simpsons? I added the full printable ASCII charset.


Chain Optimization

I was able to identify and reduce errors that caused chain de-sync or jitter. Now they even perform perfectly in slow motion:


Self Healing

If anything goes wrong with a given Super Pixie, the Watchdog Timer resets it. This causes it to lose an address, which can break comms. When this happens anywhere in the chain the MAIN controller will no longer recieve its own data back in time, causing it to quickly reset and reassign the chain units.