Video on the PIC32 uMMB, an impossible task?

Video playback on a QVGA Mobile display controlled by an 80MHz microcontroller with 32KB or RAM seems an impossible task. Yet it is "possible" and the solution, although with a number of  limitations, is somewhat simple.

First, to understand the enormity of the challenge, let's take a look at the math.
A 320x240 (QVGA) display contains 76,800 pixels.
Each pixel is represented by 16-bit (2-bytes). This means that updating the content of the screen (1 frame) requires 153,600 bytes of data.
Luckily the Mikroelektronika PIC32 uMMB board uses the 16-bit PMP port of the PIC32 to transfer one entire pixel of information (16-bit) at a time to the Mobile display. Still ~150KB of data must be fetched from somewhere.
If we look at a typical (US) video frame rate of 30 frames per second (fps), you must multiply those numbers by the same factor and we obtain:  4.6MBytes/s (This number will be slightly smaller for typical European frame rate of 25fps)
There are several ways we can look at this number to try and make sense of it, but I think the most intuitive way is by comparing it to the system clock speed of the PIC32 for example. At 80MHz, even assuming the best possible conditions (100% cache efficiency) we see how we have (at best) time to execute ~15 instructions in between each byte of data. I think it is immediately apparent how this is going to be a very tight fit, if possible at all.

Painting 30 fps

The first question should actually be weather the display would be able to accept the information at such a high speed (the answer is yes, luckily), and only then we should evaluate weather the PIC32 PMP port is capable of such a high transfer rate. In fact it is possible to spend less than 30 system clock cycles to transfer a pixel worth of information (16-bit) to the display even when we write our code in C!
To achieve this you will have to make sure to avoid all unnecessary overhead. Assuming we are going to base the code on the standard Microchip Application Library (MAL) Graphic module (Primitives), there is simply no way we can use the PutPixel() function as is.
If you have ever inspected how the display driver (HX8347-D.c) eventually implements the PutPixel() function call:
    if(_clipRgn){
        if(x<_clipLeft)
            return;
        if(x>_clipRight)
            return;
        if(y<_clipTop)
            return;
        if(y>_clipBottom)
            return;
    }

    CS_LAT_BIT = 0;
    SetAddress(x,y);
    WriteData(_color);   
    CS_LAT_BIT = 1;

You know this would produce a very inefficient sequence.
In particular the critical issues are the following:
  1. The PutPixel() function assumes a total random access to the graphic display, but the way we would need to paint the screen is actually very sequential, as a result we don't need to set the address (over and over) when the display controller is automatically incrementing it after each pixel update.
  2. The display is continuously selected (CS_LAT_BIT=0) before and deselected (CS_LAT_BIT=1) after each pixel update, but when the entire frame is updated, we can simply select it once at the beginning of the frame and then deselect it only at the very end.
  3. Obviously there is no reason to check the region clipping, in fact we can program the display controller to open the entire display as a "window" for continuous streaming in such a way that we won't even need to  take care of resetting the address pointer when starting a new row update effectively wrapping around automatically.
With the considerations above we can now write an efficient frame update loop as follows:
        // paint a frame
        CS_LAT_BIT = 0;
        SetAddress( 0, 0 );   
        RS_LAT_BIT = 1;        // data from here on...
        // 2. for each line
        for( j=0; j < V_MAX; j++)
        {                               
            //  paint each line                      
            i = H_MAX;
            while( i--) {
                PMDIN1 = *pV++;
                PMPWaitBusy();
                }
         }  // for each line        
         // deselect display      
         CS_LAT_BIT = 1;            

This segment of code does work for any given pair (H_MAX, V_MAX), transferring pixel data from a buffer (pointed to by pV) and can achieve the required 30fps rate, but now the question is if and how are we going to be able to get the data in the buffer in the first place.

To compress or not to compress

Video data lends itself to compression since most of the information present in each frame is likely to be repeated in the following ones, or at least it will be similar and likely displaced by a finite amount.

Without compression, video files can be huge and most importantly the bandwidth required to transfer them in/out of media can quickly become prohibitive.

In our QVGA 16-bit example, 1 minute of video would require 250MBbytes of storage space, a large amount, but frankly considering the current prices of FLASH memory, still acceptable ( 16 min video = 4GB < USD 4.0?)

But it is when we look at the bandwidth requirements: 46Mbit/s, that we realize we have hit a wall. This is much larger than the 10Mbit/s (theoretical) of a Full Speed USB connection, so USB memory sticks are not an option. The other option for the uMMB board is to use an SD/MMC card, and here again we hit a hard wall: 25Mbit/s is the theoretical number for the (SPI mode) interface available. 


So compress we must in one way or another!


JPEG/MPEG

The Graphic Library offers support for a few image compression formats including: GIF and JPEG. The latter is very similar to the techniques used in video compression algorithms such as MJPEG and MPEG1/2. Unfortunately when we test the typical decompression speed of the library (even after removing the video update) we get approx 500ms/frame or 2fps.

It is clear that it makes no sense to attempt a higher complexity (although more bandwidth efficient) algorithm such as H.264 (aka MP4).

Without dedicated hardware support, the PIC32 cannot handle any serious video decompression task on a full QVGA display.


YUV

There two more options available to us at this point:
  1. Chroma (only) compression
  2. Decimation
The first method refers to the use of (stream/file) formats that separate the luminance (B/W) information from the chroma (color) information. It is a known fact that our eyes retina contains more "rods" (B&W sensors) than "cones" (color sensors), and even the most basic video systems (NTSC TV) have taken advantage of this characteristic to "trick" the human eye and reduce considerably the system bandwidth while keeping the perceived resolution the same.  Several video formats often referred to as the YUV (i.e. YUV 4:2:2) represent a bridge between a true video compression and raw data.
The Y (luminance) signal is typically sent with full resolution/bandwidth
The U and V (chroma components) are sent only for groups of pixels (2 or 4 at a time).
The difficulty in the reconstructions of each video frame is then reduced to simple arithmetic operations (and a few look up tables) to obtain the RGB components once the Y, U and V element of each pixel are separated.
Unfortunately, these formats are often defined as "planar", that is each one of the three components (YUV) is sent for the entire frame sequentially. First all the Y, than all the Us, than all the Vs. This means that the decoder will need to buffer the entire frame worth of data before being able to "decompress". For a QVGA display (YUV4:2:2) this means 76K byte of Y data,   followed by 30K bytes of U, assuming the V data is decoded on the fly. Even on the PIC32MX5/6/7 series, with 128Kbytes of RAM this (106K) is already more than we can afford (once you take into account, file system buffers, stack, heap and audio buffering...) on a PIC32 uMMB board, with a PIC32MX460 (32Kbytes of RAM), the technique is completely out of reach.

So we have to fall back on our positions and consider a more extreme type of "compression": decimation.  If in YUV4:2:2 we are supplying U and V information only for each 2x2 block of pixel, we can think of a decimated QVGA video as a case where the luminance information is decimated as well. Another way to look at it is that we will effectively transmit only a raw QQVGA image scaled up (expanded) to fill the entire QVGA screen.

The loss of resolution will be noticeable to our eyes, especially when static images will be present,  but highly animated and colorful scenes will suffer the least.

Audio Video Interleaved

If a raw QQVGA is our final choice for the video stream, we can now recompute our bandwidth needs.
  • Each frame will now be only 160x120 pixel and that means (only) 38,400 bytes.
  • At a rate of 30fps, that is 1,152,000 bytes per second equivalent to approximately 150 systems cycles of the PIC32 per pixel.
  • At a rate of 25fps, that is 9,600,000 bytes per second equivalent to approximately 190 systems cycles of the PIC32 per pixel.
Not only we know we will have enough time to "paint" those pixels (scaled up in 2x2 blocks) but we have a pretty good chance of being able to read them from actual mass storage media and possibly add an audio stream too!
In fact the mass storage media bandwidth has become now our greatest concern.
At ~12Mbit/s the 30fps and 25fps are still out of reach for a Full Speed USB mass storage interface, but an SD card will be up to the task even when we take into account all the overhead of a file system (we will use the Microchip MDD File System Library)  and that of a standard file format such as AVI.
The AVI file format is quite an obvious choice at this point as the preferred "container".
If you have "digested" the WAV format and understood the key concepts at its root (chunks nesting), you will be comfortable opening and decoding an AVI file in no time.
An AVI file allows you to define two (or more) "streams" of data, think of them as alternate
packets of data, that provide the video and audio "content" of each frame interleaved (the "I" in AVI).
// Open file and find first movi chunk

    // 2. open AVI file
    if ( ( fp = FSfopen( fname, "r")) == NULL)
        return -3;  // cannot find the input file 

    // 3. decode RIFF format chunks
    r = FSfread( (void*)&ck, sizeof(CHUNK), 1, fp);
    if ( r == 0)    return -100;    // unexpected eof
    
    // 4. check that type is correct
    if (( ck.ckID != RIFF_DWORD) || ( ck.ckType != AVI_DWORD))
        return -4;  // not an RIFF-AVI file
        
    // 5. look for a list chunk
    r = FSfread( (void*)&ck, sizeof(CHUNK), 1, fp);
    if ( r == 0)    return -100;    // unexpected eof
    
    if ( ck.ckID != LIST_DWORD)
        return -5;  // incorrect format, list chunk not found
    
    // 6. skip until you reach the "movi" chunk
    while ((( ck.ckID == LIST_DWORD) && (ck.ckType != MOVI_DWORD)) || 
            ( ck.ckID == JUNK_DWORD))
    {
        // skip this chunk
        eof = FSfseek( fp, ck.ckSize-4, SEEK_CUR);
        if ( eof)   return -100;    // unexpected eof
        
        // fetch next chunk header
        r = FSfread( (void *)&ck, sizeof(CHUNK), 1, fp);
        if ( r == 0)    return -100;    //unexpected eof
    }

FFMPEG

Testing the actual application performance requires a bit of patience and experimenting with various resolution and fps rates.
FFMPEG is an open source command-line type of tool that is supported on all major PC/MAC operating systems and will allow you to convert and re-format the AVI files.
Here is the typical set of switches I used for a 15fps video with 22KHz 8-bit audio:  

ffmpeg -i 'input.avi' -vcodec rawvideo -pix_fmt rgb565 -s qqvga -ar 22050 -ac 1 -acodec pcm_u8 -r 15 -t 10 output.avi

Audio Buffering Optimization

After very little experimenting with audio and video playback, you will soon realize that the most difficult thing is to keep the audio flowing smoothly. The tiniest interruption will be immediately noticed. On the contrary you will be hardly pressed to notice a missed (skipped) video frame. So the audio buffering has to be our top priority and we will devote as much as RAM space as available creating a queue of audio buffers. We will also be ready to skip a frame should we find that our audio buffers are soon to be starved.


MDD File System Optimization

Once you have all the pieces together you will notice that the maximum frame rate is still short of 10fps with audio and 20fps (or less) for pure video.
With simple debugging techniques (a few I/O lines and an oscilloscope/logic analyzer) you will soon be able to determine that the true bottleneck is still the mass storage media access. The MDD File System is actually very flexible and complete, but definitely not optimized for maximum reading speed. While its core was derived (many years ago) from the fileio.c and sdmmc.c files I published in my book, it has since been modified extensively to support sub-directories, FAT32, and to offer the maximum compatibility with different types of SD cards.
Luckily we need to focus only on the FSread() function and by comparing it (fsio.c) with my original book's version (fileio.c), you will notice how its main (inner) loop has been re-written to transfer only one byte at a time from the "sector" buffer.
Since our video application is reading an entire audio buffer at a time or an entire video row (320 bytes) at a time, it is apparent how this is going to influence heavily the overall performance of the system.
The snippet of code below shows the only few lines of FSread() that need to be modified to make sure that as many bytes as possible are transferred in a single memcpy() call at once.
        // 1.2.4.1 see how many bytes we can extract right away
        left = dsk->sectorSize - pos;
        if ( left > 0)
        {
            // grab immediately as many as you can
            if ( len < left) left = len;  
            memcpy( pointer, dsk->buffer+pos, left);
            pointer += left;
            pos += left;
            len -= left;
            seek += left;
            readCount += left;
        }    
A second optimization is obtained by the use of 32-bit SPI buffering. This requires a simple modification to the MDD_Sector_Read() routine as in the segment below:
<
            // 32-bit SPI mode for increased performance
            // enter 32-bit mode
            while ( SPISTATbits.SPIBUSY);
            u = SPICON1 & 0xFFFF;
            SPICON1 = 0;
            SPICON1 = u | 0x800;        // 32-bit mode
    
            for(index = 0; index < gMediaSectorSize; index+=4)      //Reads in a sector of data (512 bytes)
            {
                SPI2BUF = (unsigned) 0xFFFFFFFF;   
                while( !SPISTATbits.SPIRBF);            // wait transfer complete
                u = SPI2BUF;                // read the received value
                buffer[index]   = (char) (u>>24);
                buffer[index+1] = (char) (u>>16);
                buffer[index+2] = (char) (u>>8);
                buffer[index+3] = (char) (u);
            } // for 

            // back to 8-bit mode
            while ( SPISTATbits.SPIBUSY);
            u = SPICON1 & 0xFFFF;
            SPICON1 = 0;
            SPICON1 = u & (~0x800);

User Interface

The user interface provided in this example project is very basic. A minimalist menu system is used to select the desired file name from a list obtained from reading the root directory of the uSD card.
To ensure the application is compatible with the uMMB as well as the larger MMB MX4 and MX7 boards, the menu is controlled exclusively by the following touch commands:
  • Touching the upper third of the screen will move the "cursor" up the list
  • Touching the lower third of the screen will move the "cursor" down the list
  • Touching the middle / right side of the screen will select a file to play.
  • Playback can be interrupt at any time by touching the screen.

Downloads

Here is the zip file containing the project source files you will need to assemble an AVIPlayer for the uMMB board (including a simplified touch sensing interface):

AVIPlayer.zip

This application uses only the primitive level of the Microchip Graphics Library, so make sure to have it installed (MAL 2.0 or 2.01 was used in this project) next to the project and MMB lib in the recommended directory structure:
/AVIPlayer
/Microchip
/MMB

The MMB lib directory contains all the MikroE Multimedia Boards support files adapted from the original Microchip Explorer16 examples or added to provide access to the Mikroe unique features. For a more detailed description see the MMB page on this site.

This is a ready to run .hex file you will be able to program right into the uMMB board using the PICkit3 or any other compatible PIC32 programmer:

AVIPlayer.hex

Showtime.avi