The difference between PIC and. FPGA

written by:
Bojan Jovanovic
Faculty of Electronic Engineering
University of Nis
A. Medvedeva 14
18000 Nis
Serbia
e-mail: bojan@elfak.ni.ac.rs

 

This is the fact that we are living in a fully technological age. If we go back a few decades in the past we will find “the computer machines” which weight the tones and are very bulky, consume large amount of power, cost thousands of bucks and are really slow. For example, let’s take a look on the performances of the first computer – ENIAC, designed in 1946. It occupied an area of about 60 m2, consumed unbelievable 174 KW of power (turning it on was often causing power outage in the entire surrounding neighborhood), weighted over 25 tones – with the speed of operation of modest 0.05 MIPS (Million Instructions Per Second). From that point now, we are witnessing amazing pace of the technological improvement. Today’s processors are implemented in different electronic devices which can fit in your pocket. Its speed of operation is up to 159000 MIPS (Intel i7 core), consumed power is measured in mW (even in μW), occupied area in mm2, and the price in tens of dollars. The question is what gonna happen in the future? Will the pace of the technological improvement be continued? One of the answers on previous questions can be found in the book “The Singularity is near” written by the American futurist Raymond Kurzweil. Also, there is a documentary movie with the same title. Briefly, Kurzweil states that the exponential trends of the technological development, expected advances in the field of bio/nano-technology and artificial intelligence can bring us some kind of self-aware machines and even the elimination of death (through the process of our reengineering)! Although I myself don’t share his point of view, simply because with the obvious increase of the technology – there is an, obvious as well, ethical decline of the mankind, I recommend interested reader to watch the movie at least.

Anyway, let’s keep on the present. One of the technological benefits of our days are programmable logic devices. They are amazing indeed, offering us a wide variety of benefits. They are often the part of safety critical or mission critical embedded systems in order to cope with the problem of component obsolescence. This means that, if you have programmable logic device on the PCB board of your electronic system, you can very easily update or upgrade it. It is not necessary to enter the time/money critical process of PCB redesign. All you need is to refresh the code of your programmable device and to obtain updated or even brand new electronic system.

In the rest of this article I will focus on the PIC (Peripheral Interface Controller) microcontrollers and the Field Programmable Gate Arrays (FPGAs) – two state-of-the-art technologies in the world of programmable devices. Although both are programmable, their working philosophy is pretty different. So, where the difference comes from? To find the answers let’s take a look under the hood of PICs and FPGAs to discover their architectures.

PIC organization

 

The PIC microcontroller contains the same main elements as any computer system (see Figure 1):

  • Processor
  • Memory
  • Input/Outpout

Unlike personal computers (PCs) where all the above mentioned elements are provided as a separate chips, in the case of PIC microcontroller all these elements are on one chip. As you can see from Figure 1, they are mutually connected by the three different buses: Address, Data and Control Bus. Depending on the PIC type, the width of these buses can vary from 8 up to 64 bits. Using them processor writes data to or reads data from memory and I/O subsystems.

Figure 1. Basic PIC microcontroller organization

            Processor (or Central Processor Unit – CPU) is in charge of all the work of data manipulation and decision making. It commonly contains the arithmetic logic unit (ALU) which, as its name suggests, performs arithmetical (addition, subtraction, multiplication) or logical (shifting or rotating left/right) operations under its inputs (see Figure 2).

Using I/O subsystem PIC is capable to communicate with the outside world.

            Memory subsystem consists of two types of memories: Program Memory and Data Memory. First one is used to store program code i.e. program instructions whereas the second one is usually used to store the data. Data Memory is of RAM type containing a special registers like SFR (Special Function Register) and GPR (General Purpose Register) as well as the variables that we use during the program execution.

Every PIC requires a precise timing reference, which is supplied by a clock and a crystal. In most "older" microcontrollers, the clock-oscillator is external to the PIC and requires an extra chip. In most recent microcontrollers, the clock-oscillator is usually incorporated within the PIC. The quartz crystal, however, because of its bulk, is always external to the system.

            When running your program code, PIC microcontroller executes instruction by instruction of your program. There are a few different stages occurring during one instruction cycle:

  • Instruction Fetch – IF,
  • Instruction Decode – ID,
  • Instruction Execute and Data Memory Read/Write – IE,
  • Write Back – WB

During the IF stage instruction fetcher reads instruction from the Program Memory. PIC microcontrollers have finite number of different possible instructions (usually up to 35).

Instruction decoder decodes the instruction and consequently decides what type of instruction is being executed. If necessary, it also fetches data (ALU operands) from the Data Memory.

When in IE phase, processor performs ALU operation. Finally, in the WB phase, PIC writes the results back to the Data Memory. After that, PIC is ready to process the next instruction from the Program Memory. As shown in Figure 3, each instruction stage requires one clock cycle. Consequently, for one line of your program code to be executed you will usually need a four clock cycles.

CPU block diagram

Figure 2. CPU architecture

 

Figure 3. Four-stage PIC instruction cycle

 

Having in mind that the stages comprising PIC instruction cycle are mutually independent, the total time needed for instruction processing can be compressed as shown in Figure 4. During program running, PIC uses all its resources in parallel. While currently fetches instruction n, PIC decodes instruction n-1, executes instruction n-2 and writes back instruction n-3, as highlighted in Figure 4. This is known as the pipelining. By applying four-stage pipelining technique you will have to wait for PIC only 7 clock cycles to execute four instructions – instead of 16 cycles from Figure 3!

Figure 4. Four-stage pipeline PIC instruction cycle

There are some other speed increasing techniques, like branch prediction, instruction reordering and register renaming, but they are out of focus of this article.

I would like to conclude this PIC section by emphasizing that all the above mentioned is not at all an exhaustive analysis of the PIC architecture. This is just a simplified description so you can have a basic idea about the PIC behaviour.

FPGA organization

 

If we now switch to the FPGAs and peek under its package, we will discover the architecture shown in Figure 5. As its name suggests, FPGA is a two dimensional array of configurable logic blocks (LBs, CLBs) with electrically programmable interconnections between them. Thanks to the feature of programmable interconnections, FPGAs can be reprogrammed for a different functionality in a matter of milliseconds. The array is surrounded by programmable input/output blocks that connect the chip to the outside world.

In recent years, new types of resources (memory blocks, embedded multipliers) have been added to FPGAs making them capable of implementing many high performance computing applications.

Each CLB block contains of a few slices and is tied to the switch matrix so it could be connected with other CLBs. If we go deeper and dive into the slice, we’ll find a few look-up tables (LUTs). A LUTs serve for generating any logic function that can have up to six input bits. They can be also configured as shift registers or distributed SelectRAM memory. Beside LUTs, the slices contain flip-flops, carry logic blocks and multiplexers. One specific structure of the CLB block is depicted in Figure 6.

The programmable routing in an FPGA provides connections among logic blocks and I/O blocks to complete a user-designed circuit. It consists of wires and programmable switches (which can be SRAM or anti-fuse based) that form the desired connections.

Figure 5. Basic FPGA architecture

To accommodate a wide variety of circuits, the interconnect structure must be flexible enough to support widely varying local and distant routing demands together with the design goals of speed performance and power consumption. There is a different number of the routing architecture strategies (hierarchical, island-style etc). Currently, most commercial SRAM-based FPGA architectures use island-style architecture. Its detailed organization is depicted in Figure 7.

            And to summarize – FPGA ingredients are configurable logic blocks (CLBs) mutually connected with the matrix of programmable interconnections. Inside CLBs, there are a few configurable slices containing LUTs, carry blocks, flip-flops, multiplexers etc. Although entire FPGA structure might seem pretty complicated, the fortunate thing is that, while designing for FPGA, you don’t need to carry about its internal structure. You only need to describe your design in a language which is understandable for the FPGA CAD tools (some of the hardware description languages - HDLs) and they will do all the necessary job for you – starting from design compilation up to its placement and routing.

So, if you describe the hardware structure of your simple design with the following line of code:

 

 y<=a or (b and c)

 

after its successful compilation, the design will be firstly partitioned into smaller blocks which can be placed into CLBs. After partitioning, these blocks will be placed into CLBs and mutually connected (during the routing phase) with the interconnection matrix. One possible scenario of FPGA implementation of the function y is depicted in Figure 8.

The main feature of the FPGA-based designs is that all the used resources (CLBs) are working in parallel.

 

 

 

Figure 6. Inside the Altera adaptive CLB

Figure 7. Detailed routing architecture of an island-style FPGA

Figure 8. One possible FPGA implementation scenario

 

PIC & FPGA mutual comparisons

 

 

When bearing in mind all the above mentioned about the internal structures and processing particularities of PICs and FPGAs, we can now face each other. It would be the best to do this by giving them to process the same task. Let the task be to implement the function F determined by the three integer parameters, F(A,B,C), on the following way:

 

F=(A*B+C)+(A+B),

 

where A and B are 8-bits wide whereas C is 16-bit operand.

To implement the function F using PIC microcontroller you need to describe it in C or assembly language, whereas for the FPGA implementation some of the hardware description languages (VHDL or Verilog) must be used. C and VHDL codes of the function F are listed in Table 1.

            As you can se from Table 1, to implement function F in PIC microcontroller it must be broken down into four sequential instructions. Each instruction needs to be fetched from the program memory (IF stage), decoded (ID stage), executed (EX stage) and written back the intermediate results to the data memory. When using four-stage pipelining technique, to obtain the value for F you will be obliged to wait for 7 clock cycles.

            When implementing function F in FPGA device, you need to determine its possible hardware structure and to describe it in HDL code. Figure 9 shows block diagram of the function F. After compilation of the HDL code, function F will be partitioned into the smaller blocks which will be placed inside FPGA’s CLB blocks and interconnected. All the hardware (CLB blocks) will be working in parallel so that you only need 2 clock cycles to obtain the bits of the function F ! One clock cycle to store the operands A, B and C into the registers and another to read the result F from the output register.

So, the result of the speed comparison is: 7 PIC clock cycles versus 2 FPGA clock cycles for the implementation of the same simple function F! Since FPGAs generally could be clocked with the higher clock frequencies than PICs, speed improvement of FPGA devices over PICs is even more emphasized.

Figure 9. Hardware structure of the function F

 

Table 1. C & VHDL codes for the PIC & FPGA implementation of the function F

C code

VHDL code

int f_function(int A, int B, int C){

   int F, X,Y, Z;

      X=A*B;
      Y=C+X;
      Z=A+B;
      F=Z+Y;

return(F);

}

library IEEE;

use IEEE.STD_LOGIC_1164.ALL;

use IEEE.STD_LOGIC_ARITH.ALL;

use IEEE.STD_LOGIC_UNSIGNED.ALL;

 

entity f_function is

port ( clk : in  STD_LOGIC;

  A,B: in  STD_LOGIC_VECTOR (7 downto 0);

  C: in  STD_LOGIC_VECTOR (15 downto 0);

  F:out STD_LOGIC_VECTOR (15 downto 0)

      );

end f_function;

 

architecture Behavioral of f_function is

 

 signal A_reg: STD_LOGIC_VECTOR (7 downto 0);

 signal B_reg: STD_LOGIC_VECTOR (7 downto 0);

 signal C_reg: STD_LOGIC_VECTOR (15 downto 0);

 signal X: STD_LOGIC_VECTOR (15 downto 0);

 signal Y: STD_LOGIC_VECTOR (15 downto 0);

 signal Z: STD_LOGIC_VECTOR (7 downto 0);

 signal F_reg: STD_LOGIC_VECTOR (15 downto 0);

 signal F_pom: STD_LOGIC_VECTOR (15 downto 0);

 

begin

 

    X <= A_reg*B_reg;

    Y <= C_reg + X;

    Z <= A_reg + B_reg;

    F_reg<=Z+Y;

 

        reg: process (clk) is

        begin

           if  clk’event and clk=1 then

               A_reg<=A;

              B_reg<=B;

              C_reg<=C;

              F_pom<=F_reg;

            end if;

        end process reg;

 

    F <= F_pom;

 

end Behavioral;

Significant speed improvements of FPGA devices over its PIC counterparts will be even more pronounced during the implementation of some more demanding functions. For loops are typical examples. Let’s consider the following case of two nested for loops:

 

for (i=1; i<M; i++){

            for (j=1; j<N; i++){

 

some instructions (I) here…

                        }
            }

PIC microcontroller will need roughly M*N*I clock cycles to process the loops, where I is the number of instructions inside the loop body.

When implementing the same nested loops in FPGA device, CAD implementation tools will unwrap them so that M*N replicas will be placed in FPGA CLBs – each replica processing I instructions with the appropriate inputs. Consequently, processing the loops will require a few FPGA clock cycles. However, this significant improvement in the processing speed of FPGA devices will be paid with the very increased implementation area as well as with the increased power consumption.

So, what could be concluded after all. Designing for an FPGA requires a Hardware Description Language (HDL) while PIC compilers “understand” C or assembly language. HDLs are absolutely nothing at all like C. Whereas a C program is a sequential series of instructions, which are processed line by line, HDL describes a concurrent circuit which will be executing in parallel. It is a very different world and if you try to build a circuit in an FPGA while thinking like a software developer it will hurt.

A PIC is time-limited. In order to accomplish more work, you need more processor cycles. On the other hand, a FPGA is space-limited. In order to accomplish more work, you merely add more circuits. If your FPGA isn’t big enough, you can buy a bigger one. The matrix of interconnections is the bottleneck of the FPGA devices, limiting its speed of operation as well as increasing its power consumption.

Finally, it is worth to mention that even today, there is a synergy between FPGA devices and PIC microcontrollers. In your FPGA device you can have a build-in (embedded) “hard core” processor (like IBM PowerPC from Figure 5) which you can program in C or assembly – and use in parallel with the rest of your FPGA CLBs. In this way you can benefit from both worlds. Also, there is a possibility to implement some “soft core” processors (like Xilinx’s PicoBlase or Altera’s Nios) described in HDL. But like the C-VHDL compilers, these cores tend to be a little bloated and slow.

One fresh idea – to make the array of the processors (Purpose Processor Array) onto a single silicon chip which will be, similarly to FPGA’s CLBs, mutually interconnected is emerging on the horizon. You can read more about in the November 2012 issue of the EE Times magazine.

http://www.electronics-eetimes.com/en/french-startup-takes-on-fpgas-with-multicore-dsp-chip.html?cmp_id=7&news_id=222914154