# **QR Decomposition based FPGA using Householder Transformation**

**This Work By** 

Safaa S. Omran<sup>1</sup> and Ahmed K. Abdul-abbas<sup>2</sup>



QR decomposition is widely used in many fields as <u>data processing</u>, <u>image processing</u>, <u>communication systems</u>, multiple input multiple output (<u>MIMO</u>), <u>radar systems</u>, <u>linear algebra</u> and so on. QRD has been computed by using the Householder transformation, givens rotation and Gram Schmidt, these algorithms are mostly used and basic ways for computing a QRD.



### Outline:

- > Introduction:
- Householder Transformation
- Proposed Design
- Data presentation
- Result and Design Comparison
- > Conclusion



#### **Introduction:**

QR decomposition plays axial role in solving linear least square problems, computing solution of linear systems of equation and computing eigenvalues. Such problems appear in navigation to wireless communication systems. For a matrix A of size  $m \times n$ , QR decomposition is given by.

## A = QR

where Q is  $[m \times m]$  orthogonal matrix and R is  $[m \times n]$  upper triangle matrix. There are many methods to perform QR decomposition named Gram-Schmidt (GS), Givens Rotation (GR), Householder Transform



#### **Householder Transformation**

Householder reflections (transformations) is another method to decompose A matrix into Q and R matrixes

The steps for computation a QRD based on HT is as following:

Assume a matrix 
$$A = \begin{bmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{bmatrix}$$
  
**a1**



#### **Householder Transformation**

$$V1 = a1 - sign(a_{11}) \left( \sqrt{\sum_{j=1}^{3} a_{j1}^{2}} \right) e1 \dots (1)$$
$$H1 = I - 2 \frac{V1 V1^{T}}{V1^{T} V1} \dots (2)$$

Where H is householder matrix.

$$A^{1} = A^{0} H1 = \begin{bmatrix} x & x & x \\ 0 & x & x \\ 0 & x & x \end{bmatrix} \dots (3)$$

A



#### **Householder Transformation**

$$R = \begin{bmatrix} x & x & x \\ 0 & x & x \\ 0 & 0 & x \end{bmatrix}$$

$$Q = [H2H1]^{-1} = H1H2$$
 .... (4)



#### **Proposed Design**

The main objective of this paper is a new design of QRD processor based on HT algorithm to achieve high speed of execution and low latency. This design based on new architecture to perform its purpose. Verilog HDL (Hardware Description Language) has been used to implement and test this work.



#### **Proposed Design**

The new design of this householder processor consists of eight main units and three secondary units. The main units are:





Figure 1: Block diagram of HHP.





Figure 2: Norms unit.





Figure 3: VTV unit.



Figure. 4: VVT unit.





Figure 2: flow chart of the HT process



#### **Data presentation**

Many applications rely upon fixed-point arithmetic and its inherentspeed advantage over floating-point. The fixed point method is used to represent the numbers system in this architecture for householder transformation processor. Figure-6 shows the fixed point format, where it is consisting of 32 bits, 20 bits (from 0 to 19) for fraction part and 11 bits (20 to 30) for integer part and last bit (31) for sign number.

| Sign  | Integer value |    | decimal fraction |   |
|-------|---------------|----|------------------|---|
| 1 bit | 11 bits       |    | 20 bits          |   |
| 31    | 30            | 20 | 19               | 0 |

Figure 6: Fixed point format.



#### **Result and Design Comparison**

#### Table 1: comparison with other work.

| Acco. \ work    | [3] | [12] | [13] | our |
|-----------------|-----|------|------|-----|
| Word length     | 16  | 16   | 16   | 32  |
| Latency (cycle) | 40  | 139  | 67   | 65  |
| pipeline        | yes | yes  | yes  | no  |



#### Conclusion

In this paper 32-bits HH processor is designed to perform QR decomposition based on Householder transformation algorithm by using Verilog HDL language. This design is implemented by using FPGA kit type of virtex – 7. **65** clock cycles are needed to perform QR process without pipelining. In future work to enhance this design with pipelining structure, it will be able to perform QR process every **17** clock cycles.



### Thank you for your listening



