Implementations of the Hough Transform on the Embedded Multicore Processors
Embedded multicore processors represented by FPGAs and GPUs have lately attracted considerable attention for their potential computation ability and power consumption. Recent FPGAs have hundreds of embedded DSP slices and block RAMs. For example, Xilinx Virtex-6 Family FPGAs have a DSP48E1 slice, which is a configurable logic block equipped with fast multipliers, adders, pipeline registers, and so on. They also have a dual-port memory with 18Kbits as a block RAM. Meanwhile, recent GPUs can be used for general purpose computation. Users can develop parallel programs running on GPUs using programming architecture called CUDA provided by NVIDIA. The main contribution of this paper is to present two implementations of the Hough transform on the FPGA and the GPU. The first idea of the implementations is an efficient usage of DSP slices and block RAMs for FPGAs, and the shared memory for GPUs. The second idea is to partition the voting space in the Hough transform and the voting operation is performed in parallel. The implementation results show that the Hough transform for a 512Ã—512 image with 33232 edge points can be done in 135.75Î¼s and 637.88Î¼s on the FPGA and the GPU, respectively. On the other hand, a conventional CPU implementation runs in 37.10ms. Thus, both implementations achieve a sufficient speed-up.Â
- There are currently no refbacks.