MRTP: Mobile Ray Tracing Processor With Reconfigurable Stream Multi-Processors for High Datapath Utilization
This paper presents a mobile ray tracing processor (MRTP) with reconfigurable stream multi-processors (RSMPs) for high datapath utilization. The MRTP includes three RSMPs that operate in multiple instruction multiple data (MIMD) mode asynchronously to exploit instruction-level parallelism. Each RSMP...
Uloženo v:
| Vydáno v: | IEEE journal of solid-state circuits Ročník 47; číslo 2; s. 518 - 535 |
|---|---|
| Hlavní autoři: | , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
New York, NY
IEEE
01.02.2012
Institute of Electrical and Electronics Engineers |
| Témata: | |
| ISSN: | 0018-9200, 1558-173X |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | This paper presents a mobile ray tracing processor (MRTP) with reconfigurable stream multi-processors (RSMPs) for high datapath utilization. The MRTP includes three RSMPs that operate in multiple instruction multiple data (MIMD) mode asynchronously to exploit instruction-level parallelism. Each RSMP is based on single instruction multiple thread (SIMT) architecture to exploit thread-level parallelism. An RSMP consists of twelve scalar processing elements (SPEs) that run multiple threads in parallel synchronously: twelve scalar threads or four vector threads depending on an operating mode. A low datapath utilization caused by a branch divergence in SIMT architecture is improved by 19.9% on average by reconfiguring twelve SPEs between scalar SIMT and vector SIMT with 0.1% area overheads. Special function instructions occupy only 2% ~ 8% of kernel instructions so that a partial special function unit (PSFU) is implemented instead of a large dedicated SFU. The access conflicts with a look-up table (LUT) caused by concurrent accesses of twelve SPEs are reduced by a table loader (TBLD). The TBLD monitors concurrent requests from twelve SPEs and reduces an access count to LUT by distributing a coefficient to multiple SPEs with only one read-access to LUT. MRTP with area of 4 × 4 mm 2 has been fabricated in 0.13 μm CMOS technology. MRTP achieves a peak performance of 673 K rays per second while consuming 156 mW at 100 MHz with V DD = 1.2 V . |
|---|---|
| ISSN: | 0018-9200 1558-173X |
| DOI: | 10.1109/JSSC.2011.2171417 |