Preview only show first 10 pages with watermark. For full document please download

Combining System Scenarios And Configurable

   EMBED


Share

Transcript

AF T Combining system scenarios and configurable memories to tolerate unpredictability ´ SANZ, MANUEL PRIETO and JOSE ´ IGNACIO GOMEZ ´ CONCEPCION Dpto. de Arquitectura de Computadores y Autom´atica and ANTONIS PAPANIKOLAOU, MIGUEL MIRANDA and FRANCKY CATTHOOR Inter-University Microelectronics Center Process variability and the dynamism of new applications increase the uncertainty of embedded systems and force designers to use pessimistic assumptions, which have a tremendous impact on both the performance and the energy consumption of their memory organizations. In this paper, we introduce an experimental framework which tries to mitigate the effects of both sources of unpredictability. At compile time, an extensive profiling helps us to detect system scenarios and bounds application dynamism. At the organization level, we incorporate an heterogeneous memory architecture composed by several configurable memories. A calibration process and a run-time control system adapt the platform to the current application needs. Our approach manages to reduce significantly the energy overhead associated to both variability and application dynamism (up to 60% according to our simulations) without compromising the timing constraints existing in our target domain of dynamic periodic multimedia applications. Categories and Subject Descriptors: C.3 [Computer systems organization]: Special-purpose; application-based systems—Real-time and embedded systems; B.3.1 [Memory Structures]: Semiconductor memories—Static memory (SRAM) General Terms: Design Additional Key Words and Phrases: Process variation, parametric yield, variability compensation 1. INTRODUCTION DR Typical embedded applications of areas such as personal communication, multimedia, ambient intelligent or 3D graphics are becoming dynamic and multitasked. Usually, these tasks are very data intensive making the memory system become the main actor of the optimization process. However, given the dynamic character of the new applications, the static analysis of the code does not accurately reveal crucial information such as the number of access to each data structure. This lack of precision may turn dramatic when trying to meet time and power constraints, very commonly present in mobile embedded systems. Deep-submicron technologies further aggravate the problem. Intra-die process Author’s address: C. Sanz, M. Prieto and J.I G´ omez, Universidad Complutense, 28040 Madrid, Spain; email: [email protected], [email protected], [email protected]. A. Papanikolaou, M. Miranda, F. Catthoor, Inter-University Microelectronics Center, Kapeldreef 75, Leuven, Belgium; email: [email protected], [email protected], [email protected]. Permission to make digital/hard copy of all or part of this material without fee for personal or classroom use provided that the copies are not made or distributed for profit or commercial advantage, the ACM copyright/server notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists requires prior specific permission and/or a fee. c 20YY ACM 1084-4309/20YY/0400-0001 $5.00 ACM Transactions on Design Automation of Electronic Systems, Vol. V, No. N, Month 20YY, Pages 1–7. 2 · C. Sanz, J.I. G´ omez, M. Prieto, A. Papanikolaou, M. Miranda, F. Catthoor AF T variability has a serious impact on the parametric yield. More precisely SRAM memories are amongst the most variability sensitive components of a system [H.Wang et al. 2005]. Timing can be severely degraded by variability and memory modules become less predictable in terms of energy and latency. The combined effect of application dynamism and process variability introduces an unaffordable uncertainty in the system design process. In this work we explore an integrated approach to manage process variation in the domain of dynamic applications. We combine application pre-processing at the compilation level with hardware monitoring and calibration at run-time to adapt the platform to the effects of variability and dynamic application behavior. Our goal is to maximize timing parametric yield and energy efficiency at the system-level. To deal with application dynamism, we reuse the concept of system scenarios presented in Palkovic [2007]. Likely occurring control flows are identified and independently optimized. At run-time, the system must identify the current scenario and enforce the design-time decisions, thus decreasing runtime overhead while avoiding worst-case assumptions. Our memory layer consists of configurable memories, like the one proposed in Wang et al. [2005]. Each memory has (at least) two working modes. By default all the memories are set in the slowest but most energy efficient mode. If, due to variability, a given module is too slow to meet the time constraints, it may be switched to a faster mode. The aim of this work is to find, for each application scenario, the most energy efficient mode for each memory such that deadlines are met. The remainder of this article is organized as follows. Section 2 outlines the related work. Section 3 provides some background information and describes the proposed compensation methodology. Section 4 discusses some simulation results. Finally, Section 5 summarizes our work. 2. RELATED WORK DR The impact of random intra-die process variability on system operation is an issue of relatively recent interest. Borkar et al. [2004] have concluded that a major shift from deterministic to probabilistic or statistical design is needed and they advocate robust module design as a solution to the problem. At the module design level Kim et al. [2003] propose corner point analysis, based on oversizing the circuits and using worst-case timing to guarantee that process variability will not destroy their functionality and internal synchronization. This incurs a penalty in performance and energy consumption. Our proposal lies at the system level, similar to Kurdahi et al. [2006] work. They propose to co-design the algorithm and the memory simultaneously in order to minimize memory failures. However this may not be feasible for todays multiapplication platforms, where configurable memories ([Wang et al. 2005]) represent a more flexible alternative. Regarding application dynamism, several authors have already studied system adaptation using information about the actual behavior of the application. For instance, Azevedo et al. [2002] introduce intra-task application checkpoints at compile time that indicate places where the processor speed and voltage should be re-calculated. We have adopted the scenario-based approach proposed by Palkovic [2007], which has been extended in this paper to handle platform variability. ACM Transactions on Design Automation of Electronic Systems, Vol. V, No. N, Month 20YY. System-level process variability compensation on memory organizations 1.6 · 3 Normalized Energy per Access AF T Energy-Delay under variability 1.4 X 1.2 1 X High-energy nominal Energy-Delay 0.8 0.6 Low-energy nominal Energy-Delay 0.4 0.2 0 0 0.5 1 1.5 2 2.5 Normalized Access Delay Fig. 1: Configurable memory with two different settings: a slow low-energy one and a fast highenergy one. 3. METHODOLOGY AND COMPENSATION MECHANISM This section outlines how scenarios and configurable memories can be combined to tackle with uncertainty coming from application dynamism and process variability, getting optimal solutions from the energy and time constraints point of view. 3.1 Tackling application unpredictability DR Current application dynamism, due to the dependency of input data, cannot be handled properly anymore using fully static compile-time approaches which lead to worst-case designs with severe energy penalties. Using the scenario concept, application is pre-processed and characterized at design time for different operating conditions through extensive profiling. Its behavior is partitioned, so control flows with similar properties are clustered into the same scenarios/classes and characterized. At run time, according to the actual input and the scenario knowledge, the appropriate version of the application is selected and executed. Strategies to detect scenarios have been studied in Palkovic [2007]. This technique makes applications more predictable in the short-term future in terms of memory accesses and execution time. This enables the application to meet real time constraints and reduce energy consumption due to the reduction in the use of worst-case margins. 3.2 Tackling platform unpredictability The platform itself is another significant source of unpredictability. Process variability has a negative impact on the access time and energy consumption of individual memory modules, making the actual values always higher than the nominal specifications. This may translate into timing violations and severe degradations in the parametric system-level yield. Configurable memories, as proposed in Wang et al. [2005], can switch at run-time between different energy-delay operating modes. Although they also suffer from variability, its effects can be mitigated at system-level controlling those operating modes. Fig. 1 illustrates one of these memories, which offers two single nominal operating points, although both of them become a range due to the effect of process ACM Transactions on Design Automation of Electronic Systems, Vol. V, No. N, Month 20YY. C. Sanz, J.I. G´ omez, M. Prieto, A. Papanikolaou, M. Miranda, F. Catthoor Clock Period Clock Period CPLP AF T Energy · Energy 4 X X Module A (Task A) X X Module B (Task B) Energy Energy Delay High config. (HS) X Low config. (LP) CPHS X X Delay Inoperative memories Delay X Delay Compensation Fig. 2: Task B is executed faster to allow Task A be executed slower. variability (i.e. each mode has an associated cloud of points and each memory configuration is characterized by one point within it). Based on these memories we extend the work presented in Papanikolaou et al. [2005] to the context of dynamic applications. Note that the effect of thermal degradation or aging may, in the long term, shift these clouds. While our methodology could be adapted to tolerate these dynamic variations performing additional calibration phases (see section 3.3), as a proof of concept, in this work we just focus on static variations. 3.3 Integrated approach DR The methodology proposed here uses a mixed design-time/run-time approach to combine and integrate the techniques explained above. At compile time the application is thoroughly profiled with two main objectives: to identify the most likely system scenarios of each task and to perform an energy-aware data assignment. For every scenario we collect the number of accesses to each memory, together with its associated time and energy values. Afterwards, every relevant data structure is assigned to a specific memory amongst the potentially heterogeneous memory architecture. All the compile time analysis is performed assuming that all the memories will work in the nominal point of its most energy efficient mode. At setup-time each memory is calibrated to discover the real working point of each mode. Fig. 1 shows that the actual operating point can be quite far from the nominal one in energy and delay terms. Here comes the core of our methodology. We must find out the optimal combination of memory modes for every scenario such that the application deadline is met while minimizing the energy consumption. The reasoning behind our proposal is illustrated with a simple example in Fig. 2. Consider an application based on frames whose execution can be split into two independent tasks (denoted as Task A and Task B), and that their respective data structures are mapped onto two different memory modules (denoted as Modules A and B respectively). According to compile-time information both memories could be set in the low energy mode since their delays allow to meet the clock period ACM Transactions on Design Automation of Electronic Systems, Vol. V, No. N, Month 20YY. · 5 AF T System-level process variability compensation on memory organizations Fig. 3: Energy savings due to scenarios and multimode memories DR target. However, under variability effects, the access delay of each module in the low power mode is larger and the deadline would be violated. To tolerate this, our control mechanism would switch the mode of Module B. This adjustment creates a delay slack between the clock cycle of Task B and the access delay of Module B, which can be used to accelerate Task B speeding the clock frequency up for that task. The time saved allows to relax the clock frequency of Task A, enabling module A to operate in the low power configuration while meeting the overall time constraints. We have developed a branch-and-bound algorithm to support this control mechanism that finds out the energy optimal operating mode for each memory for a given system scenario and application deadline. Applying the algorithm with different time constraints, we obtain a Pareto curve per system scenario that trade-offs performance and energy. Every point of the curve stores the information about the memory mode of every task of the application. It is important to remark that this calibration and curve generation process, which may take several seconds, is seldom performed (maybe even just once, the first time the system is powered on). Finally, at run time, the current system scenario is detected and the related Pareto curve loaded. The system will pick up the most energy efficient point that meets the current application deadline (that can be expressed as a frame rate and may vary with user interaction). The related decisions are enforced, i.e. each memory is switched to the mode indicated in the Pareto point. In frame-based application this look-up is usually only carried out at the beginning of each frame, so the overhead introduced is negligible. 4. RESULTS The application used to validate the proposed methodology is a multitasked implementation of the MP3 decoder. Extensive profiling and the memory mapping was carried out using the ATOMIUM tool suite [ATOMIUM ]. The memory characterization was performed by means of Monte Carlo transistor level simulations using a 65nm BSIM model. ACM Transactions on Design Automation of Electronic Systems, Vol. V, No. N, Month 20YY. 6 · C. Sanz, J.I. G´ omez, M. Prieto, A. Papanikolaou, M. Miranda, F. Catthoor AF T Fig. 3 decomposes the energy savings obtained by our technique for different application deadlines. Each bar shows the average energy consumed in the memory system. The first bar (BASE, worst-case) is used as reference and corresponds to a baseline implementation of MP3 without scenarios. The memory system consists of configurable memories, but the control algorithm always assumes the worstcase behavior for each memory mode (i.e. corner points of the variability clouds). The next bar (BASE, compensated ) represents the same MP3 implementation and mapping, but our calibration and control algorithms are applied. Up to 30% energy savings may be obtained just by detecting the real working point of each memory at setup time, and exploiting this information with our control algorithm. The next three bars stand for a different MP3 implementation: scenarios are detected and application dynamism is largely removed. 40% of energy can be saved with this transformation. If we consider the combined effect of our integrated approach (see SA, compensated bar), the energy consumption is reduced up to 60% compared to our reference implementation. Finally, the last bar represents an ideal lower bound: variability impact is ignored and every memory is assumed to work in its nominal point. Our results are very close to this unfeasible point. 5. CONCLUSIONS A methodology has been outlined which can turn an unpredictable application running on unreliable hardware, into a self-adaptive system that can meet real-time constraints with minimal energy overhead. The presented approach removes the need of worst-case design margins at application and platform level. Application dynamism can be handled using scenarios and platform unpredictability can be managed using multi-mode memories and a control algorithm. Simulation results reveal that the combination of these techniques can reduce the energy consumption in the memory organization by up to 60% without compromising the application deadlines. ACKNOWLEDGMENTS DR This research has been supported by the Spanish government through the research contracts CICYT-TIN 2005/5619 and Ingenio 2010 Consolider CSD2007-00050. C. Sanz was also supported by the Marie Curie Fellowship of the European Community. REFERENCES ATOMIUM. http://www.imec.be/design/atomium/. Azevedo, A., Issenin, I., Cornea, R., Gupta, R., Dutt, N., Veidenbaum, A., and Nicolau, A. 2002. Profile-based dynamic voltage scheduling using program checkpoints. In DATE’02: Proceedings of the conference on Design, automation and test in Europe. IEEE Computer Society, Washington, DC, USA, 168. Borkar, S., Karnik, T., and De, V. 2004. Design and reliability challenges in nanometer technologies. In DAC’04: Proceedings of the 41st annual conference on Design automation. ACM Press, New York, NY, USA, 75. H.Wang, Catthoor, F., Maex, K., Miranda, M., and Dehaene, W. 2005. Systematic analysis of energy and delay impact of very deep submicron process variability effects in embedded sram modules. In DATE’05: Proceedings of the conference on Design, automation and test in Europe. IEEE Computer Society, Washington, DC, USA, 914–919. ACM Transactions on Design Automation of Electronic Systems, Vol. V, No. N, Month 20YY. System-level process variability compensation on memory organizations · 7 AF T Kim, C., Roy, K., Hsu, S., Alvandpour, A., Krishnamurthy, R., and Borkar, S. 2003. A process variation compensating technique for sub-90nm dynamic circuits. In VLSI Circuits. Digest of Technical Papers. 205–206. Kurdahi, F. J., Eltawil, A. M., Park, Y.-H., Kanj, R. N., and Nassif, S. R. 2006. Systemlevel sram yield enhancement. In ISQED ’06: Proceedings of the 7th International Symposium on Quality Electronic Design. IEEE Computer Society, Washington, DC, USA, 179–184. Palkovic, M. 2007. Enhanced applicability of loop transformations. Ph.D. thesis, IMEC. Papanikolaou, A., Lobmaier, F., Wang, H., Miranda, M., and Catthoor, F. 2005. A systemlevel methodology for fully compensating process variability impact of memory organizations in periodic applications. In CODES+ISSS’05: Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis. ACM Press, New York, NY, USA, 117–122. Wang, H., Miranda, M., Papanikolaou, A., and Catthoor, F. 2005. Variable tapered pareto buffer design and implementation techniques allowing run-time conguration for low power embedded srams. IEEE Trans. VLSI. 13, 10, 1127–1135. DR Received March 2007; September 2007; accepted December 2007 ACM Transactions on Design Automation of Electronic Systems, Vol. V, No. N, Month 20YY.