Overview
Computers have always evolved by integrating more complexity into smaller systems. In the 1990s, this led to
highly-functional personal computers with supercomputer-like performance. Now, users interact with embedded
computers continuously and transparently (e.g., cell phones, cameras, cars). High-end embedded processors
provide the computational horsepower. Due to higher functional requirements, these processors have begun inheriting
high-performance techniques from their desktop counterparts, such as pipelining, caches, dynamic branch prediction,
and multithreading. Unfortunately, while these techniques perform well on average, their performance cannot be
analytically bounded, a key safety requirement for embedded systems with real-time tasks. In this project,
we are pioneering new directions for designing higher performance real-time embedded systems without compromising safety.
- Virtual Simple Architecture (VISA) Framework.
The VISA framework is a combined static/dynamic approach to worst-case schedulability
analysis of real-time task-sets.
Statically, tasks' worst-case execution times (WCETs) are derived assuming a simple processor which is analyzable.
Dynamically, the tasks are actually run on an arbitarily complex processor.
Normally, this is unsafe since tasks' WCETs are based on a different processor abstraction.
A novel dynamic checking approach and dual-mode pipeline design assures overall safety.
- Safely Exploiting Multithreaded Processors to Tolerate Memory Latency in Real-Time Systems.
As frequency increases for high-end embedded processors, memory will become a performance bottleneck,
just as it is in desktop systems today. Dynamic switch-on-event multithreading
(quickly switching to a different task when the current task misses in the cache)
is a promising solution, especially considering that embedded systems are typically
rich in available threads, even more so than desktop counterparts. However, currently, there are no known
frameworks for analytically bounding the performance of dynamic multithreading.
Two related projects are underway in this strategic research area. First, we use deterministic
thread switching to nearly fully capitalize on the potential for overlapping computation and memory latency,
at the same time yielding a simple closed-form test for determining whether or not a task-set is schedulable.
This project, for the first time, extends Liu and Layland's classic EDF test to handle computation/memory overlap.
Second, we are deriving safe yet tight bounds on the performance of dynamic switch-on-event multithreading itself.
- Real-Time Processors.
At a high level, a single-processor real-time system has three layers:
(1) the underlying processor architecture,
(2) static/dynamic worst-case timing analysis for deriving tasks' worst-case execution times (WCETs) on the processor, and
(3) scheduling algorithms which depend on WCETs.
Using rigid abstractions of one layer to the other, these three layers have evolved separately by researchers in
different specialties.
Abstraction compartmentalizes each specialty and thus manages complexity,
leading to prolific developments within each specialty.
But antiquated abstractions may also cause significant trends in one area to go unnoticed in another area,
passing up opportunities for leaps in performance, power, and cost.
We have begun a new project that "co-designs" the three layers rather than insulate them.
Virtual Simple Architecture (VISA) Framework
Embedded processors provide the computational horsepower for ubiquitous embedded systems,
from cell phones and automobiles to NASA's Mars rovers (pictured below). Embedded processors
are evolving, becoming increasingly complex -- even borrowing high-performance microarchitectural
techniques from desktop processors such as Intel's Pentium processors. While these techniques
improve average performance, deriving worst-case execution times (WCETs) of software tasks
becomes intractable. Yet, having WCETs is essential for safely scheduling software tasks in
embedded systems with real-time constraints.
We developed a new framework, called Virtual Simple Architecture (VISA), for building timing-safe
systems on top of timing-unsafe hardware components (pictured below). The VISA framework provides
a simple processor model to worst-case execution time analysis. Thus, WCETs are derived for
tasks assuming a simple processor. However, tasks are actually executed
on a complex processor. Strictly speaking, this is unsafe since WCETs were not derived assuming
the complex processor. To address this, progress of tasks is continuously gauged to dynamically
confirm that the "proxy" WCETs are not exceeded. Typically, there are no problems, i.e.,
tasks execute much faster than if they were executed on the hypothetical simple processor.
Nonetheless, anomalies cannot be ruled out (e.g., pathological dynamic branch prediction
scenarios and speculation penalties). The gauging technique can detect dangerously slow progress
of a task, in which case the complex processor is dynamically downgraded to a simple mode of operation
that mimics the simple processor model, explicitly bounding the execution time of the task by its WCET.
Thus, the microarchitectural support is a complex processor with dual operating modes, a complex mode
and a simple mode. The complex mode typically executes tasks much faster than a simpler processor
would, freeing the processor for other tasks or enabling frequency/voltage to be drastically reduced
for power savings. The gauging technique plus simple mode ensures bounded timing in atypical cases.
Safely Exploiting Multithreaded Processors to Tolerate Memory Latency in Real-Time Systems
The peak frequency of embedded processors is increasing to meet the demand for more functional embedded systems.
As a result, embedded systems are now facing the same "memory wall" that has plagued desktop systems for years.
The "memory wall" refers to the widening processor-memory speed gap, which causes performance to not scale with frequency.
A coarse-grain multithreaded processor can effectively hide long memory latencies by quickly switching to an alternate
task when the active task issues a memory request, improving overall throughput. However, dynamic switching cannot be
safely exploited to improve throughput in hard-real-time embedded systems. The schedulability of a task-set
(guaranteeing all tasks meet deadlines) must be determined a priori using offline schedulability tests.
Any computation/memory overlap must be statically accounted for.
We developed a novel analytical framework that bounds the overlap between computation of a pipeline-resident-task
and on-going memory transfers of other tasks. A simple closed-form schedulability test is derived, that only depends
on the aggregate computation (C) and memory (M) components of tasks. Namely, the technique does not require specificity
regarding the location of memory transfers within and among tasks and avoids searching all task permutations for a
specific feasible schedule. To the best of our knowledge, this is the first work to provide the necessary formalism
for safely and tractably exploiting coarse-grain multithreaded processors to tolerate memory latency in hard-real-time
systems, exceeding the schedulability limits of classic real-time theory for uniprocessors.
Our techniques make it possible to capitalize on higher frequency embedded processors, despite the widening
processor-memory speed gap.
The analytical framework is pictured below. The closed-form test extends Liu and Layland's EDF test,
from their classic 1973 paper, to handle coarse-grain multithreading in a uniprocessor.
By (1) matching the simplicity of the original EDF test and (2) targeting multithreading for naturally
thread-rich embedded systems, our approach holds real promise for influencing real-time scheduling theory.
We have plans to deploy our techniques in a real system based on Ubicom's IP3023 embedded microprocessor
(Ubicom designs embedded microprocessors for wireless networking), among the first embedded microprocessors
with multithreading capability. The IP3023 is a 10-stage scalar pipeline with 8 hardware threads.
Ubicom graciously donated their high-end development board worth around $20K, pictured below.
Real-Time Processors
At a high level, a single-processor real-time system has three layers:
(1) the underlying processor architecture,
(2) static/dynamic worst-case timing analysis for deriving tasks' worst-case execution times (WCETs) on the processor, and
(3) scheduling algorithms which depend on WCETs.
Classically, the three layers have evolved separately.
We feel this insulating approach is obsolete.
We have begun a new project that "co-designs" the three layers,
opening up opportunities for leaps in performance, power, and cost of real-time systems.
Publications
Conference and Journal Papers
A. Anantaraman and E. Rotenberg.
Non-Uniform Program Analysis &
Repeatable Execution Constraints:
Exploiting Out-of-Order Processors in Real-Time Systems.
ACM SIGBED Review,
Volume 3, Number 1,
January 2006.
[pdf]
A. Anantaraman and E. Rotenberg.
Non-Uniform Program Analysis &
Repeatable Execution Constraints:
Exploiting Out-of-Order Processors in Real-Time Systems.
Work in Progress Session
for the 26th IEEE International Real-Time Systems Symposium (RTSS-26),
December 2005.
[pdf]
A. El-Haj-Mahmoud, A. S. AL-Zawawi, A. Anantaraman, and E. Rotenberg.
Virtual Multiprocessor:
An Analyzable, High-Performance Microarchitecture for Real-Time Computing.
Proceedings of the
2005 International Conference on Compilers, Architecture,
and Synthesis for Embedded Systems (CASES'05), pp. 213-224, September 2005.
[pdf]
K. Seth, A. Anantaraman, F. Mueller, and E. Rotenberg.
FAST: Frequency-Aware Static Timing Analysis.
ACM Transactions on Embedded Computing Systems (TECS),
5(1):200-224, February 2006.
A. Anantaraman, K. Seth, E. Rotenberg, and F. Mueller.
Enforcing Safety of Real-Time Schedules on Contemporary Processors
Using a Virtual Simple Architecture (VISA).
Proceedings of the
25th IEEE International Real-Time Systems Symposium (RTSS-25),
pp. 114-125,
December 2004.
[pdf]
A. El-Haj-Mahmoud and E. Rotenberg.
Safely Exploiting Multithreaded Processors to Tolerate Memory Latency
in Real-Time Systems.
Proceedings of the
2004 International Conference on Compilers, Architecture,
and Synthesis for Embedded Systems (CASES'04), pp. 2-13, September 2004.
[pdf]
K. Seth, A. Anantaraman, F. Mueller, and E. Rotenberg.
FAST: Frequency-Aware Static Timing Analysis.
Proceedings of the
24th IEEE International Real-Time Systems Symposium (RTSS-24), pp. 40-51, December 2003.
A. Anantaraman, K. Seth, K. Patil, E. Rotenberg, and F. Mueller.
Virtual Simple Architecture (VISA): Exceeding the Complexity Limit in Safe Real-Time Systems.
Proceedings of the
30th IEEE/ACM International Symposium on Computer Architecture (ISCA-30), pp. 350-361, June 2003.
[pdf]
J. Koppanalil, P. Ramrakhyani, S. Desai, A. Vaidyanathan, and E. Rotenberg.
A Case for Dynamic Pipeline Scaling.
Proceedings of the
5th International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'02), pp. 1-8, October 2002.
[pdf]
E. Rotenberg.
Using Variable-MHz Microprocessors to Efficiently Handle Uncertainty in Real-Time Systems.
Proceedings of the
34th IEEE/ACM International Symposium on Microarchitecture (MICRO-34), pp. 28-39, December 2001.
[pdf]
Technical Reports
A. Anantaraman, K. Seth, E. Rotenberg, and F. Mueller.
Exploiting VISA for Higher Concurrency in Safe Real-Time Systems.
Technical Report TR-2004-15, Department of Computer Science,
North Carolina State University, May 2004.
[pdf]
Ashwini Sidhaye, Paul Steinmetz, Eric Rotenberg, David Barrow, and Domenico Arpaia.
Collecting Memory Address Traces from an Ericsson Cell Phone and Estimating Cache Performance.
Technical Report CESR-TR-01-1,
Center for Embedded Systems Research,
Department of Electrical and Computer Engineering,
North Carolina State University,
August 2001.
[pdf]
Book Chapters
E. Rotenberg and A. Anantaraman.
Architecture of Embedded Microprocessors,
in Multiprocessor Systems-on-Chips.
Ahmed Jerraya and Wayne Wolf, Eds.
San Francisco, CA: Morgan Kaufmann Publishers, 2005, pp. 81-112.
Student Theses
A. A. El-Haj-Mahmoud.
Hard-Real-Time Multithreading:
A Combined Microarchitectural and Scheduling Approach.
Ph.D. Thesis,
Department of Electrical and Computer Engineering,
North Carolina State University,
May 2006.
[NCSU library: on-line thesis]
A. V. Anantaraman.
Analysis-Managed Processor (AMP):
Exceeding the Complexity Limit in Safe-Real-Time Systems.
Ph.D. Thesis,
Department of Electrical and Computer Engineering,
North Carolina State University,
April 2006.
[NCSU library: on-line thesis]
P. S. Ramrakhyani.
Dynamic Pipeline Scaling.
M.S. Thesis,
Department of Electrical and Computer Engineering,
North Carolina State University,
May 2003.
[pdf]
A. V. Anantaraman.
Reducing Frequency in Real-Time Systems via Speculation and Fall-Back Recovery.
M.S. Thesis,
Department of Electrical and Computer Engineering,
North Carolina State University,
April 2003.
[pdf]
Talks
Virtual Multiprocessor: An Analyzable, High-Performance Microarchitecture for Real-Time Computing.
Presented at CASES'05 by A. El-Haj-Mahmoud.
[ppt]
Enforcing Safety of Real-Time Schedules on Contemporary Processors Using a Virtual Simple Architecture (VISA).
Presented at RTSS-25 by A. V. Anantaraman.
[ppt]
[ppt - no animation]
[pdf]
Safely Exploiting Multithreaded Processors to Tolerate Memory Latency in Real-Time Systems.
Presented at CASES'04 by A. El-Haj-Mahmoud.
[ppt]
[ppt - no animation]
[pdf]
Virtual Simple Architecture (VISA): Exceeding the Complexity Limit in Safe Real-Time Systems.
Presented at ISCA-30 by E. Rotenberg.
[pdf]
A Case for Dynamic Pipeline Scaling.
Presented at CASES'02 by P. Ramrakhyani.
[pdf]
Using Variable-MHz Microprocessors to Efficiently Handle Uncertainty in Real-Time Systems.
Presented at MICRO-34 by E. Rotenberg.
[pdf]
Funding
This project is supported by NSF grants No. CCR-0207785 (Dynamic Superpipelining: Shaping Microarchitecture for Variable Frequency),
No. CCR-0208581 (Reducing Frequency via Speculation and Fall-Back Recovery), and
No. CCR-0310860 (Virtual Simple Architecture (VISA): Exceeding the Complexity Limit in Safe Real-Time Systems).
Funding and a development board were also provided by Ericsson.
Ubicom provided a development board.
Any opinions, findings, and conclusions or recommendations
expressed in this website and publications herein are those of the author(s) and
do not necessarily reflect the views of the National Science Foundation.
|