TUT-6: Perceptual Metrics for Image Quality Evaluation
Date: Sunday Afternoon, October 12
Presented by
Sheila S. Hemami, Cornell University and Thrasyvoulos N. Pappas, Northwestern University
Outline
- Compression/transmission image and video distortions
- Human visual system review
- Near-threshold metrics
- Supra-threshold metrics
- Structural Similarity metrics
- Metric performance comparisons, selection and general use and abuse
Material to be Distributed to Participants
We will provide both lecture notes in the form of copies of our slides, as well as a CD-ROM with example images shown in the tutorial.
Abstract
We will examine objective criteria for the evaluation of image quality that are based on models of visual perception. (We are only interested in images that are intended to be seen and interpreted by humans.) Our primary emphasis will be on image fidelity, i.e., how close an image is to a given original or reference image, but we will also discuss no-reference and limited-reference metrics. We will consider a number of applications (such as graphics, halftoning, displays) but our main focus will be on image and video compression and transmission.
We will examine the performance of all metrics in the context of image and video compression, considering a number of compression techniques, including JPEG, SPIHT, the Safranek-Johnston perceptual image coder, and JPEG2000, as well as video compression techniques, such as MPEG-2 and H264. We will consider realistic distortions that arise from compression and error concealment in image and video transmission applications. In order to better explore the space of distortions, we will also examine models for typical distortions encountered in video compression/transmission applications.
We will begin with a review of the human visual system, including physiology, function, and psychophysical approaches to characterization. We will discuss both models of the human visual system and characterizations of temporal and spatial vision which are particularly well suited to both signal processing-based analysis and to incorporation into image and video processing algorithms. In particular, we will describe the multi-channel model, frequency-based descriptions of visual processing, and non-traditional approaches to HVS characterization which yield results which are more applicable to imaging applications than traditional psychophysical characterizations (e.g., Watson et al., Hemami et al.).
We then review near-threshold perceptual metrics, which have been developed over that last decade and a half, and will discuss their advantages over traditional mean-squared (MSE) based metrics. Such metrics have explicitly accounted for human visual system (HVS) sensitivity to noise by estimating thresholds above which distortion is just-noticeable and can successfully account for the spatial and temporal frequency sensitivity of the eye and contrast and luminance masking. We will take a closer look at metrics of wider applicability, such as those developed by Daly (93), Lubin (93), and Teo and Heeger (94), as well as several coding specific metrics, such as the Safranek-Johnston metric (89) and the Watson DCT (93) and wavelet metrics (97), just to mention a few examples.
We will then consider supra-threshold metrics. Such metrics are important in an increasing number of applications, where there is a need to achieve very high compression ratios, or there are losses due to channel conditions. In such cases, a certain amount of perceived distortion is unavoidable, and hence there is a need to derive quantitative objective measures of perceived distortion. Both vision and metrics in the suprathreshold regime have not been as extensively studied as those in the subthreshold regime. We will discuss Hemami and Chandler's approach to characterizing suprathreshold vision and the resulting metric, along with the performance of adapted near-threshold metrics in the suprathreshold regime.
Another class of metrics that has received a lot of attention recently, is that of structural similarity metrics (Wang, Bovik, Sheikh, Simoncelli, 2004). MSE and traditional perceptual metrics are quite sensitive to spatial shifts, intensity shifts, contrast changes, and scale changes. In contrast, the Structural SIMilarity (SSIM) metrics model perception implicitly by taking into account the fact that the HVS is adapted for extracting structural information (relative spatial covariance) from images. As such, they have the potential to be much more effective in quantifying suprathreshold compression artifacts than traditional perceptual metrics, as these artifacts tend to distort the structure of an image. We will examine and compare different SSIM implementations both in the image space and the wavelet domain. We also consider the complex wavelet SSIM (CWSSIM, Wang-Simoncelli'05), a translation-insensitive SSIM implementation, and a multi-scale weighted variant of the complex wavelet SSIM (WCWSSIM), with weights based on the human contrast sensitivity function (Brooks-Pappas'06). We will show that the latter provides the link with the traditional perceptual quality metrics, and will thus present a unified framework for perceptual and structural similarity metrics.
Throughout the tutorial, we will compare and contrast performance of the metrics, including successes and failures. Besides traditional performances (e.g., correlation with subjective scores), we will demonstrate performances in-use, for example, performing R-D optimized compression with a perceptual metric rather than traditional MSE. We will discuss appropriate and inappropriate in-use applications for the various metrics and suggest guidelines for metric selection.
Speaker Biography
Sheila S. Hemami (S'89, M'95, SM'03) received the B.S. degree (summa cum laude) in electrical engineering from the University of Michigan, Ann Arbor, in 1990, and the M.S. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, in 1992 and 1994, respectively.
She was with Hewlett-Packard Laboratories, Palo Alto, CA, in 1994. In 1995, she joined the faculty of the School of Electrical and Computer Engineering at Cornell University, Ithaca, NY, where she is holds the title of Professor and directs the Visual Communications Lab.
Dr. Hemami received a National Science Foundation Early Career Development Award in 1997 and has received numerous teaching awards. She held the Kodak Term Professorship of Electrical Engineering at Cornell University from 1996 to 1999, and she was a Fulbright Distinguished Lecturer in 2001. She has served as Chair of the IEEE Image and Multidimensional Signal Processing Technical Committee and as an Associate Editor for the IEEE Transactions on Signal Processing, and is currently the Editor-in-Chief for the IEEE Transactions on Multimedia.
Thrasyvoulos N. Pappas (M'87, SM'95, F'06) received the S.B., S.M., and Ph.D. degrees in electrical engineering and computer science from MIT in 1979, 1982, and 1987, respectively. From 1987 until 1999, he was a Member of the Technical Staff at Bell Laboratories, Murray Hill, NJ. In 1999, he joined the Department of Electrical and Computer Engineering at Northwestern University as an associate professor. His research interests are in image and video quality and compression, perceptual models for image processing, model-based halftoning, image and video analysis, and multimedia signal processing.
Dr. Pappas is a Fellow of the IEEE and SPIE. He has served as an elected member of the Board of Governors of the Signal Processing Society of IEEE (2004-2007), chair of the IEEE Image and Multidimensional Signal Processing Technical Committee, associate editor of the IEEE Transactions on Image Processing, and technical program co-chair of ICIP-01 and the Symposium on Information Processing in Sensor Networks (IPSN-04). Since 1997 he has been co-chair of the SPIE/IS&T Conference on Human Vision and Electronic Imaging. He has also served as co-chair of the 2005 SPIE/IS&T Electronic Imaging Symposium.
