Wednesday, October 14th, 2009
Example MATLAB Code Testing SSIM and CW-SSIM
While learning about structural image quality techniques, I implemented some test code to experiment a bit. Since I didn’t have the MATLAB image processing toolbox conveniently available, I shell out (call external command line programs) to some Imagemagick functions a bit, so watch out for that.
See my test code for the complex wavelet domain structural similarity metric (CW-SSIM). Note that I’m also using Eero Simoncelli’s steerable pyramid tools.
/image processing 〆
permalink
Tuesday, April 24th, 2007
CAPIDD
Check out my good friend Steve Hoelzer’s Master’s thesis on reducing blocking artifacts in DCT-coded images (like highly-compressed JPEGs). I enjoyed the clear and concise executive summary.
/image processing 〆
permalink
Friday, November 10th, 2006
Useful Free Software for Video Playback and Format Conversion on Windows
If you ever find yourself needing to do video playback and format conversion with a wide variety of video formats on Windows, I have a few key free pieces of software to recommend:
- VirtualDub: a video capture and processing tool that can very quickly read, manipulate, and write AVI files in many formats (VirtualDubMod, a spinoff, handles even more formats) (Wikipedia description)
- VirtualDub filters: the other great thing about VirtualDub is that there are many plugin filters available that implement a variety of video/image processing algorithms
- ffdshow: a codec (decoder and encoder) package that installs as a native Windows DirectShow filter, enabling playback of many modern video formats in Windows Media Player
- Auto Gordian Knot: a tool for converting DVD video content into XviD or DivX or x264 MPEG4 video
- MediaInfo: reveals the codecs used for video and audio contents within a video file
/image processing 〆
permalink
Friday, April 14th, 2006
Face Time
Creating passionate users is becoming one of my favorite websites. Yesterdays post on why face-to-face matters notes how video chat is very close to face time but lacks in the eye contact department:
Video chat is better than any other form of non face-to-face, because you get facial expressions, tone of voice, body language, AND real-time responsiveness. But—he said there’s still a very unsettling feature for the brain because there’s really no way for BOTH speakers to make eye contact! … there’s no way to have the camera right in your face, in a place where you can still look into the other person’s eyes. Bottom line: You can see the camera or the person’s eyes… but not both.
I wonder if some fancy image processing could be applied so as to give the illusion of eye contact between both parties.
/image processing 〆
permalink
Sunday, March 19th, 2006
Info theory book (free online)
David MacKay provides online copies of his textbook Information Theory, Inference, and Learning Algorithms (in pdf, ps, djvu, & latex formats). You can also buy the dead-tree version.
It’s a very readable text compared to the other things I’ve previously read (or skimmed) on Information Theory.
(via Sotos)
/image processing 〆
permalink
Saturday, January 28th, 2006
ECE 432 Final Report on Face Recognition
Woo hoo! My final report for class is now complete and available in html and markdown text formats. The report describes the eigenface and fisherface techniques for facial recognition and includes MATLAB source code.
/image processing 〆
permalink
Wednesday, November 23rd, 2005
libjpeg is good
Recently for my master’s work, I found that a very nice implementation
of JPEG compression is available from the Independent JPEG
Group. The code and supporting
documents are quite nice and flexible. At
least one of my readers (Steve) will like to hear that it supports compressing
with a user-specified quantization table.
/image processing 〆
permalink
Wednesday, October 26th, 2005
Illusions of Perception
The CVCL (Computational Visual Cognition Lab) at MIT presents a gallery
of perceptual image illusions. The hybrid faces are very
interesting. They combine high and low spatial frequency information to
create a face that changes with viewing distance. (via Ian Rowland
via reddit)
Doh! Right after posting this, I realized that Steve Hoelzer beat me to
the
punch.
Nice scoop, fellow reddit reader.
/image processing 〆
permalink
Tuesday, March 29th, 2005
Image Processing Test Image: The Burger Girl
Here’s a test image I enjoy. Click the image to see an ucompressed 512x512 version (640KB PNG).

/image processing 〆
permalink
Sunday, March 20th, 2005
What makes an image look good?
I gave a presentation on image quality and some related topics
(global and local image phase, steerable pyramid wavelet transforms,
statistical modeling of natural images, and structural image quality).
Some of the most interesting questions resulting from the talk were:
How should one interpret the diagram from the Phase & Perception of
Blur paper — specifically, what do the converging lines represent? My
current interpretation is that they are equal-phase contours
corresponding to a well-localized feature point at any scale.
What is the gaussian scale mixture (GSM) model? I hope to better
explain and interpret this in an upcoming blog entry.
How do SSIM and CWSSIM compare to the latest perceptual error-based
models of image quality (such as ones derived from the Watson paper)? A
specific test could evaluate structural methods with images that are
only degraded with a just-noticable difference (JND). In other words,
look at errors that are just visible at the threshold of human
perception instead of the gross “suprathreshold” errors that we looked
at before.
/image processing 〆
permalink
Wednesday, March 16th, 2005
Papers on Perceptual Image Quality Metrics, Image Phase, Subband Transforms, and Image Statistics
This entry documents the most interesting papers I’ve been reading and studying this quarter. I have sorted them into categories and then sorted chronologically to show the influence that early papers has on the newer ones.
Image Phase
1975 Kuglin and Hines, “The phase correlation image alignment method”
1979 Oppenheim, Lim, Kopec, and Pohlig, “Phase in speech and pictures”
1980 Hayes, Lim, and Oppenheim, “Signal reconstruction from phase or magnitude”
1999 Thomson, “Visual coding and the phase structure of natural scenes”
2000 Kovesi, “Phase congruency: A low-level image invariant”
2003 Wang and Simoncelli, “Local Phase Coherence and the Perception of Blur”
Subband Transforms: Steerable Pyramids
1991 Freeman and Adelson, “The design and use of steerable filters”
1991 Simoncelli, “Shiftable Multi-scale Transforms”
1995 Simoncelli, “The steerable pyramid: A flexible architecture for multi-scale derivative computation”
2000 Portilla, “A Parametric Texture Model based on Joint Statistics of Complex Wavelet Coefficients”
Statistical Image Modeling
2002 Srivastava, “On advances in statistical modeling of natural images”
2005 Simoncelli, “Statistical Modeling of Photographic Images”
2005 Wang, “Reduced-Reference Image Quality Assessment Using a Wavelet-Domain Natural Image Statistic Model”
Perceptual Image Quality
1998 Watson, “Toward a perceptual video-quality metric”
1998 Eckert, “Perceptual quality metrics applied to still image compression”
2001 Chen and Pappas, “Perceptual Coders and Perceptual Metrics”
2002 Wang, “Why is Image Quality Assessment So Difficult?”
2004 Pappas, “Perceptual Criteria for Image Quality Evaluation”
2004 Wang, “Image Quality Assessment- From Error Visibility to Structural Similarity”
2005 Wang, “Translation Insensitive Image Similarity in Complex Wavelet Domain”
/image processing 〆
permalink
Tuesday, March 15th, 2005
Eero Simoncelli’s “Statistical Modeling of Photographic Images”
Main idea:
Out of the huge set of possible images, a particular subset of likely images exist, and these images can be described using a probability model.
Three probability models are discussed:
- The Gaussian Model
- pros
- easy computations
- single parameter
- direct application to compression and noise removal
- cons
- unconstrained phase (can destroy image content)
- doesn’t capture structure in most real images
- The Wavelet Marginal Model
- pros
- captures non-gaussian histogram characteristics (with peaks at zero and long tails)
- better fit (reduced entropy) leads to improved compression and noise removal
- cons
- important image information is still not captured
- wavelet coefficients are not independent — their high-order statistics are correlated
- Wavelet Joint Models
- pros
- adapts to local variance
- gaussian scale mixture (GSM) model is useful
- gives much improved noise removal results
- cons
- still can’t capture all image structure
/image processing 〆
permalink
Tuesday, March 15th, 2005
Feedback and Answers on SSIM
Thanks to my dedicated reader, Steve, for providing feedback to my recent entries on image quality using structural similarity. He had these ideas:
Start with a low quality image (such as one that is already blurry) and degrade it more. See if results still are good — does SSIM measure this further degradation in a reasonable way?
What happens with an image that is all noise and then gets distorted? There is no structure to start with.
I ran a quick test to check out the first idea. The results follow. Click the thumbnails to view full-sized images. The image on the left is the image that has been blurred once, while the one on the right has been blurred twice.

The additional blurring operation gave a MSE = 9.9 and a MSSIM = 0.975. Qualitatively, this result makes sense — I think we lost much more visual information with the original blur than this one.
In response to the second question (what if the original image is noise only), I found that the results depend on the type of distortion. Distortion by shifting the mean or stretching the contrast gave results similar to those obtained when using natural images (MSSIM = 0.998 or so).
However, it was interesting look at the distortion caused by compressing the noise image using jpeg to achieve a MSE = 60. To achieve a MSE of 60, the jpeg algorithm couldn’t compress the noise image (shown below) very much. I can’t distinguish between the “original” and “degraded” images, therefore, my intuitive understanding is that the compressed noise-only image has a high image quality. The high MSSIM result of 0.952 coincided well with my intuition.

/image processing 〆
permalink
Monday, March 14th, 2005
The Importance of Phase in Images
Many papers have suggested that phase information in an image is very important. A report from Alan Oppenheim in 1979 entitled Phase in Speech and Pictures demonstrated that much of the structural information in an image is preserved even when it is represented by phase alone.
He describes an experiment in which an image is decomposed into phase and magnitude parts using a Fourier transform, then the magnitude is set to unity, and an image is reconstructed from the remaining phase information.
The idea is that Fourier phase includes important information about the features and details in an image. The following figures show an original and the phase-only reconstruction of an example image. These were produced by the following MATLAB commands:
% start with an image stored in variable "im"
im_fourier = fft2(im);
im_phase = angle(im_fourier);
im_reconstruct_from_phase = abs(ifft2(exp(i*im_phase)));
im_reconstruct_from_phase
% display original & reconstructed image
% (scaled for visibility)
imshow(im,[])
imshow(im_reconstruct_from_phase.^.4),[])

Many of the high-frequency structures have been preserved in the phase-only image. Indeed, the transformation into a phase-only image can be approximately interpreted as a high pass filtering operation.
It turns out that the intelligibility of the phase-only representation depends on the magnitude “smoothness” of the signal being looked at. Since most natural images contain mostly low frequency content, their magnitude rolls off quickly at high frequency and this leads to the situation where the “high pass” interpretation of the phase-only transform holds.
/image processing 〆
permalink
Thursday, March 10th, 2005
Overview of Zhou Wang’s “Image Quality Assessment: From Error Visibility to Structural Similarity”
The main idea in this paper (available here) is that human visual perception is built to understand a scene based on its structure suggesting that this structural information is the key component of visual quality. A good way to measure image quality, then, is to quantify the degradation in the structure within a distorted image versus an original.
This is a change in the fundamental assumption from past image quality work. Previous approaches measure perceptual image quality assuming that image intensity is the key component of visual quality. These methods often measure intensity error and then penalize these errors according to visibility.
To get started, let’s go over some definitions of commonly used “image quality” terms and abbreviations.
- image quality: a field of study with goals of quantifying subjective human-perceived visual quality and developing objective measures that accurately predict subjective quality
- subjective image quality: human-perceived visual quality, often measured for a group of test subjects and reported as a mean opinion score (MOS)
- objective image quality: quantitative measures that can accurately predict subjective image quality
- full-reference: the complete undistorted original image is available
- no-reference or blind: only the distorted image is available
- reduced-reference: partial information (extracted features) about the original image is available
- MSE: mean squared error, the average of squared pixel intensity differences
- PSNR: peak signal-to-noise ratio
Error-Sensitivity Approach
The assumption here is that the perceived distortion is directly related
to the error signal. These approaches apply a sequence of steps consisting of: preprocessing to scale/align and account for human color perception, CSF (contract sensitivity function) filtering to account for human spacial and temporal frequency response, channel decomposition into temporal and spacial subbands, error normalization according to a perceptual masking model, and error pooling to weight errors and come up with a single quality number.
Some common problems with these approaches have been emphasized in this paper, including:
- the quality definition problem: it’s not clear that error visibility corresponds well with image quality
- the supra-threshold problem: most perceptual studies have been
evaluated with small errors, where the error is producing a JND (just
noticeable difference) and therefore, the studies don’t account for large
errors very well
- the natural image complexity problem: the images used to develop
perceptual threshold are very simple compared to natural images
- the cognitive interaction problem: foveation (where a person is likely
to look in an image) and cognation of the image also leads to variable
image quality perception
Structural Similarity Approach
The goal of the new approach is to “find a more direct way to compare the structures of the reference and the distorted signals.” The assumption is humans extract structural information from images — not pixel intensities.
An image quality metric based on structural similarity can overcome many of the problems associated with the error-sensitivity method. The SSIM index is one specific implementation of a structural similarity approach — it is not the only possible architecture that uses the structural similarity paradigm, but it is interesting as a first example of structural similarity’s utility.
SSIM: An Example Structural Approach

Algorithm Description
The figure above shows a proposed image quality measurement system
that compares registered images x and y. The similarity
measure SSIM(x,y) is a function of luminance l(x,y),
contrast c(x,y), and structure s(x,y). Also, it is
necessary to include three constants (C1, C2, and C3) to prevent
unstable results when the denominators approach zero.
The average intensity (ux and uy) is used to define the luminance function
l(x,y) = (2*ux*uy + C1) / (ux^2 + uy^2 + C1).
The standard deviation (sx and sy) is used to define the contrast function
c(x,y) = (2*sx*sy + C2) / (sx^2 + sy^2 + C2).
The correlation (sxy) after removing the mean and normalizing by the standard deviation is used to represent structural similarity:
s(x,y) = (sxy + C3) / (sx*xy + C3).
Finally, the similarity is computed as a combination of the luminance, chrominance, and correlation in a general form
SSIM(x,y) = l(x,y)^a * c(x,y)^b * s(x,y)^g
where a > 0, b > 0, and g > 0 are parameters that determine the relative weighting of each term.
For the specific implementation in this paper, SSIM is simplified by choosing a = b = g = 1 and C3 = C2/2, giving
(2*ux*uy+C1)*(2*sxy+C2)
SSIM(x,y) = -----------------------------
(ux^2+uy^2+C1)*(sx^2+sy^2+C2)
Local image statistics are measured in a weighted 11x11 circular window around each pixel to generate SSIM for each pixel. A few other numbers are needed to fully define the parameters C1 and C2. The dynamic range of the pixels is defined as L (255 for 8-bit grayscale). Then, C1 and C2 are given as functions of L and some small constants K1 << 1 and K2 << 1.
C1 = (K1*L)^2
C2 = (K2*L)^2
In the paper, the author uses these settings: K1 = 0.01; K2 = 0.03. A single number representing overall image quality is computed by averaging the SSIM values to give a mean:
MSSIM(X,Y) = 1/M * sum( SSIM(:) ).
Test Results
Using the example MATLAB implementation referenced in the paper, I
compared MSSIM with mean-squared error (MSE) for a few images. The
following figure shows the test images I used. Also, there is a
high-resolution version (540kB).

From left-to-right starting across the top row, these images are
1. the original version
2. jpeg-compressed
3. blurred
4. added gaussian white noise
5. mean-shifted
6. contrast-stretched
All of these versions were created to give an equal mean-squared error (MSE) of 60 — this clearly demonstrates that MSE does not correlate with perceived quality. It is clear that the image quality of 2 and 3 is much worse that the others. Let’s see if MSSIM works better.
Table 1: Comparing Image Quality Measures
Image # MSE MSSIM
1 0 1.000
2 60 0.817
3 60 0.881
4 60 0.638
5 60 0.998
6 60 0.998
Structural similarity accurately predicts the high quality of images 5 and 6, the mean and contrast-shifted images.
It is interesting to discuss the results from image 4, the one with gaussian white noise added. MSSIM is the lowest for this image, contradicting my expectation that image 4 has a perceptual image quality somewhere between the worst images (2 and 3) and the best images (5 and 6). I wonder why this result didn’t match my expectations …
Anyway, I hope you enjoyed this summary. Please send me suggestions and/or comments.
/image processing 〆
permalink
Tuesday, February 22nd, 2005
Image Quality Assessment
I’m currently taking a course on digital video processing given by Prof. Thrasyvoulos Pappas, my advisor in the Image and Video Processing Laboratory (IVPL) at Northwestern.
For the course project, I’m studying objective image quality metrics, or the computation of a number that corresponds to the perceived quality of an image.
One image quality metric that is often used when comparing a reference and degraded image is the mean squared error (MSE), computed by simply averaging the squared differences between the reference and degraded image. For example, the degraded image could be a highly compressed version of the reference. While MSE is simple to understand and easy to compute, it does not achieve a good correspondance with perceived image quality.
Some interesting image quality methods have been proposed and tested recently. Junquig Chen from the IVPL evaluates metrics used when optimizing image compression, comparing MSE with subband, wavelet, and DCT-based metrics (see the SPIE paper).
Also, some very intersesting work has come from Eero Simoncelli’s Laboratory for Computational Vision (LCV) at New York University. Zhou Wang’s work on his Structural SIMilarity (SSIM) index is the best approach I’ve found so far for quantitatve evaluation of image quality for many different applications.
In upcoming blog entries, I hope to summarize and review some of the most interesting and influential papers that deal with image quality. I’ll start with Zhou Wang’s “Image Quality Assessment: From Error Visibility to Structural Similarity”. Stay tuned ….
/image processing 〆
permalink
Tuesday, August 3rd, 2004
Captcha
I heard about these a while ago, but forgot the name. A Captcha is a
test used to tell computers and humans apart. See the Wikipedia
definition and the Carnegie Mellon project.
/image processing 〆
permalink
Monday, May 31st, 2004
Face Detection
A pre-requistie for face recognition is face detection. The Robotics Institute at CMU has a great demo with my submission.
/image processing 〆
permalink
Sunday, May 30th, 2004
Face Recognition
I’ve been working on face recognition for my ECE 432 computer vision class at NWU. Here’s some images I’ve been working with:

They come from some photos I took in class one day and from my dad’s student websites.
/image processing 〆
permalink