home > image processing

Wednesday, October 14th, 2009

Example MATLAB Code Testing SSIM and CW-SSIM

While learning about structural image quality techniques, I implemented some test code to experiment a bit. Since I didn’t have the MATLAB image processing toolbox conveniently available, I shell out (call external command line programs) to some Imagemagick functions a bit, so watch out for that.

See my test code for the complex wavelet domain structural similarity metric (CW-SSIM). Note that I’m also using Eero Simoncelli’s steerable pyramid tools.

/image processing   〆   permalink

Tuesday, April 24th, 2007

CAPIDD

Check out my good friend Steve Hoelzer’s Master’s thesis on reducing blocking artifacts in DCT-coded images (like highly-compressed JPEGs). I enjoyed the clear and concise executive summary.

/image processing   〆   permalink

Friday, November 10th, 2006

Useful Free Software for Video Playback and Format Conversion on Windows

If you ever find yourself needing to do video playback and format conversion with a wide variety of video formats on Windows, I have a few key free pieces of software to recommend:

  1. VirtualDub: a video capture and processing tool that can very quickly read, manipulate, and write AVI files in many formats (VirtualDubMod, a spinoff, handles even more formats) (Wikipedia description)
  2. VirtualDub filters: the other great thing about VirtualDub is that there are many plugin filters available that implement a variety of video/image processing algorithms
  3. ffdshow: a codec (decoder and encoder) package that installs as a native Windows DirectShow filter, enabling playback of many modern video formats in Windows Media Player
  4. Auto Gordian Knot: a tool for converting DVD video content into XviD or DivX or x264 MPEG4 video
  5. MediaInfo: reveals the codecs used for video and audio contents within a video file

/image processing   〆   permalink

Friday, April 14th, 2006

Face Time

Creating passionate users is becoming one of my favorite websites. Yesterdays post on why face-to-face matters notes how video chat is very close to face time but lacks in the eye contact department:

Video chat is better than any other form of non face-to-face, because you get facial expressions, tone of voice, body language, AND real-time responsiveness. But—he said there’s still a very unsettling feature for the brain because there’s really no way for BOTH speakers to make eye contact! … there’s no way to have the camera right in your face, in a place where you can still look into the other person’s eyes. Bottom line: You can see the camera or the person’s eyes… but not both.

I wonder if some fancy image processing could be applied so as to give the illusion of eye contact between both parties.

/image processing   〆   permalink

Sunday, March 19th, 2006

Info theory book (free online)

David MacKay provides online copies of his textbook Information Theory, Inference, and Learning Algorithms (in pdf, ps, djvu, & latex formats). You can also buy the dead-tree version.

It’s a very readable text compared to the other things I’ve previously read (or skimmed) on Information Theory.

(via Sotos)

/image processing   〆   permalink

Saturday, January 28th, 2006

ECE 432 Final Report on Face Recognition

Woo hoo! My final report for class is now complete and available in html and markdown text formats. The report describes the eigenface and fisherface techniques for facial recognition and includes MATLAB source code.

/image processing   〆   permalink

Wednesday, November 23rd, 2005

libjpeg is good

Recently for my master’s work, I found that a very nice implementation of JPEG compression is available from the Independent JPEG Group. The code and supporting documents are quite nice and flexible. At least one of my readers (Steve) will like to hear that it supports compressing with a user-specified quantization table.

/image processing   〆   permalink

Wednesday, October 26th, 2005

Illusions of Perception

The CVCL (Computational Visual Cognition Lab) at MIT presents a gallery of perceptual image illusions. The hybrid faces are very interesting. They combine high and low spatial frequency information to create a face that changes with viewing distance. (via Ian Rowland via reddit)

Doh! Right after posting this, I realized that Steve Hoelzer beat me to the punch. Nice scoop, fellow reddit reader.

/image processing   〆   permalink

Tuesday, March 29th, 2005

Image Processing Test Image: The Burger Girl

Here’s a test image I enjoy. Click the image to see an ucompressed 512x512 version (640KB PNG).

Burger Girl

/image processing   〆   permalink

Sunday, March 20th, 2005

What makes an image look good?

I gave a presentation on image quality and some related topics (global and local image phase, steerable pyramid wavelet transforms, statistical modeling of natural images, and structural image quality).

Some of the most interesting questions resulting from the talk were:

  1. How should one interpret the diagram from the Phase & Perception of Blur paper — specifically, what do the converging lines represent? My current interpretation is that they are equal-phase contours corresponding to a well-localized feature point at any scale.

  2. What is the gaussian scale mixture (GSM) model? I hope to better explain and interpret this in an upcoming blog entry.

  3. How do SSIM and CWSSIM compare to the latest perceptual error-based models of image quality (such as ones derived from the Watson paper)? A specific test could evaluate structural methods with images that are only degraded with a just-noticable difference (JND). In other words, look at errors that are just visible at the threshold of human perception instead of the gross “suprathreshold” errors that we looked at before.

/image processing   〆   permalink

Wednesday, March 16th, 2005

Papers on Perceptual Image Quality Metrics, Image Phase, Subband Transforms, and Image Statistics

This entry documents the most interesting papers I’ve been reading and studying this quarter. I have sorted them into categories and then sorted chronologically to show the influence that early papers has on the newer ones.

Image Phase

1975 Kuglin and Hines, “The phase correlation image alignment method”
1979 Oppenheim, Lim, Kopec, and Pohlig, “Phase in speech and pictures”
1980 Hayes, Lim, and Oppenheim, “Signal reconstruction from phase or magnitude”
1999 Thomson, “Visual coding and the phase structure of natural scenes”
2000 Kovesi, “Phase congruency: A low-level image invariant”
2003 Wang and Simoncelli, “Local Phase Coherence and the Perception of Blur”

Subband Transforms: Steerable Pyramids

1991 Freeman and Adelson, “The design and use of steerable filters”
1991 Simoncelli, “Shiftable Multi-scale Transforms”
1995 Simoncelli, “The steerable pyramid: A flexible architecture for multi-scale derivative computation”
2000 Portilla, “A Parametric Texture Model based on Joint Statistics of Complex Wavelet Coefficients”

Statistical Image Modeling

2002 Srivastava, “On advances in statistical modeling of natural images”
2005 Simoncelli, “Statistical Modeling of Photographic Images”
2005 Wang, “Reduced-Reference Image Quality Assessment Using a Wavelet-Domain Natural Image Statistic Model”

Perceptual Image Quality

1998 Watson, “Toward a perceptual video-quality metric”
1998 Eckert, “Perceptual quality metrics applied to still image compression” 2001 Chen and Pappas, “Perceptual Coders and Perceptual Metrics”
2002 Wang, “Why is Image Quality Assessment So Difficult?”
2004 Pappas, “Perceptual Criteria for Image Quality Evaluation”
2004 Wang, “Image Quality Assessment- From Error Visibility to Structural Similarity”
2005 Wang, “Translation Insensitive Image Similarity in Complex Wavelet Domain”

/image processing   〆   permalink

Tuesday, March 15th, 2005

Eero Simoncelli’s “Statistical Modeling of Photographic Images”

Main idea:

Out of the huge set of possible images, a particular subset of likely images exist, and these images can be described using a probability model.

Three probability models are discussed:

  1. The Gaussian Model
    • pros
      • easy computations
      • single parameter
      • direct application to compression and noise removal
    • cons
      • unconstrained phase (can destroy image content)
      • doesn’t capture structure in most real images
  2. The Wavelet Marginal Model
    • pros
      • captures non-gaussian histogram characteristics (with peaks at zero and long tails)
      • better fit (reduced entropy) leads to improved compression and noise removal
    • cons
      • important image information is still not captured
      • wavelet coefficients are not independent — their high-order statistics are correlated
  3. Wavelet Joint Models
    • pros
      • adapts to local variance
      • gaussian scale mixture (GSM) model is useful
      • gives much improved noise removal results
    • cons
      • still can’t capture all image structure

/image processing   〆   permalink

Tuesday, March 15th, 2005

Feedback and Answers on SSIM

Thanks to my dedicated reader, Steve, for providing feedback to my recent entries on image quality using structural similarity. He had these ideas:

  1. Start with a low quality image (such as one that is already blurry) and degrade it more. See if results still are good — does SSIM measure this further degradation in a reasonable way?

  2. What happens with an image that is all noise and then gets distorted? There is no structure to start with.

I ran a quick test to check out the first idea. The results follow. Click the thumbnails to view full-sized images. The image on the left is the image that has been blurred once, while the one on the right has been blurred twice.

Reference Image Degraded Image

The additional blurring operation gave a MSE = 9.9 and a MSSIM = 0.975. Qualitatively, this result makes sense — I think we lost much more visual information with the original blur than this one.

In response to the second question (what if the original image is noise only), I found that the results depend on the type of distortion. Distortion by shifting the mean or stretching the contrast gave results similar to those obtained when using natural images (MSSIM = 0.998 or so).

However, it was interesting look at the distortion caused by compressing the noise image using jpeg to achieve a MSE = 60. To achieve a MSE of 60, the jpeg algorithm couldn’t compress the noise image (shown below) very much. I can’t distinguish between the “original” and “degraded” images, therefore, my intuitive understanding is that the compressed noise-only image has a high image quality. The high MSSIM result of 0.952 coincided well with my intuition.

Noise Image

/image processing   〆   permalink

Monday, March 14th, 2005

The Importance of Phase in Images

Many papers have suggested that phase information in an image is very important. A report from Alan Oppenheim in 1979 entitled Phase in Speech and Pictures demonstrated that much of the structural information in an image is preserved even when it is represented by phase alone.

He describes an experiment in which an image is decomposed into phase and magnitude parts using a Fourier transform, then the magnitude is set to unity, and an image is reconstructed from the remaining phase information.

The idea is that Fourier phase includes important information about the features and details in an image. The following figures show an original and the phase-only reconstruction of an example image. These were produced by the following MATLAB commands:

% start with an image stored in variable "im"
im_fourier = fft2(im);
im_phase = angle(im_fourier);
im_reconstruct_from_phase = abs(ifft2(exp(i*im_phase)));
im_reconstruct_from_phase 

% display original & reconstructed image 
% (scaled for visibility)
imshow(im,[])
imshow(im_reconstruct_from_phase.^.4),[])

original einstein phase-only einstein

Many of the high-frequency structures have been preserved in the phase-only image. Indeed, the transformation into a phase-only image can be approximately interpreted as a high pass filtering operation.

It turns out that the intelligibility of the phase-only representation depends on the magnitude “smoothness” of the signal being looked at. Since most natural images contain mostly low frequency content, their magnitude rolls off quickly at high frequency and this leads to the situation where the “high pass” interpretation of the phase-only transform holds.

/image processing   〆   permalink

Thursday, March 10th, 2005

Overview of Zhou Wang’s “Image Quality Assessment: From Error Visibility to Structural Similarity”

The main idea in this paper (available here) is that human visual perception is built to understand a scene based on its structure suggesting that this structural information is the key component of visual quality. A good way to measure image quality, then, is to quantify the degradation in the structure within a distorted image versus an original.

This is a change in the fundamental assumption from past image quality work. Previous approaches measure perceptual image quality assuming that image intensity is the key component of visual quality. These methods often measure intensity error and then penalize these errors according to visibility.

To get started, let’s go over some definitions of commonly used “image quality” terms and abbreviations.

  • image quality: a field of study with goals of quantifying subjective human-perceived visual quality and developing objective measures that accurately predict subjective quality
  • subjective image quality: human-perceived visual quality, often measured for a group of test subjects and reported as a mean opinion score (MOS)
  • objective image quality: quantitative measures that can accurately predict subjective image quality
  • full-reference: the complete undistorted original image is available
  • no-reference or blind: only the distorted image is available
  • reduced-reference: partial information (extracted features) about the original image is available
  • MSE: mean squared error, the average of squared pixel intensity differences
  • PSNR: peak signal-to-noise ratio

Error-Sensitivity Approach

The assumption here is that the perceived distortion is directly related to the error signal. These approaches apply a sequence of steps consisting of: preprocessing to scale/align and account for human color perception, CSF (contract sensitivity function) filtering to account for human spacial and temporal frequency response, channel decomposition into temporal and spacial subbands, error normalization according to a perceptual masking model, and error pooling to weight errors and come up with a single quality number.

Some common problems with these approaches have been emphasized in this paper, including:

  • the quality definition problem: it’s not clear that error visibility corresponds well with image quality
  • the supra-threshold problem: most perceptual studies have been evaluated with small errors, where the error is producing a JND (just noticeable difference) and therefore, the studies don’t account for large errors very well
  • the natural image complexity problem: the images used to develop perceptual threshold are very simple compared to natural images
  • the cognitive interaction problem: foveation (where a person is likely to look in an image) and cognation of the image also leads to variable image quality perception

Structural Similarity Approach

The goal of the new approach is to “find a more direct way to compare the structures of the reference and the distorted signals.” The assumption is humans extract structural information from images — not pixel intensities.

An image quality metric based on structural similarity can overcome many of the problems associated with the error-sensitivity method. The SSIM index is one specific implementation of a structural similarity approach — it is not the only possible architecture that uses the structural similarity paradigm, but it is interesting as a first example of structural similarity’s utility.

SSIM: An Example Structural Approach

Structural Similarity Diagram

Algorithm Description

The figure above shows a proposed image quality measurement system that compares registered images x and y. The similarity measure SSIM(x,y) is a function of luminance l(x,y), contrast c(x,y), and structure s(x,y). Also, it is necessary to include three constants (C1, C2, and C3) to prevent unstable results when the denominators approach zero.

The average intensity (ux and uy) is used to define the luminance function

l(x,y) = (2*ux*uy + C1) / (ux^2 + uy^2 + C1).

The standard deviation (sx and sy) is used to define the contrast function

c(x,y) = (2*sx*sy + C2) / (sx^2 + sy^2 + C2).

The correlation (sxy) after removing the mean and normalizing by the standard deviation is used to represent structural similarity:

s(x,y) = (sxy + C3) / (sx*xy + C3).

Finally, the similarity is computed as a combination of the luminance, chrominance, and correlation in a general form

SSIM(x,y) = l(x,y)^a * c(x,y)^b * s(x,y)^g

where a > 0, b > 0, and g > 0 are parameters that determine the relative weighting of each term.

For the specific implementation in this paper, SSIM is simplified by choosing a = b = g = 1 and C3 = C2/2, giving

               (2*ux*uy+C1)*(2*sxy+C2)      
SSIM(x,y) = -----------------------------
            (ux^2+uy^2+C1)*(sx^2+sy^2+C2)

Local image statistics are measured in a weighted 11x11 circular window around each pixel to generate SSIM for each pixel. A few other numbers are needed to fully define the parameters C1 and C2. The dynamic range of the pixels is defined as L (255 for 8-bit grayscale). Then, C1 and C2 are given as functions of L and some small constants K1 << 1 and K2 << 1.

C1 = (K1*L)^2
C2 = (K2*L)^2

In the paper, the author uses these settings: K1 = 0.01; K2 = 0.03. A single number representing overall image quality is computed by averaging the SSIM values to give a mean:

MSSIM(X,Y) = 1/M * sum( SSIM(:) ).

Test Results

Using the example MATLAB implementation referenced in the paper, I compared MSSIM with mean-squared error (MSE) for a few images. The following figure shows the test images I used. Also, there is a high-resolution version (540kB).

Test Images

From left-to-right starting across the top row, these images are 1. the original version 2. jpeg-compressed 3. blurred 4. added gaussian white noise 5. mean-shifted 6. contrast-stretched

All of these versions were created to give an equal mean-squared error (MSE) of 60 — this clearly demonstrates that MSE does not correlate with perceived quality. It is clear that the image quality of 2 and 3 is much worse that the others. Let’s see if MSSIM works better.

Table 1: Comparing Image Quality Measures
Image #     MSE     MSSIM
  1          0      1.000
  2         60      0.817
  3         60      0.881
  4         60      0.638
  5         60      0.998
  6         60      0.998

Structural similarity accurately predicts the high quality of images 5 and 6, the mean and contrast-shifted images.

It is interesting to discuss the results from image 4, the one with gaussian white noise added. MSSIM is the lowest for this image, contradicting my expectation that image 4 has a perceptual image quality somewhere between the worst images (2 and 3) and the best images (5 and 6). I wonder why this result didn’t match my expectations …

Anyway, I hope you enjoyed this summary. Please send me suggestions and/or comments.

/image processing   〆   permalink

Tuesday, February 22nd, 2005

Image Quality Assessment

I’m currently taking a course on digital video processing given by Prof. Thrasyvoulos Pappas, my advisor in the Image and Video Processing Laboratory (IVPL) at Northwestern.

For the course project, I’m studying objective image quality metrics, or the computation of a number that corresponds to the perceived quality of an image.

One image quality metric that is often used when comparing a reference and degraded image is the mean squared error (MSE), computed by simply averaging the squared differences between the reference and degraded image. For example, the degraded image could be a highly compressed version of the reference. While MSE is simple to understand and easy to compute, it does not achieve a good correspondance with perceived image quality.

Some interesting image quality methods have been proposed and tested recently. Junquig Chen from the IVPL evaluates metrics used when optimizing image compression, comparing MSE with subband, wavelet, and DCT-based metrics (see the SPIE paper).

Also, some very intersesting work has come from Eero Simoncelli’s Laboratory for Computational Vision (LCV) at New York University. Zhou Wang’s work on his Structural SIMilarity (SSIM) index is the best approach I’ve found so far for quantitatve evaluation of image quality for many different applications.

In upcoming blog entries, I hope to summarize and review some of the most interesting and influential papers that deal with image quality. I’ll start with Zhou Wang’s “Image Quality Assessment: From Error Visibility to Structural Similarity”. Stay tuned ….

/image processing   〆   permalink

Tuesday, August 3rd, 2004

Captcha

I heard about these a while ago, but forgot the name. A Captcha is a test used to tell computers and humans apart. See the Wikipedia definition and the Carnegie Mellon project.

/image processing   〆   permalink

Monday, May 31st, 2004

Face Detection

A pre-requistie for face recognition is face detection. The Robotics Institute at CMU has a great demo with my submission.

/image processing   〆   permalink

Sunday, May 30th, 2004

Face Recognition

I’ve been working on face recognition for my ECE 432 computer vision class at NWU. Here’s some images I’ve been working with:

a montoage of the ALAN database

They come from some photos I took in class one day and from my dad’s student websites.

/image processing   〆   permalink