Lens Distortion under water

Online.Lens Distortion under water

Published on Thursday, July 11, 1996 by Gideon Ariel

NASA Technical Memorandum 104795

Evaluation of Lens Distortion Errors Using
An Underwater Camera System
For Video-Based Motion Analysis

Jeffrey Poliner
Lockheed Engineering & Sciences Company
Houston, Texas

Lauren Fletcher & Glenn K. Klute
Lyndon B. Johnson Space Center
Houston, Texas


Video-based motion analysis systems are widely employed to study human movement, using computers to capture, process, and analyze video data. This video data can be collected in any environment where cameras can be located.

The Anthropometry and Biomechanics Laboratory (ABL) at the Johnson Space Center is responsible for the collection and quantitative evaluation of human performance data for the National Aeronautics and Space Administration (NASA). One of the NASA facilities where human performance research is conducted is the Weightless Environment Training Facility (WETF). In this underwater facility, suited or unsuited crew members or subjects can be made neutrally buoyant by adding weights or buoyant foam at various locations on their bodies. Because it is underwater, the WETF poses unique problems for collecting video data. Primarily, cameras must be either waterproof or encased in a waterproof housing.

The video system currently used by the ABL is manufactured by Underwater Video Vault. This system consists of closed circuit video cameras (Panasonic WV-BL202) enclosed in a cylindrical case with a plexiglass dome covering the lens. The dome used to counter the magnifying effect of the water is hypothesized to introduce distortion errors.

As with any data acquisition system, it is important for users to determine the accuracy and reliability of the system. Motion analysis systems have many possible sources of error inherent in the hardware, such as the resolution of recording, viewing and digitizing equipment, and l a video-based motion analysis system. It is, therefore, of interest to determine the degree of this error in various regions of the lens. A previous study (Poliner, et al., 1993) developed a methodology for evaluating errors introduced by lens distortion. In that study, it was seen that errors near the center of the video image were relatively small and the error magnitude increased with the radial distance from the center. Both wide angle and standard lenses introduced some degree of barrel distortion Fig 1.

Since the ABL conducts underwater experiments that involve evaluating crew members’ movements to understand and quantify the way they will perform in space, it is of interest to apply this methodology to the cameras used to record underwater activities. In addition to distortions from the lens itself, there will be additional distortions caused by the refractive properties of the interfaces between the water and camera lens.

This project evaluates the error caused by the lens distortion of the cameras used by the ABL in the WETF.


Data Collection

A grid was constructed from a sheet of 0.32 cm (0.125 in) Plexiglas. Thin black lines spaced 3.8 cm (1.5 in) apart were drawn vertically and horizontally on one side of the sheet. Both sides of the sheet were then painted with a WETF approved white placite to give color contrast to the lines. The total grid size was 99.1 x 68.6 cm (39.0 x 27.0 in). The intersections of the 19 horizontal and 27 vertical lines defined a total of 513 points (fig. 2). The center point of the grid was marked for easy reference. Using Velcro, the grid was attached to a wooden frame, which was then attached to a stand and placed on the floor of the) WFTF pool. Fig 2.

At the heart of the Video Vault system was a Panasonic model WV-BL202 closed circuit video camera. The camera had been focused above water, according to the procedures described in and, and placed on the WETF floor facing the grid. Divers used voice cues from the test director for fine alignment of the camera with the center of the grid. By viewing the video on pool side monitors, the camera was positioned so that a predetermined region of the grid nearly filled the field of view. The distance from the camera to the grid was adjusted several times, ranging from 65.3 to 72.6 cm (25.7 to 28.6 in). Data collection consisted of videotaping the grid for at least 30 seconds in each of the positions, with each position considered a separate trial. Descriptions of the arrangements of the four trials are given in Table 1.

Distance refers to the distance from the camera to the grid. Image size was calculated by estimating the total number of grid units from the video. The distance from the outermost visible grid lines to the edge of the image was estimated to the nearest one-tenth of a grid unit. The distance and image size values are all in centimeters.

Data Analysis

An Ariel performance analysis system (APAS) was used to process the video data. Recorded images of the grid were played back on a VCR. A personal computer was used to grab and store the images on disk. For each trial, several frames were chosen from the recording and saved, as per APAS requirements. From these, analyses were performed on a single frame for each trial.

Because of the large number of points (up to 357) being digitized in each trial , the grid was subdivided into separate regions for digitizing and analysis. Each row was defined as a region and digitized separately.

An experienced operator digitized all points in the grid for each of the trials. Here digitizing refers to the process of the operator identifying the location of points of interest in the image with the use of a mouse-driven cursor. Often digitizing is used to refer to the process of grabbing an image from video format and saving it in digital format on the computer. Digitizing and subsequent processing resulted in X and Y coordinates for the points.

Part of the digitizing process involved identifying points of known coordinates as control (calibration) points. Digitization of these allows for calculation of the transformation relations from image space to actual coordinates. In this study, the four points diagonal from the center of the grid were used as the control points (points marked “X” in fig. 2). These were chosen because it was anticipated that errors would be smallest near the center of the image. Using control points which were in the distorted region of the image would have further complicated the results. The control points were digitized and their known coordinates were used to determine the scaling from screen units to actual coordinates.

For trial 1, the coordinates ranged from 0 to approximately _38.1 cm in the X direction and 0 to approximately _30.48 cm in the Y direction. For trials 2 and 3, the ranges were 0 to _34.29 cm in X and 0 to _26.67 cm in Y. For trial 4, the range was 0 to _34.29 cm in X and 0 to -22.86 and +26.67 in Y. To remove the dependence of the data on the size of the grid, normalized coordinates were calculated by dividing the calculated X and Y coordinates by half the total image size in the X and Y directions, respectively. Table 1 lists these sizes for the four trials. Thus, normalized coordinates in both the X and Y directions were dimensionless and ranged approximately from -1 to +1 for all four trials.

For all trials, the error for each digitized point was calculated as the distance from the known coordinates of the point to the calculated coordinates.


Raw data from the four trials are presented in figure 3. Shown are graphs of the calculated normalized coordinates of points. Grid lines on the graphs do not necessarily correspond to the edges of the images.

For each trial, the error of each point was calculated as the distance between the calculated location (un-normalized) and the known location of that point. These error values were then normalized by calculating them as a percent of half the image size in the horizontal direction (trial 1, 40.2 cm; trial 2, 36.8 cm; trials 3 and 4, 36.2 cm). This dimension was chosen arbitrarily to be representative of the size of the image.

Figure 4 presents contour plots of the normalized error as a function of the normalized X-Y location in the image for each of the trials. This type of graph, commonly used in land elevation maps, displays information three dimensionally. The coordinate axes represent two of the dimensions. Here, these were the X and Y coordinates of the points. The third dimension represents the value of interest as a function of the first two dimensions, in this case, the error as a function of the X and Y location. Curves were created by connecting points of identical value.

Interpreting these graphs is similar to interpreting a land map; peaks and valleys are displayed as closed contour lines. Once again, it was clear that errors were small near the center of the image and became progressively greater further away from the center.

The unevenness evident in some of these graphs can be partly attributed to splitting the image into separate regions for the purpose of digitizing. The control points were redigitized for each individual section. Since the control points were close to the center of the image, a small error in their digitization would be magnified for points further away from the center.

Another quantitative way of viewing this data was to examine how the error varied as a function of the radial distance from the center of the image. This distance was normalized by dividing by half the image size in the horizontal direction (trial 1, 40.2 cm; trial 2, 36.8 cm; trials 3 and 4, 36.2 cm). Figure 5 presents these data for each of the four trials.

Linear and binomial regressions were then fit to the data for each trial and for all data combined. The linear fit was of the form

Error = An + A0 R

where R was the radial distance from the center of the image (normalized), and Ao and A1 were the coefficients of the least-squares fit. The binomial fit was of the form:

Error = Bo + B1 R + B2 R2

where Bo, B1, and B2 were the coefficients of the fit. The results of these leastsquares fits are presented in table 2. </a> <br>The columns labeled ” RC” are the squares of the statistical regression coefficients (r-square)



When reviewing these results, several points should be noted. First, this study utilized a two-dimensional analysis algorithm. A limitation of the study was that exactly four calibration points were required to define the scaling from screen coordinates to actual coordinates. The use of more than four points would likely result in less variability. Second, all coordinates and calculated errors were normalized to dimensions of the image. Although there were many possibilities for the choice of dimension (e.g., horizontal, vertical or diagonal image size; maximum horizontal, vertical, or diagonal coordinate; average of horizontal and vertical image size or maximum coordinate; etc.), the dimensions used to normalize were assumed to best represent the image size.

It is clear from these data that a systematic error caused by lens distortion occurred when using the underwater video system. Lens distortion errors were less than 1% from the center of the image up to radial distances equivalent to 25% of the horizontal image length (normalized R equal to 0.5). Errors were less than 5% for normalized R up to 1, an area covering most of the image.

There seemed to be some degree of random noise. This was evident in the scatter pattern seen in the graphs in figure 5. This error can most likely be attributed to the process of digitizing. There are factors which limit the ability to correctly digitize the location of a point, such as: if the point is more than one pixel in either or both dimensions, irregularly shaped points, a blurred image, shadows, etc. Because of these factors, positioning the cursor when digitizing was often a subjective decision.

Four trials were analyzed in this study. Although all the data were normalized, there were slight differences among the four trials (fig. 5 and table 2). These can most likely be attributed to the uncertainty in determining the grid size, which was estimated from the fraction of a grid unit from the outermost visible grid lines to the edge of the images.Two types of regressions were fit to the data: linear and binomial. The interpretation of the coefficients of the linear regression can provide insight into the data. A1, the slope of the error-distance relation represents the sensitivity of the error to the distance from the origin. Thus, it is a measure of the lens distortion. A0 the intercept of the linear relation can be interpreted as the error at a distance of zero. If the relation being modeled were truly linear, this would be related to the random error not accounted for by lens distortion. However, in this case, it is not certain if the error-distance relation was linear. The RC values gave an indication of how good the fit was. The binomial curve fit seemed to more correctly represent the data. The interpretation of these coefficients, however, is not as straightforward.


This study has taken a look at one of the sources of error in video-based motion analysis using an underwater video system. It was demonstrated that errors from lens distortion could be as high as 5%. By avoiding the outermost regions of the lens, the errors can be kept to less than .5%.


Poliner, J., Wilmington, R.P., Klute, G.K., and Micocci, A. Evaluation of Lens Distortion for Video-Based Motion Analysis, NASA Technical Paper 3266, May 1993.

Reference: /main/adw-10f.html