Augmented Reality X-Ray Vision with Gesture Interaction

Augmented reality is a new technology which is capable of presenting possibilities that are difficult for other technologies to offer and meet. AR will really alter the way individuals view the world. Augmented reality X-Ray Vision is an emerging concept. While AR deals with virtual and real objects coexisting in the same space, AR X-Ray Vision is a subdivision of the broad spectrum of AR, which provides a “see through” vision among real world objects. In this paper, we have thoroughly analysed the existing methodologies dealing with AR X-Ray Vision and we have come up with a convenient method that enables easy implementation. This paper deals with creating a methodology to provide an X-Ray vision using the anaglyph technique and finally integrating it with the Leap Motion Controller to enable gesture interaction to move the window around through which the point of interest can be viewed. The limitations of the suggested methodology have also been discussed . This system enables the user to perceive depth between two regions with the help of just anaglyph glasses without the use of any head mounted display devices. Can be extended to that of medical field, where X-Ray vision is of increasing importance to view the layers of skin and bones of a patient giving the doctors and surgeons an approximate depth perception.


Introduction
The term "augmented reality" was coined in 1992 to refer to overlaying computer-presented material on top of the real world. From early 2004, Augmented Reality 9 has experienced the evolution of a lot of promising computer vision algorithms which allow users, with the integration of powerful 3D render engines and other interactive tools 12 , to experience this Augmented World in a better way. A few years back, it would have been fit to say that Augmented Reality was the next big thing. Now, Augmented Reality is becoming as common as that of mobile phones. The capabilities of new devices in last years have contributed to the spreading of Augmented Reality, not only in scientific research but also for development and commercial purposes. Every other technology has already or is trying to integrate itself with Augmented Reality. One of the interesting extensions of the AR system is the idea of "X-Ray Vision". Rather, we can call it the ability to "see through" one surface (the occluding 11 region) to view the other surface (the occluded 11 region).
We see two techniques, the edge overlay and tunnel cutout 1 . The edge overlay visualization provides depth information to make hidden objects appear to be behind walls, rather than floating in front of them. It provides additional depth information and spatial awareness to users, by implementing edge detection technique on the occluding region and overlaying it on the occluded region. This translucent edge overlay method gives a sense of depth perception to the users rather than making the occluded object float on top of it. The limitations in this method are that if the background is too cluttered then too many edges are drawn and the remote scene can be difficult to see or if the background is too plain or lacks contrast, then few edges are drawn and little sense of depth is achieved. The tunnel cutout visualization method displays a representation of obstructing layers that may exist between the user and the remote scene by creating a tunnel like effect. The limitations in this method is that if there are too many occluding regions between the user and the occluded objects, then the tunnel may extend too long and much of screen space is wasted. In 2 , the authors have tried to implement other salient features like hue, luminosity and motion apart from edges. This provides a richer context of the occlude object but does not impede users to select objects in the occluded area. The results points out that this method needs a higher level of adapting. Terms like occlusion, binocular disparity, stipple, virtual hole and motion parallax are described in 3 . It also talks about different depth perception protocols like verbal estimation, forced choice and perceptual matching among others. Anaglyph 3D reconstruction from an image using depth maps is discussed in 4 . In this paper, this technique is extended to that of videos and not just images. Implementation is done for each and every frame in the video captured. Recent trends and advances are covered in 5,16 covering head mounted or see through displays, projection displays, visualization and perceptual problems. How head mounted displays are used to achieve Superman-like X-Ray vision are covered in 10,15,17 and a survey of AR and the importance of Medical AR can be understood from 10,14 . An evaluation of the Leap Motion Controller, its suitability for static and dynamic tracking is done in 6,13 . This would give an understanding on the precision and reliability of the Leap Motion Controller and why it has been chosen for implementation of gesture interaction in this project.

Problem Presentation
Our view of the real world need not be limited by what our naked eye can see. The physical world can be enhanced to enjoy the benefits of digital media, by just augmenting the physical world with it. The potential for Augmented Reality X-Ray vision is undoubtedly endless. In this paper, we are creating a 3D based reconstruction of a remote scene based on the captured video feed, providing X-ray and depth perception 8 for each of the frames. Finally, incorporating gesture interaction with this AR system, so that the user experiences a better visual perception.

Proposed Methodology
Our approach involves the following steps:

Initial Setup
Live video feeds of the occluding region and the occluded region are captured by placing the cameras at appropriate distances in front of these regions. The system design is shown in (Figure 1).

Merging the Video Feeds
Video Feeds (Figure 2) shows how the two videos are merged, with the larger frame denoting the occluding region and the smaller window surrounded by a black border, displaying the occluded region. The video feed of the occluding region is obtained and the window size to view the occluded region is estimated. The pixels of the occluding region, enclosed by the window with coordinates: (x,y), (x+w,y ), (x+w,y+w), (x,y+w), are set to white, by initializing the RGB components to 255, where (x,y) denotes the top left corner of the window and w is the width of the window. A similar operation is performed with the video feed of the occluded region as well. Here, the pixels of the occluded region excluding the ones that are enclosed by the window are set to white. Now, an AND operation is performed between the pixels of the two video feeds for each frame. This will result in an overlaid visualization of the two videos as shown in the ( Figure 2). For adding depth perception, the following method is used.

Implementing Anaglyph
As discussed, the anaglyph methodology has been applied to every frame of the video feed. A copy of each frame is made. (Figure 3) shows the red filter applied to one copy   Similarly the blue filter ( Figure 4) is applied to the other copy of the frame by obtaining the 'B' component for each pixel leaving out the 'R' and 'G' component. These two frames with channel filters applied, are superimposed, while making use of the depth map 7 and the displacement map with corresponding offsets. The super imposed frame is shown in (Figure 5).    region at that position. This is achieved by integrating the system with the Leap Motion Controller. This enables gesture interaction with the system, whereby one can move the window to the desired location within the frame. An arbitrary gesture can be used. A 'forward pointed forefinger' , has been used in this case, to move the window around.

Results and Analysis
Initially, we made the whole system run as single threaded. This resulted in crashing of the system within seconds of running it as it included processing every pixel in each frame of both the video feeds. Later improvisations were done to avoid this glitch by running the system in a multi-threaded environment. Each tasks were run on a separate thread. This helped in improving the efficiency and effectiveness of the AR X-Ray system. Adopting anaglyph methodology posed to be an easy alternative over the other existing methods like the Edge overlay and the Tunnel cut out, but didn't seem very effective in providing accurate depth perception of the occluded region. Anaglyph methodology provides a stereoscopic vision but does not give the exact depth perception for the user. However, implementing this method is a faster and a simple way to render video formats while considering the complexity of the system. Using Leap Motion posed to be an advantage because of its high precision and sensitivity, giving the user the feel of looking through one region from another with the help of a rectangular hole made in the occluded region, while enabling them to move the window around just by using the finger tips.

Conclusion
This paper explains the implementation of anaglyph methodology for depth perception. While there are other methods existing for augmented reality x-ray vision that are more accurate, those have been mostly theories or methods that are complex to implement. The method that we have suggested is easier to implement and is simple. But this does not provide accurate depth perception. On analyzing the effectiveness of this system, we have come to a conclusion that it has to be improved in its qualitative aspects. Nevertheless, this system is not affected by any external factors like luminosity, as these factors do not have an effect over calculating the depth map.
All it requires is for the video feed to be of considerable quality, as that would directly determine the quality of the anaglyph generated. This method with further improvisations can be extended to the medical field where it can be used to improve the visual saliency 8 for the doctors.