Almost 2 million people in the UK suffer from some form of sight loss and around 300,000 are registered blind or partially sighted. Of these, 180,000 say that they never go outside on their own and are completely dependent on other people.
A guide dog would give independence to these people; however, they cost around £45,000 from birth to retirement, are in short supply and are not a perfect solution. A guide dog would have a typical working life of only 6 years before it must retire. The robotic solution is cheaper, more effective and more readily available.
Costing around £1000 to produce in its prototype form, the robotic guide dog fully replaces a conventional guide dog. It can easily be expanded to include GPS navigation and could last for many years. All of the parts needed to make it are easily sourced and the design is mechanically simple to keep production costs down and availability high.
Existing solutions for robotic navigation include LiDAR, ultrasound and forward facing cameras. These all have downsides - ultrasound has very low resolution, whereas LiDAR is very expensive and they all give a head on view of the surroundings, which is very rarely ideal.
The ideal solution would be to have a live, 3D, aerial image of the area surrounding a robot. The 3 dimensionality of the image would show the heights of objects off the ground, or an arbitrary reference. This height is what an obstacle is defined as and the image can easily be thresholded and navigated very similar to a game.
Evidently, it is impossible to have a true aerial image and it would be impractical to have a camera system on a pole because the height it would need to be would impact the mobility of the robot and the pole would block out around half of the image.
By modelling an aerial image as a square based pyramid, where the top point is a camera and the bottom face is the image produced by the camera, it is possible to slice the pyramid parallel to the bottom face to create a square frustum. The bottom face of this shape represents the same image and by placing cameras at each of the four corners on the top face, the same image is received, once it is stitched together.
Mounting these four cameras on a mast fully emulates the impossible aerial image and can remove the mounting pole from the image by using a stitching method to get data from all four cameras. The robot is also always stationary relative to the mast, allowing it to be easily identified in the aerial image and saving processing time of the important regions.
Because multiple cameras are used instead of just one, the array forms four stereoscopic pairs and each pair can create a disparity map, calculating the distance to the ground. If this distance is greater than a predefined constant, the area is considered an obstacle.
This method is far more accurate than, for example, boundary detection, with which tarmac joins on roads, patterned floors and rugs would all be deemed impassable obstacles. Using the definition of an obstacle as an area of non-traversable height, false positives are eliminated.
The square based pyramid/square frustum model is very simple to understand and demonstrates the whole system fairly accurately; however, it is more accurate to model each point of the square frustum as a square based pyramid. This is very hard to draw and much easier to visualise. The structure resembles an upside down molar.
The diagram on the left shows the true crossover of the cameras and the diagram on the right shows the aerial view of the captured images.
The central region is the area closest to the robot which is captured by up to three pairs of cameras (not four due to the pole). This area is the most important area because it is the first area to come in contact with the moving robot. Using three pairs around this area hugely increases the accuracy of the measurements and anomalies and noise can be statistically removed without losing any data or resolution.
All other areas have at least one pair of cameras collecting data from them, except for the four rectangular regions along each side - these areas are only reached by one camera and no depth information is retrieved from here.
It is important to note on this diagram that the centres of the overlapping squares are the positions of the cameras, not the corners of the squares, so the frustum arrangement would be rotated 45 degrees from the expected position, were it to be drawn on.
For processing, the guide dog uses 4 Beagleboards networked together with a netbook. This allows a computer to control each camera and process parts of the image before sending over the network for stereoscopy. A 150A surge motor controller is driven by the netbook which locates the user and processes the disparity maps to determine safe areas. With the 3D processing, the robot can process at around 12 fps; and without it, the user is located at the speed of the cameras (30fps). All code is written in C++ using the OpenCV libraries.
A small coded marker is placed on the foot of the user and is tracked by the front left camera. An algorithm running on the netbook locates this marker within a region of interest. This region is the expected location of the user, which helps speed up the processing and decreases the likelihood of interference.
The region is split into a further three sections. The top section is the 'forward' section and the bottom section is the 'backward' section. When the marker is located within these regions the robot will move forwards or backwards, if it is safe to do so. The central section stops the movement of the robot and measures the rotation of the marker and corrects so it is in line with the marker. This method prevents feedback oscillations and allows the robot to walk 'in stride' with the user without being controlled - the robot just follows a normal walk.
To check whether it is safe to move forwards or backwards, several regions are defined in the depth maps. Because the "aerial" image is always stationary relative to the robot, these regions are the same in every frame. The forwards and backwards regions are strips across the front and the back of the robot. There are two on each side, one in front of the other, with the first being very close to the front of the robot.
A sample top region disparity map. Colour represents depth, or distance.
A sample bottom region disparity map. Colour represents depth, or distance.