Almost 2 million people in the UK suffer from some form of sight loss and around 300,000 are registered blind or partially sighted. Of these, 180,000 say that they never go outside on their own and are completely dependent on other people.

A guide dog would give independence to these people; however, they cost around £45,000 from birth to retirement, are in short supply and are not a perfect solution. A guide dog would have a typical working life of only 6 years before it must retire. The robotic solution is cheaper, more effective and more readily available.

Robotic Guide Dog photo Costing around £1000 to produce in its prototype form, the robotic guide dog fully replaces a conventional guide dog. It can easily be expanded to include GPS navigation and could last for many years. All of the parts needed to make it are easily sourced and the design is mechanically simple to keep production costs down and availability high.

Existing solutions for robotic navigation include LiDAR, ultrasound and forward facing cameras. These all have downsides - ultrasound has very low resolution, whereas LiDAR is very expensive and they all give a head on view of the surroundings, which is very rarely ideal.

The ideal solution would be to have a live, 3D, aerial image of the area surrounding a robot. The 3 dimensionality of the image would show the heights of objects off the ground, or an arbitrary reference. This height is what an obstacle is defined as and the image can easily be thresholded and navigated very similar to a game.

Evidently, it is impossible to have a true aerial image and it would be impractical to have a camera system on a pole because the height it would need to be would impact the mobility of the robot and the pole would block out around half of the image.

Square based pyramid By modelling an aerial image as a square based pyramid, where the top point is a camera and the bottom face is the image produced by the camera, it is possible to slice the pyramid parallel to the bottom face to create a square frustum. The bottom face of this shape represents the same image and by placing cameras at each of the four corners on the top face, the same image is received, once it is stitched together.

Square frustum Mounting these four cameras on a mast fully emulates the impossible aerial image and can remove the mounting pole from the image by using a stitching method to get data from all four cameras. The robot is also always stationary relative to the mast, allowing it to be easily identified in the aerial image and saving processing time of the important regions.

Because multiple cameras are used instead of just one, the array forms four stereoscopic pairs and each pair can create a disparity map, calculating the distance to the ground. If this distance is greater than a predefined constant, the area is considered an obstacle.

Camera mast on the Robotic Guide Dog This method is far more accurate than, for example, boundary detection, with which tarmac joins on roads, patterned floors and rugs would all be deemed impassable obstacles. Using the definition of an obstacle as an area of non-traversable height, false positives are eliminated.

The square based pyramid/square frustum model is very simple to understand and demonstrates the whole system fairly accurately; however, it is more accurate to model each point of the square frustum as a square based pyramid. This is very hard to draw and much easier to visualise. The structure resembles an upside down molar.

Diagram of overlapping images Birds-eye view diagram of overlapping images The diagram on the left shows the true crossover of the cameras and the diagram on the right shows the aerial view of the captured images.

The central region is the area closest to the robot which is captured by up to three pairs of cameras (not four due to the pole). This area is the most important area because it is the first area to come in contact with the moving robot. Using three pairs around this area hugely increases the accuracy of the measurements and anomalies and noise can be statistically removed without losing any data or resolution.

All other areas have at least one pair of cameras collecting data from them, except for the four rectangular regions along each side - these areas are only reached by one camera and no depth information is retrieved from here.

It is important to note on this diagram that the centres of the overlapping squares are the positions of the cameras, not the corners of the squares, so the frustum arrangement would be rotated 45 degrees from the expected position, were it to be drawn on.

Mounted Beagleboards For processing, the guide dog uses 4 Beagleboards networked together with a netbook. This allows a computer to control each camera and process parts of the image before sending over the network for stereoscopy. A 150A surge motor controller is driven by the netbook which locates the user and processes the disparity maps to determine safe areas. With the 3D processing, the robot can process at around 12 fps; and without it, the user is located at the speed of the cameras (30fps). All code is written in C++ using the OpenCV libraries.

A small coded marker is placed on the foot of the user and is tracked by the front left camera. An algorithm running on the netbook locates this marker within a region of interest. This region is the expected location of the user, which helps speed up the processing and decreases the likelihood of interference.

Regions for movement based on marker position The region is split into a further three sections. The top section is the 'forward' section and the bottom section is the 'backward' section. When the marker is located within these regions the robot will move forwards or backwards, if it is safe to do so. The central section stops the movement of the robot and measures the rotation of the marker and corrects so it is in line with the marker. This method prevents feedback oscillations and allows the robot to walk 'in stride' with the user without being controlled - the robot just follows a normal walk.

To check whether it is safe to move forwards or backwards, several regions are defined in the depth maps. Because the "aerial" image is always stationary relative to the robot, these regions are the same in every frame. The forwards and backwards regions are strips across the front and the back of the robot. There are two on each side, one in front of the other, with the first being very close to the front of the robot.

Sample top region
A sample top region disparity map. Colour represents depth, or distance.

Sample bottom region
A sample bottom region disparity map. Colour represents depth, or distance.

The region closest to the robot is the 'current' zone and the region after that is the next zone. The current zone is essentially the ground the robot is currently on and the next zone is the ground the robot will be on in the next frame. Because they are on a disparity map, the average of these regions is the average distance to things on the ground, and the two zones can be compared to see whether the difference is traversable.

The actual method splits these zones into smaller zones so that the robot can navigate around smaller obstacles and so that the averaging method doesn't lose smaller obstacles. A similar method is used for the sides for when the robot rotates.

Safe region segmentation

The robot will follow the user unless they are about to move into an unsafe zone. If this happens, the robot will attempt to move around it, but if that is impossible, the robot will stop. The feedback to the user is in the form of pulling on a lead. This emulates a conventional guide dog and allows current guide dog users to use the robot without needing retraining.

Guiding a user through a potentially hazardous environment is a difficult task, but my frustum based camera method greatly simplifies the problem, allowing areas to be navigated quickly and accurately. The guide dog works as a direct swap in for a conventional guide dog and has proved the method.

As this is only a prototype, several changes will need to be made before production - the most significant of these is the size of the robot. Currently, it is based around an electric wheelchair frame. This was the ideal platform for testing as it was structurally usable and had plenty of space for modifications and easy rewiring. The actual components of the robot would be able to fit into a tiny space and I am considering other platforms that would be more discreet and more usable now that the method has been proved.

I really hope that the production of this robot goes ahead, as it solves a serious need and has the potential to improve the quality of life of many vulnerable people, while saving money. I also hope that the method presented is useful for other applications requiring accurate obstacle avoidance and local navigation.