Last month I participated in xView2 (2019), a computer vision competition designed to foster the development of models for natural disaster damage assessment. In this post I'll briefly touch upon some of the new, mainly image processing-related, things that I learnt from this challenge.
For details on how I went about the entire challenge, please see my Jupyter notebooks.
Goal of the challenge
A natural disaster event, such as a hurricane or an earthquake, can cause damage to buildings, which will appear different before and after the event in a satellite image. For example, here is a pre-disaster image at some location:
and here is the same location after the disaster event:
It is evident that some damage has been done to some of the buildings and woodland, and the water level has risen. Whilst human experts can go through examples like this to assess the damage of each building, if the disaster zone is very large, it will be too time-consuming and will certainly make any disaster relief effort inefficient. Therefore, and it is the goal of the xView2 challenge, to have a model that can do this automatically and quickly. The output of such a model would look like this
The buildings in it have been coloured in, with different colours indicating a different damage level: blue indicates no-damage, cyan for minor-damage, orange for major-damage, and red indicates that the building is destroyed. With such information, a search-and-rescue crew, for example, would know which houses should be given priority.
Loading and saving masks
In an image perhaps the most intuitive way to indicate where the buildings are is to have a mask over the image and set all those pixels that are over some building to 1, the rest to 0. At various stages during the challenge, the masks needed to be saved to or loaded from disk. There are several ways to do this. Pillow offers PIL.Image.Image.save
and PIL.Image.open
. Matplotlib has plt.imsave
and plt.imread
. These all have slightly different behaviours. Some multiply/divide your array by 255, some automatically broadcast a 1-channel array to a 3-channel array. For a comparison of various combinations of these, see here for details. If you just want to work with masks as 1-channel arrays, the best way I found was to use the two PIL
functions. Suppose mask
is a 1-channel array of type uint8
, then save it to a png file with
PIL.IMAGE.fromarray(mask).save('mask.png')
and load it with
np.array(PIL.Image.open('mask.png'))
This ensures that you are always working with a 1-channel array of type unit8
.
The _label_cls
attribute of ImageList
in fastai v1
Locating buildings in an image is essentially a segmentation task, where 'x' are images and 'y' are masks over the images that indicate where the buildings are. In fastai v1 there are already provided the SegmentationItemList
and SegmentationLabelList
classes for such 'x' and 'y', respectively. Upon inspection, the former uses open_image
, and the latter uses open_mask
to load the data. In particular, the former's class attribute _label_cls
is the latter, so by providing a function that maps 'x' to 'y', you can make a SegmentationLabelList
directly out of a SegmentationItemList
via its label_from_func
methods.1
Polygons
Another way to represent buildings is to used polygons. A polygon in this context is a sequence of points, which, if you join them up one after another with a straight line, will form the boundary of the building, as it appears in the image. The first point and the last point in this sequence are the same, since it's a closed boundary. Each point is represented by its x and y coordinate in the image: \( (x, y) \). So a polygon in general looks like this:
$$
[(x_{1}, y_{1}), (x_{2}, y_{2}), ..., (x_{1}, y_{1})]
$$
The various packages that I found to help the handling of polygons included shapely, imantics, and opencv. Each holds these coordinates in slightly different types, and some even in a different order, so be sure to check these details if you plan to use them together. In the xView2 dataset, polygons are given as well-known text representations. These are just strings of expressions like the above and can be loaded with shapely.wkt.loads
.
Converting Mask to Polygons
For classifying the damage level of each building, images of individual buildings are cropped out of the post-disaster image. A Resnet is used to look at the image of each building and decide how damaged it is. These crops are also refered to as chips or polygon images. Before cropping, it is necessary to know the range of a building's x and y coordinates, and to know these the mask needs to be converted into polygons first. I found that it was nice to use the imantics pacakage to do this:
ps = imantics.Mask(mask).polygons()
If mask
is an array of type uint8
, then passing it to imantics.Mask
creates a mask object, whose polygon
method returns polygons, each of which represents a masked area, a building in the image. So, ps
is a list of polygons.
Converting Polygons to Mask
After each building's damage level has been classified, the damage levels are indicated on the mask. So now, instead of just having either 0 or 1, the mask now has:
-
0 = no building
-
1 = undamaged/unclassified
-
2 = minor damage
-
3 = major damage
-
4 = destroyed
For each building, its polygon and its damage category are used to fill in the mask at the corresponding pixels. This can be done using opencv 's cv2.fillPoly
.2