With the continuously rising digital era, data is driving the world. Even the photographs or videos you capture on your mobile device are abundant storehouses of valuable data that can be stored, used, and analyzed.
Data from digital images have several real-world applications ranging from medical uses to automation. This is where semantic image segmentation comes into play. A computerized vision task, semantic segmentation helps in labelling specific regions and sections of an image in accordance to what is being shown through the image.
What is image segmentation?
The images we see are simple collections of pixels. Image segmentation is the process when we classify each pixel of an image into a certain class and thus can be viewed as a pixel classification problem. There are two types of segmentation techniques.
Semantic segmentation is the process where we classify each pixel of an image that belongs to a particular label/ class. There is no difference between separate instances of the same object. For example, if the image contains two apples, semantic segmentation will give all pixels of both apples the same name.
Instance segmentation is different from semantic segmentation as in that it gives each instance of a particular object in the image a unique label/ class. For example, if there are three monkeys in a picture, it will assign every monkey a different label and color whereas in semantic segmentation every object is assigned the same label and color.
What is semantic image segmentation and how is it different from instance segmentation?
All digital images are made up of pixels. Pixels determine the colour and the quality of the image. An image made using a smaller number of pixels appears blurred, rough, or pixelated when magnified. Whereas, images with a higher concentration of pixels per unit area maintain good quality, even when it is magnified. Pixels have associated electronic signals and digital values and are also logical units. Semantic image segmentation can be referred to as a data management process that involves linking or segmenting each pixel in an image to a class label.
To understand the concept better, consider the example of an image depicting a brown cat and a black cat sitting next to a green ball. Semantic segmentation would create a mask over each cat and another mask over the green ball, and label them as ‘cat’ and as ‘ball’ respectively. Each pixel that forms the image of either subject is linked to the respective label. Each pixel that forms the brown cat is linked to the label ‘cat’, each pixel that forms the black cat is also linked to the label ‘cat’ whereas, each pixel that forms the green ball gets linked to the label associated with the green ball.
To create a contrast, Instance segmentation would create individual labels for each cat.
How does Semantic Segmentation work?
There are multiple approaches based on deep learning that can be used to achieve semantic segmentation. One of the most basic approaches to semantic segmentation is the encoder-decoder method. The encoder consists of a set of layers that extract features from an image using filters. Encoders are usually trained beforehand in tasks such as image classification whereby it learns correlations from multiple sets of images.
The data collected by the encoder is utilized during the image segmentation process. The segmentation mask, which is the final product, is generated by the decoder. The decoder also handles the function of ensuring the generated mask bears resemblance to the pixel resolution of the input image.
Other approaches such as fully convolutional network-based semantic segmentation are also possible. FCN is an approach that utilizes a special type of artificial neural network that is capable of providing a segmented image of the original image where the required elements alone are highlighted as needed. Fully convolutional networks are used when the task requires defining the shape and location of a required object.
What are some important applications of Semantic Segmentation?
Images generated by imaging technology such as CT or MRI can be analyzed and understood better if semantic segmentation is used. The use of fast computer vision algorithms increases the efficiency of diagnoses made by visual observation. Saving time for doctors and streamlining the treatment process.
Self Driving Cars
Autonomous cars require the capability of identifying objects in their path to determine how fast they are going to their destination. Not being able to identify living beings or hazards on the road could endanger lives.
Photo and video editing
Modern photo and video editing tools are capable of editing the different areas of the input file. This involves separating the foreground and background of images.
Virtual fitting or makeup
Semantic segmentation is the technology that allows virtual fitting of clothes or accessories and virtual application of makeup possible by focusing only on the clothing, accessories, or makeup.