Facebook has announced that it will be open sourcing several of the computer vision tools it has developed, including DeepMask, SharpMask, and MultiPathNet.
All three tools were developed by the Facebook Artificial Intelligence Research (FAIR) team, with the goal of teaching computers to intelligently breakdown what images and recognize objects, locations, and people. According to Facebook, the company hopes that opening up its tools will help the AI research community as a whole to move forward with new advancements in computer vision.
“We’re making the code for DeepMask+SharpMask as well as MultiPathNet — along with our research papers and demos related to them — open and accessible to all, with the hope that they’ll help rapidly advance the field of machine vision,” Piotr Dollar, a researcher with FAIR, wrote in a blog post. “As we continue improving these core technologies we’ll continue publishing our latest results and updating the open source tools we make available to the community.”
What do they do?
In his post, Dollar went into detail about how exactly Facebook’s tools work, beginning with DeepMask. According to Dollar, DeepMask breaks images down into a grid-like series of patches. The program then looks at each patch to see if it contains any objects, and if so, how many. This process allows DeepMask to determine how many objects it contains what what their general shapes are, but it does not capture any details about those objects. That is where SharpMask comes in.
True to its name, SharpMask takes the vague data from DeepMask and sharpens it, bringing objects into focus and perfecting their shapes. Dollar said that SharpMask analyzes images pixel by pixel based on information already gathered from DeepMask.
“To capture general object shape, you have to have a high-level understanding of what you are looking at (DeepMask), but to accurately place the boundaries you need to look back at lower-level features all the way down to the pixels (SharpMask),” Dollar said. “In essence, we aim to make use of information from all layers of a network, with minimal additional overhead.”
Of course, simply figuring out an object’s shape is only the first step in computer vision, and the real challenge is recognizing what that shape represents. If DeepMask and SharpMask are the eyes, then MultiPathNet is the brain. MultiPathNet uses a deep learning neural network to examine the shapes created by DeepMask and SharpMask and assign meaning to them.
So for example, DeepMask might look at an image and determine that it contains six misshapen blobs of some sort. SharpMask would then come in and determine that those six blobs actually have legs, feet, ears, and so on. Finally, MultiPathNet would analyze those shapes and recognize that they are actually six sheep standing in a field.