Having almost unlimited resources in Big Data and AI matters. This is what Wired’s journalists conclude from the study published by Google on Big Data algorithms, carried out in partnership with Carmegie Mullon University, located in Pittsburgh, USA.
Both partners wanted to know how to improve the efficiency of Computer vision algorithms, the famous image recognition technology.
In order to do so, the researchers put forward two hypotheses: Either the algorithms themselves need to be improved or the amount of data to be processed during training needs to be increased considerably.
A resource-intensive hypothesis
Google has therefore put 50 very powerful, and therefore very expensive, graphics processors at the service of the search for two whole months. According to the protocol of the study, the researchers compared the efficiency of Computer Vision’s algorithms fed with 1 million images and those fed by 300 million images…. They had to recognize objects in these files, for example.
As a result, the second sample was much more effective. Yes, increasing the number of data sets improves the accuracy of computer vision algorithms… by three percent.
So why invest in mass storage and powerful graphics processors needed for algorithms to work properly? This can be justified by the possibility, according to the author of the scientific article, to improve this result by optimizing the algorithms for large-scale data sets.
Crucial algorithms for tomorrow’s economic markets
No matter how small the percentage increase in performance, it can be a crucial advantage in the advent of technologies such as the autonomous car, advanced video surveillance or medical imaging.
According to Wired reporters the consequences of this change would be beneficial for companies that manage and possess impressive amounts of data : Google, Facebook, Microsoft, Amazon, Apple, etc…
When GAFAMs protect their data, startups are even more ingenious.
While they often share their technologies and data by making them open source, GAFAMs generally adopt the position of the treasure guardian dragon with regard to Data Sets for artificial intelligence. And when a dataset is released to facilitate the improvement of language algorithms, it is not critical to the competition.
After the publication of the results of this scientific research, Google and the University of Pittsburgh are advocating the use of royalty-free images. They could be made available by Facebook and Microsoft through their respective foundations.
This financially and energy-intensive approach is not, however, the best method for the evolution of the learning machine.
The issue of data ownership generally arises for startups specializing in this area. The recognition of objects in an image is much easier than the detection of a disease, which requires access to medical records. Young companies therefore think their algorithms in an economy of means. They aim to reduce the number of data sets to make their algorithms operational.
The economic divide between the tech giants and the others is therefore not prohibitive. The ability to innovate is also based on imagination and reflection.
Source: Wired – Credit :