For a recent project involving something similar to training a classification algorithm, I needed to label a small dataset of images (between 100 and 500 labeled items was enough). The labeling is not simple and it involved some amount of human appreciation, therefore I needed to have several people label the dataset independently to get significant results.
After pondering the idea of building a small GUI or Web App through which other people could help me label the dataset, I realised that I would have to either host a service or send the source code around to have people label the dataset, in which case centralizing the results would be difficult.
Google Forms ?
Google forms are a very simple to use and practical tool. In a matter of minutes, you can obtain a fairly complex form with multiples choice questions, sliders, grid questions, etc. Google forms also support image and video integration via Youtube. Form responses can also be exported to a Google Sheet that can itself be further exported to CSV or .xlsx.
One attentive reader would object: this solution is very much tied to the Google ecosystem in all possible ways. And they would be right, I’d be thrilled to have access to an equivalent open source tool with that many features. Of course, alternatives to Google forms exist, but the ones I know of either lack the ease of use, the form sharing capabilities, or the feature that I believe to be essential and that I will present in this post: Scripting capabilities.
That’s why I eventually resorted to using Google Forms for my data labeling, falling into the trap of a comfy easy-to-use ecosystem. In this post, I’ll explain a minimal example that consists in creating a form for labeling a dataset of cars and planes.
Get the data ready
To begin, you will need to have a Google Drive account set up that will be used to host your dataset. What you need to do is upload your dataset in some new folder in your Drive. I’ll call mine data.
Loading comments...