I see what you speak. NLP and Speech-to-Text (part two)
Our previous article took a closer look at the NLP and the particular application to market research. Read it here.
Today we will look at a branch of NLP that is the speech-to-text. It represents the ability of a machine to transcribe speech.
In market research, we often use focus groups to understand user opinions better. Usually, this happens in the form of an interview or a group session. And the processing of this information is not easy:
- A professional moderator has to lead the audience’s discussion to ensure the opinions will be unbiased.
- The recording shall be processed, and another professional shall create and structurize a transcript with everyone’s opinion.
- Finally, the transcribed data shall be turned into statistical data for market research needs.
The above explanation is just a scratch on the surface of the entire process. It is usually slow and painful and indeed adds additional costs to the service. We often have the situation of human error during the translation or transcribing process.
Things are getting ugly when we have several languages for one in the same study. We need to record all the different interviews, then transcribe the text, find a translation agency, process the final data, or have a multilingual text-coding team. Briefly, a very complicated workflow that adds a lot of costs to the survey process.
Another exciting market research application is having the respondents record their voice via desktop or mobile devices and then get this data using an approach similar to the one mentioned above. The process is even more complicated due to the necessity to store and process the user data.
We now have the benefits of different services available with NLP and text-to-speech APIs for processing audio into text, yet the technology is not widely adopted. But why is that? The pushback comes from several factors:
- Technological. The market research industry is big and slow. More and more MR companies are trying to innovate in the tech field, but they lack technological specialists to bridge the industry with modern development.
- Speech-to-text is a developing technology. Some of the NLP and, more precisely, speech-to-text aspects are yet to be discovered and improved. An example is detecting two people talking and correctly addressing who is saying what – this technology is in a very early stage, even for the English language.
- Data management and processing. There are various aspects of obtaining, storing, and processing audio data GDPR wise that might add to the initial setup complexity.
The above obstacles require a market research team to deeply understand how this technology works. Only this way, it’s possible to develop the appropriate automation software that can leverage the power of the neural networks and the machine learning algorithms of NLP, resulting in improved processes.
Here at Bright, we’re using speech-to-text technology in some of our products, like audio questions in Forsta Surveys™. Interested to learn more – don’t hesitate to get back to us!