Need autonomous driving training data? ›

Wikipedia’s Text-to-Speech Project is the Latest Example of the Importance of Domain-Specific Human Insights

Wikipedia’s Text-to-Speech Project is the Latest Example of the Importance of Domain-Specific Human Insights

Seeking and utilizing domain-specific human insights is the only path forward to obtaining the necessary high-quality training data for machine learning and artificial intelligence. Not a believer yet? Fine—but Wikipedia is.

Recently, the free online encyclopedia announced a major initiative to make its website more accessible by adding text to speech, allowing users to have portions of the Wikipedia text read to them. They determined that nearly 125 million people—a quarter of their users—“need or prefer” text to be spoken.

So, how does Wikipedia convert over 38M articles (growing by 20K a month)? Simple. They will leverage their known community of wiki users to complete tasks, resulting in human insights Wikipedia will then use to train their speech-to-text system. For example, let’s take a poorly spoken sentence. A wiki user will flag this as a sentence they can’t understand, and then correct it. This is the power of having a known community with experience in a specific domain provide their insights around complex unstructured data problems.

Without wiki users (a domain-specific known community), Wikipedia would likely rely on an unknown community, which would result in poor quality and likely project failure. Imagine the impact: 125 million users would never be able to access Wikipedia information in a format that works best for them. That matters a lot, because today’s consumers expect personalization and optimization—if they can’t get what they need in the way they want it, they’ll look elsewhere.

Wikipedia is only one of thousands of companies who know their projects would be doomed without a domain-specific known community’s insights into complex unstructured data problems. There is no need to let crappy data ruin your projects; access a domain-specific known community who will provide high-quality training data. Project success!