Geo 101: What Is Unstructured Data?
Unstructured data comprises approximately 80 percent of all data in the world. Here's what marketers need to know about making sense of it.
With the goal of breaking down some of the most important concepts to provide a better understanding of the basics — and a jumping off point for exploring how far technology may take us — we introduce the next installment of our GeoMarketing 101 series: what marketers need to know about unstructured data.
What Is Unstructured Data?
The simple answer is also a bit of a frustrating one: Unstructured data is functionally defined as, well, everything that’s not structured data.
So let’s break that down: Structured data is highly organized information that is easily uploaded into a traditional database — and, as such, it is easily crawled and detected by search algorithms. Think about Google’s Knowledge Graph: By adding concrete, structured data markup to their websites, marketers can make sure that “more of [their] site’s functional and visual elements appear directly in results and in Knowledge Graph cards.”
Unstructured data, on the other hand, includes all data that doesn’t fit into the structured category: It encompasses everything from audio — like voices and other kinds of sounds — to video and images that are not easy for a search engine to find and present. It can be textual or non-textual, but the determining factor is that it is not contained in a “traditional” database and/or easily read by a search algorithm.
In the words of Margaret Rouse, writing for Search Business Analytics, “textual unstructured data is generated in media like email messages, PowerPoint presentations, Word documents, collaboration software and instant messages. Non-textual unstructured data is generated in media like JPEG images, MP3 audio files and Flash video files.” It also increasingly includes voice, as intelligent assistants proliferate.
It’s all a bit complicated. But in a recent conversation with GeoMarketing, Jordan Bitterman, CMO for IBM’s Watson Content & IoT Platform, offered a simple analogy: Think of the difference between structured data and unstructured data in the way of thinking of moments that are scripted – it’s been written down, saved or otherwise collected in an easily sharable, repeatable, widely accessible format; then consider something unscripted — something more nebulous, harder to grasp, original, that doesn’t fit a preconceived design or arrangement.
“Scripted versus unscripted is a good way to begin to understand what makes unstructured data important,” Bitterman said.
The Applications Of Unstructured Data
An estimated 80-85 percent of data in the world is unstructured data — and it’s reportedly growing at an astronomical rate.
Why? The simplest explanation is that, in the present era of mobile and IoT, people are generating more data, period — and on top of that, due to the proliferation of rich media, a lot of it requires a great deal more storage and/or can’t be catalogued by traditional database means.
“Rich data types include things such as pictures, music, movies, and x-rays,” explained IT strategist Robert Primmer in a post on his blog. Basically, “there is [a big difference] between the storage required for rich data and that required for plain text, and it does give us a sense of why analysts forecast so much more storage dedicated to unstructured versus structured data going forward.”
It’s easy to see why marketers want access to — and the ability to make sense of — this data: It has the potential to reveal countless things about people and their consumption habits, allowing marketers to get more targeted and more personal with their content.
After all, “using the data to get creative insight is at the top of [any marketers’] list,” said IBM Chief Digital Officer Bob Lord following his presentation at the 2017 Ad Age Next conference. “What artificial intelligence will allow you to do is to get at that ‘dark data,’ that unknown data: unstructured data. Right now, as marketers, we [primarily] have access to ‘known data,’ structured data.”
Basically, there are billions of unstructured data points in the world, and no marketer can go through them all — and this unstructured audio, voice, video, and more data is difficult for search engines to make sense of, too. So it’s all about AI: The technology is progressing to the point where marketers can use artificial intelligence to determine what this “dark data” says about people, places and things when combined with existing data points.
As Paige Schaefer wrote in a blog post for data analysis platform Trifacta, analysts can begin to more easily “combine their current likely structured data with unstructured data, such as mapping social media with customer and sales automation data, for example.”
Essentially, the lesson to extrapolate is that marketers need to think about unstructured data as much as they thinking about structured data markup — after all, it comprises 80 percent of all data in the world. And understanding it and using AI to process it can help marketers achieve considerably more of their goals.
“Consumers don’t need most of that data because structured data does a fine job for us as we conduct our online searches,” Bitterman explained. “But when you start getting down to what businesses actually want to accomplish, it goes so far beyond what a search engine is able to crawl.
“For instance, consider an image site such as Pinterest or Flickr or Instagram,” Bitterman continued. “A marketer wants to be able to identify what images seem to be getting uploaded the most at any given time. It gives them the insight to say, ‘Alright, it looks like Europe is popping and that’s an area of the world we need to focus on. It looks like the country of Honduras is popping. We gotta focus on that.’ You can do that from images in that example. That becomes a data set that previously you wouldn’t have access to, and with unstructured data, now you do.”