Image by @Twitter.
Image by @Twitter.

Recently I participated in a panel discussion on social media, big data and visualization at the Vancouver Enterprise Forum, along with Tommy Levi, Senior Data Scientist at, Stephen Ufford, Founder & CEO of Trulioo, Bruno Aziza, VP, Worldwide Marketing at SiSense. Big data is one of the most promising – and hyped – trends in technology today. While notable companies like Facebook, Google and Netflix get most of the big data headlines, it’s quietly transforming entire industries behind the scenes, including retail, insurance and medical research. Most exciting of all, big data has the potential to improve our everyday lives by giving us insight into our social relationships, our habits, and the things we care about. That’s where big data collides with two other exciting fields: social media and data visualization. Here are some of the key takeaways from our conversation.

What is big data?

First off, let’s try to define what big data really is. It’s probably the most overused buzzword in business today, and like all buzzwords, its meaning sometimes gets lost in all the hoopla.

Those of us familiar with spreadsheets or basic databases like Microsoft Access might be tempted to think of big data as a really large data set, with many rows and columns. However, it’s a bit more complex than that. If a traditional database is a collection of data, then big data is a collection of collections of data. Usually, those different collections are in totally different formats, and it’s not obvious how to fit them together in a way that makes any sense. The collections themselves might not even be organized into tidy rows and columns. Instead, they could store “unstructured” data like the messy natural language we find in tweets and Facebook updates.

The “big” part is also important. Typically, big data problems can’t be solved with the computing resources that are available to most organizations. They require clusters of computers running special applications, and might take days or even weeks to complete.

How do you get started with big data?

First of all, you should collect data in a logical way. Tommy Levi of Plenty of Fish emphasized that data scientists can save a great deal of time and effort later on if they can anticipate the types of questions they’ll be asking, and structure their data accordingly. However, it’s not always possible to predict those future questions (after all, one of big data’s great promises is that it will surprise you with unexpected insights). Established companies often don’t even have the luxury of smart data collection, because their legacy systems have been accumulating data for years, if not decades.

Startup companies aren’t burdened by the constraints of legacy databases, but they have their own challenges. During our panel discussion, the point was raised that big data analysis can be prohibitively expensive for a young startup when every dollar counts. In many cases, startups and even larger companies can find insights in “small data” instead. Fortunately, the cost of data storage is dropping so fast that small companies should store all the data they can, right from day one.

Generally, the big data process begins with “cleaning” data sets and joining them together to make them useable. Only after you’ve taken data from different collections and connected it properly will you be able to deliver on big data’s big hype.

Storytelling with social media and big data

Big data is only valuable if it tells a story. The fuller the story your data tells, the better you’ll be able to take advantage of that data. While recognizing a trend can help you make better decisions, understanding the cause behind that trend is even more valuable. Storytelling has been a sense-making tool for humans since we were huddled around fires in caves. The organizations that can use stories to make sense of big data are going to excel.

I’ve written about the importance of storytelling in social media before. In my mind, storytelling is where social media and big data truly intersect. One of the best ways to tell a story from social media is through effective data visualization. Consider this animated visualization of Twitter data from Hurricane Sandy, showing the spread of power outages throughout the northeast United States during the storm. The difference between “data art” and good data visualization comes down to story. The former just looks pretty, but the latter gives you some insight.

InMaps Social Media Data Visualization
Cameron Uganec’s LinkedIn network, visualized.

Visualization of our social media data takes the storytelling to another level, and gives us insights into our own lives that we might never achieve on our own. Social graph visualizations, for example, help us make sense of the social dynamics that are playing out around us. That kind of clarity can be very empowering. The LinkedIn InMap above visualizes my relationships on the social network, allowing me to see how I’m related to my friends and colleagues and how closely they’re related to each other.

Some complex relationship patterns and social groupings become obvious when they’re visualized. I can start to ask, how much overlap is there between my network of friends from university and my professional network? How many of my colleagues at HootSuite have connections to people I’ve met at my previous jobs? And who are the important social connectors in my life who bridge the gaps between all these different groups?

The Nexalogy app in the HootSuite App Directory allows me to view “interest maps”, which visualize the concepts, hashtags, and links that are most relevant to the people in my social graph. At a glance, I can see what ideas matter to the people I care about. These empowering visualizations show that marketers and advertisers aren’t the only ones who can benefit from social media’s torrent of big data. When the exchange of information between social networks and their users is fair and transparent, individuals can view their own lives and social circles in an entirely new light.

Big data, social media and visualization are sure to remain hot topics for the foreseeable future. I’d love to hear your thoughts. How will our lifestyles be affected? Which industries are going to be disrupted? Please share your ideas in the comments!