Google Executive Chairman Eric Schmidt has suggested machine learning would be the one commonality for every big startup over the next five years. The inference here is that machine learning, or AI, will be as revolutionary as the Internet, the mobile phone, the personal computer; heck, I’ll say it, as game changing as sliced bread.
AI is responsible for many simple experiences we already take for granted: the Netflix “recommended for you” and the Facebook feeds that happen to show travel deals for places we’ve been searching. Another opportunity is the soon-to-be-ubiquitous self-driving cars and the growth of devices that can help plan your life. Here, AI will be based on what machines have “learned” about your habits, preferences, and any other data they can connect with that might pick up on things like traffic or consumer information. For many generations, AI will quite literally be the next thing people cannot live without and the world before it will seem strange and primitive.
Recognizing this sea change, and realizing that those who do not embrace AI will be left in the dust, here are three key processes you need to understand and perfect in order to be part of the next revolution.
First, ask some questions you want to answer. For example, what is our lead generation cost by vertical across multiple media? Then start to think about taking those questions using past data and asking what rule should we use to govern tomorrow’s decisions? For example, increase the spend on the highest profitable media type (this might involve thousands of data points and hundreds of ongoing decisions) in the top 3 performing industry sectors. The next step is to identify the sources you want to bring together to form your first data pools for evaluation.
1. Connect the Sources
Connecting the data sources is probably the most critical and most challenging process because it must include internal systems that don’t talk with each other, legacy systems that have no API, vendor software and offsite data sources. It will also include web data and IoT data that your internal systems and databases were not built to handle.
Some of this will be real-time data flowing constantly, like website analytics, while other data is more static, such as physical locations of stores or customer homes. Overcoming the challenge of understanding all the subtle details that must be pulled into the greater picture, of identifying how to capture that data – sometimes once, sometimes on an ongoing basis – and then making it all work together in a way that the machine can process and learn from, over and over again, is critical. And it won’t be like learning to ride a bike – learn it once and then know it for life – because as things change and flow and adapt, there will be new data to seek, new capture strategies to be identified and new ways of communication to be discovered.
2. Clean and Process
Once you’ve achieved the mammoth task of collecting the data, it must be normalized and deduplicated, cleaned and processed to make it understandable. You will need to map or match the data sets based on time; location; identifiable IDs, such as user IDs on mobiles; or using a business polygon such as for a business location to attach a mailing address to a business name.
The process must also include an eye to understanding the data, what it is and how it might apply to what you want to achieve so that you know what is relevant. Often cleaning the data involves tapping into many third-party systems to verify and validate data points, like verifying that a latitude and longitude of a mobile browser visiting a website is within or nearby a specific business. Like when you look up directions on Waze on your phone in a Starbucks shop, and a Starbucks offer pops onscreen.
3. Test Once and Again
Success in any new application will only come with testing, and testing again, and then retesting. In the case of AI, this includes analysis models and AI applications tested in real time. Initially your efforts will focus on insights into cause and effect. The machine must be tested in many possible scenarios in order to understand what it has learned, and what it has not. If you want to get specific on the how, I recommend using a performance improvement process like PDSA (Plan Do Study Act) to keep you focused on continuous improvement of your models and processes.
Everyone will be familiar with at least one accident involving a Google self-driving vehicle, in part because they so rarely happen they become news, and in part because we want to follow the path of this machine learning revolution. In many cases, the accident was the result of something the vehicle had not yet learned, which drives home the importance of testing and retesting to be sure you have it right.
Data Is Alive
Think of data as alive and in terms of building a data ecosystem and you are on the right track. You need to feed the system with data and keep it clean. You need to be sure you don’t have redundant and duplicate data that will throw off your results. Of course, once you start putting this operating data ecosystem together, you need someone or something smart, algorithms, AI, running the place.
This ecosystem of data will grow your entire business if properly cared for and nurtured. It’s not an overnight project but one that has to be planned and well executed.
Once the baseline knowledge and skill has been established, the focus needs to be turned towards predicting the future, because, after all, that is where the wow factor, the “can’t do without this” factor, the revolution, really comes in.