When AI works, it’s nothing short of brilliant, helping companies make or save tremendous amounts of money while delighting customers on an unprecedented scale. When it fails, the results can be devastating. Most AI models never make it out of testing, but those failures aren’t random. In Real World AI, Alyssa Simpson Rochwerger and Wilson Pang share dozens of AI stories from startups and global enterprises alike featuring personal experiences from people who have worked on global AI deployments that impact billions of people every day.

I recently caught up with Alyssa and Wilson to learn more about what inspired them to write the book, their favorite ideas they share with readers, and how they’ve applied those ideas.

Published with permission from the author.

What happened that made you decide to write the book? What was the exact moment when you realized these ideas needed to get out there?


Candidly, it was not my idea to write this book. I was “voluntold” to write this book—and perhaps classically as a woman in tech—my first reaction was: “What do I know? I couldn’t possibly write a book!” As the project started to take shape and I got deeper into it, I realized that perhaps I did learn some lessons along the way that are worth documenting and sharing with others.

Perhaps others can avoid the mistakes I’ve made, learn from my lessons, and have an easier path to launching responsible machine learning based products into the world. It became a joy to reach out to the many (many) people who had helped me find my own way in the machine learning space and talk to them about their stories, lessons learned, and to attempt to collect and organize them all into a cohesive manuscript. I am passionate about transparency and inclusivity when building and launching machine learning products—however very, very few projects start with these principals in mind.

One moment in particular—that I’ve reflected back on many times while writing this book and kept me engaged to see this project through to fruition—is a moment I had in 2017. I captured the story in chapter 9 of the book: I was sitting in a room full of men discussing the training data approach for a voice-assistant (that most folks with a smartphone interact with daily) and I realized that they were encoding in gender bias responses to the algorithm.

It wasn’t intentional. It was just that there were no diverse perspectives in the room. I was the only woman, and when I brought up the bias that I saw, I didn’t get the warmest response. I had to fight a bit to make them see how harmful, and narrow-minded (not to mention business-limiting) that encoding that gender bias could be.

I hope this book empowers folks who might otherwise feel intimidated by machine learning projects to involve themselves and contribute. We need a more diverse set of voices in the machine learning community. I hope this book furthers that enduring effort.


In 2017, I joined trip.com group as their Chief Data Officer. Trip.com group provides a large variety of online travel services like hotel booking, airline tickets booking, vacation package, train tickets, local tour, restaurant recommendation and many more. It has over 200 million monthly active users and tons of merchants (hotels, airlines, travel agencies, restaurants, etc).

I was tasked to drive business growth by leveraging data and AI. It is a dream job for any data science leader. Why? There are so so many scenarios you can apply AI, you can build search algo for all kinds of booking (hotel, flight, tour,…), you can improve recommendations to inspire people for their next vacation, you can help merchants optimize their business by suggesting the right price, and you can leverage AI to provide great customer service. More importantly, we have data to support those building AI applications.

However, implementing AI at scale across different business areas was very hard. There were questions and struggles from all teams: What is the right problem to solve? Why do we need business people involved? What AI can do or can’t do? Who should we hire to build the team? How to measure the results?

With support from the Chairman and CEO, We organized a 3.5 months long AI bootcamp for all the leaders (director and above) no matter which department they are from. We not only shared AI basic concepts, actual use cases, but also covered topics like organization design, success measurement and many more. The bootcamp was well received and our team started to build successful AI applications together with business in many different areas afterwards.

That made me think about writing a book, a book to share my own experience as well as the success (or failure) stories from different AI projects. The book can help people avoid a lot of mistakes I made before and increase their confidence to deploy AI successfully.

After I joined Appen, I found our company is supporting different AI use cases across different industries. While a lot of advanced customers have a sound approach to deploy AI confidently, we also see companies are asking the same questions which were asked a few years ago at trip.com. They all need the ‘AI bootcamp’ ! That is the moment I realized these ideas need to get out

Published with permission from the author.

What’s your favorite specific, actionable idea in the book?


Start with making sure you really truly understand the problem you are trying to solve and why. That is often very hard. Then focus on the data. The data is usually the hardest part. The algorithm is the easy part. People don’t realize that.


Pick the Right Measurement – This is my favorite idea and essential for AI success. You are what you measure so pick your measurement carefully.

Back in 2009, I was lucky to be part of the effort to build a Search Science team at eBay. We started to leverage Machine Learning to help buyers to find the products they want. We tried different machine learning models, models to rewrite buyers’ queries and models to rank the final search results. We then ran a series of A/B tests to assess the model results, with great success. Many of the models proved that buyer conversion had increased. Other teams were motivated by these successes and started to put in effort to increase their purchases per session. Everything looked rosy.

That is, until the finance team observed that those A/B testing wins didn’t translate into increased revenue.

The initial try with AI in search science failed and our team was pulled into a War-room to understand why and we needed a solution—fast. We were hurting revenue for the company at a time when it couldn’t afford to lose a single cent.

We dug deep into the search results for different queries and found one interesting phenomenon: Very often, we ranked accessory items on the top. For example, many iPhone cases would rank at the top of the results when buyers searched the term “iPhone”. Although those accessories were popular on the site, they weren’t what the user had been searching for, so it created what we call “accessory pollution”, and led to a bad user experience.

Aha! We had figured out why revenue had taken a dip; a $10 iPhone case represents much less revenue than a $300 iPhone. Our model was recommending the less expensive accessories when it should have been recommending the higher-priced phone.

Success, much of the time, is all about what you choose to measure.

We started with measuring the success by purchases per session. Our AI model succeeded for the goal, but created a bad user experience and failed to deliver business growth. We needed to find a new solution with a different AI model, and even more importantly, a new way to measure the AI model’s success. Clearly “purchase per session” created the wrong motivation in our AI models and our team. The lesson was obvious: be careful to pick the right measurement, because it will inform the direction of your AI.

What’s a story of how you’ve applied this lesson in your own life? What has this lesson done for you?


As a product manager, I ruthlessly prioritize product backlogs by understanding the customer value and business impact of the new idea or feature. Too often I see product roadmaps with vague “machine learning” or “artificial intelligence” features that are ill-defined and loose on exactly what they will do or how they will add value. I pick that apart with a fine tooth comb. More often than not, those shiny-object, ill defined items get put at the bottom of my priority list.

Alternatively, I’ve also used very quick and dirty machine learning based approaches to clean data, prioritize tickets, or summarize unstructured information. It often doesn’t have to be perfect or fancy to extract 80% of the value. Focusing on good enough and what the customer value is (not the accuracy of the algorithm) has meant generating delight and money faster then others at times.

Published with permission from the author.


At Appen, our mission is to Help build better AI by creating large volumes of high-quality training data faster. To achieve that mission, we need to leverage both technology and our diversified global crowd, and we have a world class technology team to build our AI capabilities.

What is the right measurement for our own AI capabilities?

Our team has decided the goal for our AI capability is to improve crowd annotation efficiency while not lowering annotation quality or introduce bias.

It has been proved to be a big success. Our platform now has strong automation features like smart labelling and validation and smart match.

Smart labelling and validation is the capability to leverage AI to do pre-labelling or do quality validation. Our experience to-date shows that use of automation translates to efficiency gains of 88% in automatic speech recognition and 92% in semantic image segmentation. It can also result in 3-6x faster completion of LiDAR annotation and optical character recognition (OCR) for document transcription.

Smart match analyzes crowd worker profile, mines project needs and uses machine learning to find the best crowd worker to work for the project. This in turn improves worker productivity and quality for the project.

And all those capabilities have made our Data Annotation Platform the industry leading platform!