Illinois Institute of Technology (IIT), which is launching a degree in data science, wants to teach students a wide panoply of skills, said Professor Shlomo Argamon, professor of computer science and director of the masters program in data science. Those skills include statistical theory, programming, the ability to build data models and ability to work with systems that can process large amounts of data. Data scientists also need the communication skills to give talks with clients to help them understand the needs at the beginning of a project and present results at the end, he said.
“We want to produce well-rounded data scientists.”
Universities face a challenge in a new field like data science because there aren’t many people trained in it who can offer formal education. In addition to teaching skills like R and Hadoop, the IIT program also teaches fundamentals of distributed computation. In addition to the science faculty, the school is putting in place partnerships with industry to make sure that they know what technologies are most relevant at any given time, and to provide projects and feedback.
“We are talking to people in industry who want to do adjunct teaching for some courses that are down to earth.”
Most of the incoming students have degrees in computer science, math, or science. A few applicants are transferring from business analytics program.
Argamon thinks the focus on theory and the connection between theory and practice will help graduates avoiding skill obsolescence. The school is also working on developing individual courses for continuing education in computer science and areas like mobile application development.
He admits to some push-back from students over required communication courses.
“Students coming in for a data scientist course are not looking for a real narrow technical focus but a broader in interdisciplinary approach. But they may still not understand it involves learning to talk with people who are not technical. I hope they will learn how important these skills can be.”
Students will work on real-world projects with businesses and business people who are not data scientists, he added.
Education to meet students’ needs
Professor Lisa Dierker at Wesleyan University (See link to the left), who taught a MOOC on statistics, said universities often do poorly at adapting education to student needs.
“We train people to become us (academics) but only two percent do. What do students need to learn, what do they want to accomplish?”
Even her introductory courses teach data management, something a lot of courses skip because it sounds too low level. Data management requires some broad understanding of the project and the story it is meant to tell, she explained.
“When I send someone off to analyze data, if they can’t tell the story they don’t make good decisions about using the data.” They have to be able to figure out what variables to use, what the variables should look like and if there is missing data. The data management work, including how to group data and how to handle missing data, is important for reaching useful and unbiased conclusions.
Dierker said that it helps technical people to know the whole arc of a project.They don’t have to be as good at communications as the person who finally writes up the project, but they should know the goal when they are making decisions about the data.
“If you are trying to make the data tell a story, your story will be affected by the decisions you make in that very technical realm. We do literature reviews and teach students how to write about statistics. Not everyone has to be able to tell the story, but someone sure does. Someone has to be able to pull it together and explain what it means.”
Mu Sigma creates its own
Mu Sigma, a consultancy that does very high end data scientist work for leading American retailers, insurance companies and for Microsoft Bing, started its own university after concluding that the kind of talent it was looking for didn’t exist.
Ambiga Dhiraj, who heads up the university, refers to their staff as “decision scientists.” In recruiting the company looks for a quantitative bent of mind, usually using GMAT, and the ability to communicate. They test that by putting 10 candidates in a group and asking them to discuss a controversial topic while she looks for clarity of thought, the ability to list and to find out details. The next skill Mu Sigma wants is an ability to synthesize.
“In the kind of work we do it is important for people to sift through large volumes of data and come up with insights they can communicate to end users.” Finally, she is looking for curiosity, tremendous amounts of curiosity, she said.
She agrees with Dierker on the importance of data management.
Mu Sigma offers its clients structured thinking and problem solving, but that’s a step toward the destination, not necessarily the starting point.
“Most business problems are not very well defined, they seem to be very muddy,” said Dhiraj. “You start out thinking you have a sales problem but might find it is not really sales but marketing or customer retention. Most business problems we encounter are fuzzy, so it is very, very critical to break it down in a structured manner to gain clarity on where you want to go. Then that should drive your analysis, or you could spent a lot of time on analysis that doesn’t lead to solving the right problem.”
Mu Sigma University begins with a two to three month program. It offers some specialty courses as well, and then after 12 to 18 months everyone comes back for advanced courses.
“After three years we believe we need a new category of team leaders to build decision scientist teams, so we have a set of programs for people moving from the individual decision scientist to team leaders.”
IIT and the value of classrooms
At Illinois Institute, the data scientist masters program hits some of the same themes.
“Data science brings a distinct set of challenges to business communication,” said Argamon. “How does one explain statistical evidence and analytical results without oversimplifying or creating confusion? Students need to learn how to weave results into a coherent story, how to explain statistical assumptions and caveats clearly, and how to create insight-producing data visualizations.”
The university’s practical work experience will help them learn how to define the analytical problem correctly, dealing with real data complexities and inconsistencies, and communicating results in a clear, enlightening, and satisfying fashion to non-technical, non-academic clients.
He is not a fan of MOOCs.
“Learning data science is not just gaining a bunch of technical knowledge that can be conveyed through online lectures and assessed through multiple-choice exams. It requires development of soft skills of communication, planning, and design, skills that are best acquired and can only be assessed through personal interaction with teachers, advisers, and other students.”