Companies with customer experience management (CEM) programs rely heavily on customer feedback in making business decisions, including, setting strategy, compensating employees, allocating company resources, changing business processes, benchmarking best practices and developing employee training programs just to name a few. The quality of the customer feedback directly impacts the quality of these business decisions. Poor quality feedback will necessarily lead to sub-optimal business decisions. High quality feedback will lead to optimal business decisions. We typically measure quality of subjective ratings by two standards: reliability and validity.
To understand reliability and validity, you need to employ the use of psychometrics. Psychometrics is the science of educational and psychological measurement and is primarily concerned with the development and validation of measurement instruments like questionnaires. Psychometricians apply scientifically accepted standards when developing/evaluating their questionnaires. The interested reader can learn more about reliability and validity by reading Standards for Educational and Psychological Testing.
I recently wrote about assessing the reliability of customer experience management (CEM) programs. Reliability refers to the degree to which your CEM program delivers consistent results. There are different types of reliability, each providing different types of insights about your program. Inter-rater reliability indicates whether different customers who receive the same service (e.g., same Contacts within an Account) give similar feedback. Test-retest reliability indicates whether customers give similar ratings over time. Internal consistency reliability indicates whether an aggregated score (average of different items/questions) makes statistical sense. While assessing the reliability of your CEM program is essential, you need to evaluate the validity of your CEM program.
Validity refers to the degree to which customer ratings generated from your CEM program measure what they are intended to measure. Because of the subjective nature of customer feedback ratings, we rely on special types of evidence to determine the extent to which ratings reflect underlying customer attitudes/feelings. We want to answer the question: “Are we really measuring “satisfaction,” “customer loyalty,” or other important customer variables with our customer feedback tool?
Unlike reliability, there is no single statistic that provides an overall index of the validity of the customer ratings. Rather, the methods for gathering evidence of validity can be grouped into three different approaches: content-related, criterion-related, and construct-related. When determining the validity of the customer ratings, you will likely rely on a variety of these three methods:
1. Content-related Evidence
Content-related evidence is concerned with the degree to which the items in the customer survey are representative of a “defined universe” or “domain of content.” The domain of content typically refers to all possible questions that could have been used in the customer survey. The goal of content-related validity is to have a set of items that best represents all the possible questions that could be asked of the customer.
While there are many potential questions to ask your customers, research has shown that a only a few general customer experience (CX) questions are needed for a customer survey. These general CX questions represent broad customer touchpoints (e.g., product, customer support, account management, company communications) across the three phases of the customer lifecycle (marketing, sales and service).
2. Criterion-related Evidence
Criterion-related evidence is concerned with examining the statistical relationship (usually in the form of a correlation coefficient) between customer feedback (typically ratings) and another measure, or criterion. With this type of evidence, what the criterion is and how it is measured are of central importance. The main question to be addressed in criterion-related validity is how well the customer rating can predict the criterion.
The relationship between survey ratings and some external criteria (not measured through the customer survey) is calculated to provide evidence of criterion-related validity. This relationship between survey ratings and some external criterion is typically index using the Pearson correlation coefficient, but other methods for establishing this form of validity can be used. For example, we can compare survey ratings across two knowingly different groups that should result in different satisfaction levels (reflected by their survey ratings). Specifically, satisfaction ratings should vary across different “Service Warranty Levels” and “Account Sizes.” The different treatment of various customer groups should lead to different customer satisfaction results. In these types of comparisons, Analysis of Variance (ANOVA) can be employed on the data to compare various customer groups. The ANOVA allows you to determine if the observed differences across the customer segments are due to chance factors or are real and meaningful.
3. Construct-related Evidence
Construct-related evidence is concerned with the customer rating as a measurement of an underlying construct and is usually considered the ultimate form of validity. Unlike criterion-related validity, the primary focus is on the customer metric itself rather than on what the scale predicts. Construct-related evidence is derived from both previous validity strategies (e.g., content and criterion). The questions in the survey should be representative of all possible questions that could be asked (content-related validity) and survey ratings should be related to important criteria (criterion-related validity).
A high degree of correlation between the customer ratings and other scales/measures that purportedly measure the same/similar construct is evidence of construct-related validity (more specifically, convergent validity). Construct-related validity can also be evidenced by a low correlation between the customer ratings (e.g., customer loyalty) and other scales that measure a different construct (e.g., customer engagement) (more specifically, discriminant validity).
Difference Between Reliability and Validity
Applications of Reliability and Validity Analysis
The bottom line: a good customer survey provides information that is reliable, valid and useful. Applying psychometric standards like reliability and validity to customer feedback data can help you accomplish and reveal very useful things.
- Validate the customer feedback process: When CEM data are used for important business decision (e.g., resource allocation, incentive compensation), you need to demonstrate that the data reflect real customer perceptions about important areas. Here is a PowerPoint presentation of a validation report.
- Improve account management: Reliability analysis can help you understand the extent of agreement between different Contacts within the same Account. Your account management strategy will likely be different depending on the agreement between Contacts within Accounts. Should each relationship be managed separately? Should you approach the Accounts with a strong business or IT focus?
- Create/Evaluate customer metrics: Oftentimes, companies use composite scores (average over several questions) as a key customer metric. Reliability analysis will help you understand which questions can/should be combined together for your customer metric. Here is an article that illustrates reliability and validity in the context of developing three measures of customer loyalty.
- Modify surveys: Reliability analysis can help you identify which survey questions can be removed without loss of information; Additionally, reliability can help you group questions together that make the survey more meaningful to the respondent.
- Test popular theories: Reliability analysis has helped me show that the likelihood to recommend question (NPS) measures the same thing as overall satisfaction and likelihood to buy again. – see Customer Loyalty 2.0. That is why the NPS is not better than overall satisfaction or continue to buy in predicting business growth.
- Evaluate newly introduced concepts into the field: One of my biggest pet peeves in the CEM space is the introduction of new metrics/measures without any critical thinking behind the metric and what it really measures. For example, I have looked into the measurement of customer engagement via surveys and found that these measures literally have the exact same questions as our traditional measures of customer loyalty (e.g., recommend, buy again, overall satisfaction (please see Gallup’s Customer Engagement Overview Brochure for an example). Simply giving a different name to a set of questions does not make it a new variable. In fact, Gallup says nothing about the reliability of their Customer Engagement instrument.
The ultimate goal of reliability and validity analyses is to provide qualitative and quantitative evidence that the CEM program is delivering precise, consistent and meaningful scores regarding customers’ attitudes about their relationship with your company. Establishing the reliability and validating the CEM program needs to be one of the first research projects for any CEM program. A validation study of the CEM program builds confidence that such decisions lead to improved customer loyalty and business success. Validation efforts described here provide an objective assessment of the quality of the program to ensure the data reflect reliable, valid and useful customer feedback.
References (if you have a desire to learn about important measurement principles or suffer from insomnia)
Allen, M.J., & Yen, W. M. (2002). Introduction to Measurement Theory. Long Grove, IL: Waveland Press.
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297-334.