Students Prefer Courses and Professors with Better Evaluations
Many colleges and universities use course evaluation systems, through which students evaluate a course while, or after attending it. The results of these evaluations are often provided to students, in order to assist them in class selection. I use data from Northwestern University to determine the year-to-year elasticity of enrollment in professor-class pairings to a variety of student enrollment factors. This allows me to identify both the properties of class evaluations which predict, and perhaps motivate, student enrollment, and how class evaluation changes are intercorrelated. I make four conclusions: that changes in all course evaluation variables are highly positively correlated; increases in enrollment are highly predicted by improvements in reported course quality; reductions in intellectual challenge predict significant increases in enrollment for lower-level classes; and that changes in reported student time investment are not significant predictors of change in enrollment.
By “course,” I mean a topic which has a departmental number, and is offered as a class. For example, ECON 311: Macroeconomics is a course. By “class,” I mean a specific instance of a particular course, taught by the same professor, during a quarter. Professor Adam Smith teaching ECON 311: Macroeconomics during the Fall Quarter of 2011 is a class. Students enroll in classes, not courses. Students evaluate classes, not courses. However, because the class will never be repeated in precisely the same way, the assumption is that the CTEC evaluations shed light on properties of the course, professor, or professor/course pairing. Because of this paper’s publication in the Northwestern Undergraduate Research Journal, I have achieved some brevity by assuming the reader is familiar with some Northwestern-specific terminology and the layout of the CTEC course evaluation interface.
I analyze how enrollment shifts in response to evaluation shifts when course, professor, and quarter are all held constant. I exploit the fact that Northwestern features many courses are frequently taught by the same professor, in the same quarter. I focus on sets of three classes, where the years vary, but the professor, quarter, and course are held constant. By holding all these variables constant, I can construct a model which yields robust results:
Et/Et-1 = a1C1,t-1/C1,t-2 + ... + anCn,t-1/Cn,t-2 + i.year + i.quarter
Where E is enrollment, Cn is the mean response for a given category of evaluation, and i.year and i.quarter represent fixed effects for year and quarter respectively. Each year is treated as a distinct time period. The significant problem with this model is that, by focusing on classes which exist in repetitive triplets, I necessarily skew and shrink the sample. Shrinking the sample is not a significant problem because so many classes are offered by the university. Thus, while the sample is significantly reduced, this does not prevent the discovery of significant correlations. In fact, most of the results to which I later refer as significant have a p-value well below 0.01.
As for skewing the sample, this is a potential problem both regrettable and unavoidable. One especially difficult limitation in this type of analysis is survivorship bias. My results apply to classes that are repeated, and not to those that are canceled due to either unpopularity or the expectation of low enrollment. In theory, this survivorship bias would manifest itself in the form of a significant positive constant change in enrollment. However, no such significant positive constant change in enrollment is found in the regressions. Thus, the largest potentially sample-skewing phenomenon is not so strong as to invalidate the results by creating illusory trends.
Predictive and Descriptive Value of Class Evaluations
Throughout this paper, I allude to the ‘actual properties’ of a class. This idea is best explained through an example. The average amount of time that students spend on a class outside of lecture and lab hours is a quantifiable, objective fact. However, there is no reason that this number must be close to the average amount of time that students report spending on that class outside of lecture and lab hours: students might have psychological incentives to over-report or under-report the number of hours they spend on the course. They may also be very poor estimators of the number of hours they spend on the course, and the students who fill out the course evaluation might represent a highly skewed sample of all students who take the class.
Class evaluations might be inaccurate, and changes in class evaluations may not reflect actual changes in the student experience of a class. Until a study that proves otherwise is produced for each of the course evaluation properties engaged in this paper, I focus on changes in class evaluations as predictive, rather than descriptive. One must avoid excessive speculation as to any potential correlation between changes in class evaluations, and changes in actual properties that they are meant to reflect. However, when backed by data, I propose ways that students may perceive these potential correlations. If there were neither descriptive, nor perceived descriptive values to any changes in course evaluations, they would not predict enrollment, because students would not act in response to them.
Correlations Between Course Evaluation Shifts
Before studying the potential predictive value of shifts in course evaluation properties for shifts in enrollment, I examine the relationships between simultaneous shifts in course evaluation properties. I conduct 30 regressions, by regressing each shift in evaluation means against each other shift in evaluation means, with year and quarter as control variables. Each of my regressions thus takes this form:
Cx,t/Cx,t-1 = a0 + a1Cy,t/Cy,t-1 + i.year + i.quarter
Where Cx,t is the mean response for one evaluative category in time t and Cy,t is the mean response for another evaluative category in time t. The major finding is that, with control variables, changes in every evaluation summary variable is positively correlated with changes in every other evaluation summary variable with a significance at the 99 percent level. The correlation levels are shown in Table 1 and plotted in Graph 1.
As the reader can observe, whenever one element of course evaluation increases, every other element of the course evaluation significantly increases as well. We can make some inferences as to why this is, which would be useful in seeking to understand how students respond to changes in these evaluation figures. One possibility is that changes in each of these factors represent highly similar, or highly related changes in the actual properties of the class. For example, it may be very unlikely that students learn more without the quality of instruction increasing, or that the time that students spend on the class increases without the class also becoming more intellectually challenging. This is the most intuitive result of these regressions, as it rests on the reasonable assumption that a more challenging course, requiring more time from students, might also teach them more, better command their interest, and be looked upon fondly by the students.
However, another possibility is that students treat each course evaluation question not as an opportunity to remark upon the properties of the class specific to that question, but also as an opportunity to remark upon the overall qualities of the course. Put differently, a student might be so happy with a class that she gives it high scores for every category, even if it was not the most intellectually challenging or time-demanding class she has taken. This might also be because students are aware that professors can benefit from good teaching evaluations, and so they might wish to give professors overall “high marks” or “low marks.”
It should also be noted that shifts in Time are not highly correlated with shifts in any other variable. This includes shifts which one might reasonably expect to be correlated with shifts in the amount of time occupied by the course, including Challenge and Learning. This may indicate that changes in the Time variable correlate poorly with changes in the actual properties of the class.
Predictivity of Evaluation Shifts
I regress changes in enrollment against potentially predictive shifts in course evaluations. This is the core of the paper, as it identifies which course evaluation criteria are actually critical in affecting or predicting student decision-making. The results are shown in Table 2.
According to the regression, when simultaneously analyzing all shifts in evaluation criteria, only a change in Quality is statistically meaningful in predicting changes in enrollment.
This should not be confused with a result indicating that other shifts are statistically insignificant in their capacity to predict shifts in enrollment. Rather, this result shows that other shifts are insignificant if they are not accompanied by the corresponding increase in Quality, calculated in Table 1. Changes in Quality are the key predictors –– and potentially the key drivers –– of student enrollment. Changes in other evaluation criteria are important to the extent that they drive changes in Quality.
This might be considered an intuitive consequence of having a course evaluation criterion simply to describe the quality of the course. Quality is the value that the course posessess to those who might potentially take it. Other evaluation criteria are useful only if different individuals experience quality differently enough that the single measure is insufficient for enough people. The effectiveness of Quality as a predictor, well above the predictive quality of other variables, is an indication that students who use course evaluations to consider taking a course generally agree on what constitutes the quality of a course. However, this does not indicate that the entire student body has a shared concept of course quality. Instead, it indicates that if a student is considering taking a class and determining whether or not a certain class will be of a high quality, she will have a similar conception of quality as others who are considering the class. The average student considering enrollment in Cost/Benefit Analysis for Banking and Investing may have a very different idea of course quality from the average student who is considering enrollment in Post-Decolonization Poetics and its Discontents. Quality is in the eye of the beholder. But for the average class, beholders tend to agree on what quality means, and they take it seriously.
This affirms a core assumption of those who implement course evaluations, which is that class quality is a measurable value which predicts and describes the average student's experience. Otherwise, there would be no value in asking students whether they were satisfied with the class because the answer would not provide insight to the instructor or the course. The significant correlation between enrollment shifts and predictive shifts in Quality shows that course evaluations measure something which is genuinely important to students.
Predictive Value of Intellectual Challenge Shifts, and Upper-Level Class Effects
Before embarking on this project, I was assured by numerous students and faculty that there would likely be a negative correlation between changes in intellectual challenge and changes in enrollment, implying that students attempt to avoid enrolling in classes that would be difficult. However, as can be seen above, the data does not fulfill those assurances; intellectual challenge shifts do not significantly predict enrollment shifts for the average class.
I use this as an opportunity to examine how these elasticities vary based on the level of the class described. I run these regressions with an interaction dummy term for whether or not the course is of the 300 level. The results are shown in Table 3.
One can observe two significant findings. The first is that enrollment is highly elastic to shifts in the intellectual difficulty of the course in the negative direction. However, some effect exists that significantly reduces this the elasticity for 300-level classes, such that changes in the reported intellectual difficulty of 300-level classes do not predict changes in enrollment. Graph 2 plots the elasticity of enrollment to Challenge based on class level, in order to further display the fact that 300-level classes exhibit a much less significant level of elasticity of enrollment-to-intellectual challenge.
To speak plainly, students enroll in less difficult classes when enrolling in basic coursework. However, this pattern does not persist in more advanced coursework. As an economist, it is logical that students seek to avoid difficulty, including challenging courses. We can speculate as to why the aversion to intellectual challenge is dependent on class level. One notion is that students derive value from difficulty in 300-level classes, which they do not from 100- and 200-level classes. This value could be associated with signaling theory. Students obtain a strong and valuable signal from succeeding in a difficult course relevant to their major, while even the most difficult low-level course provides very little signaling value for students. Another option is that the intellectual difficulty associated with 100- and 200-level classes is different in type from the intellectual difficulty associated with 300-level classes; for example, 100- and 200-level classes might mainly assess students through many short essays and tests, while 300-level classes assess students through fewer, but more difficult essays and tests. Students might prefer the latter type of intellectual challenge to the former. Students might also be content with difficult coursework that relates to their major, but struggle with difficult coursework in other fields; thus a student majoring in mathematics taking 300-level classes to complete the requirements for the major, would accede to a 300-level math class, but avoid a difficult 200-level history class.
This result appears to vindicate, at least in part, those who have predicted course evaluations enable students to enroll in easier courses. At the very least, it implies that students attempt to enroll in easier courses when selecting lower-level coursework. One might imagine that access to online course evaluations facilitates this goal. However, I do not have a data set for an institution without a course evaluation system. Therefore, I will avoid claiming to know with any certainty how students would choose classes at such an institution. Perhaps students mainly choose classes based on word-of-mouth about intellectual challenge, and course evaluations serve as a predictive, rather than a determinative role in enrollment.
Predictive Elasticity of Time
A particularly interesting and puzzling finding is that changes in time are simply not predictive of changes in enrollment. If anything, a change in Time is a positive predictor of changes in enrollment, both by itself, and because it is associated with increases in Quality. Given that students have limited time, and many potential opportunities, one might expect students to avoid classes that require a large time investment. We can speculate some reasons as to why this is not the case. One potential explanation is that students do care about the time invested in coursework, but only to the extent that it makes the class difficult, so that regressions make it appear that students care solely about difficulty. However, this assumes that changes in difficulty are closely tied to changes in time. This is not the case, as shown in Table 1 and Graph 1. Changes in difficulty are as associated with changes in time as with changes in any other factor are associated with changes in time.
Another possible explanation is that students use the time data in making enrollment decisions, but use some summary of the data other than the average. The fact that students are not provided with the average of reported time investments might encourage them to use alternative measurements. Here, I create a new variable that represents the percentage of students who responded in either of the two lowest time categories. I regress changes in enrollment against changes in the level of students who report spending zero to 7 hours on the class every week. Changes in enrollment also do not correlate with potentially predictive changes in that percentage, as is shown in Graph 3 and Table 4, both of which depict the corresponding regression. Again, only Quality remains as a significant predictor of enrollment shifts.
An alternative explanation is that students do not perceive a strong correlation between the times reported by other students, and the time that they may personally invest in the class. In other words, while students actually place value on having a class that occupies less of their time, they do not believe that the course evaluation acts as an effective means of predicting time investment. Students may have reason to believe this, due to the relatively low levels of correlation between Time evaluation shifts, and shifts in any other evaluation. This may reflect that the Time evaluation does not significantly correlate with the actual properties of the class. As discussed above, shifts in Time may serve as such a weak signal of actual changes in the class, that students ignore the variable altogether, and focus instead on alternative measures of class difficulty such as Challenge.
Yet another reason that students seemingly disregard the Time variable might be found in the literature related to the cognitive engagement of data. Tal and Wansink found that the inclusion of graphs increased the persuasive power of claims, even when the graphs did not communicate or imply additional information (Tal & Wansink, 2014). In fact, the persuasive power of graphs was attributed to their perception as scientific, granting a scientific imprimatur to the corresponding claims.
Thus, we might develop a long list of potential reasons why the Time variable operates differently from other variables. All of these reasons are due solely to the way that the variable is treated in this survey or displayed to students:
- Students are asked to answer the Time question by selecting an interval of hours, rather than a relative rating between 1 and 6.
- Time is placed at the end of the survey results page.
- The mean response is not provided for Time. Students choose not to calculate or use it.
- A bar graph is not provided for Time, making the statistic seem less trustworthy.
Analyzing the relative impact of each of these factors is not possible at this time, with the available data. However, administrative actions to alter the display of course evaluation data to students could address these concerns, either by standardizing the display of this variable or, better yet, by creating experiments through different display formats for different students. For this reason, I do not believe that student apathy toward the Time variable is a highly generalizable result, applying to other institutions. However, these findings show that student aversion to time-costly classes is not significant in all instances.
My results are consistent with the possibility that students never look at course evaluation surveys. Rather, they respond to class reputations reflected by the course evaluations. However, my findings are consistent with the possibility that students heavily incorporate course evaluation surveys into the course selection process, and consistent with any intermediate use of the course evaluation in selecting courses. All of these potential student behavior patterns would create data where changes in course evaluation surveys act as an effective proxy for changes in perception or reputation of a class.
However, these findings demonstrate significant predictive value in course evaluations. This implies that changes in the reputation or perception of a class result in a change to its enrollment, and that changes in course evaluation levels can predict enrollment shifts. As referenced earlier, this vindicates the course evaluation project, by showing that course evaluations measure generalizable and useful information about professor-class pairings. This also suggests that course evaluations may be more complex than they need to be, and could be reduced to two questions: one covering overall quality, and the other covering challenge. Course evaluations structured in this way would retrain predictive value for enrollment.
The fact that students avoid lower-level classes predicted to be more challenging, but do not avoid more difficult upper-level classes, is also a key finding. The most obvious implication is a further affirmation and clarification of signalling theory. Students benefit from signalling the ability to do very difficult work in their field of study, but not from signalling the ability to do somewhat difficult, but still introductory, work in other fields.
Alex Gordon is a writer at a law firm in Chicago, where he works with immigrant researchers to prepare letters of support for their visa petitions. He graduated from Northwestern University in 2017 with a degree in Economics, Mathematics, and Higher Education. His email is Alex@AlexNGordon.com and he can be found on Twitter at @AlexNGordon.
Bergstrand, Kelly, and Scott V. Savage. "The Chalkboard Versus the Avatar." Teaching Sociology 41, no. 3 (2013): 294-306. doi:10.1177/0092055x13479949.
Bordon, Paola, and Chao Fu. "College-Major Choice to College-Then-Major Choice." The Review of Economic Studies 82, no. 4 (2015): 1247-288. doi:10.1093/restud/rdv023.
Figlio, David, Morton Schapiro, and Kevin Soter. "Are Tenure Track Professors Better Teachers?" The Review of Economics and Statistics 97, no. 4 (2013): 715-24. doi:10.3386/w19406.
Fournier, Gary M., and Tim R. Sass. "Take My Course, "Please": The Effects of the Principles Experience on Student Curriculum Choice." The Journal of Economic Education 31, no. 4 (2000): 323-39. doi:10.2307/1183146.
Hansen, W. Lee, and Allen C. Kelley. "Political Economy of Course Evaluations." The Journal of Economic Education 5, no. 1 (1973): 10-21. doi:10.2307/1182830.
Heckman, James, John Eric Humphries, Paul Lafontaine, and Pedro Rodriguez. "Taking the Easy Way Out: How the GED Testing Program Induces Students to Drop Out." Journal of Labor Economics 30, no. 4 (July 2012): 495-520. doi:10.3386/w14044.
Vlieger, Pieter De, Brian Jacob, and Kevin Stange. "Measuring Instructor Effectiveness in Higher Education." Education Next, 2016, 68-74. doi:10.3386/w22998.
Manski, Charles F., and David A. Wise. "College Choice in America." Journal of Policy Analysis and Management 3, no. 2 (December 1984): 221. doi:10.4159/harvard.9780674422285.c9.
Marsh, Herbert W., Jesse U. Overall, and Steven P. Kesler. "Class Size, Students Evaluations, and Instructional Effectiveness." American Educational Research Journal 16, no. 1 (1979): 57-70. doi:10.2307/1162403.
Mirus, Rolf. "Some Implications of Student Evaluation of Teachers." The Journal of Economic Education 5, no. 1 (1973): 35-37. doi:10.2307/1182833.
Tal, Aner, and Brian Wansink. "Blinded with science: Trivial graphs and formulas increase ad persuasiveness and belief in product efficacy." Public Understanding of Science 25, no. 1 (2014): 117-25. doi:10.1177/0963662514549688.
Weinbach, Robert W. "Manipulations of Student Evaluations: No Laughing Matter." Journal of Social Work Education 24, no. 1 (1988): 27-34. doi:10.1080/10437797.1988.10672094.
Weinberg, Bruce A., Belton M. Fleisher, and Masanori Hashimoto. "Evaluating Teaching in Higher Education." The Journal of Economic Education 40, no. 3 (2002): 227-61. doi:10.1080/13562510252756424.