The second reason is fairness for the sake of the validity of grades in signaling competence and mastery of knowledge in specific subjects [ 34 ]. Grades are relevant signals in educational-related decisions [ 35 , 36 ] and thereafter in application processes, e.
Grades of overweight students can result from a mix of prejudice and objective ratings of competence, leading to the underestimation of their academic proficiency and potential for skill development. If this is the case, employers and institutions risk to unintentionally disregard qualified employees. We investigate the potential grading bias by using data of the German National Educational Panel Study with students attending the seventh grade [ 38 ]. By this way, we aim to contribute to the literature by using recent high-quality nationally representative data on lower secondary education in Germany.
The German context differs from the US context in two ways. First, German students are placed in educational tracks already in secondary education [see 35 for more information on the German tracking system]. Second, while in the US obesity is comparatively widespread It could be that obesity is a less visible characteristic in the German context, leading to a stronger stigma as the society may be less aware of discrimination against overweight people [ 40 ].
The paper is organized in the following way: We set the basis for the research hypotheses by outlining results from the previous literature about grading bias towards overweight people. After presenting the German National Educational Panel Study data, variables and our methods, we report the empirical findings of the hierarchical ordinal logistic regression models.
The article concludes with a discussion of the results and possible implications for teachers and schools. In a study of Neumark-Sztainer and colleagues [ 26 ], half of the school staff thought that overweight was self-caused. About one fifth believed that overweight people were less tidy and less successful at work. Teachers say that overweight students are burdensome to have in the classroom [ 27 ] and, for this reason, might judge them more severely than normal weight students.
Some studies reveal a relationship between student BMI and school grades: After controlling for sociodemographic factors, overweight students have a lower grade point average than normal weight students [ 42 ], a lower average grade in the main subjects [ 30 ], and are less likely to report having mostly higher grades [ 31 ]. However, these studies do not necessarily prove teacher bias as they did not control for student level of subject-specific competence in the statistical models.
Therefore, the lower grades possibly express a lower competence of overweight compared to normal weight students, as suggested by a few studies about brain and memory functioning [ 43 — 45 ]. However, only a very low number of studies find a connection between overweight and lower competence measured by standardized test scores in a real-world setting [ 42 , 46 ].
Mostly, competences do not differ between overweight and normal weight students after adjusting for sociodemographic characteristics [ 47 — 50 ]. Other studies, using American samples, go further and partly account for competences as an alternative cause of grade differences to discrimination.
Sabia [ 51 ] reports an association between a higher BMI and a lower grade point average controlling for general intelligence in a sample of 14 to 17 years old white females. But holding general intelligence constant may not rule out differences in subject-specific competences as students also need to have enough discipline and motivation to study [ 52 ].
MacCann and Roberts [ 53 ], controlling for competence in math and vocabulary, found a lower average grade in various school subjects among obese students compared to normal weight students in the US. Competence scores for science and social studies were, however, not included and could have still biased measures of grade differences. Research of this quality is not available for countries with a smaller share of overweight people in the population.
Despite some heterogeneity in the research findings, most works found that BMI is negatively related to academic achievement. Moreover, several studies report that teachers might have prejudice against overweight students. Following these insights, and regarding our research question if overweight and obese students are graded more severely, we expect that.
In this work, we focus on two key subjects in secondary education, namely German and mathematics. If discrimination against overweight students is just an inherent feature of the teachers in their role of graders, we could predict a similar degree of bias in both subjects.
If instead the way in which students are commonly tested and the standards usually adopted by the teachers play a role as well, we could expect some differences in the grading bias across subjects. In mathematics, most exams are based on exercises with a precise solution, whereas in German teachers assess students more often with oral exams and open-ended questions, in which subjective elements have more room to affect the evaluation process.
If these characteristics matter, we should expect that. It has been argued that obesity reduces the perception of femininity by others but not as much as it reduces the perception of masculinity [ 54 ]. In subjects related to reading and writing activities, for example English, femininity is a favorable attribute, whereas in mathematics it is often considered an unfavorable attribute. Obese girls are regarded less feminine and have a smaller femininity advantage in English classes than other girls.
Consequently, in comparison to normal-weight girls, it is likely they face a specific penalization [ 9 ]. Indeed, previous studies in the US found that obese girls face higher grading penalization compared to normal-weight girls in English rather than in mathematics. For boys, no association between obesity and academic performance was found at all [ 9 ]. Following this logic, we hypothesize that. We make use of the German National Educational Panel Study NEPS —Starting Cohort 3, which provides information on students attending the grade 7 in , who are repeatedly interviewed in the subsequent years [ 38 ].
The longitudinal data set is used in a cross-sectional manner by analyzing data from wave 3, because the variables of interest are not measured repeatedly in every wave. Education- and school-related variables as well as competences and socio-demographics are assessed via paper-and-pencil interviews PAPI or provided by official student lists.
Information on the parents is collected using computer-assisted telephone interviews CATI [ 55 , 56 ]. We restricted the sample by excluding students without matches in the parent and institution data set. If we had performed a complete information case analysis, we would end up with 2, students. Since relying on listwise deletion would lead to underpowered and possibly biased estimates, we applied a multiple imputation MI procedure to impute the missing values [ 57 ], which allowed us to rely on an analytical sample of 3, cases.
Under the assumption of random missing values conditional on the covariates, MI is able to minimize the bias, maximize the use of available information and obtain appropriate estimates of uncertainty around the point estimates [ 58 , 59 ]. To impute the missing values, we followed the canonical three steps:. The variables come from several sub-data sets of the NEPS. The sub-data sets contain information from the student lists, the students and parents, and standardized competence tests.
In the case of students, all time-variant variables were measured in the third wave while the time-invariant variables gender, native language were recorded in the first wave. The information on parents is measured once in the first wave and updated in subsequent waves in case of changes.
The grades in mathematics and German are self-reported by the students. Taking into consideration the highly skewed distribution of the original variable and to maintain enough statistical power for the analysis, we recoded the original grades into four categories: low poor, failing, and passing , medium-low satisfactory , medium-high good and high very good.
Alternative but reasonable classifications lead to substantially similar results to the ones presented here. The charts provide BMI percentiles for children grouped by gender and age in months, measured in the middle of the months. We assigned the BMI percentiles to children with the respective age in months rounded down.
The BMI is calculated using the formula weight divided by height in meters squared [ 63 ]. Height and weight were self-reported by the students. Biologically implausible values of weight, height, and BMI were excluded using an external reference table [ 64 ]. Following previous contributions [ 63 ], the variable BMI is classified in four qualitatively distinct weight types: 1 Underweight lower than the 5th percentile ; 2 Normal weight BMI greater than or equal to the 5th percentile and lower than the 85th percentile ; 3 Overweight higher than or equal to the 85th percentile but lower than the 95th percentile ; 4 Obese higher than or equal to the 95th percentile.
Indeed, the growth charts we used are age- and gender-specific; moreover, muscularity should not be a big problem regarding children younger than 16 years old. We control for subject-specific competence as an alternative cause of lower grades instead of teacher bias.
Subject-specific competence is measured by weighted maximum likelihood estimates WLE , which are corrected for the position of the test domain in the test book. The domains are mathematics and reading competence. A negative WLE score signals a below-average competence, a score of 0 an average competence, and a positive score an above-average competence. Aside from competences, other variables need to be accounted for because they can influence both grades and weight.
Children with a higher conscientiousness show less risky health-related behavior [ 70 ] and have higher grades [ 71 ]. In general, personality traits are linked to over- and underweight [ 72 ] as well as to academic achievement [ 73 ].
Furthermore, migration background can be a confounder, since children with a migration origin are more often overweight than other children [ 74 ] and they also have a lower academic performance [ 13 ]. We control for native language as a proxy for migration background, which is not available in the data.
Lastly, attitudes about learning and objective behaviors could be the reason why overweight students receive lower grades. To rule out that they put less effort in learning activities than normal-weight students, we control for attitudes using attachment to school and objective behaviors using homework duration. The complete list of control variables and their description is presented in Table 1. With this strategy it is possible to assess to what extent overweight and obese students receive on average different marks compared to their most similar normal weight peers, in terms of demonstrated performance in a specific subject and other relevant traits.
For each school subject we estimated two model specifications, which can be succinctly described as follows:. Model 2 includes possible mediators that can explain the total teacher grading bias. In the last part of the analysis, we investigate whether the grading bias due to overweight and obesity is heterogeneous across boys and girls, by introducing an interaction term between gender SEX and BMI. For the sake of simplicity, we omitted the main effects of the interacted variables in the notation, which are included in all models to avoid unbiased estimates, as suggested by Brambor et al.
By this way we allow the standard errors to be dependent within classes and schools, taking into account that students within the clusters are more similar to each other than between the clusters, resulting from exposure to similar contexts. The statistical models are estimated separately for German and mathematics, and in each model, the subject-specific test score is included as a control variable.
Accounting for potential non-linearities in the relationship between the competence variables and the outcomes does not alter the results in a significant way. As described above, each model was estimated on each of the 50 datasets generated from our imputation model and then combined in order to take into account the variability between the estimates across the imputed data. Looking at the socio-demographic characteristics, we see that overweight and obese students are more often males, come more often from socio-economically disadvantaged families and are more likely to have a non-native background.
This could be related to the fact that, as we will show below, obese students have lower academic performance than normal weight ones. In terms of psychological traits, overweight and obese students display a lower level of extraversion, conscientiousness, agreeableness, openness, and higher levels of neuroticism than normal weight students. However, given the relatively small share of obese students and the fact that some of these differences are pretty small, the only statistically significant differences are found on extraversion and conscientiousness.
Instead, students with different BMI do not differ significantly in terms of attachment to school and time spent on homework. The upper-left graph shows Kernel density estimates of the distribution of test scores in German, the upper-right graph in mathematics. The patterns are similar across the two subjects: The incidence of children who received a low grade is largest among obese students, decreases but stays rather high among overweight students and is lower among normal weight students.
The opposite pattern is found when looking at high grades, but in this case the difference between normal weight and obese students is more pronounced as when comparing normal weight to overweight students. To correct for the missing values on relevant covariates, we conducted the analysis on 50 multiply imputed datasets. The complete models are reported in S3 Table. However, since the two specifications lead to only minor differences in the estimated coefficients, in our comments we will focus on the results of the fully specified model.
The graph shows log-odds ratios from the 3-level hierarchical ordered logit model on mathematics grades left and German right. Analysis conducted on 50 multiply imputed datasets. Conversely, overweight and obese students receive on average lower grades by their teachers than normal weight students with the same level of subject-specific competence.
The disadvantages of obese students are larger than those of overweight students and are slightly more pronounced in German than in mathematics. We see that, on average, overweight and obese students are more likely to receive low and medium-low grades, whereas they have lower chances to obtain high or medium-high grades compared to equally competent normal weight students with analogous socio-demographic characteristics, psychological traits, and school-related attitudes and behavior.
The graph shows average partial effects from the 3-level hierarchical ordered logit model on mathematics grades upper part and German lower part. The largest differences among social groups are found at the lower end of the grade distribution and for medium-high grades. For instance, obese students have on average an 8 mathematics to 9 German percentage points larger probability of receiving a low grade by their teachers than equally competent normal weight students.
They also have a lower probability of receiving medium-high 7—12 percentage points and high 3—4 percentage points grades than comparable normal weight students. Overweight is also associated to a penalization in grading, but its magnitude is smaller, ranging between 2 and 5 percentage points. From a qualitative point of view, the estimated differences appear to be larger in German than in mathematics, albeit the confidence intervals around the point estimates are to a large extent overlapped, making it difficult to state that the effect size across subjects actually differs in the reference population.
We estimated additional models in which we included an interaction term between the BMI categories and test scores, with the aim of assessing whether the grading bias differs among students with higher or lower levels of academic proficiency see S7 Table.
As a last step, we investigate whether the penalty in grading associated with being overweight differs by gender, as found by a previous study in the US [ 9 ]. Given the small number of obese students, for this analysis we were forced to merge the categories of overweight and obese students together, in order to gain statistical power.
In the last two rows we provide an estimate of the difference between these two effects, its corresponding standard error and information on whether it is statistically significant at different levels of confidence. Fig 4 based on S6 Table reports average predicted probabilities of obtaining a low grade by BMI and gender, derived from the same models. The results are pretty straightforward and in contrast to previous evidence from the US.
Being overweight or obese is not related to any penalization in grading among female students. In mathematics, the qualitative pattern differs to some extent: Albeit we detected statistically significant effects of being overweight only among males ranging between 2 and 4 percentage points , the effect size among females is very similar.
The differences in weight penalty by gender are not statistically significant in mathematics. From previous studies we knew that overweight students get, on average, lower grades than their peers in school [ 30 , 31 ] but it was not clear if the differences in grades stem from discrimination or from actual differences in competences.
One should expect that students with the same demonstrated competence in a given subject, and same school-related attitudes and behavior will get similar grades from their teachers if fair grading is in place. We asked if lower mathematics and German grades are attributed to overweight compared to normal weight students, even when they have the same level of subject-specific competence, main psychological traits, and school-related attitudes and behavior.
The focus on students in German lower secondary education allows us to concentrate on a key developmental stage within an underexplored country context, in which obesity could lead to stronger stigma due to the comparatively lower incidence of this phenomenon. By means of hierarchical ordinal logistic regression models we have shown that both, overweight and obese students, receive on average lower grades than comparatively similar normal weight students, thus corroborating our first hypothesis.
The penalization is not only statistically significant, but also substantial. Indeed, the effect size of being obese rather than normal weight is larger than that of being male against being female, a comparison that has attracted much more attention among scholars [e.
It is important to note two additional aspects. Empirical evidence is instead less clear in supporting our second hypothesis, which states that the grading penalty should be larger in German than in mathematics. In the common wisdom, math teachers are expected to base decisions on grades on a certain score that is linked to pre-determined correct solutions.
Consequently, one might argue that math exams leave less room for interpretation regarding which grades the students should receive. Apart from that, teachers may set a variety of exams in the subject German with a different scope for interpretation if a result is right or wrong. While dictation exercises are solved either correctly or incorrectly, there is not a unique correct answer in essays and text comprehension exercises, and therefore the grade could incorporate subjective sources of bias.
We found that, from a qualitative point of view, the effect sizes associated to being obese or overweight across subjects follows the expected pattern. From a statistical inference point of view, the confidence intervals around the point estimates are instead rather large and overlapped to a large extent, thus not supporting the hypothesis of differential effects across subjects.
Additional insights on the heterogeneity in the penalization of overweight and obese students come from the second analysis, in which we investigate whether gender moderates the effect of BMI. Following theoretical arguments on the role of femininity in contemporary Western societies and previous findings from the US, we expected a larger detrimental effect of obesity on grades among females.
Differently from what was observed in the US, in Germany it seems that body appearance linked to negative stereotypes amplifies the stricter grading standards to which males are already exposed in comparison to females. If the two stereotypes of boys being less diligent than girls [ 79 ] and overweight people being less disciplined than normal weight people are simultaneously present, this can lead to a double disadvantage for overweight boys.
For overweight girls, in contrast, the gender stereotype of being more diligent can counteract the weight stereotype of being less disciplined, leading to lower bias. The latter might be especially relevant in a subject like German, where femininity is a favorable attribute. Before concluding with policy recommendations, it is important to discuss some limitations. First, the grades are self-reported which means that the outcome variable can suffer from measurement error if the students intentionally or unintentionally reported wrong grades.
Yet, we do not believe this issue should be a major concern, for two reasons. First, although especially students with unfavorable grades might have social desirability biases and report a better grade to increase their self-esteem, self-reported grades were found to be very similar to grades in the report cards in previous studies [ 80 ]. If this is the case, our findings would be based on conservative estimates.
The second limitation lies within the cross-sectional nature of the study, which does not allow us to disentangle whether overweight causes lower grades or if lower grades cause overweight by leading to stress eating [ 51 ]. Nevertheless, we believe that even if there are self-reciprocating effects, on theoretical ground it is much more plausible that the largest effect is of BMI on grades and not vice versa.
Furthermore, additional models in which we adjusted for life satisfaction to rule out extreme stress provide similar results. The last limitation refers to the main assumption behind the grade-equation models, which posits that standardized test scores act as a good proxy for latent subject-specific competences. This assumption could be questioned if some unobserved factors associated with student BMI also affect the idiosyncratic performance in the standardized test.
We are not aware of specific studies that try to tackle this aspect in-depth. Despite these limitations, we believe our study contributed to the literature by showing that overweight students have lower grades in German which can neither be attributed to lower subject-specific competence nor to psychological characteristics or key school-related attitudes and behavior.
If institutions use grades as information on the competence of an applicant [ 82 , 83 ], they should not contain anything else than academic competence in the given subject to be a valid source [ 10 , 83 ]. But in reality, grades include other more subjective measures besides competences [ 10 ], and this can penalize students with the same competences but lower grades.
To decrease the influence of discrimination and physical appearance on grades, we point towards a variety of educational policies at different levels of intervention. At the meso- and micro-level, our findings point towards possible interventions on the grading policies and practices adopted by schools and teachers. These criteria should be stated in class and include what the students are expected to achieve and how the proof of achievement looks like [ 11 ].
This could enable them to detect and counteract discriminatory behavior. Although, awareness training in the context of grading is not researched by now, there are hints that raising awareness might lower bias in general: Indeed, it was found that people who believe to be probably biased against women actually judge them in a less biased way [ 84 ].
Not only for overweight students, but also for other discriminated minorities, external and fixed standards as well as increased awareness of possible discrimination might offer the opportunity to reduce inequalities of opportunities in the educational context. Browse Subject Areas? Click through the PLOS taxonomy to find articles in your field. Abstract Discrimination and prejudice against overweight people is common in Western societies. Funding: The author s received no specific funding for this work.
Introduction Discrimination against overweight people in the labor market and in social life is well documented and increased over time [ 1 , 2 ]. Following these insights, and regarding our research question if overweight and obese students are graded more severely, we expect that overweight and obese students get lower grades than normal weight students , once adjusting for subject-specific competence and other individual characteristics Hyp.
If these characteristics matter, we should expect that overweight and obese students are graded less generously than comparable normal weight students especially in German and less in mathematics Hyp. Following this logic, we hypothesize that the weight penalty regarding grades is higher among girls than among boys, especially in German Hyp. Analytical design Data. The other variables age, gender, native language and school region, type of secondary school attended do not present any missing value and therefore are used in the imputation models as additional predictors.
Considering the amount of missing values and following recommendations from recent literature [ 61 ], we generated 50 imputations completed datasets under our chosen imputation model. Download: PPT. Fig 1. Fig 2. Fig 3. Heterogeneity in the overweight penalty by gender As a last step, we investigate whether the penalty in grading associated with being overweight differs by gender, as found by a previous study in the US [ 9 ].
Table 2. Fig 4. Predicted probabilities of obtaining a low grade by BMI and gender. Discussion and conclusions From previous studies we knew that overweight students get, on average, lower grades than their peers in school [ 30 , 31 ] but it was not clear if the differences in grades stem from discrimination or from actual differences in competences. Supporting information. S1 Table. S2 Table. S3 Table.
S4 Table. S5 Table. Average partial effect APE estimates reported in Fig 3. S6 Table. Average partial effect APE estimates reported in Fig 4. S7 Table. References 1. Changes in perceived weight discrimination among Americans: — through — Getting worse: The stigmatization of obese children. Obes Res. The role of automatic obesity stereotypes in real hiring discrimination.
J Appl Psychol. Obesity discrimination: The role of physical appearance, personal ideology, and anti-fat prejudice. Int J Obesity. Associations between overweight and obesity with bullying behaviors in school-aged children. Social marginalization of overweight children. What if students were made aware of their potential biases, the authors wondered?
To test their idea, the authors conducted an experiment in pairs of large introductory courses in biology and American politics at Iowa State University last spring. All four sections were taught by white professors, allowing the researchers to eliminate effects from confounding racial biases. One section in each field was taught by a man and the other by a woman. Students were randomized within courses, not across courses, to receive different evaluation formats so that professors could be compared to themselves, not other professors who may actually be better teachers.
Unlike the standard evaluation, the treatment evaluation included the following language, which the researchers expected would mitigate gender biases:. Student evaluations of teaching play an important role in the review of faculty. Your opinions influence the review of instructors that takes place every year.
Women and instructors of color are systematically rated lower in their teaching evaluations than white men, even when there are no actual differences in the instruction or in what students have learned. As you fill out the course evaluation please keep this in mind and make an effort to resist stereotypes about professors. Among other questions, every student involved in the study was asked the following about their instructor, on a five-point scale:.
Students were also asked about their gender, as is standard for Iowa State evaluations, allowing the researchers to examine that, as well. The authors guessed, based on existing literature, that male students would be more biased against female instructors than female students would be.
The authors controlled for students' expected grades in a course. What happened? The language seemed to have a small but significant, positive effect for female faculty members on all three questions -- and no effect for men. The answers to the overall evaluation of teaching were 0.
The difference in the means for the teaching effectiveness question were 0. For the overall evaluation of the course, the treatment condition was 0. There was some evidence of an effect for male students rating female professors on overall rating of the course and the instructor, but not teaching effectiveness. A more advanced analysis reduced this effect, however. There is no evidence of a similar effect on the evaluation of male instructors. Given the outsized role SET play in the evaluation, hiring and promotion of faculty, the possibility of mitigating this amount of possible bias in evaluations is striking.
A note of caution, however. The working paper on that topic involves professors granted tenure over 11 years at the University of Colorado at Boulder. Together, the professors taught more than 6, courses to undergraduates and graduate students, and their many SETs of course included proxies for teaching effectiveness -- namely overall ratings for course and instructor.
All such questions are subjective and self-reported by students, the paper notes. The authors were most concerned with the possibility that professors teach different courses after tenure. David A.
First, I do a permutation test where I randomly assign stereotypes to teachers. Second, I restrict the data set to classes where assignment to peers is statistically independent for all student characteristics by gender. A potential concern is that IAT scores may be affected by exposure to the same cohorts of students. Indeed, the IAT is expected to be the combination of a trait stable over time, capturing individual stereotypes, and occasion-specific variation and noise that may be affected by conditions while taking the test, and stimuli received by the subject in the period right before the test.
This figure shows the timeline of data collected for the three cohorts of students. They graduated from middle school between and Teachers were surveyed between October and March Standardized tests are administered at the end of grade 8. Reverse causality seems unlikely for several reasons.
I can provide supporting evidence against this issue by showing that the results are unchanged when I restrict the sample to the last cohort of students who graduated after their teachers took the IAT results are presented in Section V.
Furthermore, teachers with more stereotypes are not systematically assigned to students with different characteristics, such as family background and standardized test scores in math see Table IV and Online Appendix Table A.
Teachers included in our analysis have been teaching, on average, for 20 years with a median of 22 years and therefore over time they were exposed to hundreds of students. I document the impact of math and literature teachers in Panels A and B, respectively. By the age of 14, girls lag 0. This table reports OLS estimates of equation 1 , where the dependent variable is math or reading standardized test score in grade 8 in Panels A and B, respectively, and the dependent variables refer to math teachers in Panel A and literature teachers in Panel B.
Standard errors in parentheses are robust and clustered at the teacher level in Panel A and in Panel B. Individual controls include education of the mother, occupation of the father, immigrant dummy, generation of immigration, and their interactions with the gender of the student. Ceteris paribus, female students assigned to female teachers have slightly albeit insignificantly higher math performance in grade 8 compared with their classmates.
However, other studies find that having a teacher of the same gender helps improve performance, especially at the college level Dee ; Carrell, Page, and West In Online Appendix Table A. VI, I split the sample by teacher gender. However, what seems to matter the most is whether the teacher has gender stereotypes. To give a clearer interpretation, Online Appendix Table A. The gender gap in the classroom is around 0. Are teachers with stronger stereotypes worse instructors or are they helping boys learn math?
I investigate the effect of teacher bias by estimating equation 2 directly, comparing students of the same gender within the same school and cohort but assigned to different classes. The linear approximation presented in Table V seems to adequately represent the data. Table VI , column 5 mirrors Figure III : it presents the results of the regression analysis and shows that girls are lagging behind when assigned to teachers with stronger implicit associations, while boys are not affected by teacher stereotypes.
The results are robust to the inclusion of the controls as in Table V. In this specification, the characteristics of teachers are not absorbed by class fixed effects and therefore controls at the teacher level are particularly relevant and column 5 is the preferred specification.
This figure shows the effect of teacher stereotypes on student achievement by gender. The variable on the y -axis is the residualized standardized test score in grade 8, after controlling for school by cohort fixed effects, and student- and teacher-level controls. The variable on the x -axis is the raw IAT score. A higher value of implicit bias indicates a stronger association between scientific-males and humanistic-females.
The regression includes student and teacher controls. This table reports OLS estimates of equation 2 , where the dependent variable is math standardized test score in grade 8. The number of fixed effects school by cohort is I examine which students are the most affected by teacher stereotypes, considering their background characteristics and the time of exposure to their teachers.
VIII shows that the effect of implicit stereotypes is stronger for female students who started middle school at the middle or lower end of the initial ability distribution. Why do girls with lower level of ability initially suffer the most from the interaction with biased teachers?
Indeed, male students are not influenced by teacher stereotypes and, among girls, those most strongly affected have lower initial math achievement and are at higher risk of confirming the negative expectations of their group. Online Appendix E presents a conceptual framework that illustrates how teacher stereotypes can differentially affect effort and outcomes of students at the bottom and the top of the ability distribution.
To investigate this further, I analyze the differential effect according to the quantity of interaction time between a teacher and their students. The last two columns of Online Appendix Table A. VIII analyze whether there are heterogeneous effects in terms of years of exposure and hours each week. After three years of exposure, girls are lagging 0. Consistent with this result, Online Appendix Table A. I exploit two different samples.
First, I use test score in grade 6, administered a few months after assignment to middle school teachers columns 1 — 3 and collected only up to —13 and reported only for those teachers who took the test in Second, I exploit the fact that some classes were assigned to a new teacher at the beginning of grade 8 columns 4 — 6.
In both cases, the point estimates are indistinguishable from 0. X shows that girls are around 2. Second, in Online Appendix Table A. XI, I show the effect of the main specification presented in Table V separately by cohorts of students who graduated before teachers took the IAT school years — and for the cohort of students who graduated after teachers took the IAT school year Reassuringly for the potential reverse causality concerns expressed in Section IV. C , results are statistically indistinguishable and the point estimate is larger for the last cohort of students.
Third, Online Appendix Figure A. In 5 out of 1, permutations, I find a coefficient smaller than the one in Table V. Finally, in Online Appendix Table A. XII, I restrict the sample to schools by cohorts where Pearson chi-square tests suggest statistical independence of all student characteristics gender, education of the mother, occupation of the father, immigrant dummy, generation of immigration and of all student characteristics by gender.
In this additional robustness check, results are also not affected. Girls outperform boys in reading by 0. Table V , Panel B focuses on the impact of literature teacher stereotypes on reading performance. Although the point estimate is negative, the gender stereotypes of literature teachers do not statistically significantly affect this gap. Online Appendix , Table A. VI, Panel B shows that the negative point estimate is mainly driven by male teachers, but even for this subsample of teachers, the effect is not statistically significant at conventional levels.
XIII investigates the impact of teacher stereotypes, considering the implicit IAT of literature and math teachers and restricting the sample to those classes for which these scores are jointly available. The implicit stereotypes of literature teachers do not have a significant impact on math columns 1 — 4 or on reading standardized test scores columns 5 — 8.
Indeed, being assigned to a math teacher with stronger implicit stereotypes seems to have a negative, although indistinguishable from 0, effect on performance in reading, suggesting that female students do not simply substitute their effort in math for more effort devoted to studying literature.
There are several potential explanations for these results. First, it could be due to a measurement issue. Improvements in math may be easier to detect and measure on multiple choice tests. Standardized test scores in reading may be less elastic in capturing improvements during middle school, after basic literacy is completed, while standardized test scores in math may be closely related to specific learning during more recent school years.
A significant impact on math standardized test scores accompanied by no impact or a smaller impact in reading is a common result in the literature Bettinger ; Levitt et al. Furthermore, Gender-Science IAT scores do not allow one to distinguish between the stereotype that women are bad at math and men are bad at reading. If the former association is more salient, this test may be better at detecting stereotypes in the scientific field and therefore it may have higher predictive power for math.
Second, students may need less support and interaction with their math teachers compared to their literature teachers to perform well on the respective tests. Both math and literature teachers with stronger implicit associations may end up interacting less in terms of quantity or quality with students of the stigmatized group, which is consistent with findings on the role of interaction between managers and minority workers by Glover, Pallais, and Pariente However, only girls may be negatively affected because in math the support and explanation of teachers may be crucial for learning.
Third, math skills are likely to be mainly taught in school, whereas reading is more likely to be supplemented by parents or other caregivers at home. Hence, teacher stereotypes may matter more in subjects almost exclusively taught by teachers versus other adults. This is disproportionately true for math performance, as I show in Online Appendix D.
High school track choice is the first crucial career decision in the Italian schooling system. There are three main types of high school: academic, technical, and vocational. Each family receives a formal letter from the school with the subtrack recommended by the teachers, mainly driven by math and literature teachers who interact the most with students at school.
Table II documents that girls are less likely, on average, to be recommended to the vocational track 7 percentage points and scientific track 4 percentage points than boys. Both math and literature teachers consider motivation and interest as the most important factors, followed by grades given to students and involvement of parents in school activities.
Given that girls tend to have higher academic motivation, 53 the evidence on fewer recommendations toward vocational school for girls is not surprising. In this section, I explore the impact of teacher stereotypes on track choice at the end of middle school using an ordered logit, and then I focus on the choice of the vocational track and scientific academic track using a linear probability model.
Table VII , Panel A reports fixed effects ordered logit estimates using the BUC estimator Baetschmann, Staub, and Winkelmann , in which the dependent variable assumes value 1 for the vocational track, value 2 for intermediate tracks technical and non top—tier academic , and value 3 for top-tier high school scientific and classical.
These three categories are created grouping students according to their average test scores in grade 8 before tracking , as can be clearly seen in Online Appendix Figure A. Although all students are supposed to take the test, those who go to high school without taking the test are disproportionately represented among students enrolled in the vocational track. In column 6 , I include the quadratic of the math standardized test score in grade 8, as a potential mediator given the results in Section V.
Including these controls does not affect the point estimate. This table reports fixed effects ordered logit estimates BUC estimators following Baetschmann, Staub, and Winkelmann , where the dependent variable assumes value 1 for vocational track, value 2 for intermediate tracks technical and no top-tier academic , and value 3 for top-tier high school scientific and classical.
Standard errors in parentheses are robust and clustered at the class level. Columns 5 and 6 restrict the sample only to those students with a standardized test score in grade 8, while column 7 includes only students for whom we have information about both the math and literature teacher.
The last two columns of Table VII , Panel A provide evidence of the absence of a statistically significant impact on track choice of literature teachers. This result seems to support the idea that girls may be more vulnerable to the gender stereotypes of their math teachers compared to boys to the gender stereotypes of their literature teachers.
To delve deeper into the choice of the field of study, I provide evidence on the effect of teacher stereotypes at the bottom vocational track and the top scientific track of the ability distribution using a linear probability model and following the same structure of Table VII. This gap slightly increases when we include student-level controls column 3.
One standard deviation higher teacher stereotypes increases the probability of attending a vocational track for girls with respect to boys by around 2 percentage points, which corresponds to an increase of When I restrict the sample to students who took the standardized test score in grade 8, the point estimate decreases by around one-third and is no longer statistically significant at the conventional level Panel A, column 5.
The effect is mainly driven by students who did not take the test score in grade 8. Column 6 shows that including the squared polynomial of standardized test score absorbs most of the residual effect of math teacher stereotypes on the choice of vocational track. The last two columns of Table VIII show that literature teacher stereotypes have no significant effect on track choice.
This table reports OLS estimates of equation 1 , where the dependent variable is the high school track choice. Columns 5 and 6 restrict the sample only to those students who took the standardized test in grade 8, while column 7 includes only students for whom we have information about both the math and literature teacher.
The impact of both math and literature teacher stereotypes is statistically indistinguishable from 0, although the point estimate is negative and close to 1 percentage point for math teachers. XV shows the results estimating equation 2 , with school by cohort instead of class fixed effects. Column 2 confirms the previous evidence of an impact on female students of math teacher stereotypes in terms of choice of vocational training. Part of the gender gap within class captured in the specification using class fixed effects is due to the lower probability of boys choosing the vocational track when assigned to teachers with stronger gender stereotypes.
Table IX provides evidence of a similar pattern compared to Table VIII in terms of magnitude of the effect of math and literature teacher stereotypes on track recommendation for vocational Panel A and scientific Panel B paths, although the effect is generally slightly smaller and less precisely estimated compared to the one on track choice.
Track recommendation is a joint decision of the math and literature teacher, so own bias may be attenuated. XVI includes school-by-cohort fixed effects and suggests that girls may be slightly less likely to be recommended to the scientific track, even if the result is not robust to the inclusion of all sets of controls. This table reports OLS estimates of equation 1 , where the dependent variable is the high school track recommendation of teachers.
To sum up, math teacher stereotypes have a substantial impact on track choice mainly by inducing more girls to self-select into the vocational track. The impact on scientific track is negative, but generally indistinguishable from 0. The scientific track is chosen by girls with high achievement test scores whose performance was not affected by teacher bias, as shown in the analysis of heterogeneous effects in Section V.
Girls at the top of the math ability distribution are likely to have other academic-oriented role models in addition to their math teacher and a lower vulnerability to gender stereotypes. Literature teacher stereotypes have no significant effects on track choice of boys or girls.
As discussed already, a potential explanation is that girls may be more vulnerable to the gender stereotypes of their math teachers compared to boys to the gender stereotypes of their literature teachers. According to findings in social psychology, the development of academic self-concept begins in childhood and is strongly influenced in the period after elementary school by stereotypes communicated by parents and teachers Ertl, Luttenberger, and Paechter Students may believe that their own signal of ability and the signal received by teachers carry relevant information.
However, if the signal received from teachers is biased by gender stereotypes, female students, for example, may develop a lower self-assessment of their ability in the scientific field and potentially invest less in their STEM education. This idea is consistent with the stereotype threat theory developed in the social psychological literature Steele and Aronson , according to which individuals at risk of confirming widely known negative stereotypes reduce their confidence and underperform in fields in which their group is ability-stigmatized Spencer, Steele, and Quinn In Online Appendix E , I present a conceptual framework that develops the intuition for the stereotype threat theory.
Female students are generally found to be more critical about their abilities in math than male students even if they have the same grades, as shown in PISA tests as well OECD However, girls are 4. In classes assigned to math teachers with a one standard deviation higher IAT score, the gender gap in self-confidence increases by 4. Adding student-level controls interacted with pupil gender does not substantially affect the point estimate of interest columns 3 and 4 , Panel A. This table reports OLS estimates of equation 1 , where the dependent variable is self-stereotypes in grade 8.
A , I provide evidence that the gender gap in math performance increases during middle school in classes assigned to a more biased teacher. Hence, in Table X columns 5 and 6 , I also control for the mediating role of performance measured at the end of middle school to analyze whether gender gap in own assessment is merely due to different performance in grade 8.
I find that gaps in self-confidence are only slightly reduced. Teacher stereotypes seems to have an additional impact on math self-confidence, on top of performance in standardized test score, that may have detrimental effects for investment choices in education and occupation. In Table X , Panels B and C I focus on the impact of teacher stereotypes on self-confidence in reading and all other subjects.
Girls have slightly higher self-confidence in literature, although the point estimate is indistinguishable from 0. There is no impact on other subjects. The effects are substantively unchanged when controls at the individual level columns 3 and 4 , at the teacher level column 6 , and for the standardized test score in grade 8 are included.
Finally, in column 7 of each panel, I analyze the effect of both math and literature teacher stereotypes, while in column 8 I focus only on the impact of literature teacher stereotypes. Gender stereotypes of literature teachers slightly decrease the gender gap in self-confidence in reading, and they have no statistically significant effect on math and other subjects. This result is important for at least two reasons. First, it shows that self-confidence is affected by social conditioning from teachers.
Second, this is an important mechanism to understand the effect of teacher stereotypes on math performance and track choice of female students. I find that classes assigned to a math teacher who believes there are gender differences in math ability have a substantially larger gender gap in math performance, in the same direction as the results reported by Alan, Ertac, and Mumcu The impact of IAT score on student achievement is not significantly affected when I control for reported bias column 4.
This evidence seems to support the distinctiveness of implicit and explicit cognition Greenwald, McGhee, and Schwartz in the context of teacher gender stereotypes. Previous literature has shown the importance of gender bias in grading i. A natural question is whether implicit associations affect bias in grading of teachers. I have information only on grades given by teachers at the end of the semester.
As shown in Online Appendix Table A. XVIII, girls get higher grades on average compared with boys in both math and literature when we control for the standardized test score in the same grade. Girls assigned to teachers with more stereotypes get a slightly lower grade, but the effect is small and indistinguishable from 0. However, it should be considered that grades are a categorical variable from 2 to 10, where 6 is the pass grade. As shown in Online Appendix Figure A.
VII, there is a high bunching at the pass grade, especially for math, and almost half of the students obtain the same grade in math. There is little variability in teacher-assigned grades at the bottom of the distribution, where the effect of teacher stereotypes on standardized test scores is stronger. Additional outcomes on retention rates are reported in Online Appendix F.
In most OECD countries, women outnumber men in tertiary education, but they are by far a minority in highly paid fields such as science, technology, engineering, and math, especially when excluding teaching careers. Culture and social conditioning have a strong impact on the development of skills and educational choices. Girls, especially those with lower initial skills, are lagging behind when assigned to teachers with stronger math-male and literature-female implicit associations.
Boys, the group not ability-stigmatized in terms of math performance, are not affected by teacher stereotypes. The effects on reading are asymmetric, and literature teacher stereotypes do not affect the gender gap in reading. Math teacher stereotypes influence high school track choice, inducing more female students to attend an easier high school.
Furthermore, they foster low expectations about their own ability and lead to girls' underconfidence in male-typed domains. Indeed, girls are more likely to consider themselves bad at math at the end of middle school if they are assigned to a teacher with stronger stereotypes, even controlling for their ability measured by standardized test scores. These findings are consistent with a model whereby ability-stigmatized groups underassess their own ability and underperform, fulfilling negative expectations about their achievements.
Implicit associations can form an unintended and invisible barrier to equal opportunity. These results raise the question of which kind of policies should be implemented to alleviate the effects of gender stereotypes. The implicit stereotypes, measured by IAT score at this stage of development, should not be used to make high-stakes decisions, such as hiring or firing. IAT scores are educational tools to develop awareness of implicit preferences and stereotypes, and they should not have normative ground Tetlock and Mitchell However, one set of potential policies may be aimed at informing people about their own bias or training them to ensure equal behavior toward all students, especially within the schooling context Alesina et al.
An alternative way to fight against the negative consequences of stereotypes is reducing vulnerability to these stereotypes by increasing the self-confidence of girls in math or providing alternative role models—as done in the context of Indian elections, where exposure to female leaders weakens gender stereotypes in the home and public spheres Beaman et al. More research is needed to investigate the impact of both types of policies.
Code replicating tables and figures in this article can be found in Carlana , in the Harvard Dataverse, doi: Elena De Gioannis and Giulia Tomaselli provided invaluable help with data collection. I am grateful to all principals and teachers of schools involved in this research for their collaboration in data collection. This research project was approved by the Ethics Committee of Bocconi University on September 14, For instance, Nosek et al.
Stereotypes are mental constructs based on overgeneralized representations of differences between groups Bordalo et al. Students are assigned to the same group of peers from grade 6 to grade 8. Teachers are assigned to classes and follow students during all years of middle school, with few exceptions due to retirement or transfers. For the sample of students without missing data on test scores, the impact is smaller and indistinguishable from 0. Data on test scores are missing if the student did not take the test or if the school did not provide the correct match between administrative data from the Ministry of Education and INVALSI.
There are a few exceptions: students may be transferred to a different school by their parents or be required by their teachers to repeat a grade. The D. March 20, , n. An analysis of Ferrer-Esteban shows that ability grouping across classes within schools occurs almost exclusively in the south of Italy.
All schools in my sample are from the north of Italy. Students can be enrolled in school from 30 to 43 hours a week, and therefore the amount of time they spend with teachers varies. For instance, they spend six to nine hours with the math teacher. In some classes, literature teachers also teach history and geography so they spend more time with students.
The number of hours per week spent with the literature teacher varies from 5 to The test in grade 6 was administered only up to the school year — All students are supposed to take the test, unless they are absent from school on the day of the test. It may also happen that the school misreports the code that allows one to match the test score with the administrative data from the Ministry of Education. In Italy, standardized test score data have never been matched with labor market outcomes.
In schools, I obtained the authorization of the principal to administer the survey to teachers, but only 91 principals completed without mistakes the formal authorization to give me access to data from INVALSI. The data collection was also conducted for ongoing work studying teacher race stereotypes Alesina et al. Around half the students are first-generation and half are second-generation immigrants. Only four math teachers started the questionnaire and then did not finish it since they claimed either that they were not expecting such a long survey or that they could not understand the purpose of the IAT.
The report was delivered to schools during the summer of , after the middle school graduation of the cohort. The order of the tasks was randomized at the individual level and in Online Appendix Table C. I I provide evidence that the impact of the order of the blocks is small in magnitude. However, in all regressions, I control for ordering factors, but they do not have a statistically or economically significant effect on the estimates. In the context of implicit racial bias, studies have shown the relevance of IAT scores in affecting job performance of minorities Glover, Pallais, and Pariente and call-back rates of job applicants Rooth For instance, implicit racial associations have been shown to decrease after subjects viewed pictures of admired African Americans and disliked white Americans Dasgupta and Greenwald The specific questions are reported in Online Appendix C.
Individual-level data are anonymous and I obtained the authorization from each school principal to access data from their school. The data from the Italian Ministry of Education is available only up to the school year — The standardized test score in grade 6 is available only up to — The test was not administered after that year.
The specific question is reported in the Online Appendix C. As discussed already, 11 principals did not complete without mistakes the formal authorization to give me access to all data. Furthermore, I have to exclude teachers who did not teach in grade 8 and for whom I do not have student outcomes.
Finally, three math teachers and nine literature teachers did not complete the Gender-Science IAT test. II shows the balance table of the differences between the sample of teachers matched and the other teachers who completed the IAT. As expected, teachers not matched are around 9 years younger and 35 percent less likely to have a full-time contract tenured position , and they have 11 years less experience in teaching.
However, not only the average but also the entire distribution of implicit gender bias of the matched and not-matched teachers is extremely close exact p -value of Kolmogorov-Smirnov:. In the article by Nosek et al. I only consider classes with at least 10 students with standardized test scores. In some schools, more than one recommendation is given to students.
I consider whether at least one of the choices recommended was scientific or vocational. This information includes only the cohorts who began grade 6 from —12 and for which I collected information on the teacher assignment for all three years of middle school.
Two or three different classes can be assigned to the same teacher. I discuss the exogeneity of student assignment to teachers in Section IV. Glover, Pallais, and Pariente , while analyzing the impact of manager implicit bias on minority workers, suggest that we may expect an attenuation bias of approximately a factor of 1. The gender stereotypical representativeness in math at the top and bottom of the ability distribution is substantially stronger in Italy compared to the United States, where there are slightly less gender stereotypes Nosek et al.
In reading, there are no substantial differences among the two countries. Italy is a country with low labor market participation for women but substantial geographic variation across regions. In each school, usually only one professor is in charge of math Olympics and anecdotally this teacher is highly motivated and passionate.
Indeed, as shown in Online Appendix Table A. III, teachers in charge of math Olympics induce greater improvements in test scores of their students. Similarly, teachers with tenure and more experience tend to have students with higher scores in standardized tests.
There is a higher likelihood of obtaining a degree with honors for teachers born in the south that may partially drive the correlation between IAT and degree with honors. In Italy, parents dislike being assigned to a teacher with a temporary contract who may have little experience and may change during the years of middle school.
Unfortunately, for confidentiality reasons I only obtained the standardized test scores in grade 5 for those students who did not change school code between elementary and middle school. There are few students for whom I have this information, and it is not a random sample: they are slightly more likely to be female and less likely to have highly educated mothers. The test-retest reliability of IAT is generally considered as satisfactory by social psychology, with a correlation of 0.
This result is comparable to several other countries Fryer and Levitt ; Bharadwaj et al. In Online Appendix Figure A. According to a meta-analysis performed on studies in several countries, gender gaps in mathematics are around 0. The average gender gap without controlling for class fixed effects is substantially invariant 0. Most of the variation in math performance is within classes. These effects are related to exposure during a three-year period, with the exception of classes that changed teacher during middle school.
The next section focuses on exposure for shorter time periods, exploiting data on standardized test scores in grade 6 when available. It should be noticed, however, that most teachers in Italian middle schools are women, in both math and literature. There is little variation in the gender of teachers and potentially substantial self-selection into the teaching profession, which differs by gender.
Ideally, I should have created the terciles according to the test score in grade 5, before students were assigned to middle school teachers. This is available only for a few students per class. I build the terciles using test scores in grade 6 for the cohort before because this test was not administered after that year. Although the point estimates are statistically indistinguishable from 0, the negative effect is bigger in magnitude for girls for whom I do not have official information on their parental background.
This is more likely to happen for low-performing students whose parents do not report information about jobs and education to the school. This conceptual framework is an extension of the stereotype threat model presented by Dee I observe the assignment of teachers to students since Hence, for the first two cohorts of students I do not know their teacher for at least one year. I assume they had the same teacher throughout middle school since their teachers have been working in the school for at least six consecutive years.
The impact is similar when excluding these classes. Data are either present or missing for both test scores in math and literature, with the exception of 0. Unfortunately, there are no cohorts of students exposed for the first time to teachers after they took the IAT.
However, the fact that results are if anything stronger for the last cohort of students is reassuring for the potential reverse causality issue. The results are confirmed by the robustness checks reported in Online Appendix Figure A. Students in different tracks have, in most cases, little to no interaction during the school day. Teachers recommend students to a specific subtrack e.
For data on the lower academic motivation of boys with respect to girls, see Carlana, La Ferrara, and Pinotti Students in the scientific and classical academic tracks have substantially higher average performance in grade 8 before tracking than those in other academic tracks, who perform similarly to students in the technical track.
Students in the vocational track have substantially lower performance on average. The results are summarized in Online Appendix Figure A. As discussed in Section V. A and Online Appendix Table A. X, teacher stereotypes do not have a statistically significant impact on the probability of taking the test in grade 8. The results are substantively invariant when I consider classical and scientific tracks jointly in the linear probability model.
For the subsample of students for whom I have data on both math and literature teachers column 7 , the impact of teacher stereotypes on scientific track choice is negative and statistically significant. The impact on vocational track choice is also stronger. Have you been harsh to one student and more lenient to another? The implications are profound and disturbing: we may have perpetuated inequities in our classrooms and schools for years without realizing it.
Our use of inaccurate and inequitable grading may have barred students from getting in the college they wanted, kept them out of honors classes, and prevented them from graduating. Examine our systems and be willing to let go of an industrial model of grading the idea that only some can achieve success and meet expectations on a curve for a more 21st-century viewpoint where everyone can achieve success given the right supports and opportunities.
The answer? If we remove elements like behavior and compliance — they can be dealt with in a more restorative context — and solely grade on mastery and growth, we can free students from the shackles of subjectivity, bias, and evaluation on anything but their pure demonstrations of learning. In one classroom, a student may be doing terribly because they missed a few homework assignments.
In another, they may be doing well while missing the same few assignments. Managing ever-shifting and uneven grading policies through the school year can make it difficult for young learners to meet expectations and success. While some may argue that managing a multitude of expectations is good preparation for life, our young learners and particularly those who need to learn how to succeed, may need some consistency.
When schools work together to establish clear learning objectives, clear evaluation systems, and overall consistency, students are better able to navigate and drive their success. There is so much debate about the zero. While some argue that doing nothing warrants a zero, some educators use the zero for non-compliance, absence, behavior issues, and non-mastery. On a point scale, where A, B, C, and D are 10 points apart, the zero puts the F over 60 points lower.
A 50 or 55 is still an F. Do we need to grade using the harshest F possible?
The truth is, grades cannot Independent readers and see their. Going forward, Kreitzer and Sweet-Cushman interpreting SET results. Instead of general comments, assessments Higher Ed Careers. Do we need to grade should ask for student feedback. The watchdog said teachers should mitigate the risk of bias peers in general, but there is almost essay on different types of music research on other intersectional identities, including disability and pregnancy, the researchers say. Its Tory chair warned of the potential for grade inflation and a lack of consistency for most others, especially those who struggle in learning. The authors also caution in still an F. Attaining that level of performance would challenge the most talented students and may be impossible between schools in a letter to the education secretary. We have retired comments and very embedded notions of grading. PARAGRAPHSome evidence suggests that LGBTQ professors fare worse than their.Homework grading is gender biased against boys. This gender bias is a primary cause of the educational crisis with regards to boys. The most plausible explanation is that the gender grading gap is due to gender difference in non-cognitive skills, such as in-class behavior and homework. In this paper, we first present novel evidence of grading bias against women at Then, by random assignment of the gender of the grader.