Thoughts on the Bias Busters Workshop

I have produced this web page to share my thoughts about a workshop that Google and CMU developed and that CSE presented several times called "Bias Busters." I have not included the usual header that identifies this with the Paul G. Allen School because this is an explanation of my personal opinions and not something that should be construed as coming from the school.

I am pleased to report that CSE will not be offering this workshop in the future. Our staff are working on a new workshop that will leverage different materials and have a different focus. I will add comments about that new workshop when the details are finalized, but in the meantime, I want to speak out about this workshop that is still being offered at CMU and that has been adopted at other universities. I believe the workshop is ill-conceived and I worry that it could have a negative effect on individuals who participate and on the broader community. I don't doubt that the designers of this workshop mean well, but this is not the way to build an inclusive community. I encourage anyone who is swayed by my critique to join me in encouraging all universities to stop using this approach.

An Ill-Conceived Workshop

The first workshop that Google developed was on unconscious bias (sometimes called implicit bias). The workshop relied heavily on the implicit association test (IAT) that supposedly reveals biases that people aren't even aware of. The idea that we all have hidden biases fits nicely with a political narrative of systemic oppression of women and minorities, but the science has failed to confirm that narrative. For an excellent analysis of this, check out Jesse Singal's thorough discussion of the IAT in his NY Magazine article titled Psychology's Favorite Tool for Measuring Racism Isn't Up to the Job.

Science has also failed to confirm the idea that there is pervasive discrimination against women and minorities in STEM disciplines. It was easy to find discrimination thirty and forty years ago with experiments that used identical resumes but rotating names with different implied genders and race, but when Cornell researchers tried the experiment again in 2015, they found a two-to-one preference for women over men in STEM faculty hiring.

What I find most fascinating about this study is the response to it. It provides a microcosm of the challenges we face in trying to explore this issue calmly and rationally. Some latched on to the study as proof that women now have it better than men, which is not what the researchers believe. They have found evidence of differences in other areas and in a follow-up study they found no gender differences when the resumes were made stronger. Others dismissed the study because it didn't match their sense of reality, although they offered few scientific critiques. As Wendy Williams said of these critics of her study, "Acting as though the study is worthless because it came out in a way that you personally don't like—that's not science. That's religion."

Another fascinating response came in the form of a new study that looked at actual STEM hiring in computer science. The study is titled Gender, Productivity, and Prestige in Computer Science Faculty Hiring Networks, by Way, Larremore, and Clauset. The researchers failed to find evidence of discrimination against women in that once you account for factors such as the prestige of the institutions where applicants got their degree and the quality of their publication record, gender was not a significant predictor. And yet the spin about this study was all about discrimination and how it must be baked into the process before the interview because we are all certain the discrimination is there even when there is no scientific evidence to support that belief. As the Science article mentions in reporting this result, the situation is "complex." That's a polite way of saying that the study failed to find the evidence of discrimination they were looking for.

One source of complexity, then, is the question of whether the discrimination even exists. The assumption of bias is such a passionately held belief of some (almost religious as Wendy Williams said) that they ignore alternative explanations offered by researchers such as Lee Jusim of Rutgers. But even if you grant that bias exists, the evidence suggests that workshops like these produce no change in behavior. That was the conclusion of a new study that includes as one of its authors Brian Nosek, who is described as "one of the three founders of the IAT" in a Chronicle of Higher Education article.

That leaves us with unconscious bias that might not exist and workshops that fail to change any behaviors even if it does exist.

A Biased Workshop

This workshop was originally developed for Google employees based on materials from the now defunct Ada Initiative. CMU then adapted the materials for use in universities as part of their Bias Busters workshop. Staff members from CSE attended a CMU training workshop and adapted materials for use in our version. The great irony is that these materials are deeply biased.

Google usually demonstrates a commitment to open inquiry and the use of science as a tool to explore fundamental questions, but in their slides for this workshop, they say that, "Debating whether bias exists at your organization" is "Off-topic for this session." You could imagine that they are simply trying to keep the workshop on track and are open to other ideas, but when their engineer James Damore wrote a 10-page essay offering biological explanations for some of the gender differences at Google, they fired him. Google's CEO Sundar Pichai was concerned that Damore was "advancing harmful gender stereotypes in our workplace." Damore submitted his memo in response to a request for feedback on the workshops he had attended, including a version of this workshop. By firing Damore, Google has made it clear that they do not allow free inquiry among their employees. Universities should be careful not to emulate this intolerance.

The folks at CMU should know better, but their version also suffers from extreme bias. They list as their first goal to "Review Research." This gives their workshop an appearance of scientific objectivity, and yet they don't link to either of the Cornell studies or the "complex" study from Way, et al. Without much trouble I found two other major papers that they do not mention at all:

Understanding Current Causes of Women's Underrepresentation in Science: The abstract for this 2010 study from the National Academy of Sciences mentions that, "Claims that women scientists suffer discrimination in these arenas rest on a set of studies undergirding policies and programs aimed at remediation. More recent and robust empiricism, however, fails to support assertions of discrimination in these domains. To better understand women's underrepresentation in math-intensive fields and its causes, we reprise claims of discrimination and their evidentiary bases. Based on a review of the past 20 y of data, we suggest that some of these claims are no longer valid and, if uncritically accepted as current causes of women's lack of progress, can delay or prevent understanding of contemporary determinants of women's underrepresentation."

Not Lack of Ability but More Choice: Individual and Gender Differences in Choice of Careers in Science, Technology, Engineering, and Mathematics: This 2013 study from Psychological Science mentions in its abstract that, "Results revealed that mathematically capable individuals who also had high verbal skills were less likely to pursue STEM careers than were individuals who had high math skills but moderate verbal skills. One notable finding was that the group with high math and high verbal ability included more females than males."

The CMU site also includes a link to a 2003 paper titled "Exploring the color of glass" that I discuss in the next section. Of the three papers I explore on this topic, CMU has included the oldest and most flawed of the three.

I would hope that at this point some readers would begin to think, "But that's not fair because you're only looking for studies that support your argument." That's exactly right. When I read over the Google and CMU materials, I became emotionally uncomfortable and in a classic case of motivated reasoning, I went in search of studies that supported my sense of skepticism. That's the problem with approaching these questions from a perspective of advocacy rather than engaging in a dispassionate exploration of the truth. Google and CMU have done the same thing. Google even fired the one employee who dared to question their assumptions.

I am not an expert in this field, so everyone should take my comments with a huge grain of salt. But if I can easily find five strong studies that question the underlying premise of this workshop, then you have to wonder what process Google and CMU followed in putting together their materials. I happen to think that all five of the studies that I have cited are more solid than any of the studies cited by CMU, but I will leave that to others to judge for themselves.

A Biased Poster

NY Times columnist David Brooks correctly identified the tension between individuals like James Damore who are searching for scientific truth and diversity staff who see their mission as improving the culture in tech so that currently underrepresented groups feel welcome. "It takes a little subtlety to harmonize these strands," he wrote, "but it's doable."

"Of course subtlety is in hibernation in modern America," Brooks goes on to say (he points out that Sundar Pichai didn't even try), and workshops are more effective when they present simple and easy-to-understand ideas. This increases the likelihood of polarization as each side distills their complex evidence into simple "facts."

Suppose, for example, that I want to convince people that discrimination against women in STEM no longer exists and that it has, in fact, gone in the other direction. To make my point, I might cite the first Cornell study by saying that, "Women are 2x more likely to be hired as STEM faculty than men." Nobody would be more upset about such a gross simplification than the authors of the study who were careful not to make broad claims that aren't supported by their data. And yet, this is exactly what advocacy groups do in presenting their perspective.

Our workshop included a slide that was based on a poster from the University of Arizona titled Avoiding gender bias in reference writing. The CMU site lists this as one of their resources. This poster and the slide made based on it refer to many supposed facts that I found hard to believe. Fact checking can be exasperating because it takes a great deal of time and in a case like this involves reading multiple long papers. For those who aren't interested in delving into the weeds, you can safely skip the rest of this section and its lengthy explanations, but I felt it was a useful exercise to drill down on this slide.

The slide used in the CSE workshop was titled "Gender Bias in Academic Letters" and made definitive statements that sound like facts. In doing so, we were mirroring the language of the poster from the University of Arizona. That poster lists two studies as the source of their various claims. The first is a paper by Trix and Psenka from 2003 titled Exploring the color of glass: letters of recommendation for female and male medical faculty that looks at 312 letters written for a medical school between 1992 and 1995 (over twenty years ago). The other study is from 2009 written by Madera, Hebl, and Martin titled Gender and Letters of Recommendation for Academia: Agentic and Communal Differences.

Most of the observations on the poster come from the 2003 paper. The 2009 paper begins by citing five significant flaws with that study and mentioning a 2007 paper by Schmader, Whitehead, and Wysocki titled, A Linguistic Comparison of Letters of Recommendation for Male and Female Chemistry and Biochemistry Job Applicants, that was able to replicate some but not most of the claims from the 2003 paper. It is odd that the creator of the poster did not consider the 2007 study because, as the paper describes, "Although the findings of Trix and Psenka (2003) are provocative, one limitation of this study is that most of their comparisons were not statistically analyzed to provide information on the reliability of these differences."

Let me go through each of the claims included in the workshop slide:

Letters for women are 7x more likely to mention (irrelevant) personal life details. The 2003 paper mentions the idea of an "irrelevancy" and includes it as one kind of "doubt raiser." The paper provides no data on the incidence of such comments in the letters they examined, so I find no support for this claim. The incidence can't be very high given that only 16% of letters were labeled as having any of the "doubt raiser" characteristics.

Letters for women are 2x more likely to raise doubts about the candidate. This is a fair reading of the 2003 paper when they report in their Figure 2 that 24% of the letters for women versus 12% of the letters for men contained doubt raisers. But this result was not replicated in the 2007 study which says that, "in contrast to the findings of Trix and Psenka (2003), letters for female candidates to jobs in chemistry and biochemistry did not contain significantly more tentative language and did not overemphasize teaching and hard work over research and ability."

Letters for women contain more "effort" related words, while letters for men contain more "ability" related words. This is apparently a reference to the frequency of what the 2003 study refers to as "grindstone words" that they found to have a frequency of 34% for women and 23% for men. The 2007 study found no statistical difference in the use of such words for men versus women, although the one result it was able to replicate was the use of "standout" words which are related to grindstone words (fewer grindstone words in the presence of standout words). They found that men had on average 0.7% standout words versus 0.6% for women.

Letters for men are 16% longer than letters for women. This statistic is cited in the 2003 paper, but was not replicated in the 2007 study that found that, "no significant gender differences emerged for any of the following LIWC generated language-use dimensions: length of letters, negative feeling words, positive feeling words, tentative words, certainty words, or achievement words."

Letters for women are 2.5x more likely to make "minimum assurances" ("she can do the job") rather than ringing endorsements ("she's the best for the job"). This comes from Figure 1 of the 2003 paper that indicates that 15% of the letters for women were of this type versus 6% for male applicants. This is not a result that is easy to explore because it was based on a subjective reading of the letters to categorize them. Once again the 2007 study's failure to find differences for various categories of words seems relevant.

Letters for men are 4x more likely to mention publications and 2x more likely to have multiple references to research. I can find no reference to the 4x mention of publications for men versus women. The 2003 paper says, "we found that roughly the same personage [sic] of letters for women (48%) as for men (50%) mentioned 'research' at least once." The second statistic is consistent with the paper in that it says,"35 percent of those for women that mentioned 'research' at least once, mentioned it multiple times. In contrast, 62 percent of the letters for men that mentioned 'research' at least once, mentioned it multiple times," although it ignores the half of the letters that had no mention of research at all.

The 2009 study provides evidence that women are more often associated with "communal" words and men with "agency" words and that this can lead to a lower rating by STEM faculty. It is a useful study worthy of consideration, but it isn't relevant to this slide because it doesn't lend itself to simple sound bites with a numeric multiplier.

Although the 2007 study replicated the increased incidence of standout words for men, it is worth noting how they begin their summary section. Keep in mind that they described the 2003 paper as "provocative." They say, "Overall, the results of the current study revealed more similarity in the letters written for male and female job candidates than differences. Male and female candidates had similar levels of qualifications and this was reflected in their letters of recommendation. Letters written for women included language that was just as positive and placed equivalent emphasis on ability, achievement, and research." There is a "However" that comes later about the incidence of standout language, but their major conclusion is that they were not seeing the same patterns that Trix and Psenka reported in their 2003 paper.

I don't understand how any serious scientist with all three of these studies available to them could produce the kind of poster put out by the University of Arizona. This is a clear example of confirmation bias, picking only conclusions that match the desire to convince a reader that gender bias exists. The CMU resources include a link to an NCWIT poster that makes less simplistic claims but that is also based on the flawed 2003 Trix and Psenka paper.

A Dangerous Workshop

For months I have considered the question of how this workshop might be fixed. The more I think about it, the more I am convinced that it is fatally flawed. It demonizes "bias" as something undesirable and it advocates building a more inclusive community by "busting" those who exhibit bias. These two pillars are not enough to build a workshop around because both ideas are problematic.

Bias is not bad. A person with no biases is a person with no brain. We invoke biases when we select which students to admit to our degree programs and which faculty and staff to hire. We apply biases when we give grades to our students and judge whether a research paper has merit. We exhibit bias when we scold a fellow faculty or staff member for making remarks that are insensitive or rude. As a community we have expressed our strong bias in favor of diversity and inclusion by publishing a statement of our principles.

The idea that we might have unconscious bias that we aren't aware of allows one to make a workshop that is at least entertaining in the same way that being shown optical illusions and M.C. Escher drawings captivates us. But as I mentioned earlier, the scientific evidence for unconscious bias is weak and this workshop focuses on other aspects of bias.

My guess is that Google and CMU wanted to leverage the popularity of the unconscious bias workshop to motivate this workshop as the next logical step. But this workshop looks at real-world scenarios that are unlikely to have a connection to unconscious bias. In their training manual, Google says to "Start to collect examples of unconscious bias" by sending out an email asking people to, "Think about a time when you were either the subject or a bystander of unconscious bias." Our staff sent out an email with exactly this wording. Researchers in the field have difficulty identifying when a social interaction involves unconscious bias. It is absurd to believe that average people with no training can figure this out just by observation, let alone that they could accurately remember the details and nuance of the incident.

When I asked about this, I was told that we are talking about "subtle bias." It doesn't sound like an important distinction, but I would disagree. In the scenarios our staff picked, it seems clear that individuals are expressing ideas they hold consciously. When a student says that women aren't as interested in CS or that they are more interested in design than coding or that they are likely to get extra help because they are women, they are expressing personal beliefs, not manifesting some unconscious bias beyond their perception. This workshop sounds far less appealing when it is described as "Belief Busting," but that seems like the more accurate title.

The problem with belief busting is that you never know which beliefs to bust and which to leave alone. The CMU site has resources specifically for race, gender, and age, but not for other forms of bias. I have witnessed Trump supporters and deeply religious individuals being openly ridiculed in CSE, but those biases were not included in our workshop. Even if I could personally choose what biases would be busted, I would still object because the workshop is inherently political in that it identifies particular biases as being problematic. What seems like bias to one person can feel like resistance or patriotism or truth or faith or tradition or a righteous crusade to someone else.

This workshop also ignores the important distinction between the professional space where we work and learn and the personal space where we socialize and discuss politics. Erwin Chemerinsky and Howard Gillman describe in their book Free Speech on Campus why the first amendment requires a public university like UW to respect this distinction and why private universities should as well even if they are not required by law to do so. Many of the scenarios in our workshop involved private conversations between individuals not "on the clock" or in a classroom. We should tread lightly in trying to micromanage such conversations. We can impose some restrictions in the professional space, although even there they should be content neutral, emphasizing general guidelines such as mutual respect and willingness to listen to the concerns of others when our comments cause emotional distress.

I also object to the notion of "busting" individuals. The Google instruction manual says that, "At the end of the training, learners will be able to:

Recognize when action is needed
Identify and understand actions to combat unconscious bias
Commit to taking action in particular situations
Take action to combat unconscious bias"

Our staff adopted this wording by sending an invitation to the June 2017 offering describing it as, "an anti-bias workshop designed to help you counteract everyday acts of bias." What exactly does it mean to "counteract" acts of bias? If someone like James Damore were to openly express the ideas that got him fired from Google, should he be busted? This workshop seems to be teaching you how to do so. As one participant of a previous offering of the CSE workshop is quoted as saying, "The intervention strategies we got are great. Practicing intervening is very useful." The Google language is even more disturbing by invoking military terminology that we are "combating" people's unconscious bias. The big question is whether busting people helps to build community.

Evidence suggests that young people are increasingly willing to silence individuals who make offensive remarks. We have to resist that trend and remind our students of the value of free and open inquiry. James Damore might be unwelcome at Google, but he should not be fired or silenced or busted if he or someone like him ends up in our community. I'm sure that some people are passionate enough about righting the wrongs of the past that they wouldn't mind if we were to show a little intolerance towards those who advance what some consider to be "harmful gender stereotypes," but you can't fight intolerance with intolerance and we have a responsibility as a university to tolerate all viewpoints.

I have tolerated intolerance my entire adult life as I have struggled to be an openly gay man in a country that until recently was not very friendly to LGBT folks. I feel tempted now that we are in power to punish people who don't agree with me, but forcing them into silence does me no good. For example, if someone in CSE believes that homosexuality is a sin, I want that person to feel comfortable saying so openly. That is the only way I could hope to explore the issue with that person, as I did several times at Stanford when I moderated dorm discussions on homosexuality and Christianity.

Many of us are concerned about the lack of diversity in computer science and we are often desperate to do something—anything—to try to make things better. But sometimes a poorly crafted solution can cause harm. As the NY Magazine article points out, many people have reported strong emotional reactions to their IAT results. Even the designers of this test admit that it should not be used as a diagnostic tool that can measure your level of bias the way a blood test can measure your cholesterol level. It is irresponsible for any university to encourage people to take this test which has a veneer of scientific validity and might lead individuals to conclude that they are sexist or racist. It is exactly because I am so concerned about sexism and racism that I don't want to encourage people to subject themselves to this flawed instrument that has more noise than signal when applied to an individual.

We are living in a time when our country is deeply divided on many issues. In this politically charged climate, universities can't afford to offer a workshop whose claims have not been properly vetted. It would not be fair to the participants and it opens the door to bitter infighting.

Final Thoughts

I have been watching how people have reacted to Google's firing of James Damore and it concerns me deeply. Overnight he became a hero to some and a devil to others. I saw the same reaction when I was fired by Stanford in 1991 for daring to question the war on drugs. For a while he described himself as "fired for truth." I'm not convinced that any one of us has cornered the market on truth, but I would say he was fired for honesty and that should concern all of us. We need to ensure that universities remain a place where people don't have to keep their opinions to themselves for fear of being overheard and punished.

The one aspect of the workshop that appeals to me is the concern that some individuals in our community might hear comments that upset them and make them feel unwelcome. Certainly women and minorities have heard such comments and I don't want to diminish the negative impact such comments can have. I wrote about my own painful experiences in a 1987 Stanford Daily article that was controversial at the time for possibly sharing "too much information." But I don't believe the answer is to shut people down.

Based on my own life experiences I have deep sympathy for those who experience these hurtful comments. My life experience has also taught me that often these comments happen because others don't understand how their words can cause pain. But I don't want universities deciding which categories of hurtful comments are to be addressed and I certainly don't want them "busted." I want more dialog, not less. And sometimes we find that others really did mean that hurtful thing they said. In those cases I try to find a way to live respectfully with that person, recognizing, for example, that they might consider homosexuality a sin, but that we can still work together in harmony towards shared goals. To answer Rodney King's famous question, we can all just get along, but through understanding and respect, not through "intervention" and "busting."

I am sincerely grateful that CSE has decided not to offer this workshop in the future. If you share the concerns I have, then I suggest that you encourage other universities to drop the workshop as well. And no matter what you think about this workshop, I hope that each of us makes a commitment to try to make our universities places where everyone feels welcome and to carve out safe spaces for discussion of all ideas, even those ideas that we find personally upsetting.

Stuart Reges

Last modified: Mon Oct 16 09:44:00 PDT 2017