Do we talk and write about men more than women?

As I’ve been researching the gendered nature of bossy, I’ve gotten a lot of important feedback from fellow linguists, who have helped to strengthen the argument in favor of viewing bossy as a word that is applied to women and girls more than men and boys.

One particularly interesting point was raised by Lesley Jeffries (Professor of English Language at the University of Huddersfield). She commented that “One argument that you didn’t even use is that women are usually less mentioned in corpora than men”. I did not mention this, and if I could have provided evidence of it at the time, it would have impacted how I presented my findings on bossy. Specifically, it would mean that women were called bossy more than men were even though men were given more attention by speakers and writers in the corpus (suggesting bossy is even more gendered than I found originally).

Beyond that though the other implication that is staggering is this: If women are less frequently mentioned in corpora, which attempt to represent (to the best of our ability) language use in some domain, then that would reflect the idea that, we systematically pay less attention to one group of human beings than another. Now I don’t say this like that because I think I’m the first person to come up with this idea. The list of people who have made this very claim (including Jeffries on this very blog) sometimes using other types of data (e.g., number of women featured in Hollywood films) would stretch on for miles.

Nonetheless, when I went looking for a thorough study to confirm Jeffries’ point and my own assumption at the time that she was correct, I couldn’t find one, so I thought I’d go ahead and do my own for three purposes: (1) to illustrate the under-representation of women and girls in our discourse, (2) to provide a resource for researchers (especially linguists) who might be looking for some reasonably thorough data to support this point, and (3) to attempt to tease apart a couple of issues related to male forms being used as ‘generics’ and the impact of this on over-representation of men/boys in corpora (if you find that confusing, I explain it more thoroughly below).

So without further ado, the table that illustrates what most of my readers probably already knew, and what all of us probably wish wasn’t true.


Now allow me to provide a little bit more explanation of what you see in the table. If you’re interested in the full data set, I’ve made it available for download here.

First of all, looking at the left most column you will see a list of categories of references to men/boys and women/boys. Pronouns include things like he and she.  Human ref includes words like man, female, and girlInformal ref includes a small sample of words for referring to people informally like guy, dude, and chick. Titles/respect markers include titles like Mr. and Mrs. as well as markers of respect such as ma’amRelationships includes gendered words for referring to familial and personal relationships like brother, wife, girlfriend, and fathersRoyalty refers to the gendered terms used for referencing royalty such as king and princessesman/-woman includes compound nouns made with these endings like spokesman and congresswoman. Similarly, boy/-girl includes compound nouns with these endings like schoolgirl and cowboys. If you would like to see the full list of words I actually used, which are not necessarily comprehensive but are intended to provide a principled sample of language, download the full dataset. In each category, I have tried to include a balance of male and female equivalents. However, this is not always as simple as it might seem. Take, for example, the words man and woman. While, in many instances, the two terms are used for equivalent purposes, man has usages that may be represented in the corpus but that diverge from its meaning as “opposite of woman“. Below, I explore one of these usages, man as a generic term for humans (or ‘mankind’). Man is also used as an expression of exasperation or excitement, as in “Oh man, this blog post is really beating a dead horse”.

Across the top, you will see 4 abbreviations (COCA, GloWbE, BNC-Men, and BNC-Women). These refer to the four corpora that I used. COCA, or the Corpus of Contemporary American English, is a corpus that contains transcriptions from television shows, newspaper articles, magazine articles, fiction, and academic publications from the 1990s up until today all from the United States. GloWbE, or the Corpus of Global Web-based English, contains web-based writing collected from writers from 20 different English-speaking countries. BNC-Men and BNC-Women refer to sub-parts of the BNC64, which is itself a small section from the British National Corpus.  The BNC64 contains the conversational speech of 32 men and 32 women who are all from the United Kingdom and are all closely matched for other demographic characteristics.

General findings

Women are mentioned less frequently than men in all four corpora. The ratio is about 2:1 in GloWbE, COCA, and BNC-Men. Although women talk about women/girls more than men talk about women/girls,  women still talk about men/boys slightly more than women/girls as demonstrated by the results of the BNC-Women findings. Thus, much of the over-representation of men and boys likely has to do with men being over-represented as speakers and writers in COCA and GloWbE as well.

If there’s any category where women may have any advantage over men in terms of frequency of mention, it’s familial and interpersonal relationships, suggesting that women are more likely to be talked about in their roles as wives, girlfriends, sisters, and mothers than men are to be talked about in their respective roles. Compare this to the fact that women are mentioned far less frequently using the usually professional terms created by compounds with -man  or -woman (for example, salesman, congressman, spokesman). This comparison suggests the lingering effects of very traditional gender roles noted also by Douglas Biber (Regents’ Professor of English at Northern Arizona University) and his colleagues in the Longman Grammar of Spoken and Written English (see pp. 312-314), who report a ratio of 31:2 for words ending in –man and -woman in their corpus. When women and girls are mentioned frequently, it is in roles not ascribed with societal power. Of course there’s nothing inherently wrong with being a sister or a mother, but this does suggest that women are being ascribed a particular ‘place’ in society that is not equal in status to men’s.

What about generic use?

Another way in which men are more frequently represented than women includes the fact that men are used generically for both women and men in English (and in many other languages). One example of this is the marking of certain professions with the suffix –ess such as waitress, poetess, and actress. Since the male terms here are also often applied to women, I did not include them in the comparison.

However, there are a few terms included in my comparison that are sometimes used in this unigendered fashion (that is as if they applied to both genders): he, his, him and man, all of which are quite common in the corpus. Compare the unigendered usage of these words to the male-gendered:


  • Every teacher should be thankful that he gets the opportunity to teach.
  • If someone knocks at the door, let him in.
  • Every employee should respect his supervisor.
  • Man has long dreamed of travelling through space.


  • Mr. Johnson should be thankful that he gets the opportunity to teach.
  • If a young man knocks at the door, let him in.
  • Josh should respect his supervisor.
  • That man over there has long dreamed of travelling through space

We might want to know (as Sarah Shulist, Assistant Professor of Anthropology at MacEwan University, asked me on Twitter) to what extent the findings are a result of the (somewhat old-fashioned and often prescriptive) use of certain terms as unigendered, which would suggest that rather than talking about men more we use a linguistic form that simply has multiple meanings. Although I need to point out here that the idea that the unigendered use of man, his, him, and he as referencing both genders equally has long been disputed by psyhcolinguistic evidence, showing that even if a unigendered meaning is intended we tend to interpret such uses as referring to someone who is male. As Shulist pointed out, this doesn’t mean unigendered use is not a problem, it’s just a different type of problem. Specifically, if we use male-gendered terms when we intend to reference both men and women or girls and boys, then we erase women/girls from our discourse, even if that’s not necessarily our intention.

In order to get an estimate of how many instances of man, his, him, and he were unigendered, I took a random sample of 100 instances of these four terms in COCA. I chose COCA because it contains a variety of different types of language such as academic writing and television speech, because I suspect that there are differences across different contexts, especially as a result of things like conscious efforts to reduce sexist speech in academic writing (such as the Linguistic Society of America’s guidelines on nonsexist language). What I found is presented in Table 2 below (and the raw data for each term is available for inspection here: he,  himhis, and man).


These findings suggest that unigendered usage of prototypically masculine terms is infrequent. As a result, it’s probably safe to assume that the findings above which showed that men/boys were more frequently referred to  than women/girls are primarily the result of speakers and writers referring to men/boys more frequently. In other words, the influence of unigendered usage is negligible. What’s really driving the greater use of male-gendered words is more frequent mention of men and boys.

What’s missing?

Even though I hope to have provided a reasonably good sample, there are of course many other terms that refer to men/boys and women/girls. I’ll give a few examples here, but there are more.

One notable place where I have not included things is the informal references category, where terms like broaddollbaby, and other things (mostly references to women/girls) are not included because they have too much overlap with non-human references. In other words, doll used as a noun can be a woman or it can be a plaything; there’s no way for a computer to tell the difference really, and hence it’s not included in my analysis.

Another unfortunate absence is homosexual men and women. Although there is a clear non-pejorative term for female homosexuals, lesbian, the same is not really true for men, other than possibly the phrase gay men, but this is often used in the phrase gay men and women. Homophobia has bestowed upon us plenty of disgusting slurs for gay men and women, but without a frequently used counterpart for lesbian I chose not to include references to homosexual men and women in the analysis.

Another major absence is first names, which quite often clearly specify someone of a particular gender. Unfortunately, as I discovered in my attempts to include these in my analysis, there are a number of problems. First, reflecting patriarchal structures of family naming, men’s first names often happen to be last names too (e.g., James), so searching for them is made complicated by this. Second, in the United States at least, in the best available data I could find, men seem to have less diversity of first names than women do, as the most popular names for men account for a much greater percentage of the male population than is true for women. Even if an acceptable sample of names could be pieced together, it is difficult to know what data source to use. Even if a comprehensive list of first names of people living in the US at the time of text production were available, it would not be necessarily representative of the language in these corpora (even COCA, which is drawn from the US) since many now dead figures would feature prominently (for example, a name like Adolf might be frequent in the corpus but not in the US Census for obvious reasons) and we can’t necessarily assume that fictional characters are named following the same tendencies of real people. It would be nice to know if reference to male or female first names is more common, but I’m unaware of any way to do this in a principled fashion. Perhaps a reader can point me to a good tool for this.

Final thoughts

Overall, I think it’s hard to refute the fact that men and boys are mentioned more frequently than women and girls in our discourse, particularly in the types of public discourse found in newspapers, magazines, fiction, as well as internet and academic writing but also in our informal conversations. There are two strong reasons for this. The first is that regardless of who is speaking or writing, we tend to talk about men and boys more. However, this tendency is even more pronounced when the speaker is male. Male speakers are less likely than female speakers to talk about women and girls. This leads to our second reason, which is that men are over-represented as speakers and writers in the production of media and thus their stronger tendency to talk more about men and boys is also over-represented in public discourse. Another less frequent occurrence is reference to people in general using what are typically male-gendered terms like man (As in “Man has long dreamed of travelling through space”). This usage is certainly still present, but it can only explain a small section of the more frequent use of terms for men and boys.

The answer to the question in the title then is: Yes! Unfortunately, our discourse contains far more mentions of men and boys than of women and girls (not a terribly novel or surprising finding for many readers I’m sure), and sometimes we even talk about people in general as if they were all male. This both creates and sustains a world in which men maintain positions of power and prominence, while women when they are mentioned at all are cast in roles as sisters, wives, mothers, and girlfriends. I guess the important question now is: what are we going to do about it?

UPDATE (24 May 2014): I stumbled upon a paper by Paul Baker (Professor, Linguistics and English Language at Lancaster University), which provides a thorough description of the under-representation of women in four British corpora over time (starting in 1931). Baker finds some increase in the relative representation of women but still finds they are under-represented and described in stereotypical ways.

Tagged with: , , , , , , ,
Posted in Language and gender
4 comments on “Do we talk and write about men more than women?
  1. Tea_Talks says:

    Reblogged this on Tea Talks.

  2. Thanks Nic – great work! I wish every throwaway remark I made produced a nicely controlled and clear piece of research like this without me having to lift a finger;) I will certainly be pointing students towards your study as an exemplar in rigour and clarity which they would do well to emulate!

  3. Alon Lischinsky says:


    Note for example that the word woman contains man suggesting it is some type of deviation from the generic man (note the same for male and female).

    Not really, although I am sure that many speakers come to the same folk-etymological conclusion you do. In Old English, mann was a gender-neutral term (like the modern German or Swedish pronoun man), and the marked term for ‘male’ was wer. But, most importantly, male (< OFr maslle < L masculus) and female (< OFr femelle < VL femella, diminutive of classical L femina) are etymologically unrelated. The resemblance is coincidental.

  4. Lesley, thanks for your original comment and your kind words here.

    Alon, thanks for the reality check, particularly concerning male / female, which I totally botched here. However, I think woman is a bit more murky, although perhaps my original statement needs more refinement if it is to be made into an accurate claim. The OED suggests that man has long carried the implication that it was referring primarily to male humans. For example, the OED states that “Man was considered until the 20th cent. to include women by implication, though referring primarily to males.” In addition, they estimate the death of wer to have occurred by the end of the 13th century, suggesting man would have had to take over as primary means of referencing males. Looking over the OED examples, I see a male specific reference as far back as 1225. Even if we assume that the morphological formation of woman from wife and man predates all notion of man referencing males exclusively, I think we can still view the subsequent take over of man to mean male human as a way of placing males in the sort of default human position (which would make females deviations from them by logical extension).

    Nonetheless, I’ve deleted the sentence you quoted from the post just to avoid as many questionable claims as I can.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 465 other subscribers
Follow linguistic pulse on Twitter
%d bloggers like this: