This post contains a small amount of profanity for pseudo-scientific purposes.
If I were to ask you for the first word that comes into your mind when I say "stepmother," I would guess that for most of you the answer would be "evil" or "wicked." In the field of linguistics, the phrases "evil stepmother" and "wicked stepmother" are known as bigrams.
A bigram is a combination of two "tokens" (such as words or letters) that appear frequently together in spoken words or printed text. In today's article we're focusing on two-word combos. There are larger word groupings used by linguistic scientists which are all collectively referred to as ngrams. Today we'll be seeking out the most popular adjective + noun bigrams from various sources as they pertain to the rental industry.
Adjective + noun bigrams are sneaky little things. On their own they seem quite harmless but if you hear certain combinations of them repeatedly they can affect your perception of things before you have hands-on experience with them. Stepmothers around the world have been forced for decades to work extra hard to overcome the media's constant repetition of "wicked stepmother".
Let's look at another example, "the child went to school." That's a pretty neutral statement. But if I say "the child went to public school," or private school, military school, special school, urban school, boarding school -- you get the picture -- suddenly the reader can assume to know a whole lot more about the child in question simply by swapping out a single adjective in the bigram. Yet another example: healthcare, as opposed to universal healthcare or managed healthcare.
The study of linguistics is inherently tied to other scientific fields such as cognitive studies and sociology. Our use of language as a species is one of the things that sets us apart from all other living things. Language changes, it absorbs, it blends. It reflects the changing values of a society. One particular branch of linguistics is corpus linguistics.
A corpus (plural: corpora) is a collection of texts which can be statistically analyzed for word frequency. I've done brief corpus studies before using Twitter as my source of texts. Here's one on apartment hunting and another on landlords. But there are other corpora out there and within them is a whole lot of discussion about landlords, apartments and tenants from over 200 years of the written word.
Choosing the Corpora
In order to do this correctly I'd need multiple different corpora to compare.
I would need to be able to run a multi-word search using specific parts of speech across each corpus. This ruled out one potential option, an ngram-enabled text collection of Reddit posts, but left me with four other options.
The first one I used is probably the most well-known ngram indexed corpus out there, the ngram search at Google Books. This corpus includes every book that Google has scanned into their database, a selection of texts ranging from 1800 to 2000. It includes British English and American English. However, it is widely acknowledged to be a flawed corpus because of the optical character recognition (OCR) scanning technology they used to digitize the books. OCR was still an infant tech when they started digitizing and it's still pretty buggy. So I needed another source. Since the Google Books corpus cuts off at the year 2000 I want something more modern, and I found it courtesy of Mark Davies at Brigham Young University.
At Mark's site English Corpora I found a whole raft of options to choose from. I selected three: the iWeb Corpus, the TV Corpus and the News on the Web Corpus. The iWeb corpus is a collection of websites captured in 2017, selected for their consistent length for use in statistical language analysis. It contains about 14.5 billion words. The TV corpus contains the scripts of 75,000 TV shows aired between 1950 and 2018. The News corpus contains the text from online newspapers and magazines from 2010 to the present day, about 8.1 billion words and growing.
From across these four options I would have a cross-section of most forms of the English language: formal, commercial, informal and informational.
I checked each of these four corpora against the following five bigram search criteria, recording the top 10 results from each:
- Adjective + landlord
- Adjective + landlady
- Adjective + slumlord
- Adjective + apartment
- Adjective + tenant
Below are the adjectives that came up in the top 10 results in at least one corpus. The words with an asterisk (*) appeared in at least 3 of them.
Landlord: absentee, best, biggest, commercial, current, damn, dead, English, evil, fucking, good*, great, Irish, largest, local, new*, old, own, potential, previous, private, prospective, residential, social, stupid, such, superior, worst
Landlady: antagonistic, buxom, current, dotty, eccentric, elderly, fat, former, genial, German, good, Irish, Italian, kind, little, mean, modern, mutual, new*, nice, nice/st, nos(e)y, notorious, old*, poor, strict, stupid, widowed, worthy
Slumlord: biggest, capitalist, evil, fellow, female, filthy, fucking, greedy, illegal, inexperienced, Irish, Jewish, largest, local, notorious*, racist, secret, sleazy, small-time, unconscionable, unscrupulous, vindictive, wealthy, white
Apartment: adjoining, empty, furnished, great, high-rise, inner, large, little, luxury, modern, nearby, new*, nice, old, one-bedroom*, own, private, rented, residential, same, small*, spacious, three-bedroom, tiny, two-bedroom, whole, wrong
Tenant: best, current*, existing, favorite, first, former, good*, incoming, joint, largest, main, major, new*, only, original, other, outgoing, particular, perfect, potential, present, previous*, prospective, single, such
If you would like to see the full list including which words appeared in which corpora, I've done up a spreadsheet here.
So obviously this is just a bit of trivial natter. I've not run many numbers and haven't compared these words against other statistically significant ngrams to see if they could actually have an impact. But we can tease out some interesting things from the results.
There's a lot of focus on the transitory nature of apartment living. The frequent appearance of "old" and "new" in the lists along with words related to apartment turnover such as "potential," "prospective" and "previous" are all reminders that rental housing is a short-lived arrangement. However we have to remember that the word "old" could be the tail end of the phrase "X year old", a frequent modifier in news articles.
Also I see many mentions of immigrant status. We've got English landlords, German and Italian landladies, white and Jewish slumlords and Irish across the board.
Adjectives used to describe landladies are far more focused on their physical appearance (fat, buxom, little) or their personality shortcomings (dotty, eccentric, genial, nosy, strict) than the adjectives used to describe their male counterparts. It's also noteworthy that "female slumlord" is something that has been specified with some frequency.
Descriptive terms used for landlords and slumlords see a lot of overlap, with the landlord words trending far more positive between the two. I did note that the most negative terms such as stupid, nosy, damn, worst, and of course f***ing all came from the TV corpus only. However, when it came to judging landladies the news is nearly as judgmental, serving as the sole source for the words dotty and antagonistic.
For landlords and tenants both there is a preoccupation with how large they are. Biggest, largest, main and major all make appearances. However when it comes to apartments we're obsessed with how small they are. "Small" appeared in front of "apartment" in all 4 corpora, and we also see "tiny" and "little" on the list. But we also see "large" and "spacious," "luxury" and "modern."
All told the only ones who come off on the good side of this adjective rally are the tenants, who shine out with words like "perfect," "best," "favorite" and "first," with no noticeably negative words in the list.
The real question of course is the impact that these word combinations have on our perceptions of others when we see and hear them over and over again. Word combinations can easily become stereotypes, which in turn leads to bias and incorrect assumptions. The "wicked stepmother" bigram has been blamed for everything from psychological harm to false murder convictions. We have to ask ourselves what the "small apartment" and "notorious slumlord" bigrams have done to shape the rental industry for good or ill.
Can you think of other renting-related bigrams you see or hear a lot? Let me know in the comments!
RentConfident is a Chicago startup that provides renters with the in-depth information they need to choose safe apartments. Help us reach more renters! Like, Share and Retweet us!