Informal Writing in Social Media Posts: Deviation Patterns by Age and Gender
Social media is a relatively new creation when considering all that has occurred in the history of language. In congruence with this creation, a somewhat unnoticed mode of communication has flourished: informal writing. For the purposes of this research, I’ll refer to this informal writing as “non-standard” or a “deviation” from what would normally be considered proper or standard writing. Social media platforms have created a place for people to write informally, or in other words, have provided a place for people to emulate their social speech through written text. Speech, in this context, is meant to refer to all the nuances that come with understanding a message (e.g. tone of voice, body language, sarcasm). The purpose of this research is to show how often deviation is performed, how these deviations vary by age and sex, and why people may be performing specific deviations.
I will draw three hypotheses as to what the data will uncover. The first of which is that the youngest age group (13-17 years) will perform the highest percentage of deviation in their writing, and that these deviations will primarily be in the form of capitalization. This is not to suggest that they won’t also perform other deviations frequently, just that their capitalization deviation frequency will be higher than any other group. A second hypothesis is that frequency of deviation between sex will vary throughout age groups. In other words, one sex may have a higher deviation percentage in a given age group, but in a different age group would have a lower deviation percentage. A third hypothesis is that the oldest age group (26-40 years) will use informal contractions more than the other three categories of deviation. Informal contractions are (arguably) the most used informal element of speech. They are used so unconsciously that people rarely notice using them anymore. Since these contractions have become so common in speech, I believe the oldest age group will also use them more frequently in informal writing. This is just an assumption, though, as there is no way to tell whether or not the subject intended to deviate in the informal contraction or if it was done unconsciously. It’s also important to note that the most popular informal contractions are not autocorrected or red-underlined by spellcheck.
There are two assumptions that must be made for this research as well. One important component of this research is that the writer of the text understands they are deviating from standard English to write in a non-standard form. If what I believe is true about unconscious use of informal contractions, then it’s difficult to claim deviation, but since I have no way of knowing if the informal contractions were done unconsciously, they will be marked as deviations for the purposes of this research. Deviation is not the same thing as making an error, so any non standard usage that is not a conscious decision would not be considered deviation. Another assumption is that the sample size performed to create these data are representative of social media populations as a whole, which may or may not be the case.
Method
Data gathering was the first step in providing the material necessary to prove or disprove the hypotheses. It was determined that the same number of subjects would be needed in each group, and that each group should have the same number of males to females. This study has twelve (12) males in each group and twelve (12) females in each group, with three separate groupings by age: 13-17, 18-25, and 26-40, which results in 72 total subjects. Each subject would have five (5) of the most recent posts from their account documented for analysis, for a total of 360 social media posts. These posts, however, would need to be deemed substantive in order to count towards the data in this study. An example of a non-substantive post would be the subject giving a one-word caption for a picture the subject had posted. These types of posts were not included because they would misrepresent the scope of this research. Any other post that used words to form some type of complete thought would be documented for analysis. Also, if a subject did not have five substantive posts in total on their account, they were not selected for the study. This means that the subjects’ accounts needed to be vetted for content before being selected for the study.
To find study participants, I searched for keywords or issues that certain age groups would most likely be talking about. Once I was able to find a subject and estimate their age accurately, I used their friends/followers list to find other subjects in the same age group until I had twenty-four subjects in each group. In order to maintain the integrity of the data, I used multiple friends/followers sets from multiple individuals to gather subjects. Not doing this could result in skewed data because of the potential for groups of friends to follow each other in their trends. By finding three or four subjects not known to each other, and then finding other subjects through the initial three or four, I was able to maintain the integrity of the data. This process was repeated for each age group until enough subjects were found.
There are three other notes that should be taken in terms of data gathering. Number one: the subjects had no knowledge of my data gathering, nor did I request friendship with, follow, or converse with in any manner before, during, or after the gathering of data. This was meant not only for liability issues but also for the integrity of the data. Socioeconomic and sociocultural status were not taken into consideration. While this distinction would have allowed for more clarity and additional hypotheses, it was not feasible with time constraints. Number two: the age of each subject needed to be estimated in order to place them into a categorized age group. This was done based on the subject’s physical appearance and on the content of their posts. The ages are strictly estimations, though I’m confident in the accuracy of the estimations. At no time have the actual ages of the subjects been known. Number three: self-identification of sex and sexual orientation were not taken into consideration for categorizing deviations.
Once all subjects were identified and their posts collected, the total number of words written by each subject were documented in a table. I categorized four different types of deviations for this study: acronyms (e.g. BTW), capitalization (e.g. david), informal contractions (e.g. gonna), and non-standard spelling (e.g. todae, instead of today). The first three of these categories are straightforward; however, in the interest of simplicity, I found it necessary to group multiple different variations of deviation into the “non-standard spelling” category. These include: lengthened words, shortened words, abbreviated words, and words that would be considered slang words. Lengthened words are any words that maintain proper spelling but add extra letters at some point in the word (e.g. reaaaaaallllyyy). Shortened words are those that follow the same spelling convention but are cut off before finishing the word completely (e.g. bro). Abbreviated words are those that remove certain letters to shorten the word (e.g. u instead of you; frgt instead of forgot). These four instances (acronyms, capitalization, informal contractions, and non-standard spelling) were counted and documented in the corresponding table for each subject.
For data calculation, I divided the number of deviations in each separate category by the total number of words written by the subject, so if there were four capitalization deviations out of 300 total spoke words, I would divide 4 / 300 for a 1.3% deviation in capitalization. I would perform this calculation for each category. I then performed a total deviation percentage by dividing the total number of deviations (acronyms, capitalization, informal contractions, and non-standard spelling) by the total word count of the subject. This process was completed for each subject and percentages were noted in a table. Once all individual subjects had been calculated, I calculated total group deviations for each category and the deviation percentage as a whole. This calculation was completed by counting the total number of deviations for each category in each of the six groups and dividing by the total number of words written by the designated group. For a hypothetical example, if the 13-17 year old female group spoke a total of 2500 words and had 100 capitalization deviations, I would divide 100 / 2500 for a total group deviation percentage of 4%. Once all calculations for each individual subject and each group were completed, I would follow the same calculation to find the total deviation percentages of all the groups combined. I would then use these percentages to make my findings and assumptions for this research report.
Results and Analysis
Informal Contractions
Informal contraction usage does decline with age, though the decline is minimal. The consistency in informal contraction deviation suggests that there is no social pressure to eliminate these deviations in informal writing. People often speak using informal contractions, so it doesn’t come as much of a surprise that they are also accepted in informal text. If informal contractions were non-standard in speech, it could be assumed that the deviation percentage would lower by age group as well. One of my hypotheses was that informal contractions would be the most prominent deviation among the 26-40 age groups, though this was not the case, as non-standard spelling was actually performed more frequently.
Non-Standard Spelling
The 13-17 male group has a very high deviation percentage in non-standard spelling (5.1%) in comparison to the other groups, as well as a higher percentage in the other three deviation categories. With the exception of the 13-17 year old males, non-standard spelling remains relatively constant throughout the age and sex groups. Probably the most surprising information in the data table is the lack of deviation in spelling by the 13-17 year old females, as they more so align with the two older male groups and the 26-40 female group.
Acronyms
Acronym usage never rises above 1%, for any group, making it a relative non-factor in terms of deviation.
Capitalization
Capitalization deviation is the most frequent deviation, which supports my first hypothesis in this study. The 13-17 male group in particular has a very high percentage in comparison to the other groups. The largest decline in deviation comes in the form of capitalization. Females 13-17 deviate in capitalization 4.6% of the time, while males of the same age deviate 6.6% of the time. All other groups are under 3%. This suggests that the need to adhere to capitalization standards increases as people age. The motivations behind this adherence likely stem from social pressures (e.g. present as better educated, look less immature, etc.).
Interestingly, the drop off in deviation between the two youngest male groups (13-17 and 18-25) is 4.5%, which is the largest drop off in deviation usage between any age group. Males between the ages of 26 and 40 also have the lowest deviation percentage in capitalization of any group in the study, at 0.1%. Both the male and female group 26-40 had higher non-standard spelling frequency than capitalization, which were the only groups to deviate more in a category other than capitalization.
Big Picture
My second hypothesis for this research was that frequency of deviation between sex groups will vary throughout age groups. This hypothesis proved to be true, and the most surprising instance of this was the increase in non-standard spelling between the 13-17 and 18- 25 year female groups. The 18-25 females actually increased in their non-standard spelling percentage, with the 13-17 group at 0.8% and the 18-25 group at 2.0%. This was the only instance in which there was an increase in deviation percentage from one male/female group to the next older group. 13-17 year old males deviate more than any group in every category with a total deviation rate of 13.9%. This doubles the rate of the next closest group, and, interestingly enough, 26-40 year old males have the lowest total deviation rate at 1.4%. The data shows us that females have a steady decline in deviation, but never reached the same heights as the males at the peak deviation age.
Table 1.2
When performing an initial read-through of all the posts analyzed in this research, I assumed the percentages of deviation would be much higher for each group, save the 26-40 male and female groups. The relatively low percentages in deviation were actually a surprise given all the conscious “errors” I had seen before data analysis. The biggest reason for this is likely that I’m not used to reading social media posts and this made the posts seem riddled with non standard writing. Much of it stuck out like a sore thumb, so the initial reading made it appear worse than it really was (See Table 1.3 below for a visual of deviation v. non-deviation). A second note on data gathering was my frequency in skipping over common informal contractions (e.g. gonna, wanna, etc.). It was more difficult for me to notice these inconsistencies because I personally use these deviations in informal writing and texting. My own informal writing inconsistency required me to take special care to find the more common informal contractions in the social media posts.
Table 1.3
Sex had very little to do with deviation percentages, except that the non-standard spelling deviation for females 13-17 was much lower (averaging out similarly to other groups) than the males 13-17. I can think of no explanation as to why this deviation percentage would be so low compared to the males 13-17.
Inferences
I would argue that non-standard spelling and informal contraction usage stem from the writer’s desire to place tone and phonetic presentation on the text. It’s a way of allowing readers to see the tone in the text. An example from a 13-17 year old male subject shows this when he says he’s, “puttin’ on da pressha.” Notice also his consciousness of shortening the word putting by placing an apostrophe after the letter [n]. His placement of the apostrophe shows that his shortening of the word had nothing to do with typing fewer keystrokes, but that it was meant to express the way he would say it aloud. The word da is meant to represent the word the, as da is a popular replacement word for the in informal writing and speech, and it’s featured in a number of Hip/Hop songs. Pressha is meant to be the word pressure. The spelling of this word is interesting in that the first syllable of this word follows proper spelling conventions, while the rest of the word is written differently so that the reader will deviate from regular pronunciation. It can be assumed that this sentence was written to evoke a specific phonetic deviation, a deviation that he likely would have expressed in speech had he been speaking aloud. I make this inference because of the way children of this age are often trying to find their own identity and/or present to their peers a certain identity. Deviation in both spoken and written forms expresses a rebellious attitude, and those in the 13-17 year age group typically view rebellion as a factor in being labeled as a cool kid and working their way into adolescence.3 Another supporting feature of this influence is the way people are attempting to express tone in their social media posts. Typically, written texts are more formal in nature and aren’t meant to express emotion or changes in tone. This changes entirely, though, with the informality of social media posts, as writers feel the need to express their tone through the text. One reason for this is because the text is all they have to express their tone, with the exception of punctuation, which is beyond the scope of this research. In Gretchen McCulloch’s book, Because Internet: Understanding the New Rules of Language, she states, “We’re creating new rules for typographical tone of voice. Not the kind of rules that are imposed from on high, but the kind of rules that emerge from the collective practice of a couple billion social monkeys – rules that enliven our social interactions.” From this, we see that tone is meant to be expressed through the deviations performed on social media posts, which are typically in the form of non-standard spelling and informal contractions. If the male subject was to write, “putting on the pressure,” the message would not be read as intended because the readers can’t see any of the tone or phonetics in the writing. Given this information, it wouldn’t be inappropriate to assume that deviation percentages should be higher in all groups, as tone of voice is an important marker in speech, though the percentage remains rather low except in the 13-17 male group.
Capitalization deviation follows a similar line to that of non-standard spelling for the 13- 17 age groups. A lack of capitalization is viewed as rebellious in nature, and while it could also be a lack of willingness to put forth the effort to hit the shift key, there is no evidence to suggest that those writing on social media are typing for speed or ease, but rather a way of sending their own message with their own flair. The results of this research show that capitalization deviation decreases as subjects age, which also supports the belief that not capitalizing words stems from a search for identity and/or a way to follow the cool, social norms of peers. If this were not the case, older age groups would, in theory, also maintain a similar capitalization deviation.
An explanation for the decline in total deviation as people age stems from the process of maturity and the desire to appear more professional and intelligent. As people age, they feel more pressure to adhere to standard writing forms. This pressure comes with the desire to appear more marketable in the professional world. This is especially important now that we’re in a time when employers may examine the social media posts and activity of potential employees. Not only do they wish to appear professional to potential employers, they also wish to present as intelligent to their peers. At some point in the life of the subject (seemingly between the ages of 18-25), their mind switches from thinking they need to deviate from standard writing forms to thinking they need to follow standard writing forms. This points directly to the way maturity affects literacy presentation. This is an interesting finding in the way that those performing non standard deviations are conscious of their decision to deviate. Those reading the posts understand that the individual is using deviations, and the reader likely wouldn’t assume those deviations are performed because of a lack in intelligence. Qualitative data and knowledge in psychological theory would be required to confirm the inferences I’ve made with the results from this research.
Conclusion
Understanding informal writing on social media is a difficult undertaking. The primary aim of this research wasn’t to find out why people write informally, but what age and sex groups were writing informally with the most prevalence. It’s clear from the data that deviation percentages decrease with age and that total deviation is relatively low within each group. The only real surprising data of deviation between males and females came with the 13-17 year old subjects. Females rarely performed non-standard spelling, while males of the same group did perform non-standard spelling over 5% of the time. This rise in non-standard spelling by 13-17 year old males also rose their total deviation percentage to 13.9, which was double that of the 13- 17 year old females. Capitalization had the highest deviation percentage with all groups combined, coming in at 2.48% of total deviation. Non-standard spelling was second with 1.46% (refer to table 1.3 for visual). The inferences made under the inferences section propose that much of the reasoning behind informal writing comes from how one thinks one will be perceived through the text. The phonetic presentation and tone of words aims at putting the writer’s voice in the mind of the reader, which primarily connects to informal contractions and non standardized spellings. The writer in these instances wants the reader to get a certain impression or feeling from the reader, so they alter the presentation of the text to form that impression. One set of data I could have gathered to help with qualitative analysis would have been to document and calculate all the deviations that would have represented a change in tone or phonetic presentation. Not all spelling deviations result in the changing of tone, so this percentage would have been different, and could have offered an additional avenue of analysis. Capitalization deviation could come from rebellion, laziness, or individuality. Individuality seems unlikely given the frequency of use by subjects, though laziness, too, seems unlikely, since nothing has been shown to suggest people are trying to get messages out more quickly. Perhaps it’s as simple as people that age want to feel like they’re going against the rules, which would support Finders’ thoughts on the process of adolescence. Neither of these inferences can be confirmed without more quantitative and qualitative data, which I will offer as a continuing study to the research in this report.
Finders, Margaret J. Just girls: hidden literacies and life in junior high. New York: Teachers College Press, 1997. Print.
McCulloch, Gretchen. Because internet: understanding the new rules of language. New York, NY: Riverhead Books, 2019. Print
TOMMY BROWN is a senior pursuing a Baccalaureate in English with a minor in History. Selected by Professor David Bowie.