AI Markers in Academic Writing: The Case of Dash Overuse

By Eben van Tonder and Christa van Tonder-Berger, 16 April 2026

Introduction

The rapid adoption of generative AI in academic writing has created a parallel challenge for supervisors, editors, and institutions: identifying when text has been produced or heavily shaped by AI tools. One recurring stylistic feature is the unusually frequent use of dashes. Two kinds exist. One is called an em dash (—) because its length traditionally matches the width of the letter M in a given font. American writing favours its use. The other is the en dash (–) because it matches the width of the letter N. British writes tend to use this more. In this article we examines whether such usage is appropriate in English writing, whether it can function as a marker of AI involvement, and how users can instruct AI systems to avoid this pattern.

Dash Usage in Standard English Writing

Authoritative style guides agree that dashes are a secondary punctuation device. The Chicago Manual of Style states that em dashes are used to indicate a sudden break in thought or to set off an amplifying or explanatory element, but should not replace standard sentence structure such as commas or full stops [1]. Similarly, the Oxford Style Manual emphasises that overuse of dashes leads to informal and potentially unclear prose [2]. The Associated Press Stylebook also limits dash usage to specific functions such as abrupt change or emphasis, discouraging frequent use in continuous text [3].

Empirical studies of academic writing support this position. Research on academic prose shows a preference for hypotactic structures, where logical relationships are expressed through conjunctions and subordinate clauses rather than punctuation shortcuts [4]. Excessive reliance on punctuation such as dashes disrupts this structure and reduces clarity.

Observed Pattern in AI Generated Text

Analyses of large language model outputs have identified consistent stylistic regularities. AI generated text tends to favour high information density per sentence, layered clause structures and reduced use of explicit logical connectors.

These tendencies are documented in studies examining stylistic fingerprints of machine generated text, which note increased structural uniformity and predictable clause construction patterns [5].

The dash becomes a convenient mechanism within this system. It allows the model to attach qualifying or explanatory segments without restructuring the sentence. This aligns with findings that language models optimise for local coherence rather than global stylistic variation [6].

Why Dash Overuse Emerges

The overuse of dashes is not explicitly programmed. It emerges from three interacting factors:

First, training data composition. Language models are trained on mixed corpora including technical documentation, blogs, and explanatory texts, where dashes are more common than in formal academic writing [7].

Second, compression of ideas. AI systems tend to compress multiple related ideas into a single sentence. The dash functions as a low cost structural bridge between these ideas. It outputs ideas in layers: main point + qualification + example + contrast. Instead of splitting these into separate sentences, it compresses them into one. The dash becomes the easiest way to attach these layers.

Third, avoidance of syntactic complexity. Studies in computational linguistics show that language models often avoid deeply nested subordinate clauses, instead preferring flatter structures with inserted segments [8]. The dash supports this preference. It does this because it tries to avoid strong rhetorical commitment. Instead of writing: “This causes X because Y” it often writes: “This causes X — particularly when Y is present — which leads to Z.” The dash allows hedging without fully restructuring the sentence.

Is Dash Overuse a Reliable AI Marker

Dash overuse alone is not definitive proof of AI involvement. However, it is a statistically meaningful indicator when combined with other features. Research on AI detection highlights that markers are probabilistic rather than absolute. Features such as repetitive sentence structure, balanced phrasing, and reduced syntactic variation are more reliable when evaluated together [5][9].

In professionally edited academic writing, excessive dash usage is typically corrected. Its presence therefore often indicates either a lack of editorial control or reliance on unrefined AI output.

Practical Implications for Academic Editing

For editorial professionals, dash overuse provides a useful diagnostic entry point. It signals areas where the text may require restructuring rather than simple proofreading. The object is to replacing dashes with clear sentence boundaries, explicit logical connectors, proper subordinate clauses and restores conventional academic style and improves readability.

Instructions to Control Dash Usage in AI Output

Users can significantly reduce dash overuse by providing explicit constraints. Effective instructions to the AI model in terms of OUTPUT include phrases such as:

“Do not use dashes in any form. Replace them with full stops or conjunctions.”

“Limit punctuation to commas and full stops unless strictly necessary.”

“Use formal academic sentence structure with explicit connectors such as ‘because’, ‘therefore’, and ‘however’.”

“Split complex sentences into shorter sentences rather than inserting clauses with punctuation.”

Studies on prompt engineering confirm that specific stylistic constraints can significantly alter output patterns in large language models [10].

Conclusion

The high frequency use of dashes is not consistent with established English academic writing standards. It is a useful marker of AI generated or insufficiently edited text. The phenomenon arises from structural tendencies in language models, particularly the compression of ideas and avoidance of syntactic complexity.

For academic and professional contexts, controlling this feature through explicit instruction and editorial intervention is both necessary and effective.

References

[1] Chicago Manual of Style. 17th ed. University of Chicago Press, 2017.
[2] Oxford Style Manual. Oxford University Press, 2016.
[3] Associated Press Stylebook. AP, latest edition.
[4] Biber, D. et al. Longman Grammar of Spoken and Written English. Pearson, 1999.
[5] Gehrmann, S., Strobelt, H., Rush, A. “GLTR: Statistical Detection and Visualization of Generated Text.” ACL, 2019.
[6] Holtzman, A. et al. “The Curious Case of Neural Text Degeneration.” ICLR, 2020.
[7] Brown, T. et al. “Language Models are Few Shot Learners.” NeurIPS, 2020.
[8] Manning, C., Schütze, H. Foundations of Statistical Natural Language Processing. MIT Press, 1999.
[9] Mitchell, E. et al. “DetectGPT: Zero Shot Machine Generated Text Detection using Probability Curvature.” ICML, 2023.
[10] Reynolds, L., McDonell, K. “Prompt Programming for Large Language Models.” arXiv, 2021.