This is a small public demonstration of artificial intelligence (AI) co-writing in classroom contexts. The essays collected here respond to a proposal writing assignment assigned as part of the first-year composition course at the University of Texas at Austin. Each sample essay includes the final revised essay and all prompts generated by the AI. Each essay was graded and evaluated for robot-ness (i.e., did the prose seem artificial?) by three experienced first-year composition instructors. The essays were also scored by the Turnitin plagiarism detection software. Details on the results and links to each of the essays are provided below.
Essay | Essay Words | Generated Words | Turnitin Score | Ave Grade | Ave Robotness* |
Essay 1 | 1,377 | 10,532 | 2% | 80.0% | 3.83 |
Essay 2 | 2,068 | 24,685 | 6% | 88.3% | 2.00 |
Essay 3 | 1,542 | 4,730 | 1% | 77.7% | 4.17 |
Essay 4 | 1,303 | 6,198 | 3% | 80.0% | 4.00 |
Essay 5 | 1,510 | 11,359 | 3% | 88.3% | 3.00 |
* 5= very roboty; 1 = not roboty at all. (All essays posted with permission.)
Although this is a small public demonstration and not any kind of experimental study, the results are interesting in a few ways:
- Writers were able to use AI to co-write reasonable submissions to the assignment.
- Writers reported being frustrated about how much more time it took to co-write and that their co-written essays were not as good as what they would have produced writing alone.
- The best scoring essays in terms of average grade and potential humanness were those that generated the most text. This implies that revision is an essential skill for effectively leveraging AI co-writing tools.
- The text generated is unique enough that it is not flagged as potential plagiarism by Turnitin.
AI Co-Writing Activity: This co-writing activity asked students to use AI writing technologies (mostly GPT-J) to produce an essay in response to a standard first-year writing assignment from the University of Texas at Austin. The assignment asks writers to write a proposal about a local issue at the University of Texas or in the City of Austin. The co-writing activity parameters allowed students to:
- Craft multiple unique prompts and generate multiple AI outputs
- Combine and revise generated outputs
- Make edits for readability and to reduce redundancy
- Make edits to correct AI hallucinations (where AI invents fake information)
This is a follow-up exercise that builds on a prior experiment exploring the feasibility of using AI to cheat on student writing assignments. The original activity was designed to assess if students could efficiently use current AI writing technology to generate essays in response to a well-designed writing prompt. Students overwhelmingly stated that AI co-writing was far more difficult and more time consuming than traditional writing. The AI co-writing demonstration presented here allowed for additional rounds of edits and recruited additional writers beyond the original class.
AI Essay Grading: Three experienced first-year writing instructors were provided with each of the five sample essays (excluding generated text). They were told that the batch of five included an unspecified mix of both AI-generated and authentically human essays. Each instructor independently graded each essay and assigned a “robotness” score (5 = very roboty; 1 = not roboty at all). Each of the graders reported a sense of feeling like they were being tricked (which they were), but also an obligation to identify possible humans. Determining what “robot-ness” constituted and what metrics to look for was an individuated process that varied across evaluators; while the final roboty scores did have parallels across evaluations, evaluators all ended up operating based on slightly different notions of what to look for, and in the process noted that this affected their orientation to how to read and think about the writing.
Project Contributors: S. Scott Graham, Casey Boyle, Hannah R. Hopkins, Ian Ferris, Tristan Hanson, Maclain Scott, Emma Allen, Lisa Winningham, Walker Kohler