Automated essay evaluation represents a practical solution to a time-consuming activity of manual grading of students’ essays. During the last 50 years, many challenges have arisen in the held, including seeking ways to evaluate the semantic content, providing automated feedback, determining validity and reliability of grades and others. In this paper we provide comparison of 21 state-of-the-art approaches for automated essay evaluation and highlight their weaknesses and open challenges in the held. We conclude with the hndings that the held has developed to the point where the systems provide meaningful feedback on students’ writing and represent a useful complement (not replacement) to human scoring.
Povzetek: Avtomatsko ocenjevanje esejev predstavlja prakticno resitev za casovno potratno rocno ocenjevanje studentskih esejev. V zadnjih petdesetih letih so se na podrocju avtomatskega ocenjevanja esejev pojavili mnogi izzivi, med drugim ocenjevanje semantike besedila, zagotavljanje avtomatske povratne informacije, ugotavljanje veljavnosti in zanesljivosti ocen in ostale. V clanku primerjamo 21 aktualnih sistemov za avtomatsko ocenjevanje esejev in izpostavimo njihove slabosti ter odprte probleme na tern podrocju. Zakljucimo z ugotovitvijo, da seje podrocje razvilo do te mere, da sistemi ponujajo smiselno povratno informacijo in predstavljajo koristno dopolnilo (in ne zamenjavo) k cloveskemu ocenjevanju.
Keywords: automated essay evaluation, automated scoring, natural language processing
Essays are a short literary composition on a particular theme or subject, usually in prose and generally analytic, speculative, or interpretative. Researchers consider essays as the most useful tool to assess learning outcomes. Essays give students an opportunity to demonstrate their range of skills and knowledge, including higher-order thinking skills, such as synthesis and analysis. However, grading students’ essays is a time-consuming, labor-intensive and expensive activity for educational institutions. Since teachers are burdened with hours of grading of written assignments, they assign less essays, thereby limiting the needed experience to reach the writing goals. This contradicts the aim to make students better writers, for which they need to rehearse their skill by writing as much as possible .
A practical solution to many problems associated with manual grading is to have an automated system for essay evaluation. Shermis and Burstein  define an automated essay evaluation (AEE) task as the process of evaluating and scoring the written prose via computer programs. AEE is a multi-disciplinary field that incorporates research from computer science, cognitive psychology, educational measurement, linguistics, and writing research . Researchers from all these fields are contributing to the development of the field: computer scientists are developing attributes and are implementing AEE systems, writing scientists and teachers are providing constructive criticisms to the development, and cognitive psychologists expert opinion is considered when modeling the attributes. Psychometric evaluations provide crucial information about the reliability and validity of the systems, as well.
In Figure 1 we illustrate the procedure of automated essay evaluation. As shown in the figure, most of the existing systems use a substantially large set of prompt-specific essays (i.e. set of essays on the same topic). Expert human graders score these essays with scores e.g. from 1 to 6, to construct the learning set. This set is used to develop the scoring model of the AEE system and attune it. Using this scoring model (which is shown as the black box in Figure 1), the AEE system assigns scores to new, ungraded essays. The performance of the scoring model is typically validated by calculating how well the scoring model “replicated” the scores assigned by the human expert graders .
Automated essay evaluation has been a real and viable alternative, as well as a complement to human scoring, in the last 50 years. The widespread development of the Internet, word processing software, and natural language processing (NLP) stimulated the later development of AEE systems. Motivation for the research in the field of automated evaluation was first focused on time and cost savings, but in the recent years the focus of the research has moved to development of attributes addressing the writing construct (i.e. various aspects of writing describing “what” and “how” the students are writing). Researchers are also focusing on providing comprehensive feedback to students, evaluating the semantic content, developing AEE systems for other languages (in addition to English), and increasing the validity and reliability of AEE systems.