In my last post I lamented my skills as an examiner in the written format, and suggested that I might need to gather some feedback on my exams. So I did! In this post I’m going to outline how, and what the results were. Note to readers: this one gets a bit long and statsy. Go forewarned.
My last post was really the start of the feedback process, and was actually a good start as it allowed me to figure out exactly what I wanted to know. In bullet point form:
- How students felt about their exam score.
- Possible reasons that they received that score.
- Possible effects that their score may have.
- If students felt the exam was a fair test.
- How the exam could be improved.
From these basic ideas I generated a list of questions, and had one of our teaching assistants translate them into Korean. I chose to use the L1 to maximize return, in the belief that it would help students understand and complete the form better. I then created a Google Form (very simple!) using the questions and a likert-like 1-5 scale from strongly disagree to strongly agree. I chose numerical response in order to minimize translation on my part too, and because it would give me some stats to play with. I sent a link to the form to students via Kakao Talk. I got a fairly useful 40 responses out of 57, perhaps helped by a carrot of free drink prizes from the university cafe. This is what the results said.
How did students feel about their exams?
- I was happy with the result of my WRITTEN exam (mean = 3.56, SD=1.16)
- I was happy with the result of my SPEAKING exam (mean = 3.54, SD=1.02)
Student responses indicated that they were similarly happy with their written exam score and their speaking exam score . This was not what I expected given what I thought was the relative difficulty of the exams, and my happiness with student performances. However, I then remembered that I’d adjusted the written scores upward in order to fit them into a curve, and wondered if this was the reason. I decided to look at mean exam scores for some extra insight. However, looking at the numbers, I came across a small problem. I’m dealing with three different classes’ exams and survey responses, without knowing if the classes are represented equally within the responses. Given that their exams were of different difficulties, this is a potential source of dodgy calculations.
Nevertheless, there’s no choice but to lump all of these scores together to give us the following:
- Raw mean written score = 35.00 (sd=8.71)
- Adjusted mean written score = 38.35 (sd=8.71)
- Mean speaking score = 42.62 (sd=8.82).
Even having adjusted the scores upward, speaking scores were still more than 4 points higher, yet students were similarly happy with them. What I wonder is if the adjustment and some fairly generous grading in places (see last post) caused students to receive better scores than they expected. Sadly the data was collected anonymously, otherwise it would be interesting to see how these plotted against exam scores. Maybe students just have lower expectations – whatever the explanation this is a very strange phenomenon which might merit further investigation.
The second measure in the feeling category was about the effect on student confidence:
- The WRITTEN exam made me feel more confident about my English (mean=3.38, sd=0.97)
- The SPEAKING exam made me feel more confident about my English (mean=3.92, sd=0.89)
These results were much more predictable, though it’s perhaps a little odd that on the written exam where just one student got an A, many students still felt that it improved their confidence. Still, the written exam wasn’t an entirely positive result and rightly so. One thing that springs to mind is that I didn’t actually issue letter grades in this exam. Perhaps I should have done in order to give students a better idea of what I thought about their performance.
Reasons and effects
The second thing that I wanted to know was why students received their scores and how students might respond to their results. I devised three questions for each exam which tried to get to the amount of preparation had done for the exam and the amount of effort they expended generally. With the responses included, these were:
- I studied more than one hour for the WRITTEN exam (mean=3.33, sd=1.31)
- I used Anki (or another similar app) often this semester (mean=2.56, sd=1.08)
- I take careful notes of new language from the board (mean 3.69, sd=1.01)
- I practiced more than one hour for the SPEAKING exam (mean=3.33, sd=1.16)
- I have done English Cafe 3 times or more this semester (mean=3.56, sd=1.48)
- I try to practise English speaking outside of class or English Cafe (mean=3.35, sd=1.05)
The first three items generally relate to the written exam and recommended behaviours. The second three relate to speaking exam and out of class practice (English cafe is the optional conversation slots that students can sign up for with teachers). The first set suggest that students didn’t prepare a great deal for their written exam, either in the period immediately before it or during the half-semester using the spaced repetition system that I recommended. The response to the third questions rings true to what I see in class, namely that notes are diligently taken, and the scores further my suspicions that these notes are then promptly forgotten as soon as students get out of the door. From these results, I clearly need to think more about how to get students to maintain the language that we encounter in class, but that’s for another post.
For the speaking exam, again preparation is reasonably low, though interpreting these results more than half of the students spent more than an hour on it, and a surprising number claim that they try to practise English outside of class. I’d be interested to know what form this practice takes.
To survey the effects of the exam, I asked some similar questions:
- Because of my WRITTEN exam score, I will try to use Anki (or another similar app) (mean=3.28, sd=1.04)
- Because of my WRITTEN exam score, I will try to take better notes in class (mean=4.10, sd=0.70)
- Because of my SPEAKING exam score, I will try to practise speaking more outside class (mean=4.05, sd=0.85)
These results for the written exam are rather interesting, in that it’s the behaviour that students already consider that they do well that they also consider needs improving (though I suppose that I can’t rule out the chance that it was the students who responded negatively to the previous question about note taking who are responding positively to this one). What I am potentially getting into here is the difficulty of changing ingrained practices – a lack of genuine engagement with language and perhaps an over-reliance on cramming rather than long-term learning. This was what I’d hoped to combat by using the app, as well as allowing myself to do a lot more work with lexis. Here, however, the student choice seems to be for the path of least resistance. While there is a chance that the app I recommended simply doesn’t fit well with the students, I see this as indicative of an underlying culture of shallow and temporary learning that I would like to do my best to change.
Was it a fair test?
Perhaps the main motivation for writing my last post was the fear that as an examiner I was letting my students down, and causing their scores to be lower than they actually deserved. I’d hate for the trust I have built up with these groups to be damaged by a poorly written exam. The following questions were , therefore, an attempt to see how students evaluated the exam and their own performance.
- I thought that the WRITTEN exam was a fair test of class content (mean=4.27, sd=0.86)
- If I had studied harder, I could have got a higher score on the WRITTEN exam. (mean=4.46, sd=0.75)
- I thought that the SPEAKING exam was a fair test of class content (mean=4.42, sd=0.82)
These are a pleasing set of results for my peace of mind. These are some of the highest mean scores, so at least in the students’ minds (much more so than mine) I am a fair/competent examiner. The second question also shows that they tend to attribute their low scores to their own effort rather than deficiencies in the exam. This might be reflective of a less critical view of exams, however. For each exam, only two students disagreed that it was a fair test. Still, the largely positive response suggests that I haven’t irreparably damaged my relationship with the group. This in no way excuses me from making improvements though.
Improvements for future exams
Finally, I wanted to know how students thought that I could improve the exam. I also wanted their view on my idea that the exam could feature slightly extended writing pieces in order to get away from the kind of half-open questions that plagued this exam.
- I would prefer more extended writing/communciation in the WRITTEN exam, and less vocabulary and grammar questions. (mean=4.26, sd=1.06)
While there’s a bit of variation in answers here, students seem to be more positive than negative about this. I’m undergoing a bit of a shift in thinking about writing at the moment anyway, and trying to include a few more writing assignments in class, so my next exam could/should include a writing section.
Finally I included an open field for students to suggest improvements to the written and spoken exams. Suggestions included less grammar (funny as there really wasn’t much – my students perhaps view grammar differently to how I do), and there were comments that the listening section was too heavily weighted (which I might agree with) and that the questions started very suddenly (an easy fix). One student picked up on the fact that the written questions were too open, and another claimed that he couldn’t see the pictures well.
Speaking-wise there wasn’t much of interest except for a request to see the time, which I will definitely try to organize for the next exam.
Reflection on Reflection
All in all I’m reasonably happy with the way that this went. I learned a lot from it, and I hope it also gave the students a sense of agency in deciding how they are examined. I also hope that doing the survey helped students to reflect on their own behaviour, attribute their successes and failures to the right reasons and hopefully do something differently next time. As for what I might do differently again, the one change that springs to mind is to try to collect feedback with names – it would be very interesting to see how responses correlated with actual exam scores, and also to do this for individual classes rather than all of my students as a group.
Thanks very much for reading if you got this far. If you’d like to try this yourself, please feel free to use the Google Form linked above for your own investigations, and if there’s anything you’d like to chat about please do leave a comment below. If you do try something like this, I’d be very keen to know how it turned out.