How do we find the right level for the assessments we set?

Image-1One of the things that I am currently exploring is ‘how can we ensure that assessment occurs at the correct level?’. This is a topic that has cropped up a number of times recently in my work with both internal and external colleagues. Although I am interested in the national and institutional issues relating to standards setting, here I am only considering the micro-level i.e. that which is within the control of a module/course tutor (or equivalent).

It has been clear for some time that more focus on research in to the educator experience of assessment is needed (see for example: Bearman et al., 2016; Norton, Floyd, & Norton, 2019)⁠, so it is perhaps not surprising that there is very little written on how academics can calibrate levels. Sally Brown’s paper has been my most useful find. In her exploration of post-graduate expectations Brown (2014)⁠ reported that interviewees distinguished this level of study from others by such things as: the demand of the learning outcomes; the depth of skill required; the level of application evident; the scale and scope of the coverage; and the level of autonomy required. Other participants in this study identified that they found little difference of level between postgrad and undergrad studies. Ultimately this paper showed the ‘fuzziness’ around defining level.

It seems the laboured accuracy of the many national and even institutional standards documents does not always cascade into practice. I would speculate that this may be because of the great number of factors that must be considered in designing assessment leading to a displacement of an educator’s focus on level (a case of too many things to consider, perhaps). It may be because of thacademic language used to define standards (do we have a common understanding of ‘critical thinking’?). It may be due to a lack of training and support in relation to standard setting. It could be that levelness, in practice, is defined by the situated academic community with only loose links to standards frameworks which get a little lost in translation. There may of course be other things going on. Research would help move this speculation in to something clearer.

All of this really matters from a quality and standards perspective, but it may also be important from a staff well-being perspective. Myyry et al. (2019)⁠ highlight the role of emotion in the assessment process; they demonstrate the presence of anxiety around making judgments in the Danish context. Citing the work of Nevgi and Lofstrom, they state that “one of the most stressful of academics’ roles is to be a gatekeeper of academic standards” (Myyry et al., 2019, p.3).⁠ Presumably if there is a confusion around those standards, then the sense of anxiety would be compounded

Learning outcomes are almost ubiquitous in HE. Often underpinned by Blooms taxonomy and other knowledge hierarchies, they can of course give a strong indication of what assessment needs to achieve, they can indicate levelness. However, so often I hear ‘is it appropriate to describe at level seven?’ ‘how can a postgrad course have outcomes which start with ‘identify’?’ or ‘can level four students be expected to create or evaluate?’. When using learning outcomes to help set levels we need to look at the bigger programme picture and not be slaves to the active verbs. We need to ask broadly, what do we want students to be able to do? At level seven this may involve some description of a new topic, ahead of some deeper analysis or more specialised research. At level four we may want to introduce some evaluation, but with a different expectation of the depth of performance compared to a final year evaluation task. Learning outcomes can be enormously helpful in setting levels, but alone they will not suffice. 

For now, I am collating practical advice on how level can be established and understood with greater confidence during the process of assessment development. Here’s my list of ten ideas, but I would very much welcome more suggestions …

  1. Cross-team review events (calibration between disciplines)By working with others from different teams to explore the features of proposed assessment designs during the development phase (e.g. Engineers colleagues with Business colleagues), there is an opportunity to see past the content and to step back and look at the level of demand. Conversations where assessment is discussed with colleagues less familiar with the topic material can require a focus on first principles regarding what the assessment is actually seeking to test. This is a process of making the familiar strange. 
  2. Teams for assessment(don’t work alone) Wherever possible work in a team to develop assessments, allowing mutual challenge, and the opportunity to refine designs through discussion. It is especially important that each member of the team is open to constructive dialogue, and that all voices are heard. The avoidance of lone working around assessment may also alleviate some of the anxiety previously mentioned (Myyry et al., 2019)⁠.
  3. Use original source frameworksTo gain confidence in working with different levels of assessment, it is helpful, if not essential, to be familiar with the standards that are used to measure level, whether this is the FHEQ or another national standard. It is also constructive to maintain familiarity with institutional level descriptors, graduate outcomes or other equivalent documentation. Relying on colleagues alone for the information on levelness can lead to inflation or depression of standards. If these documents are not used often they can be easily forgotten; a quick re-familiarisation when designing or redesigning an assessment can help.
  4. Get involved in external examiningWorking cross-institutionally wherever possible can expose academics to the interpretation of different levelin other institutions. This can be affirming or in some cases it may result in changes being made back at the home institution. As an extension of this, any opportunity for external networking or professional development can also assist in calibrating with the rest of the sector.
  5. Engage new staffNew staff can bring fresh perspectives to any teaching and assessment challenge. They bring a fresh lens, with an outlook which has not been acculturated in to the institution’s existing ways of working. Where newcomers are recent graduates themselves, they maybe able to calibrate to assessment designs with their own educational experience. In itself this is not enough of a basis for standard setting but it can surface issues where there is not a sense of similarity between what is being asked and what the new staff member has previously encountered. Second, new staff can be very keen to learn how to mark and to pitch assessment at the right level. They are, in my experience, often willing to use original source documents in an attentive manner to check that they have it right. They do not take for granted the tacit knowledge that an experienced colleague may have and so can be very thought provoking in a discussion about assessment, by asking clarifying questions.
  6. Review how an assessment wentEach time an assessment is used within the academic cycle it is important to get feedback from students about how they found it. Feedback from students about which assessments they find easier and which more difficult, with an exploration of why this is the case, can shine a light on issues of levelness and provide a trigger to review specific aspects. While the external comments can be incredibly helpful, students themselves are likely to offer helpful comment too.
  7. Analysis of questions (particularly for exams)For some assessments, such as multiple choice exams, it may be possible and appropriate to systematically analyse the performance of students in different questions to test the level of discrimination, as a proxy for demand. When doing this care is needed though to ensure that the proxy is not to other factors such as a student’s prior study experiences, social advantage or cultural background. Through item analysis tutors can identify which questions are most discriminating and which are least challenging. Future exams may be changed in light of this information and tutors can develop a sense of question difficulty. Ultimately item analysis is not an objective test against the external standards, so care needs to be taken that any actions arising from looking at the performance of a cohort are also informed by level descriptors.
  8. Explore the demands of surrounding levels It can be helpful for both staff and students to be able to clearly define what else ‘extra’ is needed as students progress through the levels. So for example what do students moving out of first year need to do to maintain or improve, their grades at the next level of study. Internal communications within course teams can help define this in ways that would be relevant to a specific course or discipline, but for anyone teaching at level four it may also be important to ask those questions of level three courses. It’s easy to lose touch with standards in schools and colleges and rely on hearsay, family, friends and neighbours, but developing an appreciation of what is really happening before students enrol can again be a form of calibration.
  9. Review standards across timeIn one particular module I have close sight of, but do not teach, I have noticed a significant increase in the quality of work and the performance of students. This is mainly due to the use of exemplars and iteratively developed assessment support. As a slight outsider, I was able to see the change in the module standards over time. A new colleague coming in to the associated teaching team may see this very high standard as the norm, rather than as something which exceeds the level requirements. There could be a risk of level inflation. Taking more of a longitudinal view of the level of assessment on a programme can help ensure that there is not standards creep in staff expectations (that is creep either up or down). Looking only at marks will not tell the whole story; putting past student submissions next to more recent ones will facilitate a productive discussion about level.
  10. Consider a marking rubricIMG_0007Marking rubrics have gained traction in recent years as a helpful tool for educators and students alike. Well developed assessment criteria in the form of a rubric can guide student and staff level expectations (for a fuller exploration of different types of rubrics see Dawson (2017)).
  11. From a tutor perspective, the act of explicitly articulating detailed requirements for student performance at a specific level of study can help with the articulation of standards. It can turn broad, fuzzy standards in to something more tangible in context. It is essential that the rubric is formed in conjunction with standards documents (or a good knowledge of these). A rubric formed on intuition alone will do nothing to bring about level confidence; indeed it could continue to reinforce level misalignment.
  12. It is important to flag that there are many questions around the practicalities of rubrics, such as what makes a good rubric?And how can staff and students be supported to make best use of rubrics? (see for example Chan & Ho, 2019; Kennard & Arnold, 2016)⁠.There are also some bigger critical questions about their place in higher education, for example regarding their potential to work against an open ended learning process (see Carson, 2019)⁠. I have added rubrics here as a pragmatic tool, but with a strong encouragement for colleagues to explore the literature and to engage in thinking about their limitations as well as their potential to help.

 

Bearman, M., Dawson, P., Dawson, P., Bennett, S., Hall, M., & Molloy, E. (2016). Support for assessment practice: developing the Assessment Design Decisions Framework. Teaching in Higher Education,21(5), 545–556. https://doi.org/10.1080/13562517.2016.1160217

Brown, S. (2014). What are the perceived differences between assessing at Masters level and undergraduate level assessment? Some findings from an NTFS-funded project. Innovations in Education and Teaching International, 51(3), 265–276. https://doi.org/10.1080/14703297.2013.796713

Carson, J. T. (2019). Blueprints of distress?: Why quality assurance frameworks and disciplinary education cannot sustain a 21st-century education. Teaching in Higher Education, 1–10. https://doi.org/10.1080/13562517.2019.1602762

Chan, Z., & Ho, S. (2019). Good and bad practices in rubrics: the perspectives of students and educators. Assessment & Evaluation in Higher Education, 44(4), 533–545. https://doi.org/10.1080/02602938.2018.1522528

Dawson, P. (2017). Assessment rubrics: towards clearer and more replicable design, research and practice. Assessment and Evaluation in Higher Education, 42(3), 347–360. https://doi.org/10.1080/02602938.2015.1111294

Kennard, C., & Arnold, L. (2016). Staff and student experiences of electronic marking and feedback. International Journal of Assessment and Evaluation, 23(3).

Myyry, L., Karaharju-Suvanto, T., Vesalainen, M., Virtala, A.-M., Raekallio, M., Salminen, O., … Nevgi, A. (2019). Experienced academics’ emotions related to assessment. Assessment & Evaluation in Higher Education, 1–13. https://doi.org/10.1080/02602938.2019.1601158

Norton, L., Floyd, S., & Norton, B. (2019). Lecturers’ views of assessment design, marking and feedback in higher education: a case for professionalisation? Assessment & Evaluation in Higher Education, 1–13. https://doi.org/10.1080/02602938.2019.1592110

Advertisements

2 comments

  1. Another thoughtful article Lydia. I know I’m only responding to one aspect of your post – writing level appropriate outcomes – but I am reminded of a super Jisc project we were involved in some years ago. The project was led by the University of Gloucestershire and we did the techie stuff. It was only ever meant to be an exploratory project but I think that what we created was really useful – and may still be useful (though maybe a little out-of-date).

    Essentially the tool we built provides a framework of commonly used verbs and validated examples of them in use across a range of academic levels (3-8) from FHEQ, SEEC, EQF etc.

    The idea was that someone could look for an example of e.g. Analyse at level 7, and be presented with a number of pre-validated outcomes which could be used to help someone write a new outcome or check the levelness of an existing outcome. We still maintain the tool for people to play with. It’s here https://cogent.pebblepad.co.uk/

    The best person to chat with about its inception is Phil Gravestock, now at University of Wolverhampton.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s