Among the most important roles of instructors in higher education is the task of certifying that each individual learner in their course has achieved a particular standard in relation to the intended outcomes of the course and that achievement is both a valid and reliable measurement of the learner's true ability. The importance of this determination is a reflection of how student achievement data are used in not only summative course assessments, but also predicting future success, awarding scholarships, and determining acceptance into competitive programs (Baird, Andrich, Hopfenbeck, & Stobart, 2017; Guskey & Link, 2019). In order for this accounting for learning to accurately reflect the goals of any given course, it is necessary to ensure that the assessment strategy be aligned with course learning outcomes (Biggs & Tang, 2011). However, Broadfoot (2016) and Pellegrino and Quellmalz (2010) argue that the goals and intended outcomes for higher education have changed as society has become more saturated with digital technologies. These changes have precipitated parallel shifts in both the cognitive and affective competencies required of digitally literate citizens. Predictably, this has led to an increasing gap between traditional assessment structures which prioritize validity, reliability, fairness, and objectivity through psychometric analyses and modern goals of higher education prioritizing more affective constructs such as cooperation, empathy, creativity, and inquiry.
Complicating this problem is the trend, accelerated by SARS-Cov-2 and COVID-19, towards the use of digital technologies to create, administer, and score assessments. Typically, digital technologies are used to increase the efficiency and objectivity of test administration (Benjamin, 2019) using automated scoring on selected-response tests, reinforcing traditional assessment structures. However, as Shute et al. (2016) argue, digital technologies could be used to drive innovations in assessment practice while balancing the need for both quantitative and qualitative approaches to assessment.
This gap between outcomes and assessments can also be considered in light of the model of assessment described by the National Research Council in the United States (2001). The authors describe the assessment triangle with three interdependent components:
If any one of these components is lacking, the quality of the assessment process will be compromised. Accordingly, both the observation and the inference must align with the cognition component of the model. If, according to Timmis et al. (2016), the cognition component has changed with the increase in the use of digital technologies in society and in higher education, then it follows that the ways in which we observe learner competence and infer a summative grade from the observation must also change.
Assessment practices in higher education tend to focus on summative, selected-response instruments designed to maximize efficiency in administration and objectivity in scoring (Lipnevich, Guskey, Murano, & Smith, 2020), both of which are beneficial to instructors and learners. While selected-response tests have notable benefits for learners, instructors, and institutions including presumed greater objectivity in scoring, the potential for robust psychometric qualities, and more defensible grading practices, there are also downsides, including learner anxiety, the tendency of learners to cram for exams, misalignment between learning outcomes and assessments, and 'teaching to the test' (Broadfoot, 2016; Gerritsen-van Leeuwenkamp, Joosten-ten Brinke, & Kester, 2017). While these classroom assessments are designed to mimic the psychometric rigour of large-scale assessments, such as the Scholastic Aptitude Test (SAT), it appears that they may lack evidence of validity and reliability (Broadfoot, 2016; Lipnevich et al., 2020). Researchers have called for instructors to reconsider their assessment practices (Timmis et al., 2016) as they recognize that the aims of higher education in the 21st century have shifted from a predominantly top-down transmission model of requiring graduates to demonstrate knowledge mastery in a cognitive domain, to a model that demands graduates demonstrate skills and attitudes in non-cognitive domains such as cooperation, problem-solving, creativity, and empathy, assessment of which selected-response instruments are ill-suited (Broadfoot, 2016). Encouraging a broader range of assessment structures will require paradigm shifts in all of pedagogy, assessment, and use of technology that centre a relational, human centred approach to transformative learning experiences for learners (Black & Wiliam, 1998). Understanding how instructors think about and implement assessment structures and also how those assessment structures impact learners can help stakeholders plan for assessment in the 21st century.
Psychometricians consider the validity and reliability of measurement instruments to be critical to their usefulness in supporting inferences about what examinees know or can do in relation to particular constructs (Finch & French, 2019). Large-scale assessments (LSA) such as the SAT, Graduate Record Examination (GRE), and other tests are used to make admissions decisions in colleges and universities (The College Board, 2015), while licensing examinations such as the National Council Licensure Examination (NCLEX) for nurses in the United States and Canada are used to ensure that graduates of nursing programs have the requisite knowledge and skills to be licensed as a Registered Nurse (Wendt & Alexander, 2007). These LSAs undergo thorough vetting and rigourous statistical analysis to ensure that they are valid and reliable predictors of success in college or in the nursing profession. LSAs are necessarily context independent, meaning that they are intended to provide useful information about an examinee regardless of who they are or where and when they complete the assessment (Baird et al., 2017). The consequences if these examinations were to be shown to be invalid or unreliable would be severe, thus undermining both admissions decisions for entry into higher education and licensure decisions in the nursing profession, where lives are genuinely at stake. These are only two of many LSAs used as gatekeepers into programs or professions.
Black and Wiliam (1998) and Guskey and Link (2019) report that classroom teachers, when given the opportunity, attempt to emulate these summative LSAs by building their own or using publisher-created assessment items and instruments. Unfortunately, these instructor- or publisher-created instruments have not been vetted through the same degree of psychometric analyses or refinement as typical LSAs, meaning that much of the assessment practice in higher education may be based on untested assumptions of validity and reliability. Further compounding this problem is that classroom-based summative assessments are used in much the same way as LSAs in that they serve as gatekeepers for learners hoping to progress through a degree program in a timely manner.
The use of technology in higher education has been particularly noticeable in assessment practices with instructors relying on the ease of administration and efficiency of scoring selected-response tests to determine learners' grades (Broadfoot, 2016), but it has been slower to lead to significant innovation in how technology might afford novel assessment structures (Pellegrino & Quellmalz, 2010). This does not mean, however, that there are not affordances of technology that may empower new ways of thinking about how grading decisions are made. For example, the use of learner blogs may lead to more opportunities for metacognitive reflection or self and peer formative assessment. Researchers caution, however that the increased use of technology in assessment will require careful thought about the ethical use of data, especially as surveillance tools have begun to proliferate in the field (Oldfield, Broadfoot, Sutherland, & Timmis, n.d.).
Based on the purposes and questions noted above, preliminary thoughts about methods include the likelihood of a mixed-method approach including surveying instructors and learners in higher education to learn about their views on assessment in higher education and including opportunities for open-ended responses to gather further context. Following analysis of the quantitative data, follow-up interviews with select instructors and learners may be conducted to explore beliefs and experiences more deeply.
The pivot to remote teaching in the spring of 2020 due to the COVID-19 pandemic created conditions that led many instructors to reconsider the design and structure of their courses, including how they assess learners. Their sudden reliance on technology to administer exams revealed significant gaps in what had become traditional modes of assessment. While there will almost certainly be some post-pandemic reversion to the norm, this presents an opportunity to explore the topic of assessment with both instructors and learners. Understanding both how instructors think about assessment and how learners are impacted by assessment decisions will be critical to informing assessment practices and policies as higher education emerges from the pandemic and moves forward into the 21st century.
I am in my third year (part-time) of my Ph.D. program in the Educational Technology area in the Department of Curriculum and Instruction at the University of Victoria and a graduate research affiliate of the Technology Integration and Evaluation (TIE) Research Lab. My coursework has included Advanced Research Methods, Education Action Research (UBC), and Test Theory (UAlberta). My program supervisor is Dr. Valerie Irvine, Co-director of the Technology Integration and Evaluation Lab at the University of Victoria. In addition, I am Manager of Online Learning and Instructional Technology in a different Western Canadian university where I support faculty in designing and deploying transformative online learning experiences that focus of rich, communities of Inquiry. I am also a member of the board of the Open/Technology in Education, Society, and Scholarship Association (OTESSA), which is a member association of the Federation for the Humanities and Social Sciences and a participating member of the annual Congress of the Humanities and Social Sciences.
Baird, J.-A., Andrich, D., Hopfenbeck, T. N., & Stobart, G. (2017). Assessment and learning: Fields apart? Assessment in Education: Principles, Policy & Practice, 24(3), 317–350. https://doi.org/10/gf3brt
Benjamin, R. (2019). Race after technology: Abolitionist tools for the new Jim Code. Medford, MA: Polity.
Biggs, J., & Tang, C. (2011). Teaching for quality learning at university: What the student does (4th ed.). New York: Society for Research into Higher Education & Open University Press.
Black, P., & Wiliam, D. (1998). Assessment and Classroom Learning. Assessment in Education: Principles, Policy & Practice, 5(1), 7–74. https://doi.org/10/fpnss4
Broadfoot, P. (2016). Assessment for Twenty-First-Century Learning: The Challenges Ahead. In M. J. Spector, B. B. Lockee, & M. D. Childress (Eds.), Learning, Design, and Technology (pp. 1–23). https://doi.org/10.1007/978-3-319-17727-4_64-1
Council, N. R. (2001). Knowing What Students Know: The Science and Design of Educational Assessment. https://doi.org/10.17226/10019
Finch, W. H., & French, B. F. (2019). Educational and psychological measurement. New York, NY: Routledge.
Gerritsen-van Leeuwenkamp, K. J., Joosten-ten Brinke, D., & Kester, L. (2017). Assessment quality in tertiary education: An integrative literature review. Studies in Educational Evaluation, 55, 94–116. https://doi.org/10/ghjbhx
Guskey, T. R., & Link, L. J. (2019). Exploring the factors teachers consider in determining students' grades. Assessment in Education: Principles, Policy & Practice, 26(3), 303–320. https://doi.org/10/ghg8j7
Lipnevich, A. A., Guskey, T. R., Murano, D. M., & Smith, J. K. (2020). What do grades mean? Variation in grading criteria in American college and university courses. Assessment in Education: Principles, Policy & Practice, 27(5), 480–500. https://doi.org/10/ghjw3k
Oldfield, A., Broadfoot, P., Sutherland, R., & Timmis, S. (n.d.). Assessment in a Digital Age: A research review. Retrieved from Graduate School of Education, University of Bristol website: https://www.bristol.ac.uk/media-library/sites/education/documents/researchreview.pdf
Pellegrino, J. W., & Quellmalz, E. S. (2010). Perspectives on the Integration of Technology and Assessment. Journal of Research on Technology in Education, 43(2), 119–134. https://doi.org/10/ggfh8z
Shute, V. J., Leighton, J. P., Jang, E. E., & Chu, M.-W. (2016). Advances in the Science of Assessment. Educational Assessment, 21(1), 34–59. https://doi.org/10/gfgtrs
The College Board. (2015). Test Specifications for the Redesigned SAT (p. 210).
Timmis, S., Broadfoot, P., Sutherland, R., & Oldfield, A. (2016). Rethinking assessment in a digital age: Opportunities, challenges and risks. British Educational Research Journal, 42(3), 454–476. https://doi.org/10/gftz95
Wendt, A., & Alexander, M. (2007). Toward a Standardized and Evidence-Based Continued Competence Assessment for Registered Nurses: JONA's Healthcare Law, Ethics, and Regulation, 9(3), 74–86. https://doi.org/10/dhcs35