The California-based Smarter Balanced Assessment Consortium is a member-led public organization that provides assessment systems to educators working in K-12 and higher education. The organization, which was founded in 2010, partners with state education agencies to develop innovative, standards-aligned test assessment systems. Smarter Balanced supports educators with tools, lessons and resources including formative, interim and summative assessments, which help educators to identify learning opportunities and strengthen student learning.

Smarter Balanced is committed to evolution and innovation in an ever-changing educational landscape. Through a collaboration with IBM Consulting®, it aims to explore a principled approach for the use of artificial intelligence (AI) in educational assessments. The collaboration was announced in early 2024 and is ongoing.

Defining the challenge

Traditional skills assessments for K-12 students, including standardized tests and structured quizzes, are criticized for various reasons related to equity. If implemented responsibly, AI has the transformative potential to offer personalized learning and evaluation experiences to enhance fairness in assessments across student populations that include marginalized groups. Thus, the central challenge is to define what responsible implementation and governance of AI looks like in a school setting.

As a first step, Smarter Balanced and IBM Consulting created a multidisciplinary advisory panel that includes experts in educational measurement, artificial intelligence, AI ethics and policy, and educators. The panel’s goal is to develop guiding principles for embedding accuracy and fairness into the use of AI for educational measurement and learning resources. Some of the advisory panel’s considerations are outlined below.

Leading with human-centered design

Using design thinking frameworks helps organizations craft a human-centric approach to technology implementation. Three human-centered principles guide design thinking: a focus on user outcomes, restless reinvention and empowerment of diverse teams. This framework helps ensure that stakeholders are strategically aligned and responsive to functional and non-functional organizational governance requirements. Design thinking enables developers and stakeholders to deeply understand user needs, ideate innovative solutions and prototype iteratively.

This methodology is invaluable in identifying and assessing risks early in the development process, and facilitating the creation of AI models that are trustworthy and effective. By continuously engaging with diverse communities of domain experts and other stakeholders and incorporating their feedback, design thinking helps build AI solutions that are technologically sound, socially responsible and human-centered.

Incorporating diversity

For the Smarter Balanced project, the combined teams established a think tank that included a diverse set of subject-matter experts and thought leaders. This group comprised experts in the fields of educational assessment and law, neurodivergent people, students, people with accessibility challenges and others.

“The Smarter Balanced AI think tank is about ensuring that AI is trustworthy and responsible and that our AI enhances learning experiences for students,” said think tank member Charlotte Dungan, Program Architect of AI Bootcamps for the Mark Cuban Foundation.

The goal of the think tank is not to simply incorporate its members’ expertise, viewpoints and lived experiences into the governance framework in a “one-and-done” way, but iteratively. The approach mirrors a key principle of AI ethics at IBM: the purpose of AI is to augment human intelligence, not replace it. Systems that incorporate ongoing input, evaluation and review by diverse stakeholders can better foster trust and promote equitable outcomes, ultimately creating a more inclusive and effective educational environment.

These systems are crucial for creating fair and effective educational assessments in grade school settings. Diverse teams bring a wide array of perspectives, experiences and cultural insights essential to developing AI models that are representative of all students. This inclusivity helps to minimize bias and build AI systems that do not inadvertently perpetuate inequalities or overlook the unique needs of different demographic groups. This reflects another key principle of AI ethics at IBM: the importance of diversity in AI isn’t opinion, it’s math.

Exploring student-centered values

One of the first efforts that Smarter Balanced and IBM Consulting undertook as a group was to ascertain the human values that we want to see reflected in AI models. This is not a new ethical question, and thus we landed on a set of values and definitions that map to IBM’s AI pillars, or fundamental properties for trustworthy AI:

  • Explainability: Having functions and outcomes that can be explained non-technically
  • Fairness: Treating people equitably
  • Robustness: Security and reliability, resistance to adversarial attacks
  • Transparency: Disclosure of AI usage, functionality and data use
  • Data Privacy: Disclosure and safeguarding of users’ privacy and data rights

Operationalizing these values in any organization is a challenge. In an organization that assesses students’ skill sets, the bar is even higher. But the potential benefits of AI make this work worthwhile: “With generative AI, we have an opportunity to engage students better, assess them accurately with timely and actionable feedback, and build in 21st-century skills that are actively enhanced with AI tools, including creativity, critical thinking, communication strategies, social-emotional learning and growth mindset,” said Dungan. The next step, now underway, is to explore and define the values that will guide the use of AI in assessing children and young learners.

Questions the teams are grappling with include:

  • What values-driven guardrails are necessary to foster these skills responsibly?
  • How will they be operationalized and governed, and who should be responsible?
  • What instructions do we give to practitioners building these models?
  • What functional and non-functional requirements are necessary, and at what level of strength?

Exploring layers of effect and disparate impact

For this exercise, we undertook a design thinking framework called Layers of Effect, one of several frameworks IBM® Design for AI has donated to the open source community Design Ethically. The Layers of Effect framework asks stakeholders to consider primary, secondary and tertiary effects of their products or experiences.

  • Primary effects describe the intended, known effects of the product, in this case an AI model. For example, a social media platform’s primary effect might be to connect users around similar interests.
  • Secondary effects are less intentional but can quickly become relevant to stakeholders. Sticking with the social media example, a secondary effect might be the platform’s value to advertisers.
  • Tertiary effects are unintended or unforeseen effects that become apparent over time, such as a social media platform’s tendency to reward enraging posts or falsehoods with higher views.

For this use case, the primary (desired) effect of the AI-enhanced test assessment system is a more equitable, representative and effective tool that improves learning outcomes across the educational system.

The secondary effects might include boosting efficiencies and gathering relevant data to help with better resource allocation where it is most needed.

Tertiary effects are possibly known and unintended. This is where stakeholders must explore what potential unintended harm might look like.

The teams identified five categories of potential high-level harm:

  • Harmful bias considerations that do not account for or support students from vulnerable populations that may need extra resources and perspectives to support their diverse needs.
  • Issues related to cybersecurity and personally identifiable information (PII) in school systems that do not have adequate procedures in place for their devices and networks.
  • Lack of governance and guardrails that ensure AI models continue to behave in intended ways.
  • Lack of appropriate communications to parents, students, teachers and administrative staff around the intended use of AI systems in schools. These communications should describe protections against inappropriate use, and agency, such as how to opt out.
  • Limited off-campus connectivity that might reduce access to technology and the subsequent use of AI, particularly in rural areas.

Initially applied in legal cases, disparate impact assessments help organizations identify potential biases. These assessments explore how seemingly neutral policies and practices can disproportionately affect individuals from protected classes, such as those susceptible to discrimination based on race, religion, gender and other characteristics. Such assessments have proven effective in the development of policies related to hiring, lending and healthcare. In our education use case, we sought to consider cohorts of students who might experience inequitable outcomes from assessments due to their circumstances.

The groups identified as most susceptible to potential harm included:

  • Those who struggle with mental health
  • Those who come from more varied socioeconomic backgrounds, including those who are not housed
  • Those whose dominant language is not English
  • Those with other non-language cultural considerations
  • Those who are neurodivergent or have accessibility issues

As a collective, our next set of exercises is to use more design thinking frameworks such as ethical hacking to explore how to mitigate these harms. We will also detail minimum requirements for organizations seeking to use AI in student assessments.

In conclusion

This is a bigger conversation than just IBM and Smarter Balanced. We are publicly publishing our process because we believe those experimenting with new uses for AI should consider the unintended effects of their models. We want to help ensure that AI models that are being built for education are serving the needs not just of a few, but for society in its entirety, with all its diversity.

“We see this as an opportunity to use a principled approach and develop student-centered values that will help the educational measurement community adopt trustworthy AI. By detailing the process that is being used by this initiative, we hope to help organizations that are considering AI-powered educational assessments have better, more granular conversations about the use of responsible AI in educational measurement.” 

— Rochelle Michel, Deputy Executive Program Officer, Smarter Balanced.

Learn more about IBM Design for AI Discover how to apply design thinking practices to AI ethics challenges
Was this article helpful?
YesNo

More from Artificial intelligence

Tools for trustworthy AI

5 min read - A new tool has been developed to catch students cheating with ChatGPT. It’s 99.9% effective. But OpenAI hasn’t released it because it’s mired in ethics concerns. It’s just one example of one of the major challenges facing AI. How can we monitor the technology to make sure it’s used ethically? For the past few years, the biggest names in AI have pushed for their tech to be used responsibly. And using AI ethically isn’t just the right thing for businesses…

When AI chatbots break bad

3 min read - A new challenge has emerged in the rapidly evolving world of artificial intelligence. "AI whisperers" are probing the boundaries of AI ethics by convincing well-behaved chatbots to break their own rules. Known as prompt injections or "jailbreaks," these exploits expose vulnerabilities in AI systems and raise concerns about their security. Microsoft recently made waves with its "Skeleton Key" technique, a multi-step process designed to circumvent an AI's ethical guardrails. But this approach isn't as novel as it might seem. "Skeleton…

Time series models: The quiet revolution in AI forecasting

2 min read - Large language models may dominate headlines, but a different class of AI could change how businesses predict the future. Compact and efficient time series models are transforming forecasting across industries. IBM's TinyTimeMixer (TTM) exemplifies this trend. With fewer than one million parameters, TTM delivers robust predictions without the computational demands of its larger counterparts. "Forecasting can be a powerful tool when applied correctly,” IBM Technical Strategist Joshua Noble explains. “The ability to predict demand, revenue, costs, device failure or market…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters