New toolkit aims to help teams create responsible human-AI experiences

Microsoft has released the Human-AI eXperience (HAX) Toolkit, a set of practical tools to help teams strategically create and responsibly implement best practices when creating artificial intelligence technologies that interact with people.

The toolkit comes as AI-infused products and services, such as virtual assistants, route planners, autocomplete, recommendations and reminders, are becoming increasingly popular and useful for many people. But these applications have the potential to do things that aren’t helpful, like misunderstand a voice command or misinterpret an image. In some cases, AI systems can demonstrate disruptive behaviors or even cause harm.

Such negative outcomes are one reason AI developers have pushed for responsible AI guidance. Supporting responsible practices has traditionally focused on improving algorithms and models, but there is a critical need to also make responsible AI resources accessible to the practitioners who design the applications people use. The HAX Toolkit provides practical tools that translate human-AI interaction knowledge into actionable guidance.

“Human-centeredness is really all about ensuring that what we build and how we build it begins and ends with people in mind,” said Saleema Amershi, senior principal research manager at Microsoft Research. “We started the HAX Toolkit to help AI creators take this approach when building AI technologies.”

The toolkit currently consists of four components designed to assist teams throughout the user design process from planning to testing:

The Guidelines for Human-AI Interaction provide best practices for how AI applications should interact with people.
The HAX Workbook helps teams prioritize guidelines and plan the time and resources needed to address high-priority items.
The HAX Design Patterns offer flexible solutions for addressing common problems that come up when designing human-AI systems. The HAX Design Library is a searchable database of the design patterns and implementation examples.
Teams can utilize the HAX Playbook to identify and plan for unforeseen errors, such as a transcription error or false positive.

Humans collaborating to build better AI

The idea for the HAX Toolkit evolved from the set of 18 guidelines, which are based on more than 20 years of research and initially published in a 2019 CHI paper. As the team began sharing those guidelines, they learned that additional tools could help multidisciplinary teams plan for and implement AI systems that aligned with the principles reflected in the guidelines.

Saleema Amershi is a senior principal research manager at Microsoft Research.

The workbook came about because many teams were not exactly sure how to bring the guidelines into their workflow early enough to have a real impact. It is intended to bring clarity when teams “have an idea about a feature or product but have not defined it completely,” said Mihaela Vorvoreanu, director of UX Research and Responsible AI (RAI) Education for Microsoft’s AI Ethics and Effects in Engineering and Research (Aether) Committee, which collaborated with Microsoft Research to create the toolkit.

Most importantly, Vovoreanu said, the workbook should be used by a multidisciplinary team, including data scientists, engineers, product teams, designers and others who will work on a project.

“You need to be together to have this conversation,” said Vorvoreanu, who along with Amershi leads the HAX Toolkit project. “The workbook gives you a common vocabulary for different disciplines to talk to each other so you’re able to communicate, collaborate and create things together.”

Mihaela Vorvoreanu is the director of UX Research and Responsible AI Education for Aether.

That was certainly true for Priscila Angulo Lopez, a senior data scientist on Microsoft’s Enterprise and Security Data and Intelligence team, who said, “The workbook was the only session where all the disciplines got together to discuss the machine learning model. It gave us a single framework for vocabulary to discuss these problems. It was one of the best uses of our time.”

In that session, the team collectively discovered a blind spot – they realized that a solution they had in place would not in practice solve the problem it was supposed to solve for the user – and were thus able to save a lot of time and resources.

Justin Wagle, a principal data science manager for Modern Life Experiences, piloted the workbook for a feature called Flagged Search in the Family Safety app. He said it helped the team think through ethical and sociotechnical impact.

“It helped us all – data science, product and design – collaborate in a way that abstracted us from all the technicality of implementing machine learning,” he said. “We can talk about these very technical things but it comes down to what that means for the user.”

He said the workbook also helped the team better articulate to a consumer exactly how the system works, as well as discover where the system can go wrong and how to mitigate it. It’s now part of the process for every project on his team.

Specific guidance and practical tools help teams now

The HAX team set out to differentiate itself from existing human-AI interaction resources that lean toward tutorials. The toolkit gives teams specific guidelines as well as practical tools they can start using now and throughout the development lifecycle.

For example, the guidelines are divided into four groups, based on when they are most relevant to an interaction with an AI system: initially; during interaction; when the AI system gets something wrong and needs to be redirected; and over time.

The Guidelines for Human-AI Interaction. Click image to open in a new tab and zoom.

In the “Initially” group of guidelines, there are two: 1) Make clear what the system can do and 2) Make clear how well the system can do it.

Julie Stanford, a lecturer in the computer science program at Stanford University and a principal at a design firm, used these two guidelines to clearly communicate with a client, based on data her firm had gathered. It turns out that users of the client’s product were expecting the product to learn from its mistakes, something the product was not programmed to do.

In the case of Stanford’s client, an introductory blurb might be one way to help users better understand the product’s capabilities. An introductory blurb is one of several design patterns that can be used to implement Guideline 1. The toolkit has 33 design patterns for eight of the 18 guidelines.

The design patterns provide proven solutions to specific problems, so that people do not have to “reinvent the wheel and try to create their own processes,” Amershi said.

“This is how we generally work. We take a human-centered approach to the tools we’re creating ourselves. We ask what are people most struggling with? What will unblock people the fastest? What will have the biggest impact?”

The HAX Design Library contains the patterns, as well as specific examples of how those patterns have been implemented by others. It can be filtered by guideline, product category and application type.

“We are asking people to submit examples and patterns,” Vorvoreanu said. “We’re hoping this design library is going to be a community library where people keep contributing and adding examples.”

The final tool in the toolkit, the playbook, helps teams anticipate what might go wrong with an AI system by generating scenarios that are most common, based on a type of design. For example, a couple of the most common errors encountered by an AI-powered search feature that uses speech as its input would be transcription issues or background noise.

“It can be difficult to know when it can fail until it encounters a failure situation,” Vorvoreanu said. “The playbook helps a team proactively and systematically explore what types of failures might occur.”

Flexible applications nurture continuous improvements

Stanford learned about the guidelines during a talk Amershi gave at Stanford University. She has since incorporated them into her Design for AI course.

“I felt the guidelines were so robust and thoughtful that I would make them a cornerstone of the user interface design portion of that class,” she said.

In the first part of the course, students look at comparative AI experiences online, then evaluate them based on the guidelines. Later, they use the guidelines to design an AI addition for an existing project. Students prioritize which guidelines are most relevant to their project.

Stanford said the guidelines gave the students a common language and helped them to see issues they may not have noticed within the AI experiences they have every day. And when it came time to grade the students’ design work, Stanford had a comprehensive, fair way of measuring whether they had met their goals.

“It’s a really flexible tool both for teaching and practicing design,” she said.

The HAX team encourages users to share their feedback on the Contact us page and submit examples to the HAX Design Library, so the HAX community can learn together.

“We are hoping this can be a trusted resource where people can go to find tools throughout the end-to-end process,” Amershi said. “We will continue to update and create new tools as we continue to learn and work in this space.”

Explore the tools by visiting the HAX Toolkit website.

To learn more about the HAX Toolkit, join Amershi and Vorvoreanu for a webinar on July 21 at 10 am PT. Register for the webinar.

New toolkit aims to help teams create responsible human-AI experiences

Humans collaborating to build better AI

Specific guidance and practical tools help teams now

Flexible applications nurture continuous improvements

Related:

Latest Posts

The innovation behind AI at Scale

Microsoft and Wolters Kluwer Legal & Regulatory partner to explore AI-driven legal workflows

Demystifying AI at Scale

Building AI responsibly from research to practice