The challenge
The solution
Understanding the need
Starting at the heart
Experimenting with proactivity
Remote interaction research
The larger context and future work
Sam, A Proactive Music Assistant

The before and after

I challenged myself to explore how we could use advances in conversational technology to create better music listening experiences for multiple people. I assumed that people's needs would end at an assistant that could automatically play songs that match their tastes and maintain the current vibe. What the project showed me was that a conversational assistant has the power to impact the social experience and help people discover new music, create memories and connections, deepen their music knowledge, and help them reach an aspirational mood.

Organization

Design Researcher @ CMU HCII
Jan 2021 - Present
Advising from

Prof. Nik Martelaro (CMU)
Sarah Mennicken (Spotify Research)

My hats

Concept development
UX research
VUI prototyping
Mobile UI design

Tools

Figma
Self-designed WoZ prototype

Project overview

The challenge

How might we curate enjoyable group music listening experiences through the use of a proactive conversational assistant?

The solution

How our proactive assistant interacts with users

See examples of the different proactivity levels

Creating new, fun music moments and deepening knowledge

Sam can send listeners relevant information about the music they are listening to (ex. fun facts or concerts coming up soon), create individual "just for you" moments even in multi-person settings, and use personal artifacts to suggest fun music recommendations.

Curating music based on the activity

Sam uses the user's current activity and perceived cognitive load to choose what to play. It also adjusts other elements of the music such as the volume to an appropriate level for the activity.

Reaching an aspirational state or desired mood

Sam utilizes people's listening habits to create music experiences that help users lean in to positive moods, break out of negative moods, or reach any other desired state of mind.

Building connections

Sam can curate music experiences based on similarities in music tastes between the different users to create memorable moments

A framework for the future

Using the results from our WoZ research studies, we developed a generalized framework to inform the design and development of a proactive voice-based music assistant and to guide future design efforts in this space.

Framework for translating context detection into proactive action

Process

Solution

01. Wizard with intelligence

The final design shows a step by step wizard that takes users through the most important information needed to provide an accurate analysis. With the amount of data needed and the potential for users to have missing information, the tool uses intelligent dynamic defaults to estimate numbers where users may have gaps.

02. building trust with transparency

The focus here is to help build trust with the user by showing them that there are significant algorithmic processes being undertaken to provide accurate analyses and show what those actual processes are.

03. actionable insights

The final screen of the tool displays the results of the analysis showing the most important data points with opportunities to get more detailed analyses delivered electronically by providing personal contact information. Users can also manipulate important real-time variables that do not require a full analysis re-run.

04. Flexibility to run multiple analyses

Using the left collapsible menu, users can see their current inputs and easily modify one or more and run a new analysis without having to go back through the entire wizard.

01. wizard with intelligence

The final design shows a step by step wizard that takes users through the most important information needed to provide an accurate analysis. With the amount of data needed and the potential for users to have missing information, the tool uses intelligent dynamic defaults to estimate numbers where users may have gaps.

02. building trust with transparency

The focus here is to help build trust with the user by showing them that there are significant algorithmic processes being undertaken to provide accurate analyses and show what those actual processes are.

03. actionable insights

The final screen of the tool displays the results of the analysis showing the most important data points with opportunities to get more detailed analyses delivered electronically by providing personal contact information. Users can also manipulate important real-time variables that do not require a full analysis re-run.

04. Flexibility to run multiple analyses

Using the left collapsible menu, users can see their current inputs and easily modify one or more and run a new analysis without having to go back through the entire wizard.

Understanding the need

The average person listens to music 14% of their waking lives and 25% of adults in the U.S. now own a smart speaker

One of the most popular activities with a voice assistant is playing music, however, this process of playing music is still a very transactional experience.

We wanted to explore a better way to deliver an immersive music-listening experience using voice assistants by proactively engaging people with the right music at the right time using contextual information.

Proactivity poses many opportunities but, not without its own set of challenges

Paramount to these is understanding what level of proactivity is the most appropriate, knowing when to interrupt a user, understanding how to curate a personalized experience without making the user uncomfortable, and how best to converse with the user.

Proactivity in multi-person contexts is relatively unexplored

In addition to the obstacles of proactivity, operating in multi-person contexts presented challenges such as determining when to interrupt a conversation, understanding the social dynamics present, and who to address/how to address them in different contexts.

What was our goal?

Create a better music experience with voice assistants by using contextual information to deliver the right music at the right time.

Starting at the heart

I began this exploration with my team by focusing on learning more about the heart of the experience we set out to design: the music

How can we understand people's relationship with music in multi-person contexts?

5
Household
Interviews

Why household interviews?

We conducted group interviews with different households to understand how different contexts within the home, social dynamics, group activity goals, and current available platforms shaped group listening experiences. We made the conscious decision not to do individual interviews to examine the implicit dynamics between members of the household.

Key insights

01. People care about music supporting their group activity and maintaining the social environment.

02. People found it difficult to curate and play the right music without much effort.

03. People want music to improve the enjoyability of a group experience.

04. People want to listen to music they know they will enjoy.

Levels of proactivity

I helped us define the level of proactivity by recommending we vary it based on the estimated confidence the voice assistant had in the actions it was taking (a lower level of proactivity is assumed to use less personal data and deliver less overall value). In all scenarios, the assistant engages the participants.

Low proactivity

In a low proactive scenario, we had the agent ask the participant what they wanted to listen to and confirm all actions before taking them. The assistant would not make any music recommendations.

Medium proactivity

In a medium proactive scenario, we had the agent recommend songs, playlists, volume levels and, changing speakers (based on prior knowledge and contextual factors) to the participant but, still confirm all actions before taking them.

High proactivity

In a high proactive scenario, we had the agent just communicate that they are changing the music, volume level, or speaker (based on prior knowledge and contextual factors) without confirming with the participant. We also had the agent explicitly explain their actions.

Experimenting with proactivity

We wanted to dive right in and start exploring how we could address the broad needs we found through a proactive voice assistant. Our first goal was to understand...

What is the right level of proactivity and when is it the right time to be proactive?

5
Scripted
Enactments

Why scripted enactments?

I proposed a scripted enactment method because it was an easy way to get participants immersed in future voice assistant scenarios where we could distinctly control and compare different levels of proactivity.

Methodology

Our participants enacted different scenarios for multi-person interaction (cooking in the kitchen, working in the dining room, and cleaning in the living room). In each scenario, I simulated a voice assistant and controlled the music by sharing computer audio over a Zoom call.

The 3 levels of proactivity

I helped us define the levels of proactivity by recommending we vary it based on the estimated confidence the voice assistant had in the actions it was taking (a lower level of proactivity is assumed to use less personal data and deliver less overall value).

Low proactivity

In a low proactive scenario, we had the agent ask the participant what they wanted to listen to and confirm all actions before taking them. The assistant would not make any music recommendations.

Medium proactivity

In a medium proactive scenario, we had the agent recommend songs, playlists, volume levels and, changing speakers (based on prior knowledge and contextual factors) to the participant but, still confirm all actions before taking them.

High proactivity

In a high proactive scenario, we had the agent just communicate that they are changing the music, volume level, or speaker (based on prior knowledge and contextual factors) without confirming with the participant. We also had the agent explicitly explain their actions.

Zoom In
Click anywhere to minimize image.

Levels of proactivity

I helped us define the level of proactivity by recommending we vary it based on the estimated confidence the voice assistant had in the actions it was taking (a lower level of proactivity is assumed to use less personal data and deliver less overall value). In all scenarios, the assistant engages the participants.

Low proactivity

In a low proactive scenario, we had the agent ask the participant what they wanted to listen to and confirm all actions before taking them. The assistant would not make any music recommendations.

Medium proactivity

In a medium proactive scenario, we had the agent recommend songs, playlists, volume levels and, changing speakers (based on prior knowledge and contextual factors) to the participant but, still confirm all actions before taking them.

High proactivity

In a high proactive scenario, we had the agent just communicate that they are changing the music, volume level, or speaker (based on prior knowledge and contextual factors) without confirming with the participant. We also had the agent explicitly explain their actions.

Key insights

01. People need to have a sense of control over the experience by confirming the assistants decisions because they didn't trust it to understand their needs based on context and past behavior

02. People want the assistant to be brief and relevant when it communicates with them.

03. People value when the assistant minimizes the effort needed to curate an appropriate music experience (recommending music, using music to transition activities, and helping them explore new music)

04. People want to be able to control what data an assistant can use but, do not want the assistant to explicitly mention how it was using the data when interacting with them

Remote interaction research

Our "voice assistant"

We used a box, two smartphones connected to Zoom calls (to see our participants), a lapel mic (to hear our participants), and a bluetooth speaker (to play music and voice messages) to create our homemade smart voice assistant.

In-Home
Study 1

Methodology

2-hour sessions everyday for 6 days
We curated music experiences for two roommates in their kitchen, living, and dining areas.

Simulating the assistant
We used contextual cues (activities, mood, utterances, etc.) captured by the mic and cameras to proactively play/recommend music and deliver music related information (fun facts, upcoming concerts, etc.). Each day of the study we increased the "level of proactivity" and features to simulate a learning assistant. We also captured participants Spotify profiles prior to the study to better understand their music tastes.

Our control center

We used a used a custom interface connected to Spotify that allowed us to play music and voice messages for our participants through a Zoom call.

Read my thoughts on the study design
Our team working hard to derive key insights

Key insights

01. People value the assistant using contextual information to improve its proactive music recommendations

02. People want the assistant to message them details about denser information to allow them to digest it when and how they wanted

03. People need the assistant to create individualized music-listening experiences, even in multi-person settings

04. People want the assistant to fully control the music experience but, get confirmation when making changes that impacted their cognitive load

How do our insights hold-up with a completely new group of participants?

We wanted to see if what we learned applied more broadly to other multi-person households by testing an improved agent and setup in a more complex environment!

In-Home
Study 2

Methodology

2-hour sessions everyday for 4 days
We curated music experiences for four roommates in their kitchen, office, living, and dining areas.

For this study we also used...
‍‍
1. An application that provides people with more control over the hypothetical data being used by the system and information about the value delivered
2. An upgraded setup kit
3. A voice assistant that has a "friendly" personality to test its impact on the enjoyability of the experience

Voice assistant v2

We upgraded the design of our voice assistant to make it easier for our participants to setup and transport while giving us better video feeds.

Our application

I designed an application that walked users through the data we were capturing, what sensors would be needed and, allowed them to choose their agent preferences.

Zoom In
Click anywhere to minimize image.
Synthesizing our new learnings

Key insights

01. Mood is the most informative factor in understanding a user’s music listening preferences, but is also the most difficult to accurately and consistently interpret

02. People want to use music to strengthen social connections, but want to minimize their own vulnerability

03. Listeners liked additional information to help them connect to the music, but context dictated their preferred method of delivery

04. Adopting a friendly and supportive voice and tone increases the overall effectiveness of the agent

The larger context and future work

Attract new customers

Using context-awareness and proactivity increases the value users gain when listening to music with a voice assistant by curating a more personalized social experience - a unique value to attract new customers.

A small piece of a larger connected home

The data gained from this proactive assistant can be integrated into a larger smart home context providing new value differentiation opportunities.

I am writing a paper

I am continuing this project advising future work and experiments on this topic looking towards publishing a paper in the HCI community.

Conclusion

Reflecting on the project

This was my first full end to end product design project starting with research all the way through to the implementation of the beta version of the product for both consumers to interact with and to be presented to potential customers for licensing. It was an incredible experience and challenge learning how to create a truly value-generating product with a great user experience under a variety of time, technical, and user constraints. I was unfortunately unable to continue the project beyond the beta release, however, I helped to define the scope and direction for future iterations of the product. These next steps included conducting extensive user testing, dynamically changing the required inputs based on a user's specific fleet management role and desired analyses, and the creation of a dashboard for clients such as electric utilities who want to license the tool to collect and manage user data.

Back to projects