KinVoices

Older adult smiling and interacting with KinVoice from an Amazon Echo smart speaker device on the kitchen table

Overview

With voice user interfaces (VUIs) becoming ubiquitous and speech synthesis technology maturing, it is possible to synthesise AI voices that resemble our friends and relatives, which we will collectively call ‘kin’, and use them on VUIs. However, designing such interfaces and investigating how the familiarity of kin voices affect user perceptions remain under-explored.

Check out the video overview:

Role and Results

I led this research:

Created the concept
Interface and custom API development
Conducted user studies
Writing and publication to CSCW 2021 (journal article)

Assistance in development: Tamil Selvan Gunasekaran

KinVoice System Implementation

As an example application, we implemented a prototype, KinVoice, on Amazon Echo Dot devices that enables users to set reminders and receive. It issues the reminders in AI-generated voices based on the voices of family members and friends.

When the user set a reminder, KinVoice retrieves information on the reminder message, day, and time from the user. Then, it updates the Alexa Developer Console server which helps to keep track of and issue the reminder. The reminder data is also posted to a custom-made Django framework API that is hosted on a Google Cloud server. The API generates the reminder message as an audio file in a kin voice based on a speech recording sample of the user’s kin using the Real-Time Voice Cloning tool and stores the file to an Amazon S3 bucket database. When the reminder is issued, the Echo Dot announces there is a reminder for them and asks the user to play the reminder message. Finally, KinVoice plays the reminder audio file from the S3 bucket when the user asks and could be played at any time after the reminder is issued, until the next reminder is issued.

Key Findings

User Perception

The voices of friends and family promoted the feeling of connection (co-presence), social presence and telepresence.
The voices were persuasive, credible, and charismatic.
The voices were likeable, safe, and eerie (drew attention to the interface).

CSCW Conference 2021 Video Presentation

KinVoices

Overview

Role and Results

KinVoice System Implementation

Key Findings

User Perception

CSCW Conference 2021 Video Presentation

Samantha Chan

Postdoctoral Fellow

KinVoices

Overview

Role and Results

KinVoice System Implementation

Key Findings

User Perception

CSCW Conference 2021 Video Presentation

Samantha Chan

Postdoctoral Fellow

Publications