Upon joining the central Alexa Voice Design team in late 2015, my first order of business was to tackle a complicated systems design problem: how could we adapt a primarily single-threaded, voice-forward, reactive experience to allow proactive notifications? There was a great deal of caution surrounding the effort, as our product has been invited into homes, and we did not want to violate that trust.
Further complicating matters was the as-then-unannounced Echo Show, which completely changed the interaction paradigm. While Echo had not supported large-scale multitasking, the Echo Show introduced the concept of navigation and a sort of back stack. The Notifications patterns were also required for third party Alexa Skills partners, but we had to expose them in a way that prevented abuse of the Alexa system’s trust within the home.
I drove the design side of this engagement from my start on the Alexa team until my departure in late 2016, with Carl Mekala as my product management partner. The feature was announced in Spring 2017, in tandem with related feature Communications and the release of the Echo Show.
Defining the problem
At first engagement, I was simply asked to help my product management partner define the interactions for a proposed Alexa notification system. There were 5 levels of proposed interruptions at the time, and details about the Messaging feature were only just emerging. We began by defining scenarios for each of the proposed notification types, and I synthesized this information into interaction flows that helped streamline our solution.
Shortly after we began this engagement, the Echo Show entered the picture as our first graphically enabled Alexa device. This was particularly problematic because Echo devices up until that point did not generally support multitasking. Graphical systems like computers are expected to support a more sophisticated system where two or more tasks can coexist on the screen at once, like timers and browsing. Once those tasks are visible, they must also be accessible via voice.
An invisible design system
I began to identify changes that would be needed to our core interaction patterns to enable this new paradigm. The inputs to my process were an existing Excel spreadsheet that documented the Echo interaction model as shipped, the output of a cross-device design charette from the team working on the Echo Show, and pre-release documentation for tented projects like the Echo Show and the Ford partnership.
I proceeded to define a cross-platform design pattern library to synthesize the work and patterns required. This work involved several generative design exercises, including:
- A whitepaper identifying four key device interaction modes, and their strengths and weaknesses:
- Voice-forward (e.g, Echo Show) – most core tasks can be completed with voice, screen may provide additional extensions of core tasks and provides a notification framework
- Screen-forward (e.g. Fire TV) – not all tasks can be completed with voice
- Voice-only (e.g. Echo) – all tasks are completed with voice only, no assumed visual contact
- Constrained (e.g. smartwatches, automotive) – assume customer is in a context-heavy situation and device capabilities are limited
- The creation of a new Interruption Matrix to show how tasks should behave on each platform type.
Driving platform-wide change
Getting these changes implemented was a long road: the patterns for interrupted activities were a firmware feature, and further were implemented differently from domain to domain, like music and home automation. Once my initial patterns were proposed, I worked to identify a key development partner similarly passionate about streamlining these cross-device patterns. This development partner helped vet the feasibility of the design, and connected me with teams who became additional stakeholders in the work.
My work on the Notifications feature broke down into several key efforts:
- My device landscape whitepaper
- The updated Activity Model and Interruption Model for all Alexa devices. (The previous interruption model was defined for Echo only.)
- Notifications VUI (all retrieval intents and delivery prompts)
- Do Not Disturb VUI (all control intents and behaviors)
- End-to-end storyboards that incorporated VUI, audio, and visuals in context for approvals through SVP
In addition, I worked very closely with partners on the visual design and sound design side, particularly sound design. We collaborated to identify when sounds would be appropriate, when sounds should replace voice prompts, and how to delicately interrupt without being too disruptive in the home. We also worked hand-in-hand with the Communications domain, whose calling and messaging features would need to be consistent with the overall Notifications design as appropriate.
This work spanned over a year due to the high number of stakeholders, hardware release cycles, and the complexity of the implementation.
What is an “Interruption Model”?
This was the term we used to describe a design taxonomy for the sound, visual, and VUI behaviors that apply when a new event occurs on the device while in use. For example, on the iPhone, you could say the interruption model offers a few basic patterns: toast notifications, tray notifications, and app badging. Each of these is applied in specific circumstances.
Our Interruption Model took the form of a matrix: on one axis we mapped the possible activity types (playing media, etc.) and on the other axis we mapped the types of interruptions that could occur (i.e, incoming call.) Since there are dozens of individual intents and events, we had to first develop a coherent activity model to which all existing Alexa actions could be mapped. For example, “passive media” as an activity type that applies to music, video, or podcasts. Both the Activity Model and the Interruption Model required intensive vetting across dozens of stakeholders on multiple domain teams.
Much of the work is still partially released. Further process examples will be shared as details become public to avoid disclosing roadmap.
Storyboards were not in common use on the Alexa team when I joined, so I stayed in relatively low fidelity to maximize for speed and minimize lost work. These storyboards were used to draw attention to the interaction “cliffs” when transitioning between locations and devices.
Documentation and responses to the feature will be linked here once made public.