projects

SpeakEasy

Feb 2020

Slide generation as you speak. That was the concept we came up with in the 4^th hour. It’s visual, plausibly possible, and uses some interesting technology.

Inspiration

The motivation behind this project was to reduce people’s dependence on slides when presenting while providing an engaging visual aid. We get into bad habits with a pre-prepared slide deck: We’re less dynamic, more wooden, and are oh so tempted to glimpse at the slides. We also tend to put too much information on a slide.

All of these bad habits are prevented with SpeakEasy. You can’t rely on it as a memory crutch because it’s different every time. It encourages you to speak freely while still emphasizing your points visually and with bold headers.

How we built it

We created a pipeline that passes raw audio to the Google Cloud Speech-to-text API, producing plaintext. We then take that text data and apply various natural language processing techniques (including IBM Watson) to generate semantic analysis, from which we can extract key information to format our slides.

Knowing for instance that subject of a sentence can provide a header and the bullet points beneath are the verb-noun pairs corresponding to that subject. We also do a level of emotive analysis for creating color choices for text.

From this, we create an internal AST which contains the object types which are then converted into enhanced markdown that can take React Components known as MDX data.

We then hot-reload these files into Gatsby (a React-based framework).

Challenges we ran into

Gatsby is not well documented when interacting with MDX
First time using Google Speech-to-text API

Accomplishments that we’re proud of

Designed and executed an innovative idea in a short period.
Creating a joyful and engaging user experience
GIFs
Automatic semantic analysis and formatting
Emoji hot-loading

What we learned

Gatsby is super fast and, when not using mdx-themes, is an easy-to-use tool.
How to use Speech-to-text API in an endless stream.

What should we do next

Negative Latency with predictive descriptions
Improved speech recognition
Embedded image lookup
Improved slide transitions

Early Demo

A quick ready demonstration used to pitch the concept with a few hours left to go, it is, perhaps unsurprisingly, a little rough around the edges.