Android Things: Adding Google Assistant

With the growth of the Internet of Things (IoT), developers and engineers have had to rethink how users interact with devices on a day-to-day basis.

Android Things: Adding Google Assistant

While screens work well for websites and most apps, devices that interface with the real world can be a bit more tedious to operate if you have to use multiple buttons or a screen in order to function. One of the ways around this is to enable voice controls on your devices.

In this tutorial you will learn about Google Assistant and how you can add it to your Android Things IoT devices.

If you need a little background on Android Things before you start, check out some of my other posts here on ThemeKeeper Tuts+.

  • Android Things: Adding Google Assistant
    Android
    Introduction to Android Things
    Paul Trebilcox-Ruiz

     

  • Android Things: Adding Google Assistant
    Android SDK
    Android Things: Your First Project
    Paul Trebilcox-Ruiz

     

Assistant SDK

The Google Assistant SDK allows you to add voice controls with key word detection, natural language processing, and other machine learning features to your IoT devices. There’s a lot that can be done with the Assistant SDK, but this tutorial will just focus on the basics: how you can include it on your Android Things devices in order to ask questions, get information, and interact with standard “out of the box” Assistant functionality.

As far as hardware requirements, you have a few options. You can use a Raspberry Pi flashed with Android Things with an AIY Voice Kit.

Android Things: Adding Google Assistant

Or you can use a standard speaker with AUX connector and a USB microphone.

Android Things: Adding Google Assistant

Additionally, you can use any other I²S hardware configuration. While we won’t discuss I²S in detail in this tutorial, it’s worth noting that the Voice Kit will use this protocol. Once you have a microphone and speaker set up, you will also need to add a button to your device. This button will need to keep track of two states: pressed and released. You can accomplish this with a multi-pronged arcade button, or a standard button with a pull-down resistor attached to one of the poles.

Credentials

Once you have hooked up your hardware, it’s time to add the Assistant SDK to your device. First, you will need to create a new credentials file for your device. You can find the instructions for this in the Google Assistant docs. Once you have your credentials.json file, you will need to place it into the res/raw directory of your Android Things module.

Android Things: Adding Google Assistant

After your credentials are created with Google, you will need to declare some permissions for your app. Open the AndroidManifest.xml file and add the following lines within the manifest tag, but before the application tag.

<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" />
<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="com.google.android.things.permission.MANAGE_AUDIO_DRIVERS" />

It’s worth noting that you will need to restart your device after installing the app with these permissions in order for them to be granted.

Next you will need to copy the gRPC module into your app for communicating with the home device. This gets a little tricky, so the best place to get it is from the Google Assistant Android Things sample app, which can be found in the Android Things GitHub account. You will then need to update your settings.gradle file to reflect the new module.

include ':mobile', ':things', ':grpc'

After updating settings.gradle, include the module as a dependency in your things module by including the following line in the things module’s build.gradle file and include Google’s button driver (you will need this for activating the microphone) and optional Voice Hat driver if you are using that hardware.

compile project(':grpc')
compile 'com.google.android.things.contrib:driver-button:0.4'

//optional
compile 'com.google.android.things.contrib:driver-voicehat:0.2'

You’ll also need to include protobuf as a dependency in your project-level build.gradle file.

classpath "com.google.protobuf:protobuf-gradle-plugin:0.8.0"

Next, let’s include the oauth2 library in our project by opening the things module’s build.gradle file and adding the following under the dependencies node:

compile('com.google.auth:google-auth-library-oauth2-http:0.6.0') {
    exclude group: 'org.apache.httpcomponents', module: 'httpclient'
}

You may run into conflicts here if your project has the Espresso dependency, with an error message similar to this:

Warning:Conflict with dependency 'com.google.code.findbugs:jsr305' in project ':things'. Resolved versions for app (1.3.9) and test app (2.0.1) differ. See http://g.co/androidstudio/app-test-app-conflict for details. 

If so, just remove the Espresso dependency from build.gradle.

After you have synced your project, create a new class named Credentials.java to access your credentials.

public class Credentials {
 static UserCredentials fromResource(Context context, int resourceId)
 throws IOException, JSONException {
        InputStream is = context.getResources().openRawResource(resourceId);
        byte[] bytes = new byte[is.available()];
        is.read(bytes);
        JSONObject json = new JSONObject(new String(bytes, "UTF-8"));
            return new UserCredentials(json.getString("client_id"),
                json.getString("client_secret"),
                json.getString("refresh_token")
            );
        }
    }
}

Embedded Assistant Helper Class

Once your Credentials.java class is created, it’s time to create a new class named EmbeddedAssistant.java. This is a helper class that was originally written by engineers at Google to easily wrap the Google Assistant for Android Things. While this class is fairly straightforward to use by just including it into your project, we will want to dive into it and understand how it actually works.

The first thing you will do is create two inner abstract classes that will be used for handling callbacks in the conversation and requests to the Assistant API.

public class EmbeddedAssistant {

    public static abstract class RequestCallback {
        public void onRequestStart() {}
        public void onAudioRecording() {}
        public void onSpeechRecognition(String utterance) {}
    }

    public static abstract class ConversationCallback {
	    public void onResponseStarted() {}
	    public void onResponseFinished() {}
	    public void onConversationEvent(EventType eventType) {}
	    public void onAudioSample(ByteBuffer audioSample) {}
	    public void onConversationError(Status error) {}
	    public void onError(Throwable throwable) {}
	    public void onVolumeChanged(int percentage) {}
	    public void onConversationFinished() {}
	}
}

Once your two inner classes are written, go ahead and define the following set of global values at the top of your class. The majority of these will be initialized later in this file. These values are used to keep track of device state and interactions with the Assistant API.

private static final String ASSISTANT_API_ENDPOINT = "embeddedassistant.googleapis.com";
private static final int AUDIO_RECORD_BLOCK_SIZE = 1024;

private RequestCallback mRequestCallback;
private ConversationCallback mConversationCallback;

//Used for push-to-talk functionality
private ByteString mConversationState;
private AudioInConfig mAudioInConfig;
private AudioOutConfig mAudioOutConfig;
private AudioTrack mAudioTrack;
private AudioRecord mAudioRecord;
private int mVolume = 100; // Default to maximum volume.

private UserCredentials mUserCredentials;

private MicrophoneMode mMicrophoneMode;
private HandlerThread mAssistantThread;
private Handler mAssistantHandler;

// gRPC client and stream observers.
private int mAudioOutSize; // Tracks the size of audio responses to determine when it ends.
private EmbeddedAssistantGrpc.EmbeddedAssistantStub mAssistantService;
private StreamObserver<ConverseRequest> mAssistantRequestObserver;

Handling API Responses

While the above has a StreamObserver<ConverseRequest> object for requests to the Assistant API, you will also need one for responses. This object will consist of a switch statement that checks the state of the response and then handles it accordingly.

private StreamObserver<ConverseResponse> mAssistantResponseObserver =
    new StreamObserver<ConverseResponse>() {
        @Override
        public void onNext(ConverseResponse value) {
            switch (value.getConverseResponseCase()) {

The first case checks for the end of a user speaking and uses the ConversationCallback to let the rest of the class know that a response is imminent.

case EVENT_TYPE:
    mConversationCallback.onConversationEvent(value.getEventType());
    if (value.getEventType() == EventType.END_OF_UTTERANCE) {
        mConversationCallback.onResponseStarted();
    }
    break;

The next case will check and update conversation, volume, and microphone state.

case RESULT:
    // Update state.
    mConversationState = value.getResult().getConversationState();
    
    // Update volume.
    if (value.getResult().getVolumePercentage() != 0) {
        int volumePercentage = value.getResult().getVolumePercentage();
        mVolume = volumePercentage;
        mAudioTrack.setVolume(AudioTrack.getMaxVolume()
                * volumePercentage / 100.0f);
        mConversationCallback.onVolumeChanged(volumePercentage);
    }
    
    if (value.getResult().getSpokenRequestText() != null &&
            !value.getResult().getSpokenRequestText().isEmpty()) {
        mRequestCallback.onSpeechRecognition(value.getResult()
                .getSpokenRequestText());
    }
    
    // Update microphone mode.
    mMicrophoneMode = value.getResult().getMicrophoneMode();
    break;

The third case will take an audio result and play it back for the user.

case AUDIO_OUT:
    if (mAudioOutSize <= value.getAudioOut().getSerializedSize()) {
        mAudioOutSize = value.getAudioOut().getSerializedSize();
    } else {
        mAudioOutSize = 0;
        onCompleted();
    }
    
    final ByteBuffer audioData =
            ByteBuffer.wrap(value.getAudioOut().getAudioData().toByteArray());
    mAudioTrack.write(audioData, audioData.remaining(),
            AudioTrack.WRITE_BLOCKING);
    mConversationCallback.onAudioSample(audioData);
    break;

The final case will simply forward errors that occurred during the conversation process.

case ERROR:
    mConversationCallback.onConversationError(value.getError());
    break;

The final two methods within this stream handle error states and cleanup on completion of a conversation result.

@Override
public void onError(Throwable t) {
    mConversationCallback.onError(t);
}

@Override
public void onCompleted() {
    mConversationCallback.onResponseFinished();
    if (mMicrophoneMode == MicrophoneMode.DIALOG_FOLLOW_ON) {
        // Automatically start a new request
        startConversation();
    } else {
        // The conversation is done
        mConversationCallback.onConversationFinished();
    }
}

Streaming Audio

Next, you will need to create a Runnable that will handle audio streaming on a different thread.

private Runnable mStreamAssistantRequest = new Runnable() {
    @Override
    public void run() {
        ByteBuffer audioData = ByteBuffer.allocateDirect(AUDIO_RECORD_BLOCK_SIZE);
        int result = mAudioRecord.read(audioData, audioData.capacity(),
                AudioRecord.READ_BLOCKING);
        if (result < 0) {
            return;
        }
        mRequestCallback.onAudioRecording();
        mAssistantRequestObserver.onNext(ConverseRequest.newBuilder()
                .setAudioIn(ByteString.copyFrom(audioData))
                .build());
        mAssistantHandler.post(mStreamAssistantRequest);
    }
};

Creating the Assistant

Now that your global values are defined, it’s time to go over the framework for creating the EmbeddedAssistant. You will need to be able to retrieve the credentials for your app using the Credentials.java class that was created earlier.

public static UserCredentials generateCredentials(Context context, int resourceId)
        throws IOException, JSONException {
    return Credentials.fromResource(context, resourceId);
}

In order to instantiate itself, this class uses a private constructor and the builder pattern.

private EmbeddedAssistant() {}

public static class Builder {
    private EmbeddedAssistant mEmbeddedAssistant;
    private int mSampleRate;

    public Builder() {
        mEmbeddedAssistant = new EmbeddedAssistant();
    }

The Builder inner class contains multiple methods for initializing the values within the EmbeddedAssistant class, such as sample rate, volume, and user credentials. Once the build() method is called, all of the defined values will be set on the EmbeddedAssistant, global objects necessary for operation will be configured, and an error will be thrown if any necessary data is missing.

    public Builder setRequestCallback(RequestCallback requestCallback) {
        mEmbeddedAssistant.mRequestCallback = requestCallback;
        return this;
    }

    public Builder setConversationCallback(ConversationCallback responseCallback) {
        mEmbeddedAssistant.mConversationCallback = responseCallback;
        return this;
    }

    public Builder setCredentials(UserCredentials userCredentials) {
        mEmbeddedAssistant.mUserCredentials = userCredentials;
        return this;
    }

    public Builder setAudioSampleRate(int sampleRate) {
        mSampleRate = sampleRate;
        return this;
    }

    public Builder setAudioVolume(int volume) {
        mEmbeddedAssistant.mVolume = volume;
        return this;
    }

    public EmbeddedAssistant build() {
        if (mEmbeddedAssistant.mRequestCallback == null) {
            throw new NullPointerException("There must be a defined RequestCallback");
        }
        if (mEmbeddedAssistant.mConversationCallback == null) {
            throw new NullPointerException("There must be a defined ConversationCallback");
        }
        if (mEmbeddedAssistant.mUserCredentials == null) {
            throw new NullPointerException("There must be provided credentials");
        }
        if (mSampleRate == 0) {
            throw new NullPointerException("There must be a defined sample rate");
        }
        final int audioEncoding = AudioFormat.ENCODING_PCM_16BIT;

        // Construct audio configurations.
        mEmbeddedAssistant.mAudioInConfig = AudioInConfig.newBuilder()
                .setEncoding(AudioInConfig.Encoding.LINEAR16)
                .setSampleRateHertz(mSampleRate)
                .build();
        mEmbeddedAssistant.mAudioOutConfig = AudioOutConfig.newBuilder()
                .setEncoding(AudioOutConfig.Encoding.LINEAR16)
                .setSampleRateHertz(mSampleRate)
                .setVolumePercentage(mEmbeddedAssistant.mVolume)
                .build();

        // Construct AudioRecord & AudioTrack
        AudioFormat audioFormatOutputMono = new AudioFormat.Builder()
                .setChannelMask(AudioFormat.CHANNEL_OUT_MONO)
                .setEncoding(audioEncoding)
                .setSampleRate(mSampleRate)
                .build();
        int outputBufferSize = AudioTrack.getMinBufferSize(audioFormatOutputMono.getSampleRate(),
                audioFormatOutputMono.getChannelMask(),
                audioFormatOutputMono.getEncoding());
        mEmbeddedAssistant.mAudioTrack = new AudioTrack.Builder()
                .setAudioFormat(audioFormatOutputMono)
                .setBufferSizeInBytes(outputBufferSize)
                .build();
        mEmbeddedAssistant.mAudioTrack.setVolume(mEmbeddedAssistant.mVolume *
                AudioTrack.getMaxVolume() / 100.0f);
        mEmbeddedAssistant.mAudioTrack.play();

        AudioFormat audioFormatInputMono = new AudioFormat.Builder()
                .setChannelMask(AudioFormat.CHANNEL_IN_MONO)
                .setEncoding(audioEncoding)
                .setSampleRate(mSampleRate)
                .build();
        int inputBufferSize = AudioRecord.getMinBufferSize(audioFormatInputMono.getSampleRate(),
                audioFormatInputMono.getChannelMask(),
                audioFormatInputMono.getEncoding());
        mEmbeddedAssistant.mAudioRecord = new AudioRecord.Builder()
                .setAudioSource(AudioSource.VOICE_RECOGNITION)
                .setAudioFormat(audioFormatInputMono)
                .setBufferSizeInBytes(inputBufferSize)
                .build();

        return mEmbeddedAssistant;
    }
}

Connecting to the Assistant API

After the EmbeddedAssistant has been created, the connect() method will need to be called in order to connect to the Assistant API.

public void connect() {
    mAssistantThread = new HandlerThread("assistantThread");
    mAssistantThread.start();
    mAssistantHandler = new Handler(mAssistantThread.getLooper());

    ManagedChannel channel = ManagedChannelBuilder.forTarget(ASSISTANT_API_ENDPOINT).build();
    mAssistantService = EmbeddedAssistantGrpc.newStub(channel)
            .withCallCredentials(MoreCallCredentials.from(mUserCredentials));
}

After you have connected to the API, you will use two methods for starting and stopping conversations. These methods will post Runnable objects to mAssistantHandler in order to pass conversation state objects to the request and response streams.

public void startConversation() {
    mAudioRecord.startRecording();
    mRequestCallback.onRequestStart();
    mAssistantHandler.post(new Runnable() {
        @Override
        public void run() {
            mAssistantRequestObserver = mAssistantService.converse(mAssistantResponseObserver);
            ConverseConfig.Builder converseConfigBuilder = ConverseConfig.newBuilder()
                    .setAudioInConfig(mAudioInConfig)
                    .setAudioOutConfig(mAudioOutConfig);
            if (mConversationState != null) {
                converseConfigBuilder.setConverseState(ConverseState.newBuilder()
                        .setConversationState(mConversationState)
                        .build());
            }
            mAssistantRequestObserver.onNext(
                    ConverseRequest.newBuilder()
                            .setConfig(converseConfigBuilder.build())
                            .build());
        }
    });
    mAssistantHandler.post(mStreamAssistantRequest);
}

public void stopConversation() {
    mAssistantHandler.post(new Runnable() {
        @Override
        public void run() {
            mAssistantHandler.removeCallbacks(mStreamAssistantRequest);
            if (mAssistantRequestObserver != null) {
                mAssistantRequestObserver.onCompleted();
                mAssistantRequestObserver = null;
            }
        }
    });

    mAudioRecord.stop();
    mAudioTrack.play();
    mConversationCallback.onConversationFinished();
}

Shutting Down

Finally, the destroy() method will be used for teardown when your app is closing and no longer needs to access the Assistant API.

public void destroy() {
    mAssistantHandler.post(new Runnable() {
        @Override
        public void run() {
            mAssistantHandler.removeCallbacks(mStreamAssistantRequest);
        }
    });
    mAssistantThread.quitSafely();
    if (mAudioRecord != null) {
        mAudioRecord.stop();
        mAudioRecord = null;
    }
    if (mAudioTrack != null) {
        mAudioTrack.stop();
        mAudioTrack = null;
    }
}

Using the Assistant

Once your helper classes are fleshed out, it’s time to use them. You will do this by editing your Android Things MainActivity class to interact with the EmbeddedAssistant and hardware for controlling the Google Assistant. First, add the Button.OnButtonEventListener interface to your Activity.

public class MainActivity extends Activity implements Button.OnButtonEventListener {

Next you will need to add the member variables and constants that will be required by your app. These values will control the debounce of the button that triggers the Assistant, as well as the volume, the audio format, the UserCredentials class that you created earlier, and the hardware for your device.

private static final int BUTTON_DEBOUNCE_DELAY_MS = 20;
private static final String PREF_CURRENT_VOLUME = "current_volume";
private static final int SAMPLE_RATE = 16000;
private static final int ENCODING = AudioFormat.ENCODING_PCM_16BIT;
private static final int DEFAULT_VOLUME = 100;

private int initialVolume = DEFAULT_VOLUME;

private static final AudioFormat AUDIO_FORMAT_STEREO =
        new AudioFormat.Builder()
                .setChannelMask(AudioFormat.CHANNEL_IN_STEREO)
                .setEncoding(ENCODING)
                .setSampleRate(SAMPLE_RATE)
                .build();

// Hardware peripherals.
private VoiceHat mVoiceHat;
private Button mButton;
private EmbeddedAssistant mEmbeddedAssistant;
private UserCredentials userCredentials;

Once you have your constants defined, you will need to create a few callback objects that will be used for conversations and requests with the assistant.

private ConversationCallback mConversationCallback = new ConversationCallback() {
    @Override
    public void onConversationEvent(EventType eventType) {}

    @Override
    public void onAudioSample(ByteBuffer audioSample) {}

    @Override
    public void onConversationError(Status error) {}

    @Override
    public void onError(Throwable throwable) {}

    @Override
    public void onVolumeChanged(int percentage) {
        SharedPreferences.Editor editor = PreferenceManager
                .getDefaultSharedPreferences(AssistantActivity.this)
                .edit();
        editor.putInt(PREF_CURRENT_VOLUME, percentage);
        editor.apply();
    }

    @Override
    public void onConversationFinished() {}
};

private RequestCallback mRequestCallback = new RequestCallback() {
    @Override
    public void onRequestStart() {
        //starting assistant request, enable microphones
    }

    @Override
    public void onSpeechRecognition(String utterance) {}
};

In mConversationCallback, you will notice that we save a volume change percentage in a shared preference. This allows your device volume to stay consistent for your users, even across reboots.

As the assistant works asynchronously on your device, you will initialize everything for using the Assistant API in onCreate() by calling a set of helper methods that we will define over the rest of this tutorial.

@Override
protected void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);

    initVoiceHat();
    initButton();
    initVolume();
    initUserCredentials();
    initEmbeddedAssistant();
}

The first helper method is initVoiceHat(). If the Voice Hat shield is attached to a Raspberry Pi, this method will initialize the device so that users can use the attached microphone and speaker. If a Voice Hat is not attached, then a standard AUX speaker and USB microphone can be used and will be routed to automatically. The Voice Hat uses I2S to handle audio peripherals on the bus, and is wrapped by a driver class that was written by Google.

private void initVoiceHat() {
    PeripheralManagerService pioService = new PeripheralManagerService();
    List<String> i2sDevices = pioService.getI2sDeviceList();
    if (i2sDevices.size() > 0) {
        try {
            mVoiceHat = new VoiceHat(
                    BoardDefaults.getI2SDeviceForVoiceHat(),
                    BoardDefaults.getGPIOForVoiceHatTrigger(),
                    AUDIO_FORMAT_STEREO
            );
            mVoiceHat.registerAudioInputDriver();
            mVoiceHat.registerAudioOutputDriver();
        } catch (IllegalStateException e) {}
    }
}

The assistant will only respond in this sample while a triggering button is held down. This button is initialized and configured like so:

private void initButton() {
    try {
        mButton = new Button(BoardDefaults.getGPIOForButton(),
                Button.LogicState.PRESSED_WHEN_LOW);
        mButton.setDebounceDelay(BUTTON_DEBOUNCE_DELAY_MS);
        mButton.setOnButtonEventListener(this);
    } catch( IOException e ) {}
}

When the button is pressed, the assistant will start listening for a new conversation.

@Override
public void onButtonEvent(Button button, boolean pressed) {
    if (pressed) {
        mEmbeddedAssistant.startConversation();
    }
}

You can find more information about GPIO and Android Things in my tutorial about input and output with Android Things.

  • Android Things: Adding Google Assistant
    Android SDK
    Android Things: Peripheral Input/Output
    Paul Trebilcox-Ruiz

     

Since we stored volume information in our device’s SharedPreferences, we can access it directly to initialize the device’s volume.

private void initVolume() {
    SharedPreferences preferences = PreferenceManager.getDefaultSharedPreferences(this);
    initialVolume = preferences.getInt(PREF_CURRENT_VOLUME, DEFAULT_VOLUME);
}

The Assistant SDK requires authentication for use. Luckily we created a method in the EmbeddedAssistant class earlier in this tutorial specifically for this situation.

private void initUserCredentials() {
    userCredentials = null;
    try {
        userCredentials = EmbeddedAssistant.generateCredentials(this, R.raw.credentials);
    } catch (IOException | JSONException e) {}
}

The final helper method that was called in onCreate() will initialize the EmbeddedAssistant object and connect it to the API.

private void initEmbeddedAssistant() {
    mEmbeddedAssistant = new EmbeddedAssistant.Builder()
            .setCredentials(userCredentials)
            .setAudioSampleRate(SAMPLE_RATE)
            .setAudioVolume(currentVolume)
            .setRequestCallback(mRequestCallback)
            .setConversationCallback(mConversationCallback)
            .build();

    mEmbeddedAssistant.connect();
}

The last thing that you will need to do is properly tear down your peripherals by updating the onDestroy() method in your Activity.

@Override
protected void onDestroy() {
    super.onDestroy();
    if (mButton != null) {
        try {
            mButton.close();
        } catch (IOException e) {}

        mButton = null;
    }

    if (mVoiceHat != null) {
        try {
            mVoiceHat.unregisterAudioOutputDriver();
            mVoiceHat.unregisterAudioInputDriver();
            mVoiceHat.close();
        } catch (IOException e) {}
        mVoiceHat = null;
    }
    mEmbeddedAssistant.destroy();
}

After all of this, you should be able to interact with your Android Things device as if it were a Google Home!

Conclusion

In this tutorial, you learned about the Google Assistant and how it can be added to your Android Things applications. This feature gives your users a new way of interacting with and controlling your device, as well as access to the many features available from Google. This is only one part of the fantastic features that can go into an Android Things app and allow you to create new and amazing devices for your users.

While you’re here, check out some of my other posts on Android Things on ThemeKeeper Tuts+!

  • Android Things: Adding Google Assistant
    Android Things
    Android Things and Machine Learning
    Paul Trebilcox-Ruiz

     

  • Android Things: Adding Google Assistant
    Android SDK
    Android Things: Understanding and Writing Drivers
    Paul Trebilcox-Ruiz

     

  • Android Things: Adding Google Assistant
    Android Things
    Android Things: Creating a Cloud-Connected Doorman
    Paul Trebilcox-Ruiz