6.2 Speech recognition and natural language understanding

Topic: Speech Recognition and Natural Language Understanding (NLU)

📘 Chapter: Speech Recognition & NLU for Conversational Robotics

🔹 Introduction

Conversational robotics allows robots to understand and respond to human speech naturally. This involves two main components:

Speech Recognition (ASR) – Converts spoken words into text.
Natural Language Understanding (NLU) – Interprets text to extract meaning, intent, and context.

This chapter explains how these components work together to enable interactive, voice-controlled robots.

🧠 1. Speech Recognition (ASR)

Definition:

Automatic Speech Recognition (ASR) converts audio signals from microphones into readable text.

Key Steps:

Audio signal acquisition (mic input)
Preprocessing (noise reduction, normalization)
Feature extraction (MFCC, spectrogram)
Model inference (Deep Learning or HMM-based)
Output as text

Popular Tools / APIs:

Google Speech-to-Text
Microsoft Azure Speech
NVIDIA Riva
Open-source: Mozilla DeepSpeech, Whisper

🧠 2. Natural Language Understanding (NLU)

Definition:

NLU processes text to determine intent, entities, and context.

Key Steps:

Tokenization & normalization
Part-of-speech tagging
Named Entity Recognition (NER)
Intent classification
Dialogue state tracking

Popular Tools / APIs:

Rasa NLU
Dialogflow
LUIS (Microsoft)
OpenAI GPT / Transformers

🌀 3. Conversational Robotics Pipeline (Workflow Diagram)

Microphone Input (Audio)
        │
        ▼
  Speech Recognition (ASR)
        │
        ▼
Text Output (Transcript)
        │
        ▼
  Natural Language Understanding (NLU)
        │
        ├── Extract Intent
        ├── Extract Entities
        └── Context Processing
        │
        ▼
 Dialogue Manager / Decision Logic
        │
        ▼
  Robot Action / Response

🎯 4. Integration in Robots

Steps:

Audio Input: Capture voice using a microphone array.
ASR Module: Convert speech to text in real-time.
NLU Module: Analyze text, extract commands and intent.
Decision / Dialogue Manager: Determine robot response.
Action Execution: Robot moves, speaks, or interacts accordingly.

Example Commands:

“Pick up the red box” → Robot picks object.
“Move to the left” → Robot navigates left.
“Tell me the temperature” → Robot speaks sensor reading.

🧩 5. Python Example using SpeechRecognition + Rasa NLU

import speech_recognition as sr
from rasa_nlu.model import Interpreter

# Initialize recognizer
recognizer = sr.Recognizer()

# Load Rasa NLU interpreter
interpreter = Interpreter.load("models/nlu")

# Capture audio
with sr.Microphone() as source:
    print("Say something:")
    audio = recognizer.listen(source)
    text = recognizer.recognize_google(audio)
    print("You said:", text)

# NLU processing
result = interpreter.parse(text)
print("Intent:", result['intent'])
print("Entities:", result['entities'])

🏭 6. Real-World Applications

Home assistant robots (Alexa, Google Home)
Customer service robots
Elderly care robots
Educational robots
Industrial voice-controlled machines

📝 7. Self Assignment

Tasks:

Set up a microphone input in Python.
Use Google Speech-to-Text API to transcribe speech.
Install Rasa NLU and create basic intents (greet, move, pick).
Connect ASR output to Rasa NLU to detect intent.
Print detected intent and entities for 5 sample voice commands.

❓ 8. MCQs (Multiple Choice Questions)

Q1. ASR stands for?

A. Automatic Speech Recognition
B. Artificial Syntax Response
C. Audio Signal Relay
D. Automated System Robot

Q2. NLU extracts?

A. Intent and entities
B. Only grammar
C. File formats
D. Hardware signals

Q3. Which Python library can capture audio?

A. speech_recognition
B. numpy
C. matplotlib
D. pandas

Q4. Dialogue manager in robots decides?

A. Hardware maintenance
B. Robot actions / responses
C. WiFi settings
D. Camera resolution

Q5. Intent classification in NLU does?

A. Translates audio to text
B. Determines user’s goal from text
C. Captures images
D. Measures distance

✅ Correct Answers

A – Automatic Speech Recognition
A – Intent and entities
A – speech_recognition
B – Robot actions / responses
B – Determines user’s goal from text

This completes the Speech Recognition & NLU chapter for Conversational Robotics.

Topic: Speech Recognition and Natural Language Understanding (NLU)​

📘 Chapter: Speech Recognition & NLU for Conversational Robotics

🔹 Introduction​

🧠 1. Speech Recognition (ASR)

Definition:​

Key Steps:​

Popular Tools / APIs:​

🧠 2. Natural Language Understanding (NLU)

Definition:​

Key Steps:​

Popular Tools / APIs:​

🌀 3. Conversational Robotics Pipeline (Workflow Diagram)

🎯 4. Integration in Robots

Steps:​

Example Commands:​

🧩 5. Python Example using SpeechRecognition + Rasa NLU

🏭 6. Real-World Applications

📝 7. Self Assignment

Tasks:​

❓ 8. MCQs (Multiple Choice Questions)

Q1. ASR stands for?​

Q2. NLU extracts?​

Q3. Which Python library can capture audio?​

Q4. Dialogue manager in robots decides?​

Q5. Intent classification in NLU does?​

✅ Correct Answers

Topic: Speech Recognition and Natural Language Understanding (NLU)

🔹 Introduction

Definition:

Key Steps:

Popular Tools / APIs:

Definition:

Key Steps:

Popular Tools / APIs:

Steps:

Example Commands:

Tasks:

Q1. ASR stands for?

Q2. NLU extracts?

Q3. Which Python library can capture audio?

Q4. Dialogue manager in robots decides?

Q5. Intent classification in NLU does?