Compare commits

...

13 Commits

Author SHA1 Message Date
Terrence
2be6217b1f v0.3.1 2024-10-03 06:41:16 +08:00
Terrence
879f1cc21e reconstruct application 2024-10-03 06:39:22 +08:00
Terrence
e59be04394 update to 4MB partition 2024-10-01 15:58:03 +08:00
Terrence
d26e8d25ff support ML307, new version 0.3.0 2024-10-01 14:16:12 +08:00
Terrence
8e9be5abc7 add websocket protocol 2024-09-26 16:19:54 +08:00
Terrence
7fd72aa8e2 add more wake word packets 2024-09-26 16:19:06 +08:00
Terrence
0396b4a91c fix bugs 2024-09-25 03:44:28 +08:00
Terrence
53b08843d4 add vad to detection and communication 2024-09-17 11:26:07 +08:00
Terrence
797f9c2515 start AP if WiFi station fails to connect 2024-09-15 14:03:11 +08:00
Terrence
cebe41c2d0 update opus encoder version 2024-09-14 15:00:48 +08:00
Terrence
e46016b3fc add testing 2024-09-14 14:58:03 +08:00
Terrence
140ed56ee9 增加 RGB 灯亮的注意事项 2024-09-12 21:48:47 +08:00
Terrence
1093bce089 add usage to readme 2024-09-12 19:53:14 +08:00
31 changed files with 2177 additions and 618 deletions

View File

@@ -4,7 +4,7 @@
# CMakeLists in this exact order for cmake to work correctly
cmake_minimum_required(VERSION 3.16)
set(PROJECT_VER "0.2.0")
set(PROJECT_VER "0.3.1")
include($ENV{IDF_PATH}/tools/cmake/project.cmake)
project(xiaozhi)

View File

@@ -1,6 +1,94 @@
# 你好,小智
# 小智 AI 聊天机器人
【ESP32+SenseVoice+Qwen72B打造你的AI聊天伴侣
BiliBili 视频介绍 [【ESP32+SenseVoice+Qwen72B打造你的AI聊天伴侣](https://www.bilibili.com/video/BV11msTenEH3/?share_source=copy_web&vd_source=ee1aafe19d6e60cf22e60a93881faeba)
这是虾哥的第一个硬件作品。
## 项目目的
本项目基于乐鑫的 ESP-IDF 进行开发。
本项目是一个开源项目,主要用于教学目的。我们希望通过这个项目,能够帮助更多人入门 AI 硬件开发,了解如何将当下飞速发展的大语言模型应用到实际的硬件设备中。无论你是对 AI 感兴趣的学生,还是想要探索新技术的开发者,都可以通过这个项目获得宝贵的学习经验。
欢迎所有人参与到项目的开发和改进中来。如果你有任何想法或建议,请随时提出 issue 或加入群聊。
学习交流 QQ 群946599635
## 已实现功能
- Wi-Fi 配网
- 支持 BOOT 键唤醒和打断
- 离线语音唤醒(使用乐鑫方案)
- 流式语音对话WebSocket 协议)
- 支持国语、粤语、英语、日语、韩语 5 种语言识别(使用 SenseVoice 方案)
- 声纹识别(识别是谁在喊 AI 的名字,[3D Speaker 项目](https://github.com/modelscope/3D-Speaker)
- 使用大模型 TTS火山引擎方案阿里云接入中
- 支持可配置的提示词和音色(自定义角色)
- 免费提供 Qwen2.5 72B 和 豆包模型(受限于性能和额度,人多后可能会限额)
- 支持每轮对话后自我总结,生成记忆体
- 扩展液晶显示屏,显示信号强弱(后面可以显示中文字幕)
- 支持 ML307 Cat.1 4G 模块(可选)
## 硬件部分
为方便协作,目前所有硬件资料都放在飞书文档中:
[《小智 AI 聊天机器人百科全书》](https://ccnphfhqs21z.feishu.cn/wiki/F5krwD16viZoF0kKkvDcrZNYnhb?from=from_copylink)
第二版接线图如下:
![第二版接线图](docs/wiring2.jpg)
## 固件部分
### 免开发环境烧录
新手第一次操作建议先不要搭建开发环境,直接使用免开发环境烧录的固件。
点击 [这里](https://github.com/78/xiaozhi-esp32/releases) 下载最新版固件。
固件使用的是作者友情提供的测试服,目前开放免费使用,请勿用于商业用途。
### 搭建开发环境
- Cursor 或 VSCode
- 安装 ESP-IDF 插件,选择 SDK 版本 5.3 或以上
- Ubuntu 比 Windows 更好,编译速度快,也免去驱动问题的困扰
### 配置项目与编译固件
- 目前只支持 ESP32 S3Flash 至少 8MB, PSRAM 至少 2MB注意默认配置只兼容 8MB PSRAM如果你使用 2MB PSRAM需要修改配置否则无法识别
- 配置 OTA Version URL 为 `https://api.tenclass.net/xiaozhi/ota/`
- 配置 WebSocket URL 为 `wss://api.tenclass.net/xiaozhi/v1/`
- 配置 WebSocket Access Token 为 `test-token`
- 如果 INMP441 和 MAX98357 接线跟默认配置不一样,需要修改 GPIO 配置
- 配置完成后,编译固件
## 配置 Wi-Fi 4G 版本跳过)
按照上述接线,烧录固件,设备上电后,开发板上的 RGB 会闪烁蓝灯(部分开发板需要焊接 RGB 灯的开关才会亮),进入配网状态。
打开手机 Wi-Fi连接上设备热点 `Xiaozhi-xxxx` 后,使用浏览器访问 `http://192.168.4.1`,进入配网页面。
选择你的路由器 WiFi输入密码点击连接设备会在 3 秒后自动重启,之后设备会自动连接到路由器。
## 测试设备是否连接成功
设备连接上路由器后,闪烁一下绿灯。此时,喊一声“你好,小智”,设备会先亮蓝灯(表示连接服务器),然后再亮绿灯,播放语音。
如果没有亮蓝灯,说明麦克风有问题,请检查接线是否正确。
如果没有亮绿灯,或者蓝灯常亮,说明设备没有连接到服务器,请检查 WiFi 连接是否正常。
如果设备已经连接 Wi-Fi但是没有声音请检查是否接线正确。
在 v0.2.1 版本之后的固件,也可以按下连接 GPIO 1 的按钮(低电平有效),进行录音测试。
## 配置设备
如果上述步骤测试成功,设备会播报你的设备 ID你需要到 [小智测试服的控制面板](https://xiaozhi.tenclass.net/) 页面,添加设备。
详细的使用说明以及测试服的注意事项,请参考 [小智测试服的帮助说明](https://xiaozhi.tenclass.net/help)。
https://www.bilibili.com/video/BV11msTenEH3/?share_source=copy_web&vd_source=ee1aafe19d6e60cf22e60a93881faeba

BIN
docs/wiring.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 121 KiB

BIN
docs/wiring2.jpg Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 72 KiB

View File

@@ -1,452 +1,448 @@
#include "Application.h"
#include "BuiltinLed.h"
#include "WifiStation.h"
#include <BuiltinLed.h>
#include <TlsTransport.h>
#include <Ml307SslTransport.h>
#include <WifiConfigurationAp.h>
#include <WifiStation.h>
#include <SystemInfo.h>
#include <cstring>
#include "esp_log.h"
#include "model_path.h"
#include "SystemInfo.h"
#include "cJSON.h"
#include <esp_log.h>
#include <cJSON.h>
#include <driver/gpio.h>
#include "Application.h"
#define TAG "Application"
Application::Application() {
Application::Application()
: button_((gpio_num_t)CONFIG_BOOT_BUTTON_GPIO)
#ifdef CONFIG_USE_ML307
, ml307_at_modem_(CONFIG_ML307_TX_PIN, CONFIG_ML307_RX_PIN, 4096),
http_(ml307_at_modem_),
firmware_upgrade_(http_)
#else
, http_(),
firmware_upgrade_(http_)
#endif
#ifdef CONFIG_USE_DISPLAY
, display_(CONFIG_DISPLAY_SDA_PIN, CONFIG_DISPLAY_SCL_PIN)
#endif
{
event_group_ = xEventGroupCreate();
audio_encode_queue_ = xQueueCreate(100, sizeof(iovec));
audio_decode_queue_ = xQueueCreate(100, sizeof(AudioPacket*));
srmodel_list_t *models = esp_srmodel_init("model");
for (int i = 0; i < models->num; i++) {
ESP_LOGI(TAG, "Model %d: %s", i, models->model_name[i]);
if (strstr(models->model_name[i], ESP_WN_PREFIX) != NULL) {
wakenet_model_ = models->model_name[i];
} else if (strstr(models->model_name[i], ESP_NSNET_PREFIX) != NULL) {
nsnet_model_ = models->model_name[i];
}
}
opus_encoder_.Configure(CONFIG_AUDIO_INPUT_SAMPLE_RATE, 1);
opus_decoder_ = opus_decoder_create(opus_decode_sample_rate_, 1, NULL);
if (opus_decode_sample_rate_ != CONFIG_AUDIO_OUTPUT_SAMPLE_RATE) {
opus_resampler_.Configure(opus_decode_sample_rate_, CONFIG_AUDIO_OUTPUT_SAMPLE_RATE);
}
firmware_upgrade_.SetCheckVersionUrl(CONFIG_OTA_VERSION_URL);
firmware_upgrade_.SetHeader("Device-Id", SystemInfo::GetMacAddress().c_str());
firmware_upgrade_.SetPostData(SystemInfo::GetJsonString());
}
Application::~Application() {
if (afe_detection_data_ != nullptr) {
esp_afe_sr_v1.destroy(afe_detection_data_);
}
if (afe_communication_data_ != nullptr) {
esp_afe_vc_v1.destroy(afe_communication_data_);
}
if (wake_word_encode_task_stack_ != nullptr) {
free(wake_word_encode_task_stack_);
}
for (auto& pcm : wake_word_pcm_) {
free(pcm.iov_base);
}
for (auto& opus : wake_word_opus_) {
free(opus.iov_base);
}
if (opus_decoder_ != nullptr) {
opus_decoder_destroy(opus_decoder_);
}
if (audio_encode_task_stack_ != nullptr) {
free(audio_encode_task_stack_);
}
if (audio_decode_task_stack_ != nullptr) {
free(audio_decode_task_stack_);
}
vQueueDelete(audio_decode_queue_);
vQueueDelete(audio_encode_queue_);
vEventGroupDelete(event_group_);
}
void Application::CheckNewVersion() {
// Check if there is a new firmware version available
firmware_upgrade_.CheckVersion();
if (firmware_upgrade_.HasNewVersion()) {
// Wait for the chat state to be idle
while (chat_state_ != kChatStateIdle) {
vTaskDelay(100);
}
SetChatState(kChatStateUpgrading);
firmware_upgrade_.StartUpgrade([this](int progress, size_t speed) {
#ifdef CONFIG_USE_DISPLAY
char buffer[64];
snprintf(buffer, sizeof(buffer), "Upgrading...\n %d%% %zuKB/s", progress, speed / 1024);
display_.SetText(buffer);
#endif
});
// If upgrade success, the device will reboot and never reach here
ESP_LOGI(TAG, "Firmware upgrade failed...");
SetChatState(kChatStateIdle);
} else {
firmware_upgrade_.MarkCurrentVersionValid();
}
}
#ifdef CONFIG_USE_DISPLAY
#ifdef CONFIG_USE_ML307
static std::string csq_to_string(int csq) {
if (csq == -1) {
return "No network";
} else if (csq >= 0 && csq <= 9) {
return "Very bad";
} else if (csq >= 10 && csq <= 14) {
return "Bad";
} else if (csq >= 15 && csq <= 19) {
return "Fair";
} else if (csq >= 20 && csq <= 24) {
return "Good";
} else if (csq >= 25 && csq <= 31) {
return "Very good";
}
return "Invalid";
}
#else
static std::string rssi_to_string(int rssi) {
if (rssi >= -55) {
return "Very good";
} else if (rssi >= -65) {
return "Good";
} else if (rssi >= -75) {
return "Fair";
} else if (rssi >= -85) {
return "Poor";
} else {
return "No network";
}
}
#endif
void Application::UpdateDisplay() {
while (true) {
if (chat_state_ == kChatStateIdle) {
#ifdef CONFIG_USE_ML307
std::string network_name = ml307_at_modem_.GetCarrierName();
int signal_quality = ml307_at_modem_.GetCsq();
if (signal_quality == -1) {
network_name = "No network";
} else {
ESP_LOGI(TAG, "%s CSQ: %d", network_name.c_str(), signal_quality);
display_.SetText(network_name + "\n" + csq_to_string(signal_quality) + " (" + std::to_string(signal_quality) + ")");
}
#else
auto& wifi_station = WifiStation::GetInstance();
int8_t rssi = wifi_station.GetRssi();
display_.SetText(wifi_station.GetSsid() + "\n" + rssi_to_string(rssi) + " (" + std::to_string(rssi) + ")");
#endif
}
vTaskDelay(pdMS_TO_TICKS(10 * 1000));
}
}
#endif
void Application::Start() {
auto& builtin_led = BuiltinLed::GetInstance();
#ifdef CONFIG_USE_ML307
builtin_led.SetBlue();
builtin_led.StartContinuousBlink(100);
ml307_at_modem_.SetDebug(false);
ml307_at_modem_.SetBaudRate(921600);
// Print the ML307 modem information
std::string module_name = ml307_at_modem_.GetModuleName();
ESP_LOGI(TAG, "ML307 Module: %s", module_name.c_str());
#ifdef CONFIG_USE_DISPLAY
display_.SetText(std::string("Wait for network\n") + module_name);
#endif
ml307_at_modem_.ResetConnections();
ml307_at_modem_.WaitForNetworkReady();
ESP_LOGI(TAG, "ML307 IMEI: %s", ml307_at_modem_.GetImei().c_str());
ESP_LOGI(TAG, "ML307 ICCID: %s", ml307_at_modem_.GetIccid().c_str());
#else
// Try to connect to WiFi, if failed, launch the WiFi configuration AP
auto& wifi_station = WifiStation::GetInstance();
#ifdef CONFIG_USE_DISPLAY
display_.SetText(std::string("Connect to WiFi\n") + wifi_station.GetSsid());
#endif
builtin_led.SetBlue();
builtin_led.StartContinuousBlink(100);
wifi_station.Start();
if (!wifi_station.IsConnected()) {
builtin_led.SetBlue();
builtin_led.Blink(1000, 500);
auto& wifi_ap = WifiConfigurationAp::GetInstance();
wifi_ap.SetSsidPrefix("Xiaozhi");
#ifdef CONFIG_USE_DISPLAY
display_.SetText(wifi_ap.GetSsid() + "\n" + wifi_ap.GetWebServerUrl());
#endif
wifi_ap.Start();
return;
}
#endif
audio_device_.OnInputData([this](const int16_t* data, int size) {
#ifdef CONFIG_USE_AFE_SR
if (audio_processor_.IsRunning()) {
audio_processor_.Input(data, size);
}
if (wake_word_detect_.IsDetectionRunning()) {
wake_word_detect_.Feed(data, size);
}
#else
std::vector<int16_t> pcm(data, data + size);
Schedule([this, pcm = std::move(pcm)]() {
if (chat_state_ == kChatStateListening) {
std::lock_guard<std::mutex> lock(mutex_);
audio_encode_queue_.emplace_back(std::move(pcm));
cv_.notify_all();
}
});
#endif
});
// Initialize the audio device
audio_device_.Start(CONFIG_AUDIO_INPUT_SAMPLE_RATE, CONFIG_AUDIO_OUTPUT_SAMPLE_RATE);
audio_device_.OnStateChanged([this]() {
if (audio_device_.playing()) {
SetChatState(kChatStateSpeaking);
} else {
// Check if communication is still running
if (xEventGroupGetBits(event_group_) & COMMUNICATION_RUNNING) {
SetChatState(kChatStateListening);
} else {
SetChatState(kChatStateIdle);
}
}
});
// OPUS encoder / decoder use a lot of stack memory
const size_t opus_stack_size = 4096 * 8;
audio_encode_task_stack_ = (StackType_t*)malloc(opus_stack_size);
xTaskCreateStatic([](void* arg) {
audio_encode_task_ = xTaskCreateStatic([](void* arg) {
Application* app = (Application*)arg;
app->AudioEncodeTask();
vTaskDelete(NULL);
}, "opus_encode", opus_stack_size, this, 1, audio_encode_task_stack_, &audio_encode_task_buffer_);
audio_decode_task_stack_ = (StackType_t*)malloc(opus_stack_size);
xTaskCreateStatic([](void* arg) {
xTaskCreate([](void* arg) {
Application* app = (Application*)arg;
app->AudioDecodeTask();
}, "opus_decode", opus_stack_size, this, 1, audio_decode_task_stack_, &audio_decode_task_buffer_);
app->AudioPlayTask();
vTaskDelete(NULL);
}, "play_audio", 4096 * 2, this, 5, NULL);
auto& builtin_led = BuiltinLed::GetInstance();
// Blink the LED to indicate the device is connecting
builtin_led.SetBlue();
builtin_led.BlinkOnce();
WifiStation::GetInstance().Start();
// Check if there is a new firmware version available
firmware_upgrade_.CheckVersion();
if (firmware_upgrade_.HasNewVersion()) {
builtin_led.TurnOn();
firmware_upgrade_.StartUpgrade();
// If upgrade success, the device will reboot and never reach here
ESP_LOGI(TAG, "Firmware upgrade failed...");
builtin_led.TurnOff();
} else {
firmware_upgrade_.MarkValid();
}
#ifdef CONFIG_USE_AFE_SR
wake_word_detect_.OnVadStateChange([this](bool speaking) {
Schedule([this, speaking]() {
auto& builtin_led = BuiltinLed::GetInstance();
if (chat_state_ == kChatStateListening) {
if (speaking) {
builtin_led.SetRed(32);
} else {
builtin_led.SetRed(8);
}
builtin_led.TurnOn();
}
});
});
StartCommunication();
StartDetection();
wake_word_detect_.OnWakeWordDetected([this]() {
Schedule([this]() {
if (chat_state_ == kChatStateIdle) {
// Encode the wake word data and start websocket client at the same time
// They both consume a lot of time (700ms), so we can do them in parallel
wake_word_detect_.EncodeWakeWordData();
SetChatState(kChatStateConnecting);
if (ws_client_ == nullptr) {
StartWebSocketClient();
}
if (ws_client_ && ws_client_->IsConnected()) {
auto encoded = wake_word_detect_.GetWakeWordStream();
// Send the wake word data to the server
ws_client_->Send(encoded.data(), encoded.size(), true);
opus_encoder_.ResetState();
// Send a ready message to indicate the server that the wake word data is sent
SetChatState(kChatStateWakeWordDetected);
// If connected, the hello message is already sent, so we can start communication
audio_processor_.Start();
ESP_LOGI(TAG, "Audio processor started");
} else {
SetChatState(kChatStateIdle);
}
} else if (chat_state_ == kChatStateSpeaking) {
break_speaking_ = true;
}
// Resume detection
wake_word_detect_.StartDetection();
});
});
wake_word_detect_.StartDetection();
audio_processor_.OnOutput([this](std::vector<int16_t>&& data) {
Schedule([this, data = std::move(data)]() {
if (chat_state_ == kChatStateListening) {
std::lock_guard<std::mutex> lock(mutex_);
audio_encode_queue_.emplace_back(std::move(data));
cv_.notify_all();
}
});
});
#endif
// Blink the LED to indicate the device is running
builtin_led.SetGreen();
builtin_led.BlinkOnce();
xEventGroupSetBits(event_group_, DETECTION_RUNNING);
button_.OnClick([this]() {
Schedule([this]() {
if (chat_state_ == kChatStateIdle) {
SetChatState(kChatStateConnecting);
StartWebSocketClient();
if (ws_client_ && ws_client_->IsConnected()) {
opus_encoder_.ResetState();
#ifdef CONFIG_USE_AFE_SR
audio_processor_.Start();
#endif
SetChatState(kChatStateListening);
ESP_LOGI(TAG, "Communication started");
} else {
SetChatState(kChatStateIdle);
}
} else if (chat_state_ == kChatStateSpeaking) {
break_speaking_ = true;
} else if (chat_state_ == kChatStateListening) {
if (ws_client_ && ws_client_->IsConnected()) {
ws_client_->Close();
}
}
});
});
xTaskCreate([](void* arg) {
Application* app = (Application*)arg;
app->MainLoop();
vTaskDelete(NULL);
}, "main_loop", 4096 * 2, this, 5, NULL);
// Launch a task to check for new firmware version
xTaskCreate([](void* arg) {
Application* app = (Application*)arg;
app->CheckNewVersion();
vTaskDelete(NULL);
}, "check_new_version", 4096 * 2, this, 1, NULL);
#ifdef CONFIG_USE_DISPLAY
// Launch a task to update the display
xTaskCreate([](void* arg) {
Application* app = (Application*)arg;
app->UpdateDisplay();
vTaskDelete(NULL);
}, "update_display", 4096, this, 1, NULL);
#endif
}
void Application::Schedule(std::function<void()> callback) {
std::lock_guard<std::mutex> lock(mutex_);
main_tasks_.push_back(callback);
cv_.notify_all();
}
// The Main Loop controls the chat state and websocket connection
// If other tasks need to access the websocket or chat state,
// they should use Schedule to call this function
void Application::MainLoop() {
while (true) {
std::unique_lock<std::mutex> lock(mutex_);
cv_.wait(lock, [this]() {
return !main_tasks_.empty();
});
auto task = std::move(main_tasks_.front());
main_tasks_.pop_front();
lock.unlock();
task();
}
}
void Application::SetChatState(ChatState state) {
auto& builtin_led = BuiltinLed::GetInstance();
const char* state_str[] = {
"idle",
"connecting",
"listening",
"speaking",
"wake_word_detected",
"testing",
"upgrading",
"unknown"
};
chat_state_ = state;
ESP_LOGI(TAG, "STATE: %s", state_str[chat_state_]);
auto& builtin_led = BuiltinLed::GetInstance();
switch (chat_state_) {
case kChatStateIdle:
ESP_LOGI(TAG, "Chat state: idle");
builtin_led.TurnOff();
break;
case kChatStateConnecting:
ESP_LOGI(TAG, "Chat state: connecting");
builtin_led.SetBlue();
builtin_led.TurnOn();
break;
case kChatStateListening:
ESP_LOGI(TAG, "Chat state: listening");
builtin_led.SetRed();
builtin_led.TurnOn();
break;
case kChatStateSpeaking:
ESP_LOGI(TAG, "Chat state: speaking");
builtin_led.SetGreen();
builtin_led.TurnOn();
break;
case kChatStateWakeWordDetected:
ESP_LOGI(TAG, "Chat state: wake word detected");
builtin_led.SetBlue();
builtin_led.TurnOn();
break;
case kChatStateUpgrading:
builtin_led.SetGreen();
builtin_led.StartContinuousBlink(100);
break;
}
const char* state_str[] = { "idle", "connecting", "listening", "speaking", "wake_word_detected", "unknown" };
std::lock_guard<std::recursive_mutex> lock(mutex_);
if (ws_client_ && ws_client_->IsConnected()) {
cJSON* root = cJSON_CreateObject();
cJSON_AddStringToObject(root, "type", "state");
cJSON_AddStringToObject(root, "state", state_str[chat_state_]);
char* json = cJSON_PrintUnformatted(root);
std::lock_guard<std::mutex> lock(mutex_);
ws_client_->Send(json);
cJSON_Delete(root);
free(json);
}
}
void Application::StartCommunication() {
afe_config_t afe_config = {
.aec_init = false,
.se_init = true,
.vad_init = false,
.wakenet_init = false,
.voice_communication_init = true,
.voice_communication_agc_init = true,
.voice_communication_agc_gain = 10,
.vad_mode = VAD_MODE_3,
.wakenet_model_name = NULL,
.wakenet_model_name_2 = NULL,
.wakenet_mode = DET_MODE_90,
.afe_mode = SR_MODE_HIGH_PERF,
.afe_perferred_core = 0,
.afe_perferred_priority = 5,
.afe_ringbuf_size = 50,
.memory_alloc_mode = AFE_MEMORY_ALLOC_MORE_PSRAM,
.afe_linear_gain = 1.0,
.agc_mode = AFE_MN_PEAK_AGC_MODE_2,
.pcm_config = {
.total_ch_num = 1,
.mic_num = 1,
.ref_num = 0,
.sample_rate = CONFIG_AUDIO_INPUT_SAMPLE_RATE,
},
.debug_init = false,
.debug_hook = {{ AFE_DEBUG_HOOK_MASE_TASK_IN, NULL }, { AFE_DEBUG_HOOK_FETCH_TASK_IN, NULL }},
.afe_ns_mode = NS_MODE_SSP,
.afe_ns_model_name = NULL,
.fixed_first_channel = true,
};
afe_communication_data_ = esp_afe_vc_v1.create_from_config(&afe_config);
xTaskCreate([](void* arg) {
Application* app = (Application*)arg;
app->AudioCommunicationTask();
}, "audio_communication", 4096 * 2, this, 5, NULL);
}
void Application::StartDetection() {
afe_config_t afe_config = {
.aec_init = false,
.se_init = true,
.vad_init = false,
.wakenet_init = true,
.voice_communication_init = false,
.voice_communication_agc_init = false,
.voice_communication_agc_gain = 10,
.vad_mode = VAD_MODE_3,
.wakenet_model_name = wakenet_model_,
.wakenet_model_name_2 = NULL,
.wakenet_mode = DET_MODE_90,
.afe_mode = SR_MODE_HIGH_PERF,
.afe_perferred_core = 0,
.afe_perferred_priority = 5,
.afe_ringbuf_size = 50,
.memory_alloc_mode = AFE_MEMORY_ALLOC_MORE_PSRAM,
.afe_linear_gain = 1.0,
.agc_mode = AFE_MN_PEAK_AGC_MODE_2,
.pcm_config = {
.total_ch_num = 1,
.mic_num = 1,
.ref_num = 0,
.sample_rate = CONFIG_AUDIO_INPUT_SAMPLE_RATE
},
.debug_init = false,
.debug_hook = {{ AFE_DEBUG_HOOK_MASE_TASK_IN, NULL }, { AFE_DEBUG_HOOK_FETCH_TASK_IN, NULL }},
.afe_ns_mode = NS_MODE_SSP,
.afe_ns_model_name = NULL,
.fixed_first_channel = true,
};
afe_detection_data_ = esp_afe_sr_v1.create_from_config(&afe_config);
xTaskCreate([](void* arg) {
Application* app = (Application*)arg;
app->AudioFeedTask();
}, "audio_feed", 4096 * 2, this, 5, NULL);
xTaskCreate([](void* arg) {
Application* app = (Application*)arg;
app->AudioDetectionTask();
}, "audio_detection", 4096 * 2, this, 5, NULL);
}
void Application::AudioFeedTask() {
int chunk_size = esp_afe_vc_v1.get_feed_chunksize(afe_detection_data_);
int16_t buffer[chunk_size];
ESP_LOGI(TAG, "Audio feed task started, chunk size: %d", chunk_size);
while (true) {
audio_device_.Read(buffer, chunk_size);
auto event_bits = xEventGroupGetBits(event_group_);
if (event_bits & DETECTION_RUNNING) {
esp_afe_sr_v1.feed(afe_detection_data_, buffer);
} else if (event_bits & COMMUNICATION_RUNNING) {
esp_afe_vc_v1.feed(afe_communication_data_, buffer);
}
}
vTaskDelete(NULL);
}
void Application::StoreWakeWordData(uint8_t* data, size_t size) {
// store audio data to detect_packets_
auto iov = (iovec){
.iov_base = heap_caps_malloc(size, MALLOC_CAP_SPIRAM),
.iov_len = size
};
memcpy(iov.iov_base, data, size);
wake_word_pcm_.push_back(iov);
// remove the oldest packet if the size is larger than 50, about 2 seconds
if (wake_word_pcm_.size() > 50) {
heap_caps_free(wake_word_pcm_.front().iov_base);
wake_word_pcm_.pop_front();
}
}
void Application::EncodeWakeWordData() {
wake_word_opus_.clear();
if (wake_word_encode_task_stack_ == nullptr) {
wake_word_encode_task_stack_ = (StackType_t*)malloc(4096 * 8);
}
wake_word_encode_task_ = xTaskCreateStatic([](void* arg) {
Application* app = (Application*)arg;
auto start_time = esp_timer_get_time();
// encode detect packets
OpusEncoder* encoder = new OpusEncoder();
encoder->Configure(CONFIG_AUDIO_INPUT_SAMPLE_RATE, 1, 60);
encoder->SetComplexity(2);
for (auto& pcm: app->wake_word_pcm_) {
encoder->Encode(pcm, [app](const iovec opus) {
iovec iov = {
.iov_base = heap_caps_malloc(opus.iov_len, MALLOC_CAP_SPIRAM),
.iov_len = opus.iov_len
};
memcpy(iov.iov_base, opus.iov_base, opus.iov_len);
app->wake_word_opus_.push_back(iov);
});
heap_caps_free(pcm.iov_base);
}
app->wake_word_pcm_.clear();
auto end_time = esp_timer_get_time();
ESP_LOGI(TAG, "Encode wake word data opus packets: %d in %lld ms", app->wake_word_opus_.size(), (end_time - start_time) / 1000);
xEventGroupSetBits(app->event_group_, DETECT_PACKETS_ENCODED);
delete encoder;
vTaskDelete(NULL);
}, "encode_detect_packets", 4096 * 8, this, 1, wake_word_encode_task_stack_, &wake_word_encode_task_buffer_);
}
void Application::SendWakeWordData() {
for (auto& opus: wake_word_opus_) {
ws_client_->Send(opus.iov_base, opus.iov_len, true);
heap_caps_free(opus.iov_base);
}
wake_word_opus_.clear();
}
void Application::AudioDetectionTask() {
auto chunk_size = esp_afe_sr_v1.get_fetch_chunksize(afe_detection_data_);
ESP_LOGI(TAG, "Audio detection task started, chunk size: %d", chunk_size);
while (true) {
xEventGroupWaitBits(event_group_, DETECTION_RUNNING, pdFALSE, pdTRUE, portMAX_DELAY);
auto res = esp_afe_sr_v1.fetch(afe_detection_data_);
if (res == nullptr || res->ret_value == ESP_FAIL) {
ESP_LOGE(TAG, "Error in fetch");
if (res != nullptr) {
ESP_LOGI(TAG, "Error code: %d", res->ret_value);
}
continue;;
}
// Store the wake word data for voice recognition, like who is speaking
StoreWakeWordData((uint8_t*)res->data, res->data_size);
if (res->wakeup_state == WAKENET_DETECTED) {
xEventGroupClearBits(event_group_, DETECTION_RUNNING);
SetChatState(kChatStateConnecting);
// Encode the wake word data and start websocket client at the same time
// They both consume a lot of time (700ms), so we can do them in parallel
EncodeWakeWordData();
StartWebSocketClient();
// Here the websocket is done, and we also wait for the wake word data to be encoded
xEventGroupWaitBits(event_group_, DETECT_PACKETS_ENCODED, pdTRUE, pdTRUE, portMAX_DELAY);
std::lock_guard<std::recursive_mutex> lock(mutex_);
if (ws_client_ && ws_client_->IsConnected()) {
// Send the wake word data to the server
SendWakeWordData();
// Send a ready message to indicate the server that the wake word data is sent
SetChatState(kChatStateWakeWordDetected);
opus_encoder_.ResetState();
// If connected, the hello message is already sent, so we can start communication
xEventGroupSetBits(event_group_, COMMUNICATION_RUNNING);
ESP_LOGI(TAG, "Start communication after wake word detected");
} else {
SetChatState(kChatStateIdle);
xEventGroupSetBits(event_group_, DETECTION_RUNNING);
}
}
}
}
void Application::AudioCommunicationTask() {
int chunk_size = esp_afe_vc_v1.get_fetch_chunksize(afe_communication_data_);
ESP_LOGI(TAG, "Audio communication task started, chunk size: %d", chunk_size);
while (true) {
xEventGroupWaitBits(event_group_, COMMUNICATION_RUNNING, pdFALSE, pdTRUE, portMAX_DELAY);
auto res = esp_afe_vc_v1.fetch(afe_communication_data_);
if (res == nullptr || res->ret_value == ESP_FAIL) {
ESP_LOGE(TAG, "Error in fetch");
if (res != nullptr) {
ESP_LOGI(TAG, "Error code: %d", res->ret_value);
}
continue;
}
// Check if the websocket client is disconnected by the server
{
std::lock_guard<std::recursive_mutex> lock(mutex_);
if (ws_client_ == nullptr || !ws_client_->IsConnected()) {
if (ws_client_ != nullptr) {
delete ws_client_;
ws_client_ = nullptr;
}
if (audio_device_.playing()) {
audio_device_.Break();
}
SetChatState(kChatStateIdle);
xEventGroupSetBits(event_group_, DETECTION_RUNNING);
xEventGroupClearBits(event_group_, COMMUNICATION_RUNNING);
continue;
}
}
if (chat_state_ == kChatStateListening) {
// Send audio data to server
iovec data = {
.iov_base = malloc(res->data_size),
.iov_len = (size_t)res->data_size
};
memcpy(data.iov_base, res->data, res->data_size);
xQueueSend(audio_encode_queue_, &data, portMAX_DELAY);
}
}
BinaryProtocol* Application::AllocateBinaryProtocol(const uint8_t* payload, size_t payload_size) {
auto last_timestamp = 0;
auto protocol = (BinaryProtocol*)heap_caps_malloc(sizeof(BinaryProtocol) + payload_size, MALLOC_CAP_SPIRAM);
protocol->version = htons(PROTOCOL_VERSION);
protocol->type = htons(0);
protocol->reserved = 0;
protocol->timestamp = htonl(last_timestamp);
protocol->payload_size = htonl(payload_size);
assert(sizeof(BinaryProtocol) == 16);
memcpy(protocol->payload, payload, payload_size);
return protocol;
}
void Application::AudioEncodeTask() {
ESP_LOGI(TAG, "Audio encode task started");
while (true) {
iovec pcm;
xQueueReceive(audio_encode_queue_, &pcm, portMAX_DELAY);
// Encode audio data
opus_encoder_.Encode(pcm, [this](const iovec opus) {
std::lock_guard<std::recursive_mutex> lock(mutex_);
if (ws_client_ && ws_client_->IsConnected()) {
ws_client_->Send(opus.iov_base, opus.iov_len, true);
}
std::unique_lock<std::mutex> lock(mutex_);
cv_.wait(lock, [this]() {
return !audio_encode_queue_.empty() || !audio_decode_queue_.empty();
});
free(pcm.iov_base);
}
}
if (!audio_encode_queue_.empty()) {
auto pcm = std::move(audio_encode_queue_.front());
audio_encode_queue_.pop_front();
lock.unlock();
void Application::AudioDecodeTask() {
while (true) {
AudioPacket* packet;
xQueueReceive(audio_decode_queue_, &packet, portMAX_DELAY);
// Encode audio data
opus_encoder_.Encode(pcm, [this](const uint8_t* opus, size_t opus_size) {
auto protocol = AllocateBinaryProtocol(opus, opus_size);
Schedule([this, protocol, opus_size]() {
if (ws_client_ && ws_client_->IsConnected()) {
ws_client_->Send(protocol, sizeof(BinaryProtocol) + opus_size, true);
}
heap_caps_free(protocol);
});
});
} else if (!audio_decode_queue_.empty()) {
auto packet = std::move(audio_decode_queue_.front());
audio_decode_queue_.pop_front();
lock.unlock();
if (packet->type == kAudioPacketTypeData) {
int frame_size = opus_decode_sample_rate_ / 1000 * opus_duration_ms_;
packet->pcm.resize(frame_size);
@@ -458,14 +454,79 @@ void Application::AudioDecodeTask() {
}
if (opus_decode_sample_rate_ != CONFIG_AUDIO_OUTPUT_SAMPLE_RATE) {
int target_size = frame_size * CONFIG_AUDIO_OUTPUT_SAMPLE_RATE / opus_decode_sample_rate_;
int target_size = opus_resampler_.GetOutputSamples(frame_size);
std::vector<int16_t> resampled(target_size);
opus_resampler_.Process(packet->pcm.data(), frame_size, resampled.data(), target_size);
opus_resampler_.Process(packet->pcm.data(), frame_size, resampled.data());
packet->pcm = std::move(resampled);
}
std::lock_guard<std::mutex> lock(mutex_);
audio_play_queue_.push_back(packet);
cv_.notify_all();
}
}
}
void Application::HandleAudioPacket(AudioPacket* packet) {
switch (packet->type)
{
case kAudioPacketTypeData: {
if (skip_to_end_) {
break;
}
audio_device_.QueueAudioPacket(packet);
// This will block until the audio device has finished playing the audio
audio_device_.OutputData(packet->pcm);
if (break_speaking_) {
break_speaking_ = false;
skip_to_end_ = true;
// Play a silence and skip to the end
int frame_size = opus_decode_sample_rate_ / 1000 * opus_duration_ms_;
std::vector<int16_t> silence(frame_size);
bzero(silence.data(), silence.size() * sizeof(int16_t));
audio_device_.OutputData(silence);
}
break;
}
case kAudioPacketTypeStart:
Schedule([this]() {
SetChatState(kChatStateSpeaking);
});
break;
case kAudioPacketTypeStop:
skip_to_end_ = false;
Schedule([this]() {
SetChatState(kChatStateListening);
});
break;
case kAudioPacketTypeSentenceStart:
ESP_LOGI(TAG, "<< %s", packet->text.c_str());
break;
case kAudioPacketTypeSentenceEnd:
break;
default:
ESP_LOGI(TAG, "Unknown packet type: %d", packet->type);
break;
}
delete packet;
}
void Application::AudioPlayTask() {
ESP_LOGI(TAG, "Audio play task started");
while (true) {
std::unique_lock<std::mutex> lock(mutex_);
cv_.wait(lock, [this]() {
return !audio_play_queue_.empty();
});
auto packet = std::move(audio_play_queue_.front());
audio_play_queue_.pop_front();
lock.unlock();
HandleAudioPacket(packet);
}
}
@@ -484,13 +545,19 @@ void Application::SetDecodeSampleRate(int sample_rate) {
void Application::StartWebSocketClient() {
if (ws_client_ != nullptr) {
ESP_LOGW(TAG, "WebSocket client already exists");
delete ws_client_;
}
std::string token = "Bearer " + std::string(CONFIG_WEBSOCKET_ACCESS_TOKEN);
ws_client_ = new WebSocketClient();
#ifdef CONFIG_USE_ML307
ws_client_ = new WebSocket(new Ml307SslTransport(ml307_at_modem_, 0));
#else
ws_client_ = new WebSocket(new TlsTransport());
#endif
ws_client_->SetHeader("Authorization", token.c_str());
ws_client_->SetHeader("Device-Id", SystemInfo::GetMacAddress().c_str());
ws_client_->SetHeader("Protocol-Version", std::to_string(PROTOCOL_VERSION).c_str());
ws_client_->OnConnected([this]() {
ESP_LOGI(TAG, "Websocket connected");
@@ -498,8 +565,7 @@ void Application::StartWebSocketClient() {
// Send hello message to describe the client
// keys: message type, version, wakeup_model, audio_params (format, sample_rate, channels)
std::string message = "{";
message += "\"type\":\"hello\", \"version\":\"1.0\",";
message += "\"wakeup_model\":\"" + std::string(wakenet_model_) + "\",";
message += "\"type\":\"hello\",";
message += "\"audio_params\":{";
message += "\"format\":\"opus\", \"sample_rate\":" + std::to_string(CONFIG_AUDIO_INPUT_SAMPLE_RATE) + ", \"channels\":1";
message += "}}";
@@ -507,21 +573,26 @@ void Application::StartWebSocketClient() {
});
ws_client_->OnData([this](const char* data, size_t len, bool binary) {
auto packet = new AudioPacket();
if (binary) {
auto header = (AudioDataHeader*)data;
packet->type = kAudioPacketTypeData;
packet->timestamp = ntohl(header->timestamp);
auto protocol = (BinaryProtocol*)data;
auto payload_size = ntohl(header->payload_size);
auto packet = new AudioPacket();
packet->type = kAudioPacketTypeData;
packet->timestamp = ntohl(protocol->timestamp);
auto payload_size = ntohl(protocol->payload_size);
packet->opus.resize(payload_size);
memcpy(packet->opus.data(), data + sizeof(AudioDataHeader), payload_size);
memcpy(packet->opus.data(), protocol->payload, payload_size);
std::lock_guard<std::mutex> lock(mutex_);
audio_decode_queue_.push_back(packet);
cv_.notify_all();
} else {
// Parse JSON data
auto root = cJSON_Parse(data);
auto type = cJSON_GetObjectItem(root, "type");
if (type != NULL) {
if (strcmp(type->valuestring, "tts") == 0) {
auto packet = new AudioPacket();
auto state = cJSON_GetObjectItem(root, "state");
if (strcmp(state->valuestring, "start") == 0) {
packet->type = kAudioPacketTypeStart;
@@ -537,19 +608,35 @@ void Application::StartWebSocketClient() {
packet->type = kAudioPacketTypeSentenceStart;
packet->text = cJSON_GetObjectItem(root, "text")->valuestring;
}
std::lock_guard<std::mutex> lock(mutex_);
audio_decode_queue_.push_back(packet);
cv_.notify_all();
} else if (strcmp(type->valuestring, "stt") == 0) {
auto text = cJSON_GetObjectItem(root, "text");
if (text != NULL) {
ESP_LOGI(TAG, ">> %s", text->valuestring);
}
}
}
cJSON_Delete(root);
}
xQueueSend(audio_decode_queue_, &packet, portMAX_DELAY);
});
ws_client_->OnError([this](int error) {
ESP_LOGE(TAG, "Websocket error: %d", error);
});
ws_client_->OnClosed([this]() {
ESP_LOGI(TAG, "Websocket closed");
ws_client_->OnDisconnected([this]() {
ESP_LOGI(TAG, "Websocket disconnected");
Schedule([this]() {
#ifdef CONFIG_USE_AFE_SR
audio_processor_.Stop();
#endif
delete ws_client_;
ws_client_ = nullptr;
SetChatState(kChatStateIdle);
});
});
if (!ws_client_->Connect(CONFIG_WEBSOCKET_URL)) {

View File

@@ -2,24 +2,60 @@
#define _APPLICATION_H_
#include "AudioDevice.h"
#include "OpusEncoder.h"
#include "OpusResampler.h"
#include "WebSocketClient.h"
#include "FirmwareUpgrade.h"
#include <OpusEncoder.h>
#include <OpusResampler.h>
#include <WebSocket.h>
#include <Ml307AtModem.h>
#include <Ml307Http.h>
#include <EspHttp.h>
#include "opus.h"
#include "resampler_structs.h"
#include "freertos/event_groups.h"
#include "freertos/queue.h"
#include "freertos/task.h"
#include "esp_afe_sr_models.h"
#include "esp_nsn_models.h"
#include <opus.h>
#include <resampler_structs.h>
#include <freertos/event_groups.h>
#include <freertos/task.h>
#include <mutex>
#include <list>
#include <condition_variable>
#include "Display.h"
#include "FirmwareUpgrade.h"
#ifdef CONFIG_USE_AFE_SR
#include "WakeWordDetect.h"
#include "AudioProcessor.h"
#endif
#include "Button.h"
#define DETECTION_RUNNING 1
#define COMMUNICATION_RUNNING 2
#define DETECT_PACKETS_ENCODED 4
#define PROTOCOL_VERSION 2
struct BinaryProtocol {
uint16_t version;
uint16_t type;
uint32_t reserved;
uint32_t timestamp;
uint32_t payload_size;
uint8_t payload[];
} __attribute__((packed));
enum AudioPacketType {
kAudioPacketTypeUnkonwn = 0,
kAudioPacketTypeStart,
kAudioPacketTypeStop,
kAudioPacketTypeData,
kAudioPacketTypeSentenceStart,
kAudioPacketTypeSentenceEnd
};
struct AudioPacket {
AudioPacketType type = kAudioPacketTypeUnkonwn;
std::string text;
std::vector<uint8_t> opus;
std::vector<int16_t> pcm;
uint32_t timestamp;
};
enum ChatState {
@@ -27,7 +63,8 @@ enum ChatState {
kChatStateConnecting,
kChatStateListening,
kChatStateSpeaking,
kChatStateWakeWordDetected
kChatStateWakeWordDetected,
kChatStateUpgrading
};
class Application {
@@ -47,28 +84,38 @@ private:
Application();
~Application();
Button button_;
AudioDevice audio_device_;
#ifdef CONFIG_USE_AFE_SR
WakeWordDetect wake_word_detect_;
AudioProcessor audio_processor_;
#endif
#ifdef CONFIG_USE_ML307
Ml307AtModem ml307_at_modem_;
Ml307Http http_;
#else
EspHttp http_;
#endif
FirmwareUpgrade firmware_upgrade_;
std::recursive_mutex mutex_;
WebSocketClient* ws_client_ = nullptr;
esp_afe_sr_data_t* afe_detection_data_ = nullptr;
esp_afe_sr_data_t* afe_communication_data_ = nullptr;
#ifdef CONFIG_USE_DISPLAY
Display display_;
#endif
std::mutex mutex_;
std::condition_variable_any cv_;
std::list<std::function<void()>> main_tasks_;
WebSocket* ws_client_ = nullptr;
EventGroupHandle_t event_group_;
char* wakenet_model_ = NULL;
char* nsnet_model_ = NULL;
volatile ChatState chat_state_ = kChatStateIdle;
volatile bool break_speaking_ = false;
bool skip_to_end_ = false;
// Audio encode / decode
TaskHandle_t audio_feed_task_ = nullptr;
TaskHandle_t audio_encode_task_ = nullptr;
StaticTask_t audio_encode_task_buffer_;
StackType_t* audio_encode_task_stack_ = nullptr;
QueueHandle_t audio_encode_queue_ = nullptr;
TaskHandle_t audio_decode_task_ = nullptr;
StaticTask_t audio_decode_task_buffer_;
StackType_t* audio_decode_task_stack_ = nullptr;
QueueHandle_t audio_decode_queue_ = nullptr;
std::list<std::vector<int16_t>> audio_encode_queue_;
std::list<AudioPacket*> audio_decode_queue_;
std::list<AudioPacket*> audio_play_queue_;
OpusEncoder opus_encoder_;
OpusDecoder* opus_decoder_ = nullptr;
@@ -77,26 +124,22 @@ private:
int opus_decode_sample_rate_ = CONFIG_AUDIO_OUTPUT_SAMPLE_RATE;
OpusResampler opus_resampler_;
TaskHandle_t wake_word_encode_task_ = nullptr;
StaticTask_t wake_word_encode_task_buffer_;
StackType_t* wake_word_encode_task_stack_ = nullptr;
std::list<iovec> wake_word_pcm_;
std::vector<iovec> wake_word_opus_;
TaskHandle_t check_new_version_task_ = nullptr;
StaticTask_t check_new_version_task_buffer_;
StackType_t* check_new_version_task_stack_ = nullptr;
void MainLoop();
void Schedule(std::function<void()> callback);
BinaryProtocol* AllocateBinaryProtocol(const uint8_t* payload, size_t payload_size);
void SetDecodeSampleRate(int sample_rate);
void SetChatState(ChatState state);
void StartDetection();
void StartCommunication();
void StartWebSocketClient();
void StoreWakeWordData(uint8_t* data, size_t size);
void EncodeWakeWordData();
void SendWakeWordData();
void CheckNewVersion();
void UpdateDisplay();
void AudioFeedTask();
void AudioDetectionTask();
void AudioCommunicationTask();
void AudioEncodeTask();
void AudioDecodeTask();
void AudioPlayTask();
void HandleAudioPacket(AudioPacket* packet);
};
#endif // _APPLICATION_H_

View File

@@ -1,18 +1,15 @@
#include "AudioDevice.h"
#include "esp_log.h"
#include <esp_log.h>
#include <cstring>
#define TAG "AudioDevice"
AudioDevice::AudioDevice() {
audio_play_queue_ = xQueueCreate(100, sizeof(AudioPacket*));
}
AudioDevice::~AudioDevice() {
vQueueDelete(audio_play_queue_);
if (audio_play_task_ != nullptr) {
vTaskDelete(audio_play_task_);
if (audio_input_task_ != nullptr) {
vTaskDelete(audio_input_task_);
}
if (rx_handle_ != nullptr) {
ESP_ERROR_CHECK(i2s_channel_disable(rx_handle_));
@@ -37,8 +34,8 @@ void AudioDevice::Start(int input_sample_rate, int output_sample_rate) {
xTaskCreate([](void* arg) {
auto audio_device = (AudioDevice*)arg;
audio_device->AudioPlayTask();
}, "audio_play", 4096 * 4, this, 5, &audio_play_task_);
audio_device->InputTask();
}, "audio_input", 4096 * 2, this, 5, &audio_input_task_);
}
void AudioDevice::CreateDuplexChannels() {
@@ -76,10 +73,10 @@ void AudioDevice::CreateDuplexChannels() {
},
.gpio_cfg = {
.mclk = I2S_GPIO_UNUSED,
.bclk = (gpio_num_t)CONFIG_AUDIO_DEVICE_I2S_GPIO_BCLK,
.ws = (gpio_num_t)CONFIG_AUDIO_DEVICE_I2S_GPIO_WS,
.dout = (gpio_num_t)CONFIG_AUDIO_DEVICE_I2S_GPIO_DOUT,
.din = (gpio_num_t)CONFIG_AUDIO_DEVICE_I2S_GPIO_DIN,
.bclk = (gpio_num_t)CONFIG_AUDIO_DEVICE_I2S_MIC_GPIO_BCLK,
.ws = (gpio_num_t)CONFIG_AUDIO_DEVICE_I2S_MIC_GPIO_WS,
.dout = (gpio_num_t)CONFIG_AUDIO_DEVICE_I2S_SPK_GPIO_DOUT,
.din = (gpio_num_t)CONFIG_AUDIO_DEVICE_I2S_MIC_GPIO_DIN,
.invert_flags = {
.mclk_inv = false,
.bclk_inv = false,
@@ -127,9 +124,9 @@ void AudioDevice::CreateSimplexChannels() {
},
.gpio_cfg = {
.mclk = I2S_GPIO_UNUSED,
.bclk = (gpio_num_t)CONFIG_AUDIO_DEVICE_I2S_GPIO_BCLK,
.ws = (gpio_num_t)CONFIG_AUDIO_DEVICE_I2S_GPIO_WS,
.dout = (gpio_num_t)CONFIG_AUDIO_DEVICE_I2S_GPIO_DOUT,
.bclk = (gpio_num_t)CONFIG_AUDIO_DEVICE_I2S_SPK_GPIO_BCLK,
.ws = (gpio_num_t)CONFIG_AUDIO_DEVICE_I2S_SPK_GPIO_WS,
.dout = (gpio_num_t)CONFIG_AUDIO_DEVICE_I2S_SPK_GPIO_DOUT,
.din = I2S_GPIO_UNUSED,
.invert_flags = {
.mclk_inv = false,
@@ -147,7 +144,7 @@ void AudioDevice::CreateSimplexChannels() {
std_cfg.gpio_cfg.bclk = (gpio_num_t)CONFIG_AUDIO_DEVICE_I2S_MIC_GPIO_BCLK;
std_cfg.gpio_cfg.ws = (gpio_num_t)CONFIG_AUDIO_DEVICE_I2S_MIC_GPIO_WS;
std_cfg.gpio_cfg.dout = I2S_GPIO_UNUSED;
std_cfg.gpio_cfg.din = (gpio_num_t)CONFIG_AUDIO_DEVICE_I2S_GPIO_DIN;
std_cfg.gpio_cfg.din = (gpio_num_t)CONFIG_AUDIO_DEVICE_I2S_MIC_GPIO_DIN;
ESP_ERROR_CHECK(i2s_channel_init_std_mode(rx_handle_, &std_cfg));
ESP_LOGI(TAG, "Simplex channels created");
}
@@ -180,57 +177,22 @@ int AudioDevice::Read(int16_t* dest, int samples) {
return samples;
}
void AudioDevice::QueueAudioPacket(AudioPacket* packet) {
xQueueSend(audio_play_queue_, &packet, portMAX_DELAY);
void AudioDevice::OnInputData(std::function<void(const int16_t*, int)> callback) {
on_input_data_ = callback;
}
void AudioDevice::AudioPlayTask() {
while (true) {
AudioPacket* packet;
xQueueReceive(audio_play_queue_, &packet, portMAX_DELAY);
void AudioDevice::OutputData(std::vector<int16_t>& data) {
Write(data.data(), data.size());
}
switch (packet->type)
{
case kAudioPacketTypeStart:
playing_ = true;
breaked_ = false;
if (on_state_changed_) {
on_state_changed_();
}
break;
case kAudioPacketTypeStop:
playing_ = false;
if (on_state_changed_) {
on_state_changed_();
}
break;
case kAudioPacketTypeSentenceStart:
ESP_LOGI(TAG, "Playing sentence: %s", packet->text.c_str());
break;
case kAudioPacketTypeSentenceEnd:
if (breaked_) { // Clear the queue
AudioPacket* p;
while (xQueueReceive(audio_play_queue_, &p, 0) == pdTRUE) {
delete p;
}
breaked_ = false;
playing_ = false;
}
break;
case kAudioPacketTypeData:
Write(packet->pcm.data(), packet->pcm.size());
break;
default:
ESP_LOGE(TAG, "Unknown audio packet type: %d", packet->type);
void AudioDevice::InputTask() {
int duration = 30;
int input_frame_size = input_sample_rate_ / 1000 * duration;
int16_t input_buffer[input_frame_size];
while (true) {
int samples = Read(input_buffer, input_frame_size);
if (samples > 0) {
on_input_data_(input_buffer, samples);
}
delete packet;
}
}
void AudioDevice::OnStateChanged(std::function<void()> callback) {
on_state_changed_ = callback;
}
void AudioDevice::Break() {
breaked_ = true;
}

View File

@@ -1,76 +1,44 @@
#ifndef _AUDIO_DEVICE_H
#define _AUDIO_DEVICE_H
#include "opus.h"
#include "freertos/FreeRTOS.h"
#include "freertos/queue.h"
#include "freertos/event_groups.h"
#include "driver/i2s_std.h"
#include <freertos/FreeRTOS.h>
#include <freertos/event_groups.h>
#include <driver/i2s_std.h>
#include <vector>
#include <string>
#include <functional>
enum AudioPacketType {
kAudioPacketTypeUnkonwn = 0,
kAudioPacketTypeStart,
kAudioPacketTypeStop,
kAudioPacketTypeData,
kAudioPacketTypeSentenceStart,
kAudioPacketTypeSentenceEnd
};
struct AudioPacket {
AudioPacketType type = kAudioPacketTypeUnkonwn;
std::string text;
std::vector<uint8_t> opus;
std::vector<int16_t> pcm;
uint32_t timestamp;
};
struct AudioDataHeader {
uint32_t version;
uint32_t reserved;
uint32_t timestamp;
uint32_t payload_size;
} __attribute__((packed));
class AudioDevice {
public:
AudioDevice();
~AudioDevice();
void Start(int input_sample_rate, int output_sample_rate);
int Read(int16_t* dest, int samples);
void Write(const int16_t* data, int samples);
void QueueAudioPacket(AudioPacket* packet);
void OnStateChanged(std::function<void()> callback);
void Break();
void OnInputData(std::function<void(const int16_t*, int)> callback);
void OutputData(std::vector<int16_t>& data);
int input_sample_rate() const { return input_sample_rate_; }
int output_sample_rate() const { return output_sample_rate_; }
bool duplex() const { return duplex_; }
bool playing() const { return playing_; }
private:
bool playing_ = false;
bool breaked_ = false;
bool duplex_ = false;
int input_sample_rate_ = 0;
int output_sample_rate_ = 0;
i2s_chan_handle_t tx_handle_ = nullptr;
i2s_chan_handle_t rx_handle_ = nullptr;
QueueHandle_t audio_play_queue_ = nullptr;
TaskHandle_t audio_play_task_ = nullptr;
TaskHandle_t audio_input_task_ = nullptr;
EventGroupHandle_t event_group_;
std::function<void()> on_state_changed_;
std::function<void(const int16_t*, int)> on_input_data_;
void CreateDuplexChannels();
void CreateSimplexChannels();
void AudioPlayTask();
void InputTask();
int Read(int16_t* dest, int samples);
void Write(const int16_t* data, int samples);
};
#endif // _AUDIO_DEVICE_H

106
main/AudioProcessor.cc Normal file
View File

@@ -0,0 +1,106 @@
#include "AudioProcessor.h"
#include <esp_log.h>
#define PROCESSOR_RUNNING 0x01
static const char* TAG = "AudioProcessor";
AudioProcessor::AudioProcessor()
: afe_communication_data_(nullptr) {
event_group_ = xEventGroupCreate();
afe_config_t afe_config = {
.aec_init = false,
.se_init = true,
.vad_init = false,
.wakenet_init = false,
.voice_communication_init = true,
.voice_communication_agc_init = true,
.voice_communication_agc_gain = 10,
.vad_mode = VAD_MODE_3,
.wakenet_model_name = NULL,
.wakenet_model_name_2 = NULL,
.wakenet_mode = DET_MODE_90,
.afe_mode = SR_MODE_HIGH_PERF,
.afe_perferred_core = 0,
.afe_perferred_priority = 5,
.afe_ringbuf_size = 50,
.memory_alloc_mode = AFE_MEMORY_ALLOC_MORE_PSRAM,
.afe_linear_gain = 1.0,
.agc_mode = AFE_MN_PEAK_AGC_MODE_2,
.pcm_config = {
.total_ch_num = 1,
.mic_num = 1,
.ref_num = 0,
.sample_rate = CONFIG_AUDIO_INPUT_SAMPLE_RATE,
},
.debug_init = false,
.debug_hook = {{ AFE_DEBUG_HOOK_MASE_TASK_IN, NULL }, { AFE_DEBUG_HOOK_FETCH_TASK_IN, NULL }},
.afe_ns_mode = NS_MODE_SSP,
.afe_ns_model_name = NULL,
.fixed_first_channel = true,
};
afe_communication_data_ = esp_afe_vc_v1.create_from_config(&afe_config);
xTaskCreate([](void* arg) {
auto this_ = (AudioProcessor*)arg;
this_->AudioProcessorTask();
vTaskDelete(NULL);
}, "audio_communication", 4096 * 2, this, 5, NULL);
}
AudioProcessor::~AudioProcessor() {
if (afe_communication_data_ != nullptr) {
esp_afe_vc_v1.destroy(afe_communication_data_);
}
vEventGroupDelete(event_group_);
}
void AudioProcessor::Input(const int16_t* data, int size) {
input_buffer_.insert(input_buffer_.end(), data, data + size);
auto chunk_size = esp_afe_vc_v1.get_feed_chunksize(afe_communication_data_);
while (input_buffer_.size() >= chunk_size) {
auto chunk = input_buffer_.data();
esp_afe_vc_v1.feed(afe_communication_data_, chunk);
input_buffer_.erase(input_buffer_.begin(), input_buffer_.begin() + chunk_size);
}
}
void AudioProcessor::Start() {
xEventGroupSetBits(event_group_, PROCESSOR_RUNNING);
}
void AudioProcessor::Stop() {
xEventGroupClearBits(event_group_, PROCESSOR_RUNNING);
}
bool AudioProcessor::IsRunning() {
return xEventGroupGetBits(event_group_) & PROCESSOR_RUNNING;
}
void AudioProcessor::OnOutput(std::function<void(std::vector<int16_t>&& data)> callback) {
output_callback_ = callback;
}
void AudioProcessor::AudioProcessorTask() {
int chunk_size = esp_afe_vc_v1.get_fetch_chunksize(afe_communication_data_);
ESP_LOGI(TAG, "Audio communication task started, chunk size: %d", chunk_size);
while (true) {
xEventGroupWaitBits(event_group_, PROCESSOR_RUNNING, pdFALSE, pdTRUE, portMAX_DELAY);
auto res = esp_afe_vc_v1.fetch(afe_communication_data_);
if (res == nullptr || res->ret_value == ESP_FAIL) {
if (res != nullptr) {
ESP_LOGI(TAG, "Error code: %d", res->ret_value);
}
continue;
}
if (output_callback_) {
output_callback_(std::vector<int16_t>(res->data, res->data + res->data_size / sizeof(int16_t)));
}
}
}

33
main/AudioProcessor.h Normal file
View File

@@ -0,0 +1,33 @@
#ifndef AUDIO_PROCESSOR_H
#define AUDIO_PROCESSOR_H
#include <esp_afe_sr_models.h>
#include <freertos/FreeRTOS.h>
#include <freertos/task.h>
#include <freertos/event_groups.h>
#include <string>
#include <vector>
#include <functional>
class AudioProcessor {
public:
AudioProcessor();
~AudioProcessor();
void Input(const int16_t* data, int size);
void Start();
void Stop();
bool IsRunning();
void OnOutput(std::function<void(std::vector<int16_t>&& data)> callback);
private:
EventGroupHandle_t event_group_ = nullptr;
esp_afe_sr_data_t* afe_communication_data_ = nullptr;
std::vector<int16_t> input_buffer_;
std::function<void(std::vector<int16_t>&& data)> output_callback_;
void AudioProcessorTask();
};
#endif

67
main/Button.cc Normal file
View File

@@ -0,0 +1,67 @@
#include "Button.h"
#include <esp_log.h>
static const char* TAG = "Button";
Button::Button(gpio_num_t gpio_num) : gpio_num_(gpio_num) {
button_config_t button_config = {
.type = BUTTON_TYPE_GPIO,
.long_press_time = 3000,
.short_press_time = 100,
.gpio_button_config = {
.gpio_num = gpio_num,
.active_level = 0
}
};
button_handle_ = iot_button_create(&button_config);
if (button_handle_ == NULL) {
ESP_LOGE(TAG, "Failed to create button handle");
return;
}
}
Button::~Button() {
if (button_handle_ != NULL) {
iot_button_delete(button_handle_);
}
}
void Button::OnPress(std::function<void()> callback) {
on_press_ = callback;
iot_button_register_cb(button_handle_, BUTTON_PRESS_DOWN, [](void* handle, void* usr_data) {
Button* button = static_cast<Button*>(usr_data);
if (button->on_press_) {
button->on_press_();
}
}, this);
}
void Button::OnLongPress(std::function<void()> callback) {
on_long_press_ = callback;
iot_button_register_cb(button_handle_, BUTTON_LONG_PRESS_START, [](void* handle, void* usr_data) {
Button* button = static_cast<Button*>(usr_data);
if (button->on_long_press_) {
button->on_long_press_();
}
}, this);
}
void Button::OnClick(std::function<void()> callback) {
on_click_ = callback;
iot_button_register_cb(button_handle_, BUTTON_SINGLE_CLICK, [](void* handle, void* usr_data) {
Button* button = static_cast<Button*>(usr_data);
if (button->on_click_) {
button->on_click_();
}
}, this);
}
void Button::OnDoubleClick(std::function<void()> callback) {
on_double_click_ = callback;
iot_button_register_cb(button_handle_, BUTTON_DOUBLE_CLICK, [](void* handle, void* usr_data) {
Button* button = static_cast<Button*>(usr_data);
if (button->on_double_click_) {
button->on_double_click_();
}
}, this);
}

28
main/Button.h Normal file
View File

@@ -0,0 +1,28 @@
#ifndef BUTTON_H_
#define BUTTON_H_
#include <driver/gpio.h>
#include <iot_button.h>
#include <functional>
class Button {
public:
Button(gpio_num_t gpio_num);
~Button();
void OnPress(std::function<void()> callback);
void OnLongPress(std::function<void()> callback);
void OnClick(std::function<void()> callback);
void OnDoubleClick(std::function<void()> callback);
private:
gpio_num_t gpio_num_;
button_handle_t button_handle_;
std::function<void()> on_press_;
std::function<void()> on_long_press_;
std::function<void()> on_click_;
std::function<void()> on_double_click_;
};
#endif // BUTTON_H_

View File

@@ -1,9 +1,17 @@
set(SOURCES "AudioDevice.cc"
"FirmwareUpgrade.cc"
"SystemInfo.cc"
"SystemReset.cc"
"Application.cc"
"Display.cc"
"Button.cc"
"main.cc"
)
if(CONFIG_USE_AFE_SR)
list(APPEND SOURCES "AudioProcessor.cc" "WakeWordDetect.cc")
endif()
idf_component_register(SRCS ${SOURCES}
INCLUDE_DIRS "."
)

139
main/Display.cc Normal file
View File

@@ -0,0 +1,139 @@
#include "Display.h"
#include <esp_log.h>
#include <esp_err.h>
#include <esp_lcd_panel_ops.h>
#include <esp_lcd_panel_vendor.h>
#include <esp_lvgl_port.h>
#include <string>
#include <cstdlib>
#define TAG "Display"
#ifdef CONFIG_USE_DISPLAY
Display::Display(int sda_pin, int scl_pin) : sda_pin_(sda_pin), scl_pin_(scl_pin) {
ESP_LOGI(TAG, "Display Pins: %d, %d", sda_pin_, scl_pin_);
i2c_master_bus_config_t bus_config = {
.i2c_port = I2C_NUM_0,
.sda_io_num = (gpio_num_t)sda_pin_,
.scl_io_num = (gpio_num_t)scl_pin_,
.clk_source = I2C_CLK_SRC_DEFAULT,
.glitch_ignore_cnt = 7,
.intr_priority = 1,
.trans_queue_depth = 0,
.flags = {
.enable_internal_pullup = 1,
},
};
ESP_ERROR_CHECK(i2c_new_master_bus(&bus_config, &i2c_bus_));
// SSD1306 config
esp_lcd_panel_io_i2c_config_t io_config = {
.dev_addr = 0x3C,
.on_color_trans_done = nullptr,
.user_ctx = nullptr,
.control_phase_bytes = 1,
.dc_bit_offset = 6,
.lcd_cmd_bits = 8,
.lcd_param_bits = 8,
.flags = {
.dc_low_on_data = 0,
.disable_control_phase = 0,
},
.scl_speed_hz = 400 * 1000,
};
ESP_ERROR_CHECK(esp_lcd_new_panel_io_i2c_v2(i2c_bus_, &io_config, &panel_io_));
ESP_LOGI(TAG, "Install SSD1306 driver");
esp_lcd_panel_dev_config_t panel_config = {};
panel_config.reset_gpio_num = -1;
panel_config.bits_per_pixel = 1;
esp_lcd_panel_ssd1306_config_t ssd1306_config = {
.height = CONFIG_DISPLAY_HEIGHT
};
panel_config.vendor_config = &ssd1306_config;
ESP_ERROR_CHECK(esp_lcd_new_panel_ssd1306(panel_io_, &panel_config, &panel_));
ESP_LOGI(TAG, "SSD1306 driver installed");
// Reset the display
ESP_ERROR_CHECK(esp_lcd_panel_reset(panel_));
if (esp_lcd_panel_init(panel_) != ESP_OK) {
ESP_LOGE(TAG, "Failed to initialize display");
return;
}
ESP_LOGI(TAG, "Initialize LVGL");
lvgl_port_cfg_t port_cfg = ESP_LVGL_PORT_INIT_CONFIG();
lvgl_port_init(&port_cfg);
const lvgl_port_display_cfg_t display_cfg = {
.io_handle = panel_io_,
.panel_handle = panel_,
.buffer_size = 128 * CONFIG_DISPLAY_HEIGHT,
.double_buffer = true,
.hres = 128,
.vres = CONFIG_DISPLAY_HEIGHT,
.monochrome = true,
.rotation = {
.swap_xy = 0,
.mirror_x = 0,
.mirror_y = 0,
},
.flags = {
.buff_dma = 0,
.buff_spiram = 0,
},
};
disp_ = lvgl_port_add_disp(&display_cfg);
lv_disp_set_rotation(disp_, LV_DISP_ROT_180);
// Set the display to on
ESP_LOGI(TAG, "Turning display on");
ESP_ERROR_CHECK(esp_lcd_panel_disp_on_off(panel_, true));
ESP_LOGI(TAG, "Display Loading...");
if (lvgl_port_lock(0)) {
label_ = lv_label_create(lv_disp_get_scr_act(disp_));
lv_label_set_text(label_, "Initializing...");
lv_obj_set_width(label_, disp_->driver->hor_res);
lv_obj_set_height(label_, disp_->driver->ver_res);
lv_obj_set_style_text_line_space(label_, 0, 0);
lv_obj_set_style_pad_all(label_, 0, 0);
lv_obj_set_style_outline_pad(label_, 0, 0);
lvgl_port_unlock();
}
}
Display::~Display() {
if (label_ != nullptr) {
lvgl_port_lock(0);
lv_obj_del(label_);
lvgl_port_unlock();
}
if (disp_ != nullptr) {
lvgl_port_deinit();
esp_lcd_panel_del(panel_);
esp_lcd_panel_io_del(panel_io_);
i2c_master_bus_reset(i2c_bus_);
}
}
void Display::SetText(const std::string &text) {
if (label_ != nullptr) {
text_ = text;
lvgl_port_lock(0);
// Change the text of the label
lv_label_set_text(label_, text_.c_str());
lvgl_port_unlock();
}
}
#endif

32
main/Display.h Normal file
View File

@@ -0,0 +1,32 @@
#ifndef DISPLAY_H
#define DISPLAY_H
#include <driver/i2c_master.h>
#include <esp_lcd_panel_io.h>
#include <esp_lcd_panel_ops.h>
#include <lvgl.h>
#include <string>
class Display {
public:
Display(int sda_pin, int scl_pin);
~Display();
void SetText(const std::string &text);
private:
int sda_pin_;
int scl_pin_;
i2c_master_bus_handle_t i2c_bus_ = nullptr;
esp_lcd_panel_io_handle_t panel_io_ = nullptr;
esp_lcd_panel_handle_t panel_ = nullptr;
lv_disp_t *disp_ = nullptr;
lv_obj_t *label_ = nullptr;
std::string text_;
};
#endif

259
main/FirmwareUpgrade.cc Normal file
View File

@@ -0,0 +1,259 @@
#include "FirmwareUpgrade.h"
#include "SystemInfo.h"
#include <cJSON.h>
#include <esp_log.h>
#include <esp_partition.h>
#include <esp_http_client.h>
#include <esp_ota_ops.h>
#include <esp_app_format.h>
#include <vector>
#include <sstream>
#include <algorithm>
#define TAG "FirmwareUpgrade"
FirmwareUpgrade::FirmwareUpgrade(Http& http) : http_(http) {
}
FirmwareUpgrade::~FirmwareUpgrade() {
}
void FirmwareUpgrade::SetCheckVersionUrl(std::string check_version_url) {
check_version_url_ = check_version_url;
}
void FirmwareUpgrade::SetPostData(const std::string& post_data) {
post_data_ = post_data;
}
void FirmwareUpgrade::SetHeader(const std::string& key, const std::string& value) {
headers_[key] = value;
}
void FirmwareUpgrade::CheckVersion() {
std::string current_version = esp_app_get_description()->version;
ESP_LOGI(TAG, "Current version: %s", current_version.c_str());
if (check_version_url_.length() < 10) {
ESP_LOGE(TAG, "Check version URL is not properly set");
return;
}
for (const auto& header : headers_) {
http_.SetHeader(header.first, header.second);
}
if (post_data_.empty()) {
http_.Open("GET", check_version_url_);
} else {
http_.SetHeader("Content-Type", "application/json");
http_.SetContent(post_data_);
http_.Open("POST", check_version_url_);
}
auto response = http_.GetBody();
http_.Close();
// Response: { "firmware": { "version": "1.0.0", "url": "http://" } }
// Parse the JSON response and check if the version is newer
// If it is, set has_new_version_ to true and store the new version and URL
cJSON *root = cJSON_Parse(response.c_str());
if (root == NULL) {
ESP_LOGE(TAG, "Failed to parse JSON response");
return;
}
cJSON *firmware = cJSON_GetObjectItem(root, "firmware");
if (firmware == NULL) {
ESP_LOGE(TAG, "Failed to get firmware object");
cJSON_Delete(root);
return;
}
cJSON *version = cJSON_GetObjectItem(firmware, "version");
if (version == NULL) {
ESP_LOGE(TAG, "Failed to get version object");
cJSON_Delete(root);
return;
}
cJSON *url = cJSON_GetObjectItem(firmware, "url");
if (url == NULL) {
ESP_LOGE(TAG, "Failed to get url object");
cJSON_Delete(root);
return;
}
firmware_version_ = version->valuestring;
firmware_url_ = url->valuestring;
cJSON_Delete(root);
// Check if the version is newer, for example, 0.1.0 is newer than 0.0.1
has_new_version_ = IsNewVersionAvailable(current_version, firmware_version_);
if (has_new_version_) {
ESP_LOGI(TAG, "New version available: %s", firmware_version_.c_str());
} else {
ESP_LOGI(TAG, "Current is the latest version");
}
}
void FirmwareUpgrade::MarkCurrentVersionValid() {
auto partition = esp_ota_get_running_partition();
if (strcmp(partition->label, "factory") == 0) {
ESP_LOGI(TAG, "Running from factory partition, skipping");
return;
}
ESP_LOGI(TAG, "Running partition: %s", partition->label);
esp_ota_img_states_t state;
if (esp_ota_get_state_partition(partition, &state) != ESP_OK) {
ESP_LOGE(TAG, "Failed to get state of partition");
return;
}
if (state == ESP_OTA_IMG_PENDING_VERIFY) {
ESP_LOGI(TAG, "Marking firmware as valid");
esp_ota_mark_app_valid_cancel_rollback();
}
}
void FirmwareUpgrade::Upgrade(const std::string& firmware_url) {
ESP_LOGI(TAG, "Upgrading firmware from %s", firmware_url.c_str());
esp_ota_handle_t update_handle = 0;
auto update_partition = esp_ota_get_next_update_partition(NULL);
if (update_partition == NULL) {
ESP_LOGE(TAG, "Failed to get update partition");
return;
}
ESP_LOGI(TAG, "Writing to partition %s at offset 0x%lx", update_partition->label, update_partition->address);
bool image_header_checked = false;
std::string image_header;
if (!http_.Open("GET", firmware_url)) {
ESP_LOGE(TAG, "Failed to open HTTP connection");
return;
}
size_t content_length = http_.GetBodyLength();
if (content_length == 0) {
ESP_LOGE(TAG, "Failed to get content length");
http_.Close();
return;
}
char buffer[4096];
size_t total_read = 0, recent_read = 0;
auto last_calc_time = esp_timer_get_time();
while (true) {
int ret = http_.Read(buffer, sizeof(buffer));
if (ret < 0) {
ESP_LOGE(TAG, "Failed to read HTTP data: %s", esp_err_to_name(ret));
http_.Close();
return;
}
// Calculate speed and progress every second
recent_read += ret;
total_read += ret;
if (esp_timer_get_time() - last_calc_time >= 1000000 || ret == 0) {
size_t progress = total_read * 100 / content_length;
ESP_LOGI(TAG, "Progress: %zu%% (%zu/%zu), Speed: %zuB/s", progress, total_read, content_length, recent_read);
if (upgrade_callback_) {
upgrade_callback_(progress, recent_read);
}
last_calc_time = esp_timer_get_time();
recent_read = 0;
}
if (ret == 0) {
break;
}
if (!image_header_checked) {
image_header.append(buffer, ret);
if (image_header.size() >= sizeof(esp_image_header_t) + sizeof(esp_image_segment_header_t) + sizeof(esp_app_desc_t)) {
esp_app_desc_t new_app_info;
memcpy(&new_app_info, image_header.data() + sizeof(esp_image_header_t) + sizeof(esp_image_segment_header_t), sizeof(esp_app_desc_t));
ESP_LOGI(TAG, "New firmware version: %s", new_app_info.version);
auto current_version = esp_app_get_description()->version;
if (memcmp(new_app_info.version, current_version, sizeof(new_app_info.version)) == 0) {
ESP_LOGE(TAG, "Firmware version is the same, skipping upgrade");
http_.Close();
return;
}
if (esp_ota_begin(update_partition, OTA_WITH_SEQUENTIAL_WRITES, &update_handle)) {
esp_ota_abort(update_handle);
http_.Close();
ESP_LOGE(TAG, "Failed to begin OTA");
return;
}
image_header_checked = true;
}
}
auto err = esp_ota_write(update_handle, buffer, ret);
if (err != ESP_OK) {
ESP_LOGE(TAG, "Failed to write OTA data: %s", esp_err_to_name(err));
esp_ota_abort(update_handle);
http_.Close();
return;
}
}
http_.Close();
esp_err_t err = esp_ota_end(update_handle);
if (err != ESP_OK) {
if (err == ESP_ERR_OTA_VALIDATE_FAILED) {
ESP_LOGE(TAG, "Image validation failed, image is corrupted");
} else {
ESP_LOGE(TAG, "Failed to end OTA: %s", esp_err_to_name(err));
}
return;
}
err = esp_ota_set_boot_partition(update_partition);
if (err != ESP_OK) {
ESP_LOGE(TAG, "Failed to set boot partition: %s", esp_err_to_name(err));
return;
}
ESP_LOGI(TAG, "Firmware upgrade successful, rebooting in 3 seconds...");
vTaskDelay(pdMS_TO_TICKS(3000));
esp_restart();
}
void FirmwareUpgrade::StartUpgrade(std::function<void(int progress, size_t speed)> callback) {
upgrade_callback_ = callback;
Upgrade(firmware_url_);
}
std::vector<int> FirmwareUpgrade::ParseVersion(const std::string& version) {
std::vector<int> versionNumbers;
std::stringstream ss(version);
std::string segment;
while (std::getline(ss, segment, '.')) {
versionNumbers.push_back(std::stoi(segment));
}
return versionNumbers;
}
bool FirmwareUpgrade::IsNewVersionAvailable(const std::string& currentVersion, const std::string& newVersion) {
std::vector<int> current = ParseVersion(currentVersion);
std::vector<int> newer = ParseVersion(newVersion);
for (size_t i = 0; i < std::min(current.size(), newer.size()); ++i) {
if (newer[i] > current[i]) {
return true;
} else if (newer[i] < current[i]) {
return false;
}
}
return newer.size() > current.size();
}

38
main/FirmwareUpgrade.h Normal file
View File

@@ -0,0 +1,38 @@
#ifndef _FIRMWARE_UPGRADE_H
#define _FIRMWARE_UPGRADE_H
#include <functional>
#include <string>
#include <map>
#include <Http.h>
class FirmwareUpgrade {
public:
FirmwareUpgrade(Http& http);
~FirmwareUpgrade();
void SetCheckVersionUrl(std::string check_version_url);
void SetPostData(const std::string& post_data);
void SetHeader(const std::string& key, const std::string& value);
void CheckVersion();
bool HasNewVersion() { return has_new_version_; }
void StartUpgrade(std::function<void(int progress, size_t speed)> callback);
void MarkCurrentVersionValid();
private:
Http& http_;
std::string check_version_url_;
bool has_new_version_ = false;
std::string firmware_version_;
std::string firmware_url_;
std::string post_data_;
std::map<std::string, std::string> headers_;
void Upgrade(const std::string& firmware_url);
std::function<void(int progress, size_t speed)> upgrade_callback_;
std::vector<int> ParseVersion(const std::string& version);
bool IsNewVersionAvailable(const std::string& currentVersion, const std::string& newVersion);
};
#endif // _FIRMWARE_UPGRADE_H

View File

@@ -1,14 +1,20 @@
menu "Xiaozhi Assistant"
config OTA_VERSION_URL
string "OTA Version URL"
default "https://api.tenclass.net/xiaozhi/ota/"
help
The application will access this URL to check for updates.
config WEBSOCKET_URL
string "Websocket URL"
default "wss://"
default "wss://api.tenclass.net/xiaozhi/v1/"
help
Communication with the server through websocket after wake up.
config WEBSOCKET_ACCESS_TOKEN
string "Websocket Access Token"
default ""
default "test-token"
help
Access token for websocket communication.
@@ -24,29 +30,29 @@ config AUDIO_OUTPUT_SAMPLE_RATE
help
Audio output sample rate.
config AUDIO_DEVICE_I2S_GPIO_BCLK
int "I2S GPIO BCLK"
default 5
help
GPIO number of the I2S BCLK.
config AUDIO_DEVICE_I2S_GPIO_WS
config AUDIO_DEVICE_I2S_MIC_GPIO_WS
int "I2S GPIO WS"
default 4
help
GPIO number of the I2S WS.
config AUDIO_DEVICE_I2S_GPIO_DOUT
int "I2S GPIO DOUT"
config AUDIO_DEVICE_I2S_MIC_GPIO_BCLK
int "I2S GPIO BCLK"
default 5
help
GPIO number of the I2S BCLK.
config AUDIO_DEVICE_I2S_MIC_GPIO_DIN
int "I2S GPIO DIN"
default 6
help
GPIO number of the I2S DOUT.
config AUDIO_DEVICE_I2S_GPIO_DIN
int "I2S GPIO DIN"
default 3
help
GPIO number of the I2S DIN.
config AUDIO_DEVICE_I2S_SPK_GPIO_DOUT
int "I2S GPIO DOUT"
default 7
help
GPIO number of the I2S DOUT.
config AUDIO_DEVICE_I2S_SIMPLEX
bool "I2S Simplex"
@@ -54,18 +60,77 @@ config AUDIO_DEVICE_I2S_SIMPLEX
help
Enable I2S Simplex mode.
config AUDIO_DEVICE_I2S_MIC_GPIO_BCLK
int "I2S MIC GPIO BCLK"
default 11
config AUDIO_DEVICE_I2S_SPK_GPIO_BCLK
int "I2S SPK GPIO BCLK"
default 15
depends on AUDIO_DEVICE_I2S_SIMPLEX
help
GPIO number of the I2S MIC BCLK.
config AUDIO_DEVICE_I2S_MIC_GPIO_WS
int "I2S MIC GPIO WS"
default 10
config AUDIO_DEVICE_I2S_SPK_GPIO_WS
int "I2S SPK GPIO WS"
default 16
depends on AUDIO_DEVICE_I2S_SIMPLEX
help
GPIO number of the I2S MIC WS.
config BOOT_BUTTON_GPIO
int "Boot Button GPIO"
default 0
help
GPIO number of the boot button.
config USE_AFE_SR
bool "Use Espressif AFE SR"
default y
help
Use AFE SR for wake word detection.
config USE_ML307
bool "Use ML307"
default n
help
Use ML307 as the modem.
config ML307_RX_PIN
int "ML307 RX Pin"
default 11
depends on USE_ML307
help
GPIO number of the ML307 RX.
config ML307_TX_PIN
int "ML307 TX Pin"
default 12
depends on USE_ML307
help
GPIO number of the ML307 TX.
config USE_DISPLAY
bool "Use Display"
default n
help
Use Display.
config DISPLAY_HEIGHT
int "Display Height"
default 32
depends on USE_DISPLAY
help
Display height in pixels.
config DISPLAY_SDA_PIN
int "Display SDA Pin"
default 41
depends on USE_DISPLAY
help
GPIO number of the Display SDA.
config DISPLAY_SCL_PIN
int "Display SCL Pin"
default 42
depends on USE_DISPLAY
help
GPIO number of the Display SCL.
endmenu

218
main/SystemInfo.cc Normal file
View File

@@ -0,0 +1,218 @@
#include "SystemInfo.h"
#include <freertos/task.h>
#include <esp_log.h>
#include <esp_flash.h>
#include <esp_mac.h>
#include <esp_chip_info.h>
#include <esp_system.h>
#include <esp_partition.h>
#include <esp_app_desc.h>
#include <esp_ota_ops.h>
#define TAG "SystemInfo"
size_t SystemInfo::GetFlashSize() {
uint32_t flash_size;
if (esp_flash_get_size(NULL, &flash_size) != ESP_OK) {
ESP_LOGE(TAG, "Failed to get flash size");
return 0;
}
return (size_t)flash_size;
}
size_t SystemInfo::GetMinimumFreeHeapSize() {
return esp_get_minimum_free_heap_size();
}
size_t SystemInfo::GetFreeHeapSize() {
return esp_get_free_heap_size();
}
std::string SystemInfo::GetMacAddress() {
uint8_t mac[6];
esp_read_mac(mac, ESP_MAC_WIFI_STA);
char mac_str[18];
snprintf(mac_str, sizeof(mac_str), "%02x:%02x:%02x:%02x:%02x:%02x", mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);
return std::string(mac_str);
}
std::string SystemInfo::GetChipModelName() {
return std::string(CONFIG_IDF_TARGET);
}
std::string SystemInfo::GetJsonString() {
/*
{
"flash_size": 4194304,
"psram_size": 0,
"minimum_free_heap_size": 123456,
"mac_address": "00:00:00:00:00:00",
"chip_model_name": "esp32s3",
"chip_info": {
"model": 1,
"cores": 2,
"revision": 0,
"features": 0
},
"application": {
"name": "my-app",
"version": "1.0.0",
"compile_time": "2021-01-01T00:00:00Z"
"idf_version": "4.2-dev"
"elf_sha256": ""
},
"partition_table": [
"app": {
"label": "app",
"type": 1,
"subtype": 2,
"address": 0x10000,
"size": 0x100000
}
],
"ota": {
"label": "ota_0"
}
}
*/
std::string json = "{";
json += "\"flash_size\":" + std::to_string(GetFlashSize()) + ",";
json += "\"minimum_free_heap_size\":" + std::to_string(GetMinimumFreeHeapSize()) + ",";
json += "\"mac_address\":\"" + GetMacAddress() + "\",";
json += "\"chip_model_name\":\"" + GetChipModelName() + "\",";
json += "\"chip_info\":{";
esp_chip_info_t chip_info;
esp_chip_info(&chip_info);
json += "\"model\":" + std::to_string(chip_info.model) + ",";
json += "\"cores\":" + std::to_string(chip_info.cores) + ",";
json += "\"revision\":" + std::to_string(chip_info.revision) + ",";
json += "\"features\":" + std::to_string(chip_info.features);
json += "},";
json += "\"application\":{";
auto app_desc = esp_app_get_description();
json += "\"name\":\"" + std::string(app_desc->project_name) + "\",";
json += "\"version\":\"" + std::string(app_desc->version) + "\",";
json += "\"compile_time\":\"" + std::string(app_desc->date) + "T" + std::string(app_desc->time) + "Z\",";
json += "\"idf_version\":\"" + std::string(app_desc->idf_ver) + "\",";
char sha256_str[65];
for (int i = 0; i < 32; i++) {
snprintf(sha256_str + i * 2, sizeof(sha256_str) - i * 2, "%02x", app_desc->app_elf_sha256[i]);
}
json += "\"elf_sha256\":\"" + std::string(sha256_str) + "\"";
json += "},";
json += "\"partition_table\": [";
esp_partition_iterator_t it = esp_partition_find(ESP_PARTITION_TYPE_ANY, ESP_PARTITION_SUBTYPE_ANY, NULL);
while (it) {
const esp_partition_t *partition = esp_partition_get(it);
json += "{";
json += "\"label\":\"" + std::string(partition->label) + "\",";
json += "\"type\":" + std::to_string(partition->type) + ",";
json += "\"subtype\":" + std::to_string(partition->subtype) + ",";
json += "\"address\":" + std::to_string(partition->address) + ",";
json += "\"size\":" + std::to_string(partition->size);
json += "},";
it = esp_partition_next(it);
}
json.pop_back(); // Remove the last comma
json += "],";
json += "\"ota\":{";
auto ota_partition = esp_ota_get_running_partition();
json += "\"label\":\"" + std::string(ota_partition->label) + "\"";
json += "}";
// Close the JSON object
json += "}";
return json;
}
esp_err_t SystemInfo::PrintRealTimeStats(TickType_t xTicksToWait) {
#define ARRAY_SIZE_OFFSET 5
TaskStatus_t *start_array = NULL, *end_array = NULL;
UBaseType_t start_array_size, end_array_size;
configRUN_TIME_COUNTER_TYPE start_run_time, end_run_time;
esp_err_t ret;
uint32_t total_elapsed_time;
//Allocate array to store current task states
start_array_size = uxTaskGetNumberOfTasks() + ARRAY_SIZE_OFFSET;
start_array = (TaskStatus_t*)malloc(sizeof(TaskStatus_t) * start_array_size);
if (start_array == NULL) {
ret = ESP_ERR_NO_MEM;
goto exit;
}
//Get current task states
start_array_size = uxTaskGetSystemState(start_array, start_array_size, &start_run_time);
if (start_array_size == 0) {
ret = ESP_ERR_INVALID_SIZE;
goto exit;
}
vTaskDelay(xTicksToWait);
//Allocate array to store tasks states post delay
end_array_size = uxTaskGetNumberOfTasks() + ARRAY_SIZE_OFFSET;
end_array = (TaskStatus_t*)malloc(sizeof(TaskStatus_t) * end_array_size);
if (end_array == NULL) {
ret = ESP_ERR_NO_MEM;
goto exit;
}
//Get post delay task states
end_array_size = uxTaskGetSystemState(end_array, end_array_size, &end_run_time);
if (end_array_size == 0) {
ret = ESP_ERR_INVALID_SIZE;
goto exit;
}
//Calculate total_elapsed_time in units of run time stats clock period.
total_elapsed_time = (end_run_time - start_run_time);
if (total_elapsed_time == 0) {
ret = ESP_ERR_INVALID_STATE;
goto exit;
}
printf("| Task | Run Time | Percentage\n");
//Match each task in start_array to those in the end_array
for (int i = 0; i < start_array_size; i++) {
int k = -1;
for (int j = 0; j < end_array_size; j++) {
if (start_array[i].xHandle == end_array[j].xHandle) {
k = j;
//Mark that task have been matched by overwriting their handles
start_array[i].xHandle = NULL;
end_array[j].xHandle = NULL;
break;
}
}
//Check if matching task found
if (k >= 0) {
uint32_t task_elapsed_time = end_array[k].ulRunTimeCounter - start_array[i].ulRunTimeCounter;
uint32_t percentage_time = (task_elapsed_time * 100UL) / (total_elapsed_time * CONFIG_FREERTOS_NUMBER_OF_CORES);
printf("| %-16s | %8lu | %4lu%%\n", start_array[i].pcTaskName, task_elapsed_time, percentage_time);
}
}
//Print unmatched tasks
for (int i = 0; i < start_array_size; i++) {
if (start_array[i].xHandle != NULL) {
printf("| %s | Deleted\n", start_array[i].pcTaskName);
}
}
for (int i = 0; i < end_array_size; i++) {
if (end_array[i].xHandle != NULL) {
printf("| %s | Created\n", end_array[i].pcTaskName);
}
}
ret = ESP_OK;
exit: //Common return path
free(start_array);
free(end_array);
return ret;
}

20
main/SystemInfo.h Normal file
View File

@@ -0,0 +1,20 @@
#ifndef _SYSTEM_INFO_H_
#define _SYSTEM_INFO_H_
#include <string>
#include <esp_err.h>
#include <freertos/FreeRTOS.h>
class SystemInfo {
public:
static size_t GetFlashSize();
static size_t GetMinimumFreeHeapSize();
static size_t GetFreeHeapSize();
static std::string GetMacAddress();
static std::string GetChipModelName();
static std::string GetJsonString();
static esp_err_t PrintRealTimeStats(TickType_t xTicksToWait);
};
#endif // _SYSTEM_INFO_H_

View File

@@ -1,10 +1,10 @@
#include "SystemReset.h"
#include "esp_log.h"
#include "nvs_flash.h"
#include "driver/gpio.h"
#include "esp_partition.h"
#include "esp_system.h"
#include "freertos/FreeRTOS.h"
#include <esp_log.h>
#include <nvs_flash.h>
#include <driver/gpio.h>
#include <esp_partition.h>
#include <esp_system.h>
#include <freertos/FreeRTOS.h>
#define TAG "SystemReset"

203
main/WakeWordDetect.cc Normal file
View File

@@ -0,0 +1,203 @@
#include <esp_log.h>
#include <model_path.h>
#include "WakeWordDetect.h"
#include "Application.h"
#define DETECTION_RUNNING_EVENT 1
#define WAKE_WORD_ENCODED_EVENT 2
static const char* TAG = "WakeWordDetect";
WakeWordDetect::WakeWordDetect()
: afe_detection_data_(nullptr),
wake_word_pcm_(),
wake_word_opus_() {
event_group_ = xEventGroupCreate();
srmodel_list_t *models = esp_srmodel_init("model");
for (int i = 0; i < models->num; i++) {
ESP_LOGI(TAG, "Model %d: %s", i, models->model_name[i]);
if (strstr(models->model_name[i], ESP_WN_PREFIX) != NULL) {
wakenet_model_ = models->model_name[i];
}
}
afe_config_t afe_config = {
.aec_init = false,
.se_init = true,
.vad_init = true,
.wakenet_init = true,
.voice_communication_init = false,
.voice_communication_agc_init = false,
.voice_communication_agc_gain = 10,
.vad_mode = VAD_MODE_3,
.wakenet_model_name = wakenet_model_,
.wakenet_model_name_2 = NULL,
.wakenet_mode = DET_MODE_90,
.afe_mode = SR_MODE_HIGH_PERF,
.afe_perferred_core = 0,
.afe_perferred_priority = 5,
.afe_ringbuf_size = 50,
.memory_alloc_mode = AFE_MEMORY_ALLOC_MORE_PSRAM,
.afe_linear_gain = 1.0,
.agc_mode = AFE_MN_PEAK_AGC_MODE_2,
.pcm_config = {
.total_ch_num = 1,
.mic_num = 1,
.ref_num = 0,
.sample_rate = CONFIG_AUDIO_INPUT_SAMPLE_RATE
},
.debug_init = false,
.debug_hook = {{ AFE_DEBUG_HOOK_MASE_TASK_IN, NULL }, { AFE_DEBUG_HOOK_FETCH_TASK_IN, NULL }},
.afe_ns_mode = NS_MODE_SSP,
.afe_ns_model_name = NULL,
.fixed_first_channel = true,
};
afe_detection_data_ = esp_afe_sr_v1.create_from_config(&afe_config);
xTaskCreate([](void* arg) {
auto this_ = (WakeWordDetect*)arg;
this_->AudioDetectionTask();
vTaskDelete(NULL);
}, "audio_detection", 4096 * 2, this, 5, NULL);
}
WakeWordDetect::~WakeWordDetect() {
if (afe_detection_data_ != nullptr) {
esp_afe_sr_v1.destroy(afe_detection_data_);
}
if (wake_word_encode_task_stack_ != nullptr) {
free(wake_word_encode_task_stack_);
}
vEventGroupDelete(event_group_);
}
void WakeWordDetect::OnWakeWordDetected(std::function<void()> callback) {
wake_word_detected_callback_ = callback;
}
void WakeWordDetect::OnVadStateChange(std::function<void(bool speaking)> callback) {
vad_state_change_callback_ = callback;
}
void WakeWordDetect::StartDetection() {
xEventGroupSetBits(event_group_, DETECTION_RUNNING_EVENT);
}
void WakeWordDetect::StopDetection() {
xEventGroupClearBits(event_group_, DETECTION_RUNNING_EVENT);
}
bool WakeWordDetect::IsDetectionRunning() {
return xEventGroupGetBits(event_group_) & DETECTION_RUNNING_EVENT;
}
void WakeWordDetect::Feed(const int16_t* data, int size) {
input_buffer_.insert(input_buffer_.end(), data, data + size);
auto chunk_size = esp_afe_sr_v1.get_feed_chunksize(afe_detection_data_);
while (input_buffer_.size() >= chunk_size) {
esp_afe_sr_v1.feed(afe_detection_data_, input_buffer_.data());
input_buffer_.erase(input_buffer_.begin(), input_buffer_.begin() + chunk_size);
}
}
void WakeWordDetect::AudioDetectionTask() {
auto chunk_size = esp_afe_sr_v1.get_fetch_chunksize(afe_detection_data_);
ESP_LOGI(TAG, "Audio detection task started, chunk size: %d", chunk_size);
while (true) {
xEventGroupWaitBits(event_group_, DETECTION_RUNNING_EVENT, pdFALSE, pdTRUE, portMAX_DELAY);
auto res = esp_afe_sr_v1.fetch(afe_detection_data_);
if (res == nullptr || res->ret_value == ESP_FAIL) {
if (res != nullptr) {
ESP_LOGI(TAG, "Error code: %d", res->ret_value);
}
continue;;
}
// Store the wake word data for voice recognition, like who is speaking
StoreWakeWordData((uint16_t*)res->data, res->data_size / sizeof(uint16_t));
// VAD state change
if (vad_state_change_callback_) {
if (res->vad_state == AFE_VAD_SPEECH && !is_speaking_) {
is_speaking_ = true;
vad_state_change_callback_(true);
} else if (res->vad_state == AFE_VAD_SILENCE && is_speaking_) {
is_speaking_ = false;
vad_state_change_callback_(false);
}
}
if (res->wakeup_state == WAKENET_DETECTED) {
ESP_LOGI(TAG, "Wake word detected");
StopDetection();
if (wake_word_detected_callback_) {
wake_word_detected_callback_();
}
}
}
}
void WakeWordDetect::StoreWakeWordData(uint16_t* data, size_t samples) {
// store audio data to wake_word_pcm_
std::vector<int16_t> pcm(data, data + samples);
wake_word_pcm_.emplace_back(std::move(pcm));
// keep about 2 seconds of data, detect duration is 32ms (sample_rate == 16000, chunksize == 512)
while (wake_word_pcm_.size() > 2000 / 32) {
wake_word_pcm_.pop_front();
}
}
void WakeWordDetect::EncodeWakeWordData() {
if (wake_word_encode_task_stack_ == nullptr) {
wake_word_encode_task_stack_ = (StackType_t*)malloc(4096 * 8);
}
wake_word_encode_task_ = xTaskCreateStatic([](void* arg) {
auto this_ = (WakeWordDetect*)arg;
auto start_time = esp_timer_get_time();
// encode detect packets
OpusEncoder* encoder = new OpusEncoder();
encoder->Configure(CONFIG_AUDIO_INPUT_SAMPLE_RATE, 1, 60);
encoder->SetComplexity(0);
this_->wake_word_opus_.resize(4096 * 4);
size_t offset = 0;
for (auto& pcm: this_->wake_word_pcm_) {
encoder->Encode(pcm, [this_, &offset](const uint8_t* opus, size_t opus_size) {
size_t protocol_size = sizeof(BinaryProtocol) + opus_size;
if (offset + protocol_size < this_->wake_word_opus_.size()) {
auto protocol = (BinaryProtocol*)(&this_->wake_word_opus_[offset]);
protocol->version = htons(PROTOCOL_VERSION);
protocol->type = htons(0);
protocol->reserved = 0;
protocol->timestamp = 0;
protocol->payload_size = htonl(opus_size);
memcpy(protocol->payload, opus, opus_size);
offset += protocol_size;
}
});
}
this_->wake_word_pcm_.clear();
this_->wake_word_opus_.resize(offset);
auto end_time = esp_timer_get_time();
ESP_LOGI(TAG, "Encode wake word opus: %zu bytes in %lld ms", this_->wake_word_opus_.size(), (end_time - start_time) / 1000);
xEventGroupSetBits(this_->event_group_, WAKE_WORD_ENCODED_EVENT);
delete encoder;
vTaskDelete(NULL);
}, "encode_detect_packets", 4096 * 8, this, 1, wake_word_encode_task_stack_, &wake_word_encode_task_buffer_);
}
const std::string&& WakeWordDetect::GetWakeWordStream() {
xEventGroupWaitBits(event_group_, WAKE_WORD_ENCODED_EVENT, pdTRUE, pdTRUE, portMAX_DELAY);
return std::move(wake_word_opus_);
}

50
main/WakeWordDetect.h Normal file
View File

@@ -0,0 +1,50 @@
#ifndef WAKE_WORD_DETECT_H
#define WAKE_WORD_DETECT_H
#include <esp_afe_sr_models.h>
#include <esp_nsn_models.h>
#include <freertos/FreeRTOS.h>
#include <freertos/task.h>
#include <freertos/event_groups.h>
#include <list>
#include <string>
#include <vector>
#include <functional>
class WakeWordDetect {
public:
WakeWordDetect();
~WakeWordDetect();
void Feed(const int16_t* data, int size);
void OnWakeWordDetected(std::function<void()> callback);
void OnVadStateChange(std::function<void(bool speaking)> callback);
void StartDetection();
void StopDetection();
bool IsDetectionRunning();
void EncodeWakeWordData();
const std::string&& GetWakeWordStream();
private:
esp_afe_sr_data_t* afe_detection_data_ = nullptr;
char* wakenet_model_ = NULL;
std::vector<int16_t> input_buffer_;
EventGroupHandle_t event_group_;
std::function<void()> wake_word_detected_callback_;
std::function<void(bool speaking)> vad_state_change_callback_;
bool is_speaking_ = false;
TaskHandle_t wake_word_encode_task_ = nullptr;
StaticTask_t wake_word_encode_task_buffer_;
StackType_t* wake_word_encode_task_stack_ = nullptr;
std::list<std::vector<int16_t>> wake_word_pcm_;
std::string wake_word_opus_;
void StoreWakeWordData(uint16_t* data, size_t size);
void AudioDetectionTask();
};
#endif

View File

@@ -1,24 +1,14 @@
## IDF Component Manager Manifest File
dependencies:
78/esp-builtin-led: "^1.0.0"
78/esp-wifi-connect: "^1.0.0"
78/esp-ota: "^1.0.0"
78/esp-websocket: "^1.0.0"
78/esp-opus-encoder: "^1.0.0"
78/esp-builtin-led: "^1.0.2"
78/esp-wifi-connect: "^1.1.0"
78/esp-opus-encoder: "^1.0.2"
78/esp-ml307: "^1.1.1"
espressif/esp-sr: "^1.9.0"
espressif/button: "^3.3.1"
lvgl/lvgl: "^8.4.0"
esp_lvgl_port: "^1.4.0"
## Required IDF version
idf:
version: ">=5.3"
# # Put list of dependencies here
# # For components maintained by Espressif:
# component: "~1.0.0"
# # For 3rd party components:
# username/component: ">=1.0.0,<2.0.0"
# username2/component2:
# version: "~1.0.0"
# # For transient dependencies `public` flag can be set.
# # `public` flag doesn't have an effect dependencies of the `main` component.
# # All dependencies of `main` are public by default.
# public: true
description: "An AI voice assistant for ESP32"
url: "https://github.com/78/xiaozhi-esp32"

View File

@@ -1,19 +1,15 @@
#include <cstdio>
#include <esp_log.h>
#include <esp_err.h>
#include <nvs.h>
#include <nvs_flash.h>
#include <driver/gpio.h>
#include <esp_event.h>
#include "esp_log.h"
#include "esp_err.h"
#include "nvs.h"
#include "nvs_flash.h"
#include "driver/gpio.h"
#include "WifiConfigurationAp.h"
#include "Application.h"
#include "SystemInfo.h"
#include "SystemReset.h"
#include "BuiltinLed.h"
#define TAG "main"
#define STATS_TICKS pdMS_TO_TICKS(1000)
extern "C" void app_main(void)
{
@@ -32,29 +28,15 @@ extern "C" void app_main(void)
}
ESP_ERROR_CHECK(ret);
// Get the WiFi configuration
nvs_handle_t nvs_handle;
ret = nvs_open("wifi", NVS_READONLY, &nvs_handle);
// If the WiFi configuration is not found, launch the WiFi configuration AP
if (ret != ESP_OK) {
auto& builtin_led = BuiltinLed::GetInstance();
builtin_led.SetBlue();
builtin_led.Blink(1000, 500);
WifiConfigurationAp::GetInstance().Start("Xiaozhi");
return;
}
nvs_close(nvs_handle);
// Otherwise, launch the application
Application::GetInstance().Start();
// Dump CPU usage every 10 second
while (true) {
vTaskDelay(10000 / portTICK_PERIOD_MS);
// SystemInfo::PrintRealTimeStats(STATS_TICKS);
int free_sram = heap_caps_get_minimum_free_size(MALLOC_CAP_INTERNAL);
ESP_LOGI(TAG, "Free heap size: %u minimal internal: %u", SystemInfo::GetFreeHeapSize(), free_sram);
// SystemInfo::PrintRealTimeStats(pdMS_TO_TICKS(1000));
int free_sram = heap_caps_get_free_size(MALLOC_CAP_INTERNAL);
int min_free_sram = heap_caps_get_minimum_free_size(MALLOC_CAP_INTERNAL);
ESP_LOGI(TAG, "Free internal: %u minimal internal: %u", free_sram, min_free_sram);
}
}

View File

@@ -3,7 +3,7 @@
nvs, data, nvs, 0x9000, 0x4000,
otadata, data, ota, 0xd000, 0x2000,
phy_init, data, phy, 0xf000, 0x1000,
model, data, spiffs, 0x100000, 1M,
factory, app, factory, 0x200000, 2M,
ota_0, app, ota_0, 0x400000, 2M,
ota_1, app, ota_1, 0x600000, 2M,
model, data, spiffs, 0x10000, 0xF0000,
factory, app, factory, 0x200000, 4M,
ota_0, app, ota_0, 0x600000, 4M,
ota_1, app, ota_1, 0xA00000, 4M,
1 # ESP-IDF Partition Table
3 nvs, data, nvs, 0x9000, 0x4000,
4 otadata, data, ota, 0xd000, 0x2000,
5 phy_init, data, phy, 0xf000, 0x1000,
6 model, data, spiffs, 0x100000, 1M, model, data, spiffs, 0x10000, 0xF0000,
7 factory, app, factory, 0x200000, 2M, factory, app, factory, 0x200000, 4M,
8 ota_0, app, ota_0, 0x400000, 2M, ota_0, app, ota_0, 0x600000, 4M,
9 ota_1, app, ota_1, 0x600000, 2M, ota_1, app, ota_1, 0xA00000, 4M,

7
partitions_4M.csv Normal file
View File

@@ -0,0 +1,7 @@
# ESP-IDF Partition Table
# Name, Type, SubType, Offset, Size, Flags
nvs, data, nvs, 0x9000, 0x4000,
otadata, data, ota, 0xd000, 0x2000,
phy_init, data, phy, 0xf000, 0x1000,
model, data, spiffs, 0x10000, 0xF0000,
factory, app, factory, 0x100000, 3M,
1 # ESP-IDF Partition Table
2 # Name, Type, SubType, Offset, Size, Flags
3 nvs, data, nvs, 0x9000, 0x4000,
4 otadata, data, ota, 0xd000, 0x2000,
5 phy_init, data, phy, 0xf000, 0x1000,
6 model, data, spiffs, 0x10000, 0xF0000,
7 factory, app, factory, 0x100000, 3M,

View File

@@ -3,22 +3,11 @@ CONFIG_BOOTLOADER_LOG_LEVEL_NONE=y
CONFIG_BOOTLOADER_SKIP_VALIDATE_ALWAYS=y
CONFIG_BOOTLOADER_APP_ROLLBACK_ENABLE=y
CONFIG_ESP_DEFAULT_CPU_FREQ_MHZ_240=y
CONFIG_SPIRAM=y
CONFIG_SPIRAM_MODE_OCT=y
CONFIG_SPIRAM_SPEED_80M=y
CONFIG_SPIRAM_MALLOC_ALWAYSINTERNAL=4096
CONFIG_SPIRAM_TRY_ALLOCATE_WIFI_LWIP=y
CONFIG_SPIRAM_MALLOC_RESERVE_INTERNAL=32768
CONFIG_SPIRAM_MEMTEST=n
CONFIG_HTTPD_MAX_REQ_HDR_LEN=2048
CONFIG_HTTPD_MAX_URI_LEN=2048
CONFIG_PARTITION_TABLE_CUSTOM=y
CONFIG_PARTITION_TABLE_CUSTOM_FILENAME="partitions.csv"
CONFIG_PARTITION_TABLE_FILENAME="partitions.csv"
CONFIG_PARTITION_TABLE_OFFSET=0x8000
CONFIG_USE_WAKENET=y

View File

@@ -0,0 +1,6 @@
CONFIG_ESPTOOLPY_FLASHSIZE_16MB=y
CONFIG_PARTITION_TABLE_CUSTOM=y
CONFIG_PARTITION_TABLE_CUSTOM_FILENAME="partitions_4M.csv"
CONFIG_PARTITION_TABLE_OFFSET=0x8000

View File

@@ -2,6 +2,17 @@
CONFIG_ESPTOOLPY_FLASHSIZE_16MB=y
CONFIG_ESPTOOLPY_FLASHMODE_QIO=y
CONFIG_ESP_DEFAULT_CPU_FREQ_MHZ_240=y
CONFIG_SPIRAM=y
CONFIG_SPIRAM_MODE_OCT=y
CONFIG_SPIRAM_SPEED_80M=y
CONFIG_SPIRAM_MALLOC_ALWAYSINTERNAL=4096
CONFIG_SPIRAM_TRY_ALLOCATE_WIFI_LWIP=y
CONFIG_SPIRAM_MALLOC_RESERVE_INTERNAL=32768
CONFIG_SPIRAM_MEMTEST=n
CONFIG_MBEDTLS_EXTERNAL_MEM_ALLOC=y
CONFIG_ESP32S3_INSTRUCTION_CACHE_32KB=y
CONFIG_ESP32S3_DATA_CACHE_64KB=y
CONFIG_ESP32S3_DATA_CACHE_LINE_64B=y
CONFIG_ESP32S3_DATA_CACHE_LINE_64B=y

160
websocket.md Normal file
View File

@@ -0,0 +1,160 @@
# AI 语音交互通信协议文档
## 1. 连接建立与鉴权
客户端通过 WebSocket 连接到服务器时,需要在 HTTP 头中包含以下信息:
- `Authorization`: Bearer token格式为 "Bearer <access_token>"
- `Device-Id`: 设备 MAC 地址
- `Protocol-Version`: 协议版本号,当前为 2
WebSocket URL: `wss://api.tenclass.net/xiaozhi/v1`
## 2. 二进制数据
客户端发送的二进制数据使用固定头格式的协议,如下:
```cpp
struct BinaryProtocol {
uint16_t version; // 二进制协议版本,当前为 2
uint16_t type; // 消息类型0音频流数据1JSON
uint32_t reserved; // 保留字段
uint32_t timestamp; // 时间戳保留用作回声消除也可以用于UDP不可靠传输中的排序
uint32_t payload_size; // 负载大小
uint8_t payload[]; // 可以是音频数据Opus 编码或协商的音频格式),也可以封装 JSON
} __attribute__((packed));
```
注意:所有多字节整数字段使用网络字节序(大端序)。
目前二进制数据跟 JSON 都是走同一个 WebSocket 连接,未来实时对话模式下,二进制音频数据可能走 UDP可以扩展 hello 消息进行协商。
## 3. 音频数据传输
- 客户端到服务器: 使用二进制协议发送 Opus 编码的音频数据
- 服务器到客户端: 使用二进制协议发送 Opus 编码的音频数据,格式与客户端发送的相同
出现 payload_size 为 0 的音频数据包可以用做句子边界标记,可以忽略,但不要报错。
## 4. 握手消息
连接建立后,客户端发送一个 JSON 格式的 "hello" 消息,初始化服务器端的音频解码器。
不需要等待服务器响应,随后即可发送音频数据。
```json
{
"type": "hello",
"response_mode": "auto",
"audio_params": {
"format": "opus",
"sample_rate": 16000,
"channels": 1
}
}
```
应答模式 `response_mode` 可以为 `auto``manual`
`auto`:自动应答模式,服务器实时计算音频 VAD 并自动决定何时开始应答。
`manual`:手动应答模式,客户端状态从 `listening` 变为 `idle` 时,服务器可以应答。
## 5. 状态更新
客户端在状态变化时发送 JSON 消息:
```json
{
"type": "state",
"state": "<新状态>"
}
```
可能发送的状态值包括: `idle`, `wake_word_detected`, `listening`, `speaking`
示例:
1、按住说话`response_mode``manual`
- 当按住说话按钮时,如果未连接服务器,则连接服务器,并编码、缓存当前音频数据,连接成功后,客户端设置状态为 `listening`,并在 hello 消息之后发送缓存的音频数据。
- 当按住说话按钮时,如果已连接服务器,则客户端设置状态为 `listening`,并发送音频数据。
- 当释放说话按钮时,状态变为 `idle`,此时服务器开始识别。
- 服务器开始应答时,推送 `stt``tts` 消息。
- 客户端开始播放音频时,状态设为 `speaking`
- 客户端结束播放音频时,状态设为 `idle`
-`speaking` 状态下,按住说话按钮,会立即停止当前音频播放,状态变为 `listening`
2、语音唤醒轮流对话`response_mode``auto`
- 连接服务器,发送 hello 消息,发送唤醒词音频数据,然后发送状态 `wake_word_detected`,服务器开始应答。
- 客户端开始播放音频时,状态设为 `speaking`,此时客户端不会发送音频数据。
- 客户端结束播放音频时,状态设为 `listening`,此时客户端发送音频数据。
- 服务器计算音频 VAD 自动选择时机开始应答时,推送 `stt``tts` 消息。
- 客户端收到 `tts`.`start` 时,开始播放音频,状态设为 `speaking`
- 客户端收到 `tts`.`stop` 时,停止播放音频,状态设为 `listening`
3、语音唤醒实时对话`response_mode``real_time`
- 连接服务器,发送 hello 消息,发送唤醒词音频数据,然后发送状态 `wake_word_detected`,服务器开始应答。
- 客户端开始播放音频时,状态设为 `speaking`
- 客户端结束播放音频时,状态设为 `listening`
-`speaking``listening` 状态下,客户端都会发送音频数据。
- 服务器计算音频 VAD 自动选择时机开始应答时,推送 `stt``tts` 消息。
- 客户端收到 `stt` 时,状态设为 `listening`。如果当前有音频正在播放,则在当前 sentence 结束后停止播放音频。
- 客户端收到 `tts`.`start` 时,开始播放音频,状态设为 `speaking`
- 客户端收到 `tts`.`stop` 时,停止播放音频,状态设为 `listening`
## 6. 服务器到客户端的消息
### 6.1 语音识别结果 (STT)
```json
{
"type": "stt",
"text": "<识别出的文本>"
}
```
### 6.2 文本转语音 (TTS)
TTS开始:
```json
{
"type": "tts",
"state": "start",
"sample_rate": 24000
}
```
句子开始:
```json
{
"type": "tts",
"state": "sentence_start",
"text": "你在干什么呀?"
}
```
句子结束:
```json
{
"type": "tts",
"state": "sentence_end"
}
```
TTS结束:
```json
{
"type": "tts",
"state": "stop"
}
```
## 7. 连接管理
- 客户端检测到 WebSocket 断开连接时,应该停止音频播放并重置为空闲状态
- 在断开连接后,客户端按需重新发起连接(比如按钮按下或语音唤醒)
这个文档概括了 WebSocket 通信协议的主要方面。