如何解决控制麦克风捕获音量Linux,命令行断开监听脚本 - 修复?
我有一个同时运行语音识别和语音合成器的机器人。问题是识别器接收到机器人自己的语音。
当语音识别脚本运行时,我想在语音合成器内添加一对简单的行,以便在讲话前立即将麦克风静音,然后在讲话结束时取消静音。
我找到了两种方法。它们是:
方法一:
静音:
`pactl set-source-mute alsa_input.usb-GeneralPlus_USB_Audio_Device-00.analog-mono 1`
取消静音:
`pactl set-source-mute alsa_input.usb-GeneralPlus_USB_Audio_Device-00.analog-mono 0`
方法二:
静音:
`amixer set Capture 0%`
取消静音:
`amixer set Capture 100%`
问题是上面的工作是一条单行道。我可以通过任何一种方法成功地将麦克风静音,但是一旦“静音”,它就再也不会回来了。我让它再次收听的唯一方法是终止语音识别器并重新启动它。
我怀疑当发出静音或将音量级别设置为 0% 的命令时,监听脚本与麦克风的连接发生了一些事情,因此它并没有真正静音,因为它切断了连接。
语音识别脚本具有以下相关代码,用于连接到麦克风:
`import speech_recognition as sr
r = sr.Recognizer()
with sr.Microphone() as source:
print("Calibrating microphone")
r.adjust_for_ambient_noise(source,duration=4)
#... and later in a while True loop:
with sr.Microphone() as source:
try:
SpeakText = r.recognize_sphinx(r.listen(source))
except sr.UnknownValueError:
.... other code....
我还将包括我能的init.py文件,该文件在导入speech_recognition时导入。 (限制为 30k 个字符 - init 文件为 85k...)
我希望有人可以提出一种替代方法来使麦克风静音/取消静音,该方法不会断开我的语音识别脚本的连接,最好不涉及 GPIO 端口和信号切断继电器...
我还尝试简单地将消息传递给语音识别脚本(通过 ZeroMQ),以告诉它在合成器说话时“塞住它的耳朵”,但由于一些非线程安全的部分以及听众在收听时不会注意到任何传入的消息,这意味着直到讲话开始和停止之后,消息才会被接收。
感谢帮助,谢谢。
这是在识别器中运行的完整代码,适用于那些愿意在那里冒险的勇敢和专注的人。他们在评论方面做得很好,所以就是这样:
#!/usr/bin/env python3
"""Library for performing speech recognition,with support for several engines and APIs,online and offline."""
import io
import os
import sys
import subprocess
import wave
import aifc
import math
import audioop
import collections
import json
import base64
import threading
import platform
import stat
import hashlib
import hmac
import time
import uuid
__author__ = "Anthony Zhang (Uberi)"
__version__ = "3.8.1"
__license__ = "BSD"
try: # attempt to use the Python 2 modules
from urllib import urlencode
from urllib2 import Request,urlopen,URLError,HTTPError
except ImportError: # use the Python 3 modules
from urllib.parse import urlencode
from urllib.request import Request,urlopen
from urllib.error import URLError,HTTPError
class WaitTimeoutError(Exception): pass
class RequestError(Exception): pass
class UnknownValueError(Exception): pass
class AudioSource(object):
def __init__(self):
raise NotImplementedError("this is an abstract class")
def __enter__(self):
raise NotImplementedError("this is an abstract class")
def __exit__(self,exc_type,exc_value,traceback):
raise NotImplementedError("this is an abstract class")
class Microphone(AudioSource):
"""
Creates a new ``Microphone`` instance,which represents a physical microphone on the computer. Subclass of ``AudioSource``.
This will throw an ``AttributeError`` if you don't have PyAudio 0.2.11 or later installed.
If ``device_index`` is unspecified or ``None``,the default microphone is used as the audio source. Otherwise,``device_index`` should be the index of the device to use for audio input.
A device index is an integer between 0 and ``pyaudio.get_device_count() - 1`` (assume we have used ``import pyaudio`` beforehand) inclusive. It represents an audio device such as a microphone or speaker. See the `PyAudio documentation <http://people.csail.mit.edu/hubert/pyaudio/docs/>`__ for more details.
The microphone audio is recorded in chunks of ``chunk_size`` samples,at a rate of ``sample_rate`` samples per second (Hertz). If not specified,the value of ``sample_rate`` is determined automatically from the system's microphone settings.
Higher ``sample_rate`` values result in better audio quality,but also more bandwidth (and therefore,slower recognition). Additionally,some CPUs,such as those in older Raspberry Pi models,can't keep up if this value is too high.
Higher ``chunk_size`` values help avoid triggering on rapidly changing ambient noise,but also makes detection less sensitive. This value,generally,should be left at its default.
"""
def __init__(self,device_index=None,sample_rate=None,chunk_size=1024):
assert device_index is None or isinstance(device_index,int),"Device index must be None or an integer"
assert sample_rate is None or (isinstance(sample_rate,int) and sample_rate > 0),"Sample rate must be None or a positive integer"
assert isinstance(chunk_size,int) and chunk_size > 0,"Chunk size must be a positive integer"
# set up PyAudio
self.pyaudio_module = self.get_pyaudio()
audio = self.pyaudio_module.PyAudio()
try:
count = audio.get_device_count() # obtain device count
if device_index is not None: # ensure device index is in range
assert 0 <= device_index < count,"Device index out of range ({} devices available; device index should be between 0 and {} inclusive)".format(count,count - 1)
if sample_rate is None: # automatically set the sample rate to the hardware's default sample rate if not specified
device_info = audio.get_device_info_by_index(device_index) if device_index is not None else audio.get_default_input_device_info()
assert isinstance(device_info.get("defaultSampleRate"),(float,int)) and device_info["defaultSampleRate"] > 0,"Invalid device info returned from PyAudio: {}".format(device_info)
sample_rate = int(device_info["defaultSampleRate"])
except Exception:
audio.terminate()
raise
self.device_index = device_index
self.format = self.pyaudio_module.paInt16 # 16-bit int sampling
self.SAMPLE_WIDTH = self.pyaudio_module.get_sample_size(self.format) # size of each sample
self.SAMPLE_RATE = sample_rate # sampling rate in Hertz
self.CHUNK = chunk_size # number of frames stored in each buffer
self.audio = None
self.stream = None
@staticmethod
def get_pyaudio():
"""
Imports the pyaudio module and checks its version. Throws exceptions if pyaudio can't be found or a wrong version is installed
"""
try:
import pyaudio
except ImportError:
raise AttributeError("Could not find PyAudio; check installation")
from distutils.version import LooseVersion
if LooseVersion(pyaudio.__version__) < LooseVersion("0.2.11"):
raise AttributeError("PyAudio 0.2.11 or later is required (found version {})".format(pyaudio.__version__))
return pyaudio
@staticmethod
def list_microphone_names():
"""
Returns a list of the names of all available microphones. For microphones where the name can't be retrieved,the list entry contains ``None`` instead.
The index of each microphone's name is the same as its device index when creating a ``Microphone`` instance - indices in this list can be used as values of ``device_index``.
"""
audio = Microphone.get_pyaudio().PyAudio()
try:
result = []
for i in range(audio.get_device_count()):
device_info = audio.get_device_info_by_index(i)
result.append(device_info.get("name"))
finally:
audio.terminate()
return result
def __enter__(self):
assert self.stream is None,"This audio source is already inside a context manager"
self.audio = self.pyaudio_module.PyAudio()
try:
self.stream = Microphone.MicrophoneStream(
self.audio.open(
input_device_index=self.device_index,channels=1,format=self.format,rate=self.SAMPLE_RATE,frames_per_buffer=self.CHUNK,input=True,# stream is an input stream
)
)
except Exception:
self.audio.terminate()
raise
return self
def __exit__(self,traceback):
try:
self.stream.close()
finally:
self.stream = None
self.audio.terminate()
class MicrophoneStream(object):
def __init__(self,pyaudio_stream):
self.pyaudio_stream = pyaudio_stream
def read(self,size):
return self.pyaudio_stream.read(size,exception_on_overflow=False)
def close(self):
try:
# sometimes,if the stream isn't stopped,closing the stream throws an exception
if not self.pyaudio_stream.is_stopped():
self.pyaudio_stream.stop_stream()
finally:
self.pyaudio_stream.close()
class AudioFile(AudioSource):
"""
Creates a new ``AudioFile`` instance given a WAV/AIFF/FLAC audio file ``filename_or_fileobject``. Subclass of ``AudioSource``.
If ``filename_or_fileobject`` is a string,then it is interpreted as a path to an audio file on the filesystem. Otherwise,``filename_or_fileobject`` should be a file-like object such as ``io.BytesIO`` or similar.
Note that functions that read from the audio (such as ``recognizer_instance.record`` or ``recognizer_instance.listen``) will move ahead in the stream. For example,if you execute ``recognizer_instance.record(audiofile_instance,duration=10)`` twice,the first time it will return the first 10 seconds of audio,and the second time it will return the 10 seconds of audio right after that. This is always reset to the beginning when entering an ``AudioFile`` context.
WAV files must be in PCM/LPCM format; WAVE_FORMAT_EXTENSIBLE and compressed WAV are not supported and may result in undefined behaviour.
Both AIFF and AIFF-C (compressed AIFF) formats are supported.
FLAC files must be in native FLAC format; OGG-FLAC is not supported and may result in undefined behaviour.
"""
def __init__(self,filename_or_fileobject):
assert isinstance(filename_or_fileobject,(type(""),type(u""))) or hasattr(filename_or_fileobject,"read"),"Given audio file must be a filename string or a file-like object"
self.filename_or_fileobject = filename_or_fileobject
self.stream = None
self.DURATION = None
self.audio_reader = None
self.little_endian = False
self.SAMPLE_RATE = None
self.CHUNK = None
self.FRAME_COUNT = None
def __enter__(self):
assert self.stream is None,"This audio source is already inside a context manager"
try:
# attempt to read the file as WAV
self.audio_reader = wave.open(self.filename_or_fileobject,"rb")
self.little_endian = True # RIFF WAV is a little-endian format (most ``audioop`` operations assume that the frames are stored in little-endian form)
except (wave.Error,EOFError):
try:
# attempt to read the file as AIFF
self.audio_reader = aifc.open(self.filename_or_fileobject,"rb")
self.little_endian = False # AIFF is a big-endian format
except (aifc.Error,EOFError):
# attempt to read the file as FLAC
if hasattr(self.filename_or_fileobject,"read"):
flac_data = self.filename_or_fileobject.read()
else:
with open(self.filename_or_fileobject,"rb") as f: flac_data = f.read()
# run the FLAC converter with the FLAC data to get the AIFF data
flac_converter = get_flac_converter()
if os.name == "nt": # on Windows,specify that the process is to be started without showing a console window
startup_info = subprocess.STARTUPINFO()
startup_info.dwFlags |= subprocess.STARTF_USESHOWWINDOW # specify that the wShowWindow field of `startup_info` contains a value
startup_info.wShowWindow = subprocess.SW_HIDE # specify that the console window should be hidden
else:
startup_info = None # default startupinfo
process = subprocess.Popen([
flac_converter,"--stdout","--totally-silent",# put the resulting AIFF file in stdout,and make sure it's not mixed with any program output
"--decode","--force-aiff-format",# decode the FLAC file into an AIFF file
"-",# the input FLAC file contents will be given in stdin
],stdin=subprocess.PIPE,stdout=subprocess.PIPE,startupinfo=startup_info)
aiff_data,_ = process.communicate(flac_data)
aiff_file = io.BytesIO(aiff_data)
try:
self.audio_reader = aifc.open(aiff_file,"rb")
except (aifc.Error,EOFError):
raise ValueError("Audio file could not be read as PCM WAV,AIFF/AIFF-C,or Native FLAC; check if file is corrupted or in another format")
self.little_endian = False # AIFF is a big-endian format
assert 1 <= self.audio_reader.getnchannels() <= 2,"Audio must be mono or stereo"
self.SAMPLE_WIDTH = self.audio_reader.getsampwidth()
# 24-bit audio needs some special handling for old Python versions (workaround for https://bugs.python.org/issue12866)
samples_24_bit_pretending_to_be_32_bit = False
if self.SAMPLE_WIDTH == 3: # 24-bit audio
try: audioop.bias(b"",self.SAMPLE_WIDTH,0) # test whether this sample width is supported (for example,``audioop`` in Python 3.3 and below don't support sample width 3,while Python 3.4+ do)
except audioop.error: # this version of audioop doesn't support 24-bit audio (probably Python 3.3 or less)
samples_24_bit_pretending_to_be_32_bit = True # while the ``AudioFile`` instance will outwardly appear to be 32-bit,it will actually internally be 24-bit
self.SAMPLE_WIDTH = 4 # the ``AudioFile`` instance should present itself as a 32-bit stream now,since we'll be converting into 32-bit on the fly when reading
self.SAMPLE_RATE = self.audio_reader.getframerate()
self.CHUNK = 4096
self.FRAME_COUNT = self.audio_reader.getnframes()
self.DURATION = self.FRAME_COUNT / float(self.SAMPLE_RATE)
self.stream = AudioFile.AudioFileStream(self.audio_reader,self.little_endian,samples_24_bit_pretending_to_be_32_bit)
return self
def __exit__(self,traceback):
if not hasattr(self.filename_or_fileobject,"read"): # only close the file if it was opened by this class in the first place (if the file was originally given as a path)
self.audio_reader.close()
self.stream = None
self.DURATION = None
class AudioFileStream(object):
def __init__(self,audio_reader,little_endian,samples_24_bit_pretending_to_be_32_bit):
self.audio_reader = audio_reader # an audio file object (e.g.,a `wave.Wave_read` instance)
self.little_endian = little_endian # whether the audio data is little-endian (when working with big-endian things,we'll have to convert it to little-endian before we process it)
self.samples_24_bit_pretending_to_be_32_bit = samples_24_bit_pretending_to_be_32_bit # this is true if the audio is 24-bit audio,but 24-bit audio isn't supported,so we have to pretend that this is 32-bit audio and convert it on the fly
def read(self,size=-1):
buffer = self.audio_reader.readframes(self.audio_reader.getnframes() if size == -1 else size)
if not isinstance(buffer,bytes): buffer = b"" # workaround for https://bugs.python.org/issue24608
sample_width = self.audio_reader.getsampwidth()
if not self.little_endian: # big endian format,convert to little endian on the fly
if hasattr(audioop,"byteswap"): # ``audioop.byteswap`` was only added in Python 3.4 (incidentally,that also means that we don't need to worry about 24-bit audio being unsupported,since Python 3.4+ always has that functionality)
buffer = audioop.byteswap(buffer,sample_width)
else: # manually reverse the bytes of each sample,which is slower but works well enough as a fallback
buffer = buffer[sample_width - 1::-1] + b"".join(buffer[i + sample_width:i:-1] for i in range(sample_width - 1,len(buffer),sample_width))
# workaround for https://bugs.python.org/issue12866
if self.samples_24_bit_pretending_to_be_32_bit: # we need to convert samples from 24-bit to 32-bit before we can process them with ``audioop`` functions
buffer = b"".join(b"\x00" + buffer[i:i + sample_width] for i in range(0,sample_width)) # since we're in little endian,we prepend a zero byte to each 24-bit sample to get a 32-bit sample
sample_width = 4 # make sure we thread the buffer as 32-bit audio now,after converting it from 24-bit audio
if self.audio_reader.getnchannels() != 1: # stereo audio
buffer = audioop.tomono(buffer,sample_width,1,1) # convert stereo audio data to mono
return buffer
class AudioData(object):
"""
Creates a new ``AudioData`` instance,which represents mono audio data.
The raw audio data is specified by ``frame_data``,which is a sequence of bytes representing audio samples. This is the frame data structure used by the PCM WAV format.
The width of each sample,in bytes,is specified by ``sample_width``. Each group of ``sample_width`` bytes represents a single audio sample.
The audio data is assumed to have a sample rate of ``sample_rate`` samples per second (Hertz).
Usually,instances of this class are obtained from ``recognizer_instance.record`` or ``recognizer_instance.listen``,or in the callback for ``recognizer_instance.listen_in_background``,rather than instantiating them directly.
"""
def __init__(self,frame_data,sample_rate,sample_width):
assert sample_rate > 0,"Sample rate must be a positive integer"
assert sample_width % 1 == 0 and 1 <= sample_width <= 4,"Sample width must be between 1 and 4 inclusive"
self.frame_data = frame_data
self.sample_rate = sample_rate
self.sample_width = int(sample_width)
def get_segment(self,start_ms=None,end_ms=None):
"""
Returns a new ``AudioData`` instance,trimmed to a given time interval. In other words,an ``AudioData`` instance with the same audio data except starting at ``start_ms`` milliseconds in and ending ``end_ms`` milliseconds in.
If not specified,``start_ms`` defaults to the beginning of the audio,and ``end_ms`` defaults to the end.
"""
由于字符限制而被截断*
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。