microsoft cognitive - Transcribe MP3 audio file with Bing Speech API (speech to text) -


i have long recording (hour+) in format of mp3. following info managed ffmpeg audio file:

[mp3 @ 000001fe666da320] skipping 0 bytes of junk @ 58650. [mjpeg @ 000001fe666effe0] changing bps 8 [mp3 @ 000001fe666da320] estimating duration bitrate, may inaccurate input #0, mp3, '1.mp3': duration: 00:57:18.52, start: 0.000000, bitrate: 192 kb/s     stream #0:0: audio: mp3, 44100 hz, mono, s16p, 192 kb/s     stream #0:1: video: mjpeg, yuvj420p(pc, bt470bg/unknown/unknown), 1300x1370, 90k tbr, 90k tbn, 90k tbc 

i use bing speech api (microsoft oxford - cognitive services - speech api) transcribe file (speech text).

i believe achievable using code below.

option 1: before sending audio data, must first send speechaudioformat descriptor describe layout , format of raw audio data via datarecognitionclient's sendaudioformat() method. can provide code sample option?

option 2: converting file target's acceptable format. have done ffmpeg , got:

duration: 00:57:23.67, bitrate: 256 kb/s     stream #0:0: audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 hz, 1 channels, s16, 256 kb/s 

as understand documentation, should acceptable: the audio must pcm, mono, 16-bit sample, sample rate of 8000 hz or 16000 hz.

i tried send audio server did not reply. on right tracks? maximum buffer size?

do u see other, maybe easier option audio file transcribed?

private void sendaudiohelper(string wavfilename)         {             using (filestream filestream = new filestream(wavfilename, filemode.open, fileaccess.read))             {                 int bytesread = 0;                 byte[] buffer = new byte[1024];                  try                 {                                         {                         // more audio data send byte buffer.                         bytesread = filestream.read(buffer, 0, buffer.length);                          // send of audio data service.                         this.dataclient.sendaudio(buffer, bytesread);                     }                     while (bytesread > 0);                 }                                 {                     // done sending audio.  final recognition results arrive in onresponsereceived event call.                     this.dataclient.endaudio();                 }             }         } 

there limit of 15 seconds when use rest implementation. sdk has limit of 2minutes.

bing speech team


Comments

Popular posts from this blog

Load Balancing in Bluemix using custom domain and DNS SRV records -

oracle - pls-00402 alias required in select list of cursor to avoid duplicate column names -

python - Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>] error -