microsoft cognitive - Transcribe MP3 audio file with Bing Speech API (speech to text) -


i have long recording (hour+) in format of mp3. following info managed ffmpeg audio file:

[mp3 @ 000001fe666da320] skipping 0 bytes of junk @ 58650. [mjpeg @ 000001fe666effe0] changing bps 8 [mp3 @ 000001fe666da320] estimating duration bitrate, may inaccurate input #0, mp3, '1.mp3': duration: 00:57:18.52, start: 0.000000, bitrate: 192 kb/s     stream #0:0: audio: mp3, 44100 hz, mono, s16p, 192 kb/s     stream #0:1: video: mjpeg, yuvj420p(pc, bt470bg/unknown/unknown), 1300x1370, 90k tbr, 90k tbn, 90k tbc 

i use bing speech api (microsoft oxford - cognitive services - speech api) transcribe file (speech text).

i believe achievable using code below.

option 1: before sending audio data, must first send speechaudioformat descriptor describe layout , format of raw audio data via datarecognitionclient's sendaudioformat() method. can provide code sample option?

option 2: converting file target's acceptable format. have done ffmpeg , got:

duration: 00:57:23.67, bitrate: 256 kb/s     stream #0:0: audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 hz, 1 channels, s16, 256 kb/s 

as understand documentation, should acceptable: the audio must pcm, mono, 16-bit sample, sample rate of 8000 hz or 16000 hz.

i tried send audio server did not reply. on right tracks? maximum buffer size?

do u see other, maybe easier option audio file transcribed?

private void sendaudiohelper(string wavfilename)         {             using (filestream filestream = new filestream(wavfilename, filemode.open, fileaccess.read))             {                 int bytesread = 0;                 byte[] buffer = new byte[1024];                  try                 {                                         {                         // more audio data send byte buffer.                         bytesread = filestream.read(buffer, 0, buffer.length);                          // send of audio data service.                         this.dataclient.sendaudio(buffer, bytesread);                     }                     while (bytesread > 0);                 }                                 {                     // done sending audio.  final recognition results arrive in onresponsereceived event call.                     this.dataclient.endaudio();                 }             }         } 

there limit of 15 seconds when use rest implementation. sdk has limit of 2minutes.

bing speech team


Comments

Popular posts from this blog

javascript - How to get current YouTube IDs via iMacros? -

c# - Maintaining a program folder in program files out of date? -

emulation - Android map show my location didn't work -