'Media Format' 用于 Amazon Transcribe 中的“.caf”文件

Question

我有一个 React Native (Expo) 应用程序，它使用 expo-av 库捕获音频。

然后它将音频文件上传到 Amazon S3，然后在 Amazon Transcribe 中转录。

对于 Android ，我将音频保存为“.m4a”文件，并将 Amazon Transcribe API 调用为：

transcribe_client.start_transcription_job(TranscriptionJobName = job_name,
                                          Media={'MediaFileUri' : file_uri},
                                          MediaFormat='mp4',
                                          LanguageCode='en-US')

从 iOS 设备上传的 'MediaFormat' 应该是什么，通常是“.caf”文件？

Amazon Transcribe 仅允许这些媒体格式

 MP3, MP4, WAV, FLAC, AMR, OGG, and WebM

Answer 1

可能的解决方案：

创建一个 API 来为您进行转换。
您可以使用 FFMPEG python 库轻松创建一个。
使用已经制作好的 API.
通过使用 cloudconvert API 您可以轻松转换文件，但前提是您需要付费。
使用不同的库来录制 IOS 音频。
有一个名为 react-native-record-audio-ios 的模块完全用于 IOS 并在 .caf、.m4a 和 .wav.
中录制音频
使用LAME api进行转换。
正如所说 here, you can convert a .caf file into a .mp3 one by probably creating a native module 至极将运行这样：

FILE *pcm = fopen("file.caf", "rb");
FILE *mp3 = fopen("file.mp3", "wb");
const int PCM_SIZE = 8192;
const int MP3_SIZE = 8192;

short int pcm_buffer[PCM_SIZE*2];
unsigned char mp3_buffer[MP3_SIZE];

lame_t lame = lame_init();
lame_set_in_samplerate(lame, 44100);
lame_set_VBR(lame, vbr_default);
lame_init_params(lame);

do {
  read = fread(pcm_buffer, 2*sizeof(short int), PCM_SIZE, pcm);
  if (read == 0)
    write = lame_encode_flush(lame, mp3_buffer, MP3_SIZE);
  else
    write = lame_encode_buffer_interleaved(lame, pcm_buffer, read, mp3_buffer, MP3_SIZE);
  fwrite(mp3_buffer, write, 1, mp3);
} while (read != 0);

lame_close(lame);
fclose(mp3);
fclose(pcm);

正在创建运行s this objective-c 代码的本机模块：

-(void) convertToWav
{
// set up an AVAssetReader to read from the iPod Library

NSString *cafFilePath=[[NSBundle mainBundle]pathForResource:@"test" ofType:@"caf"];

NSURL *assetURL = [NSURL fileURLWithPath:cafFilePath];
AVURLAsset *songAsset = [AVURLAsset URLAssetWithURL:assetURL options:nil];

NSError *assetError = nil;
AVAssetReader *assetReader = [AVAssetReader assetReaderWithAsset:songAsset
                                                           error:&assetError]
;
if (assetError) {
    NSLog (@"error: %@", assetError);
    return;
}

AVAssetReaderOutput *assetReaderOutput = [AVAssetReaderAudioMixOutput
                                          assetReaderAudioMixOutputWithAudioTracks:songAsset.tracks
                                          audioSettings: nil];
if (! [assetReader canAddOutput: assetReaderOutput]) {
    NSLog (@"can't add reader output... die!");
    return;
}
[assetReader addOutput: assetReaderOutput];

NSString *title = @"MyRec";
NSArray *docDirs = NSSearchPathForDirectoriesInDomains (NSDocumentDirectory, NSUserDomainMask, YES);
NSString *docDir = [docDirs objectAtIndex: 0];
NSString *wavFilePath = [[docDir stringByAppendingPathComponent :title]
                         stringByAppendingPathExtension:@"wav"];
if ([[NSFileManager defaultManager] fileExistsAtPath:wavFilePath])
{
    [[NSFileManager defaultManager] removeItemAtPath:wavFilePath error:nil];
}
NSURL *exportURL = [NSURL fileURLWithPath:wavFilePath];
AVAssetWriter *assetWriter = [AVAssetWriter assetWriterWithURL:exportURL
                                                      fileType:AVFileTypeWAVE
                                                         error:&assetError];
if (assetError)
{
    NSLog (@"error: %@", assetError);
    return;
}

AudioChannelLayout channelLayout;
memset(&channelLayout, 0, sizeof(AudioChannelLayout));
channelLayout.mChannelLayoutTag = kAudioChannelLayoutTag_Stereo;
NSDictionary *outputSettings = [NSDictionary dictionaryWithObjectsAndKeys:
                                [NSNumber numberWithInt:kAudioFormatLinearPCM], AVFormatIDKey,
                                [NSNumber numberWithFloat:44100.0], AVSampleRateKey,
                                [NSNumber numberWithInt:2], AVNumberOfChannelsKey,
                                [NSData dataWithBytes:&channelLayout length:sizeof(AudioChannelLayout)], AVChannelLayoutKey,
                                [NSNumber numberWithInt:16], AVLinearPCMBitDepthKey,
                                [NSNumber numberWithBool:NO], AVLinearPCMIsNonInterleaved,
                                [NSNumber numberWithBool:NO],AVLinearPCMIsFloatKey,
                                [NSNumber numberWithBool:NO], AVLinearPCMIsBigEndianKey,
                                nil];
AVAssetWriterInput *assetWriterInput = [AVAssetWriterInput assetWriterInputWithMediaType:AVMediaTypeAudio
                                                                          outputSettings:outputSettings];
if ([assetWriter canAddInput:assetWriterInput])
{
    [assetWriter addInput:assetWriterInput];
}
else
{
    NSLog (@"can't add asset writer input... die!");
    return;
}

assetWriterInput.expectsMediaDataInRealTime = NO;

[assetWriter startWriting];
[assetReader startReading];

AVAssetTrack *soundTrack = [songAsset.tracks objectAtIndex:0];
CMTime startTime = CMTimeMake (0, soundTrack.naturalTimeScale);
[assetWriter startSessionAtSourceTime: startTime];

__block UInt64 convertedByteCount = 0;
dispatch_queue_t mediaInputQueue = dispatch_queue_create("mediaInputQueue", NULL);

[assetWriterInput requestMediaDataWhenReadyOnQueue:mediaInputQueue
                                        usingBlock: ^
 {

     while (assetWriterInput.readyForMoreMediaData)
     {
         CMSampleBufferRef nextBuffer = [assetReaderOutput copyNextSampleBuffer];
         if (nextBuffer)
         {
             // append buffer
             [assetWriterInput appendSampleBuffer: nextBuffer];
             convertedByteCount += CMSampleBufferGetTotalSampleSize (nextBuffer);
             CMTime progressTime = CMSampleBufferGetPresentationTimeStamp(nextBuffer);

             CMTime sampleDuration = CMSampleBufferGetDuration(nextBuffer);
             if (CMTIME_IS_NUMERIC(sampleDuration))
                 progressTime= CMTimeAdd(progressTime, sampleDuration);
             float dProgress= CMTimeGetSeconds(progressTime) / CMTimeGetSeconds(songAsset.duration);
             NSLog(@"%f",dProgress);
         }
         else
         {

             [assetWriterInput markAsFinished];
             //              [assetWriter finishWriting];
             [assetReader cancelReading];

         }
     }
 }];
}

但是，如前所述here：

Since the iPhone shouldn't really be used for processor intensive things such as audio conversion.

所以我向您推荐第三种解决方案，因为它更简单，而且看起来不像 Iphone 处理器的繁重任务。

'Media Format' 用于 Amazon Transcribe 中的“.caf”文件

'Media Format' for '.caf' file in Amazon Transcribe

ios

react-native

expo

aws-transcribe

expo-av