'Media Format' 用于 Amazon Transcribe 中的“.caf”文件
'Media Format' for '.caf' file in Amazon Transcribe
我有一个 React Native (Expo) 应用程序,它使用 expo-av
库捕获音频。
然后它将音频文件上传到 Amazon S3,然后在 Amazon Transcribe 中转录。
对于 Android ,我将音频保存为“.m4a”文件,并将 Amazon Transcribe API 调用为:
transcribe_client.start_transcription_job(TranscriptionJobName = job_name,
Media={'MediaFileUri' : file_uri},
MediaFormat='mp4',
LanguageCode='en-US')
从 iOS 设备上传的 'MediaFormat' 应该是什么,通常是“.caf”文件?
Amazon Transcribe 仅允许这些媒体格式
MP3, MP4, WAV, FLAC, AMR, OGG, and WebM
可能的解决方案:
创建一个 API 来为您进行转换。
您可以使用 FFMPEG python 库轻松创建一个。
使用已经制作好的 API.
通过使用 cloudconvert API 您可以轻松转换文件,但前提是您需要付费。
使用不同的库来录制 IOS 音频。
有一个名为 react-native-record-audio-ios 的模块完全用于 IOS 并在 .caf
、.m4a
和 .wav
.
中录制音频
使用LAME api进行转换。
正如所说 here, you can convert a .caf
file into a .mp3
one by probably creating a native module 至极将 运行 这样:
FILE *pcm = fopen("file.caf", "rb");
FILE *mp3 = fopen("file.mp3", "wb");
const int PCM_SIZE = 8192;
const int MP3_SIZE = 8192;
short int pcm_buffer[PCM_SIZE*2];
unsigned char mp3_buffer[MP3_SIZE];
lame_t lame = lame_init();
lame_set_in_samplerate(lame, 44100);
lame_set_VBR(lame, vbr_default);
lame_init_params(lame);
do {
read = fread(pcm_buffer, 2*sizeof(short int), PCM_SIZE, pcm);
if (read == 0)
write = lame_encode_flush(lame, mp3_buffer, MP3_SIZE);
else
write = lame_encode_buffer_interleaved(lame, pcm_buffer, read, mp3_buffer, MP3_SIZE);
fwrite(mp3_buffer, write, 1, mp3);
} while (read != 0);
lame_close(lame);
fclose(mp3);
fclose(pcm);
- 正在创建 运行s this
objective-c
代码的本机模块:
-(void) convertToWav
{
// set up an AVAssetReader to read from the iPod Library
NSString *cafFilePath=[[NSBundle mainBundle]pathForResource:@"test" ofType:@"caf"];
NSURL *assetURL = [NSURL fileURLWithPath:cafFilePath];
AVURLAsset *songAsset = [AVURLAsset URLAssetWithURL:assetURL options:nil];
NSError *assetError = nil;
AVAssetReader *assetReader = [AVAssetReader assetReaderWithAsset:songAsset
error:&assetError]
;
if (assetError) {
NSLog (@"error: %@", assetError);
return;
}
AVAssetReaderOutput *assetReaderOutput = [AVAssetReaderAudioMixOutput
assetReaderAudioMixOutputWithAudioTracks:songAsset.tracks
audioSettings: nil];
if (! [assetReader canAddOutput: assetReaderOutput]) {
NSLog (@"can't add reader output... die!");
return;
}
[assetReader addOutput: assetReaderOutput];
NSString *title = @"MyRec";
NSArray *docDirs = NSSearchPathForDirectoriesInDomains (NSDocumentDirectory, NSUserDomainMask, YES);
NSString *docDir = [docDirs objectAtIndex: 0];
NSString *wavFilePath = [[docDir stringByAppendingPathComponent :title]
stringByAppendingPathExtension:@"wav"];
if ([[NSFileManager defaultManager] fileExistsAtPath:wavFilePath])
{
[[NSFileManager defaultManager] removeItemAtPath:wavFilePath error:nil];
}
NSURL *exportURL = [NSURL fileURLWithPath:wavFilePath];
AVAssetWriter *assetWriter = [AVAssetWriter assetWriterWithURL:exportURL
fileType:AVFileTypeWAVE
error:&assetError];
if (assetError)
{
NSLog (@"error: %@", assetError);
return;
}
AudioChannelLayout channelLayout;
memset(&channelLayout, 0, sizeof(AudioChannelLayout));
channelLayout.mChannelLayoutTag = kAudioChannelLayoutTag_Stereo;
NSDictionary *outputSettings = [NSDictionary dictionaryWithObjectsAndKeys:
[NSNumber numberWithInt:kAudioFormatLinearPCM], AVFormatIDKey,
[NSNumber numberWithFloat:44100.0], AVSampleRateKey,
[NSNumber numberWithInt:2], AVNumberOfChannelsKey,
[NSData dataWithBytes:&channelLayout length:sizeof(AudioChannelLayout)], AVChannelLayoutKey,
[NSNumber numberWithInt:16], AVLinearPCMBitDepthKey,
[NSNumber numberWithBool:NO], AVLinearPCMIsNonInterleaved,
[NSNumber numberWithBool:NO],AVLinearPCMIsFloatKey,
[NSNumber numberWithBool:NO], AVLinearPCMIsBigEndianKey,
nil];
AVAssetWriterInput *assetWriterInput = [AVAssetWriterInput assetWriterInputWithMediaType:AVMediaTypeAudio
outputSettings:outputSettings];
if ([assetWriter canAddInput:assetWriterInput])
{
[assetWriter addInput:assetWriterInput];
}
else
{
NSLog (@"can't add asset writer input... die!");
return;
}
assetWriterInput.expectsMediaDataInRealTime = NO;
[assetWriter startWriting];
[assetReader startReading];
AVAssetTrack *soundTrack = [songAsset.tracks objectAtIndex:0];
CMTime startTime = CMTimeMake (0, soundTrack.naturalTimeScale);
[assetWriter startSessionAtSourceTime: startTime];
__block UInt64 convertedByteCount = 0;
dispatch_queue_t mediaInputQueue = dispatch_queue_create("mediaInputQueue", NULL);
[assetWriterInput requestMediaDataWhenReadyOnQueue:mediaInputQueue
usingBlock: ^
{
while (assetWriterInput.readyForMoreMediaData)
{
CMSampleBufferRef nextBuffer = [assetReaderOutput copyNextSampleBuffer];
if (nextBuffer)
{
// append buffer
[assetWriterInput appendSampleBuffer: nextBuffer];
convertedByteCount += CMSampleBufferGetTotalSampleSize (nextBuffer);
CMTime progressTime = CMSampleBufferGetPresentationTimeStamp(nextBuffer);
CMTime sampleDuration = CMSampleBufferGetDuration(nextBuffer);
if (CMTIME_IS_NUMERIC(sampleDuration))
progressTime= CMTimeAdd(progressTime, sampleDuration);
float dProgress= CMTimeGetSeconds(progressTime) / CMTimeGetSeconds(songAsset.duration);
NSLog(@"%f",dProgress);
}
else
{
[assetWriterInput markAsFinished];
// [assetWriter finishWriting];
[assetReader cancelReading];
}
}
}];
}
但是,如前所述here:
Since the iPhone shouldn't really be used for processor intensive things such as audio conversion.
所以我向您推荐第三种解决方案,因为它更简单,而且看起来不像 Iphone 处理器的繁重任务。
我有一个 React Native (Expo) 应用程序,它使用 expo-av
库捕获音频。
然后它将音频文件上传到 Amazon S3,然后在 Amazon Transcribe 中转录。
对于 Android ,我将音频保存为“.m4a”文件,并将 Amazon Transcribe API 调用为:
transcribe_client.start_transcription_job(TranscriptionJobName = job_name,
Media={'MediaFileUri' : file_uri},
MediaFormat='mp4',
LanguageCode='en-US')
从 iOS 设备上传的 'MediaFormat' 应该是什么,通常是“.caf”文件?
Amazon Transcribe 仅允许这些媒体格式
MP3, MP4, WAV, FLAC, AMR, OGG, and WebM
可能的解决方案:
创建一个 API 来为您进行转换。
您可以使用 FFMPEG python 库轻松创建一个。使用已经制作好的 API.
通过使用 cloudconvert API 您可以轻松转换文件,但前提是您需要付费。使用不同的库来录制 IOS 音频。
中录制音频
有一个名为 react-native-record-audio-ios 的模块完全用于 IOS 并在.caf
、.m4a
和.wav
.使用LAME api进行转换。
正如所说 here, you can convert a.caf
file into a.mp3
one by probably creating a native module 至极将 运行 这样:
FILE *pcm = fopen("file.caf", "rb");
FILE *mp3 = fopen("file.mp3", "wb");
const int PCM_SIZE = 8192;
const int MP3_SIZE = 8192;
short int pcm_buffer[PCM_SIZE*2];
unsigned char mp3_buffer[MP3_SIZE];
lame_t lame = lame_init();
lame_set_in_samplerate(lame, 44100);
lame_set_VBR(lame, vbr_default);
lame_init_params(lame);
do {
read = fread(pcm_buffer, 2*sizeof(short int), PCM_SIZE, pcm);
if (read == 0)
write = lame_encode_flush(lame, mp3_buffer, MP3_SIZE);
else
write = lame_encode_buffer_interleaved(lame, pcm_buffer, read, mp3_buffer, MP3_SIZE);
fwrite(mp3_buffer, write, 1, mp3);
} while (read != 0);
lame_close(lame);
fclose(mp3);
fclose(pcm);
- 正在创建 运行s this
objective-c
代码的本机模块:
-(void) convertToWav
{
// set up an AVAssetReader to read from the iPod Library
NSString *cafFilePath=[[NSBundle mainBundle]pathForResource:@"test" ofType:@"caf"];
NSURL *assetURL = [NSURL fileURLWithPath:cafFilePath];
AVURLAsset *songAsset = [AVURLAsset URLAssetWithURL:assetURL options:nil];
NSError *assetError = nil;
AVAssetReader *assetReader = [AVAssetReader assetReaderWithAsset:songAsset
error:&assetError]
;
if (assetError) {
NSLog (@"error: %@", assetError);
return;
}
AVAssetReaderOutput *assetReaderOutput = [AVAssetReaderAudioMixOutput
assetReaderAudioMixOutputWithAudioTracks:songAsset.tracks
audioSettings: nil];
if (! [assetReader canAddOutput: assetReaderOutput]) {
NSLog (@"can't add reader output... die!");
return;
}
[assetReader addOutput: assetReaderOutput];
NSString *title = @"MyRec";
NSArray *docDirs = NSSearchPathForDirectoriesInDomains (NSDocumentDirectory, NSUserDomainMask, YES);
NSString *docDir = [docDirs objectAtIndex: 0];
NSString *wavFilePath = [[docDir stringByAppendingPathComponent :title]
stringByAppendingPathExtension:@"wav"];
if ([[NSFileManager defaultManager] fileExistsAtPath:wavFilePath])
{
[[NSFileManager defaultManager] removeItemAtPath:wavFilePath error:nil];
}
NSURL *exportURL = [NSURL fileURLWithPath:wavFilePath];
AVAssetWriter *assetWriter = [AVAssetWriter assetWriterWithURL:exportURL
fileType:AVFileTypeWAVE
error:&assetError];
if (assetError)
{
NSLog (@"error: %@", assetError);
return;
}
AudioChannelLayout channelLayout;
memset(&channelLayout, 0, sizeof(AudioChannelLayout));
channelLayout.mChannelLayoutTag = kAudioChannelLayoutTag_Stereo;
NSDictionary *outputSettings = [NSDictionary dictionaryWithObjectsAndKeys:
[NSNumber numberWithInt:kAudioFormatLinearPCM], AVFormatIDKey,
[NSNumber numberWithFloat:44100.0], AVSampleRateKey,
[NSNumber numberWithInt:2], AVNumberOfChannelsKey,
[NSData dataWithBytes:&channelLayout length:sizeof(AudioChannelLayout)], AVChannelLayoutKey,
[NSNumber numberWithInt:16], AVLinearPCMBitDepthKey,
[NSNumber numberWithBool:NO], AVLinearPCMIsNonInterleaved,
[NSNumber numberWithBool:NO],AVLinearPCMIsFloatKey,
[NSNumber numberWithBool:NO], AVLinearPCMIsBigEndianKey,
nil];
AVAssetWriterInput *assetWriterInput = [AVAssetWriterInput assetWriterInputWithMediaType:AVMediaTypeAudio
outputSettings:outputSettings];
if ([assetWriter canAddInput:assetWriterInput])
{
[assetWriter addInput:assetWriterInput];
}
else
{
NSLog (@"can't add asset writer input... die!");
return;
}
assetWriterInput.expectsMediaDataInRealTime = NO;
[assetWriter startWriting];
[assetReader startReading];
AVAssetTrack *soundTrack = [songAsset.tracks objectAtIndex:0];
CMTime startTime = CMTimeMake (0, soundTrack.naturalTimeScale);
[assetWriter startSessionAtSourceTime: startTime];
__block UInt64 convertedByteCount = 0;
dispatch_queue_t mediaInputQueue = dispatch_queue_create("mediaInputQueue", NULL);
[assetWriterInput requestMediaDataWhenReadyOnQueue:mediaInputQueue
usingBlock: ^
{
while (assetWriterInput.readyForMoreMediaData)
{
CMSampleBufferRef nextBuffer = [assetReaderOutput copyNextSampleBuffer];
if (nextBuffer)
{
// append buffer
[assetWriterInput appendSampleBuffer: nextBuffer];
convertedByteCount += CMSampleBufferGetTotalSampleSize (nextBuffer);
CMTime progressTime = CMSampleBufferGetPresentationTimeStamp(nextBuffer);
CMTime sampleDuration = CMSampleBufferGetDuration(nextBuffer);
if (CMTIME_IS_NUMERIC(sampleDuration))
progressTime= CMTimeAdd(progressTime, sampleDuration);
float dProgress= CMTimeGetSeconds(progressTime) / CMTimeGetSeconds(songAsset.duration);
NSLog(@"%f",dProgress);
}
else
{
[assetWriterInput markAsFinished];
// [assetWriter finishWriting];
[assetReader cancelReading];
}
}
}];
}
但是,如前所述here:
Since the iPhone shouldn't really be used for processor intensive things such as audio conversion.
所以我向您推荐第三种解决方案,因为它更简单,而且看起来不像 Iphone 处理器的繁重任务。