如何解决转换为C / NDK / JNI的代码效率不如Java原始代码
这是我第一次不那么沉迷于NDK。
出于性能目的,我想将this code重写为NDK。我的c
文件如下所示:
#include <jni.h>
#include <stdbool.h>
#include <stdio.h>
#include <time.h>
#include <android/log.h>
JNIEXPORT jbyteArray JNICALL
Java_com_company_app_tools_NV21FrameRotator_rotateNV21(JNIEnv *env,jclass thiz,jbyteArray data,jbyteArray output,jint width,jint height,jint rotation) {
clock_t start,end;
double cpu_time_used;
start = clock();
jbyte *dataPtr = (*env)->GetByteArrayElements(env,data,NULL);
jbyte *outputPtr = (*env)->GetByteArrayElements(env,output,NULL);
unsigned int frameSize = width * height;
bool swap = rotation % 180 != 0;
bool xflip = rotation % 270 != 0;
bool yflip = rotation >= 180;
for (unsigned int j = 0; j < height; j++) {
for (unsigned int i = 0; i < width; i++) {
unsigned int yIn = j * width + i;
unsigned int uIn = frameSize + (j >> 1u) * width + (i & ~1u);
unsigned int vIn = uIn + 1;
unsigned int wOut = swap ? height : width;
unsigned int hOut = swap ? width : height;
unsigned int iSwapped = swap ? j : i;
unsigned int jSwapped = swap ? i : j;
unsigned int iOut = xflip ? wOut - iSwapped - 1 : iSwapped;
unsigned int jOut = yflip ? hOut - jSwapped - 1 : jSwapped;
unsigned int yOut = jOut * wOut + iOut;
unsigned int uOut = frameSize + (jOut >> 1u) * wOut + (iOut & ~1u);
unsigned int vOut = uOut + 1;
outputPtr[yOut] = (jbyte) (0xff & dataPtr[yIn]);
outputPtr[uOut] = (jbyte) (0xff & dataPtr[uIn]);
outputPtr[vOut] = (jbyte) (0xff & dataPtr[vIn]);
}
}
(*env)->ReleaseByteArrayElements(env,dataPtr,0);
(*env)->ReleaseByteArrayElements(env,outputPtr,0);
end = clock();
cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC;
char str[10];
sprintf(str,"%f",cpu_time_used * 1000);
__android_log_write(ANDROID_LOG_ERROR,"NV21FrameRotator",str);
return output;
}
这两个代码片段(链接到Java和更高版本)都可以很好地工作,但是当我测量处理持续时间时,看起来在同一设备上,Java版本大约需要7毫秒(Log.i(
Java侧日志)和C 12-13毫秒。 。不应该更快,为什么不呢?渔获物在哪里?
long micros = System.nanoTime() / 1000;
// ~7ms,Java
//data = rotateNV21(inputData,width,height,rotateCameraDegrees);
// ~12-13ms,C
NV21FrameRotator.rotateNV21(inputData,rotateCameraDegrees);
Log.d(TAG,"Last frame processing duration: " + (System.nanoTime() / 1000 - micros) + "µs");
PS。 Java日志有时显示的持续时间比clock()
文件中的本机c
测量的时间短……示例日志:
NV21FrameRotator: 7.942000
NV21RotatorJava: Last frame processing duration: 7403µs
NV21FrameRotator: 7.229000
NV21RotatorJava: Last frame processing duration: 7166µs
NV21FrameRotator: 16.918000
NV21RotatorJava: Last frame processing duration: 20644µs
NV21FrameRotator: 19.594000
NV21RotatorJava: Last frame processing duration: 20479µs
NV21FrameRotator: 9.484000
NV21RotatorJava: Last frame processing duration: 7274µs
编辑:compile_commands.json
用于armeabi-v7a
(旧设备,我只在建造这一个)
[
{
"directory": "...app/.cxx/cmake/basicRelease/armeabi-v7a","command": "...sdk\\ndk\\21.0.6113669\\toolchains\\llvm\\prebuilt\\windows-x86_64\\bin\\clang.exe --target=armv7-none-linux-androideabi21 --gcc-toolchain=...sdk/ndk/21.0.6113669/toolchains/llvm/prebuilt/windows-x86_64 --sysroot=...sdk/ndk/21.0.6113669/toolchains/llvm/prebuilt/windows-x86_64/sysroot -DNV21FrameRotator_EXPORTS -g -DANDROID -fdata-sections -ffunction-sections -funwind-tables -fstack-protector-strong -no-canonical-prefixes -D_FORTIFY_SOURCE=2 -march=armv7-a -mthumb -Wformat -Werror=format-security -Oz -DNDEBUG -fPIC -o CMakeFiles\\NV21FrameRotator.dir\\NV21FrameRotator.c.o -c ...app\\src\\main\\cpp\\NV21FrameRotator.c","file": "...app\\src\\main\\cpp\\NV21FrameRotator.c"
}
]
CMakeFile
:
cmake_minimum_required(VERSION 3.4.1)
add_library(NV21FrameRotator SHARED
NV21FrameRotator.c)
find_library(log-lib
log )
target_link_libraries(NV21FrameRotator
${log-lib} )
解决方法
JNI具有非常高的开销,尤其是在传递非POD类型或缓冲区时。因此,调用JNI函数通常可能比Java版本慢得多。
请考虑传递java.nio.ByteBuffer,以避免可能的字节数组副本。
,-
将Java的性能与真实设备上的C进行比较,仿真器不会产生可靠的结果。
-
在发行版本中比较Java与C的性能,在C中进行调试的速度很慢,而Java仍然获得了完整的JIT(和AOT)优化。
-
您可能会为您的方案寻找最佳的优化选择。默认情况下,版本将使用
-Oz
构建。要优先选择速度而不是大小,可以将其添加到 build.gradle :android { buildTypes { release { externalNativeBuild.cmake.cFlags "-O3" } } }
-
您的C代码(实际上是原始Java代码)进行了一些优化。 (据我所知)主要的效率低下是您重新计算了每个U和V值四次。简单的解决方法是split the loops。
-
进一步的优化可以避免内循环中的乘法(在外循环中也可以消除,但是效果可以忽略不计):
#include <jni.h>
#include <stdbool.h>
#include <stdio.h>
#include <time.h>
#include <android/log.h>
JNIEXPORT jbyteArray JNICALL
Java_com_company_app_tools_NV21FrameRotator_rotateNV21(JNIEnv *env,jclass thiz,jbyteArray data,jbyteArray output,jint width,jint height,jint rotation) {
clock_t start,end;
double cpu_time_used;
start = clock();
jbyte *dataPtr = (*env)->GetByteArrayElements(env,data,NULL);
jbyte *outputPtr = (*env)->GetByteArrayElements(env,output,NULL);
unsigned int frameSize = width * height;
bool swap = rotation % 180 != 0;
bool xflip = rotation % 270 != 0;
bool yflip = rotation >= 180;
unsigned int wOut = swap ? height : width;
unsigned int hOut = swap ? width : height;
unsigned int yIn = 0;
for (unsigned int j = 0; j < height; j++) {
unsigned int iSwapped = swap ? j : 0;
unsigned int jSwapped = swap ? 0 : j;
unsigned int iOut = xflip ? wOut - iSwapped - 1 : iSwapped;
unsigned int jOut = yflip ? hOut - jSwapped - 1 : jSwapped;
unsigned int yOut = jOut * wOut + iOut;
for (unsigned int i = 0; i < width; i++) {
outputPtr[yOut] = dataPtr[yIn];
if (swap) {
yOut += yflip ? -wOut : wOut;
} else {
yOut += xflip ? -1 : 1;
}
yIn++;
}
}
unsigned int uIn = frameSize;
for (unsigned int j = 0; j < height; j+=2) {
unsigned int iSwapped = swap ? j : 0;
unsigned int jSwapped = swap ? 0 : j;
unsigned int iOut = xflip ? wOut - iSwapped - 1 : iSwapped;
unsigned int jOut = yflip ? hOut - jSwapped - 1 : jSwapped;
unsigned int uOut = frameSize + (jOut / 2) * wOut + (iOut & ~1u);
for (unsigned int i = 0; i < width; i+=2) {
unsigned int vIn = uIn + 1;
unsigned int vOut = uOut + 1;
outputPtr[uOut] = dataPtr[uIn];
outputPtr[vOut] = dataPtr[vIn];
if (swap) {
uOut += yflip ? -wOut : wOut;
} else {
uOut += xflip ? -2 : 2;
}
uIn += 2;
}
}
(*env)->ReleaseByteArrayElements(env,dataPtr,JNI_ABORT);
(*env)->ReleaseByteArrayElements(env,outputPtr,0);
end = clock();
cpu_time_used = ((double) (end - start)) / CLOCKS_PER_SEC;
__android_log_print(ANDROID_LOG_ERROR,"NV21FrameRotator","%.1f ms",cpu_time_used * 1000);
return output;
}
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。