用于移动设备的框架TensorFlow Lite发布重大更新

tensorflow用于移动设备的框架tensorflow lite发布重大更新，支持开发者使用手机等移动设备的gpu来提高模型推断速度。
在进行人脸轮廓检测的推断速度上，与之前使用cpu相比，使用新的gpu后端有不小的提升。在pixel 3和三星s9上，提升程度大概为4倍，在iphone 7上有大约有6倍。
为什么要支持gpu？
众所周知，使用计算密集的机器学习模型进行推断需要大量的资源。
但是移动设备的处理能力和功率都有限。虽然tensorflow lite提供了不少的加速途径，比如将机器学习模型转换成定点模型，但总是会在模型的性能或精度上做出让步。
而将gpu作为加速原始浮点模型的一种选择，不会增加量化的额外复杂性和潜在的精度损失。
在谷歌内部，几个月来一直在产品中使用gpu后端做测试。结果证明，的确可以加快复杂网络的推断速度。
在pixel 3的人像模式（portrait mode）中，与使用cpu相比，使用gpu的tensorflow lite，用于抠图/背景虚化的前景-背景分隔模型加速了4倍以上。新深度估计（depth estimation）模型加速了10倍以上。
在能够为视频增加文字、滤镜等特效的youtube stories和谷歌的相机ar功能playground stickers中，实时视频分割模型在各种手机上的速度提高了5-10倍。
对于不同的深度神经网络模型，使用新gpu后端，通常比浮点cpu快2-7倍。对4个公开模型和2个谷歌内部模型进行基准测试的效果如下：
使用gpu加速，对于更复杂的神经网络模型最为重要，比如密集的预测/分割或分类任务。
在相对较小的模型上，加速的效果就没有那么明显了，使用cpu反而有利于避免内存传输中固有的延迟成本。
如何使用？
安卓设备（用java）中，谷歌已经发布了完整的android archive (aar) ，其中包括带有gpu后端的tensorflow lite。
你可以编辑gradle文件，用aar替代当前的版本，并将下面的代码片段，添加到java初始化代码中。
//initializeinterpreterwithgpudelegate.gpudelegatedelegate=newgpudelegate();interpreter.optionsoptions=(newinterpreter.options()).adddelegate(delegate);interpreterinterpreter=newinterpreter(model,options);//runinference.while(true){writetoinputtensor(inputtensor);interpreter.run(inputtensor,outputtensor);readfromoutputtensor(outputtensor);}//cleanup.delegate.close();在ios设备（用c++）中，要先下载二进制版本的tensorflowlite。然后更改代码，在创建模型后调用modifygraphwithdelegate()。//initializeinterpreterwithgpudelegate.std::unique_ptrinterpreter;interpreterbuilder(model,op_resolver)(&interpreter);auto*delegate=newgpudelegate(nullptr);//defaultconfigif(interpreter->modifygraphwithdelegate(delegate)!=ktfliteok)returnfalse;//runinference.while(true){writetoinputtensor(interpreter->typed_input_tensor(0));if(interpreter->invoke()!=ktfliteok)returnfalse;readfromoutputtensor(interpreter->typed_output_tensor(0));}//cleanup.interpreter=nullptr;deletegpudelegate(delegate);
（更多的使用教程，可以参见tensorflow的官方教程，传送门在文末）
还在发展中
当前发布的，只是tensorflow lite的开发者预览版。
新的gpu后端，在安卓设备上利用的是opengl es 3.1 compute shaders，在ios上利用的是metal compute shaders。
能够支持的gpu操作并不多。有：
add v1、average_pool_2d v1、concatenation v1、conv_2d v1、depthwise_conv_2d v1-2、fully_connected v1、logistic v1
max_pool_2d v1、mul v1、pad v1、prelu v1、relu v1、relu6 v1、reshape v1、resize_bilinear v1、softmax v1、strided_slice v1、sub v1、transpose_conv v1
tensorflow官方表示，未来将会扩大操作范围、进一步优化性能、发展并最终确定api。
完整的开源版本，将会在2019年晚些时候发布。
传送门
使用教程：
https://www.tensorflow.org/lite/performance/gpu
项目完整文档：
https://www.tensorflow.org/lite/performance/gpu_advanced
博客地址：
https://medium.com/tensorflow/tensorflow-lite-now-faster-with-mobile-gpus-developer-preview-e15797e6dee7

智能手环／手表支持支付宝免密支付开发方案
三星GalaxyTabS4和iPadPro哪个最好
4种整流电流5种滤波电路图解
虹科分享 | 终端威胁防御 | 为什么高级威胁正在取胜？
爆苹果仍在继续研发可嵌入屏幕下方的指纹技术Touch ID
用于移动设备的框架TensorFlow Lite发布重大更新
Poco M3新机规格揭晓：骁龙662芯片组
华为手机出货量有望在未来两年超越苹果和三星
分析面对GPS干扰状况无人机该如何面对
随着科技水平的提高智能家居市场懒人经济值得挖掘
多管脚芯片是如何操作设置
浅析不同种类的车灯光源类型和技术及发展的方向
一键式影像测量仪好在哪？
新能源行业守来春天，并网仍然是症结所在
什么是预端接光缆
Microchip推出基于Arm的新型PIC单片机系列产品
第一颗ALTAIR卫星问世测试太空3D打印等新技术
工程机械行业的火热带动了下游焊接设备市场的增长
自制书架音箱分享
如何设置51单片机的中断优先级