Author: Jiang Wei, Gong Yang, Zhou Tao, Wang Bin
Development Background
Lingjian was founded in 2015 and is headquartered in Shanghai. It has set up more than 20 branches across the country. It owns 100 software copyrights, 91 registered trademarks, and 35 invention patents. It is committed to serving consumer medical enterprises such as dental clinics and medical aesthetic institutions. Provide integrated operation and management solutions.
LeadHealth has ISO/IEC27001 domestic and international dual certification, the Ministry of Public Security Level 3 certification, and the Ministry of Industry and Information Technology Level 3 general insurance certification. It fully empowers the digital operations of consumer medical institutions and helps institutions complete the tasks of increasing revenue, avoiding risks, reducing expenditures, and increasing efficiency. Healthy business closed loop.
Lingjian innovates the SaaS + e Dental Software provides dental institutions with single store/chain management, electronic medical records, customer relationship management, purchase, sale and inventory management, intelligent marketing, BI business intelligence analysis, image integration and other SaaS software services covering the entire business process of dental institutions; e Dental Dental Mall links to 1,000+ well-known domestic and foreign consumable brands in the upper reaches of the industrial chain, carefully selects nearly 20,000+ consumable products, and provides one-stop consumable purchasing services for dental institutions; Lingjian Yuejian is based on "accurate diagnosis, accurate orthodontics, and accurate monitoring". Concept is a new generation of invisible orthodontic solution service provider. It has launched multiple product series such as Yuejian adult orthodontics and Yueya children's early orthodontics.
As a leading dental institution in the industry, LeadHealth always pursues excellence in technology and services. Lingjian provides orthodontic algorithms for dental clinics. During daily consultations, dental doctors take photos of the status of consumers' oral teeth. Through the algorithm, they can give corresponding diagnosis and treatment suggestions to improve the efficiency of dental doctors.
Platform features and business pain points
The working hours of dental clinics are relatively fixed, usually from 08:30 to 18:00. It will be busier on holidays, and the time is extended to a month, and you can see obvious peaks and troughs.
In the early days, Linkage Information purchased a wave of GPU machines on the cloud for algorithm deployment and to provide external services. However, it encountered many problems during use, mainly as follows:
- Low resource utilization and waste of costs: The machine is held on a monthly basis and there is no business traffic during off-duty hours. Considering possible business peaks, multiple GPU machines are held, so the utilization rate is not high during off-peak periods. The overall resource utilization is maintained at around 5%.
- Slow business response during peak periods: Insufficient ability to cope with sudden traffic. Traffic exceeding expected will lead to increased service load and longer request response time, which directly affects the user experience of end customers. At the peak of business, a single request once queued for 10 minutes. Case.
- Insufficient monitoring and troubleshooting capabilities: The orthodontic algorithm continues to iterate. During the iterative deployment process, requests may occasionally freeze and program errors may occur. However, due to insufficient monitoring and alarming capabilities, it is impossible to proactively detect them in the first place. It often requires the terminal to use the store to provide feedback, which reduces the efficiency of the active optimization algorithm.
In addition, frequent operation and maintenance actions and continuous platform construction also bring a lot of daily work to operation and maintenance students, and also increase the instability factors of the system. In addition, GPU technology has developed rapidly in recent years, leading the Jian Technology students also need to continue to invest a lot of energy in this field. For the above problems, Lingjian Technology students began to seek better and more efficient solutions on the cloud.
solution
The Leadtech technical team has been looking for better solutions, hoping to improve costs, service experience, and operation and maintenance efficiency. After comparing multiple cloud products of Alibaba Cloud, they finally focused on function computing. .
Alibaba Cloud Function Compute is an event-driven, fully managed computing service. Through Function Compute, customers do not need to manage infrastructure such as servers, but only need to write code and upload it. Function Compute will prepare computing resources, run code in a flexible and reliable manner, and provide log query, performance monitoring, alarm and other functions.
In addition to supporting traditional CPU computing power, Function Computing also supports GPU computing power. It adopts the concept of server-less computing and provides a GPU computing resource allocated on demand to effectively solve the low performance caused by the original long-term use of GPU. Pain points such as resource utilization, high usage costs and low elasticity capabilities. Provide customers with more convenient and efficient GPU computing services, effectively carrying accelerated workloads such as AI model inference, AI model training, audio and video accelerated production, graphics and image acceleration, etc.
The function computing GPU resource architecture diagram is as follows:
Function computing GPU uses virtualization technology to achieve strong isolation of computing power, video memory, and faults, and is 100% compatible with native applications. Function Computing ensures the rapid supply of computing power through a two-level resource pool. The Function Computing GPU resource pool platform holds it, and customers only need to use it according to the amount, and do not need to pay for idle resources.
The function computing GPU resource request model is as follows:
After the GPU function is deployed, customers can open the reserved GPU instance by configuring the auto-scaling policy of the reserved GPU instance to provide the infrastructure capabilities required for real-time inference application scenarios. The function computing platform will reserve the HPA of the GPU instance based on the scaling indicators configured by the customer. Customer requests will be prioritized and allocated to the reserved GPU instance for inference services. The platform completely blocks cold starts, and the business maintains low-latency response. In addition, the platform integrates observable, log, monitoring, alarm and other systems to simplify problem troubleshooting efficiency and daily operation and maintenance work.
Finally, after a series of verifications by the Leadgen technical team, the final architecture diagram of Function Compute is as follows:
The architecture diagram is very simple, and the business process is as follows:
-
The customer makes the orthodontic algorithm into a standard image and puts it into the Alibaba Cloud Image Warehouse ACR;
-
When there is an orthodontic call request on the front end and the instance needs to be initialized, FC completes the initialization of the instance by pulling the image in the ACR and the underlying GPU resources, and completes the deployment of the algorithm application;
-
The orthodontic calculation request is sent to the newly created GPU application for calculation and the result is returned.
Results and Advantages
By placing the GPU computing load on Function Compute, the Leadgen technical team has successfully solved the previously encountered usage problems:
- Cost optimization: Function Compute’s pay-as-you-go billing method truly charges based on actual request processing time, minimizing the cost of resource holding. Compared with the earlier monthly GPU resource holding, Function Compute’s cost It has been reduced by about 90% , achieving a good cost reduction effect.
- Business experience during peak periods: Through early resource pull-up during peak business periods and on-demand elasticity of sudden resources, back-end resources can be supplied in a timely manner. After function computing is deployed, the store no longer has long queues, which greatly improves the efficiency of the store. Improved user experience.
- Efficient operation and maintenance: Through the built-in monitoring, logging and alarm system of Function Compute, you can pay attention to the overall operation of the business in real time. By configuring monitoring alarms, when an abnormality occurs, you can receive exception push notifications as soon as possible, and with the help of a complete log system and A professional technical team in function computing locates and solves program problems in a timely manner.
In addition, the use of function computing deployment gives the entire system good scalability. For future business growth, there is no need to worry about core GPU resource planning issues. This also lays a solid foundation for the sustainable development of the business.
Summary & Outlook
通过将核心应用迁移到函数计算平台,领健技术团队不仅成功应对了业务增长带来的挑战,还显著优化了成本结构,同时加速了开发和运维流程,实现了前所未有的敏捷性和效率。
Looking forward to the future, Leadgen’s technical team will continue to deepen its cooperation with Function Compute. As the company's business territory continues to expand, it is foreseen that more application scenarios will benefit from the elastic scalability, low cost and high efficiency of function computing. Leadgen's technical team plans to give priority to the use of function computing architecture when deploying new businesses, in order to further shorten product time to market, improve user experience, and continue to reduce operating costs.
Alibaba Cloud Function Computing also looks forward to working with LeadHealth to explore more efficient and smarter medical service solutions to help the digital transformation of the medical and health industry. It is believed that with the close cooperation between the two parties, LeadHealth can better serve patients and medical practitioners and promote the industry to move towards a more intelligent and efficient direction.
Click here to experience function calculation.
Microsoft's China AI team collectively packed up and went to the United States, involving hundreds of people. How much revenue can an unknown open source project bring? Huawei officially announced that Yu Chengdong's position was adjusted. Huazhong University of Science and Technology's open source mirror station officially opened external network access. Fraudsters used TeamViewer to transfer 3.98 million! What should remote desktop vendors do? The first front-end visualization library and founder of Baidu's well-known open source project ECharts - a former employee of a well-known open source company that "went to the sea" broke the news: After being challenged by his subordinates, the technical leader became furious and rude, and fired the pregnant female employee. OpenAI considered allowing AI to generate pornographic content. Microsoft reported to The Rust Foundation donated 1 million US dollars. Please tell me, what is the role of time.sleep(6) here?