Operation and maintenance analysis

Operation and maintenance

  • Operation and maintenance, here refers to Internet operation and maintenance, which usually belongs to the technical department, and the four major departments that are the same as R&D, testing, and system management are the technical support of Internet products. This division will be somewhat different between domestic and foreign companies and between large and small companies.
  • The general experience of the generation of an Internet product is: product manager, demand analysis, R&D department development, testing department testing, operation and maintenance department deployment and release, and long-term operation and maintenance.
  • Operation and maintenance is essentially the operation and maintenance of each stage of the life cycle of the network, server, and service, reaching a consensus and acceptable state in terms of cost, stability, and efficiency.

Operation and maintenance responsibilities

  • For start-up companies, the operation and maintenance department and the system department are generally integrated, and the related work is carried out by the same group of people, and the boundaries may not be very obvious. Large companies have higher requirements for operation and maintenance work and require a more refined division of labor. Therefore, the underlying work related to the computer room/network/operating system is separated and the system management department becomes the system management department, while the upper-level work related to application products is divided Responsible for operation and maintenance, became the operation and maintenance department. The following is a look at the responsibilities of operation and maintenance work in large Internet companies with a finer division of labor from the Internet product life cycle and the technologies involved in operation and maintenance.

Product Lifecycle

  • The responsibilities of operation and maintenance cover the product life cycle from design to release, operation and maintenance, change and upgrade, and to offline. The responsibilities of each stage include:

  • Before
    the product is released , the responsibility of the operation and maintenance engineer at this stage is to participate in the design and access the relevant operation and maintenance. It mainly includes:
    (1) Familiarity with the product business
    (2) Evaluation of the rationality of the product architecture design, including whether there is a single point and whether it is acceptable Fault tolerance, whether there is strong coupling, etc. At the same time, it is necessary to provide reasonable suggestions for product design so that the product can meet the basic requirements for online release and stable operation
    (3) Resource evaluation, including required server resources, network resources, and resource distribution, etc. , At the same time, the rationality of the resource budget application for related products and the control of service costs
    (4) Resources are in place, and the server and basic environment/domain name applied for are prepared in place.

  • Product release At
    this stage, the operation and maintenance engineer is responsible for the specific work of the release, integrating specific software and system/hardware resources into a product and providing external services.
    Updates to online services also belong to the release category. Product releases at this time generally must ensure online release and complete product upgrades without interrupting external services. For large and complex changes, there are cases where the service is suspended and then re-provided after the service deployment is completed, but this situation requires the operation and maintenance engineer to avoid it through technical means as much as possible.

  • Product operation and maintenance
    The main tasks at this stage include:
    (1) Monitoring: real-time monitoring of the status of service operation, to find abnormal service operation and resource consumption at any time; output important daily service operation reports to evaluate the overall operation of the service/business condition, found hidden service;
    (2) Troubleshooting: the service appear any abnormal timely treatment, to avoid possible problems magnify even suspend service. Before this, operation and maintenance engineers needed to formulate plans for handling various service abnormalities, such as computer room/network failures, program bugs, and other issues. When problems occur, they can automatically or manually execute the plans to stop losses. In addition to daily minor failures, operation and maintenance engineers also need to consider disaster recovery when the product is damaged to varying degrees, including large-scale computer room failures caused by force majeure such as earthquakes, and online product deletions that cause fatal damage to the product.
    Capacity management: including planning and specific implementation of resource assessment, capacity expansion, computer room migration, and traffic scheduling after service scale expansion.

  • Product performance/cost optimization The
    most important point of a product's external service is user experience. The most important thing in user experience is product availability and response speed. How to use the most reasonable resources (such as machines, bandwidth, etc.) to support the product to provide a highly available and high-speed user experience is also an important responsibility of the operation and maintenance engineer.


  • Internet products with well-developed offline products will always provide services online, but Internet products are rapidly iterating, and there are also many incubated products that are finally eliminated. These products need to be offline processed. This process is mainly done by operation and maintenance engineers. Good resource recovery work, and recycle resources such as machines/networks into the resource pool for use by other services.

Technical direction of operation and maintenance

  • The responsibilities of operation and maintenance in the entire life cycle of the product are important and extensive, but the responsibilities of the operation and maintenance engineers are not limited to this part of the work, but also need to summarize the problems encountered in the work, extract the relevant technical direction, research and development related tools and platforms To support/optimize business development and improve the efficiency of operation and maintenance, related technical work mainly includes:
  • Service monitoring technology: including the R&D and application of monitoring platform, service monitoring accuracy, real-time and comprehensive guarantee
  • Service fault management: including service fault plan design, automatic execution of plan plan, fault summary and feedback to the product/system design level for optimization to improve product stability
  • Service capacity management: measuring service capacity, planning service room construction, capacity expansion, migration, etc.
  • Service performance optimization: from all directions, including network optimization, operating system optimization, application optimization, client optimization, etc., improve service performance and response speed, and improve user experience
  • Service global traffic scheduling: access to the traffic of the service, according to the capacity and service status to distribute the traffic in each room
  • Service task scheduling: scheduling triggering and status monitoring of various timing/non-timing tasks of the service
  • Service security: including service access security, anti-attack, authority control, etc.
  • Data transmission technology: including the research and development and application of various transmission technologies such as p2p, as well as the solution of long-distance big data transmission and other problems
  • Automatic service release and deployment: the development of deployment platforms/tools, and the use of platforms/tools to achieve safe and efficient release services
  • Service cluster management: including service server management, large-scale cluster management, etc.
  • Service cost optimization: reduce the resources used by the service operation as much as possible, and reduce the service operation cost
  • Database Management (DBA): Through the design, development and management of high-performance database clusters, database services are made more stable, more efficient and easier to manage.
  • Platform-based development: development and management of docker-like and google borg platforms, and service access technology
  • Development and optimization of distributed storage platforms: R&D and service access of distributed storage platforms like google gfs
  • And so on, all work related to service quality, efficiency, cost, security, etc., and the technologies, components, tools, and platforms involved are all in the technical category of operation and maintenance. Doing a good job in every technical direction and completing the corresponding components, tools, and platform research and development can play a positive role in fulfilling the responsibility of operation and maintenance, and play a key role in the development of the business.

Skills and qualities

  • Operation and maintenance is based on technology, and provides higher quality services through technical guarantee products. The responsibilities of the operation and maintenance work and the position in the business determine that the operation and maintenance engineer needs to have more extensive knowledge and in-depth technical capabilities:
  • Solid basic computer knowledge, including computer system architecture, operating system, network technology, etc.;
  • General applications need to understand the operating system, network, security, storage, CDN, DB, etc., and know its related principles;
  • Programming ability, from the development of operation and maintenance tools to the development of large-scale operation and maintenance systems/platforms, requires good programming skills;
  • Data analysis ability: able to sort and analyze various data of system operation, find problems and find solutions;
  • Rich system knowledge, including system tools, typical system architecture, common platform selection, etc.;
  • Ability to comprehensively utilize tools and platforms;
  • The complexity of the operation and maintenance work also puts forward requirements for the soft qualities of the operation and maintenance engineers in this position:
  • Time management ability, especially the processing ability of fragmented time;
  • Calm mentality, you need to be calm when faced with an emergency;
  • Communication skills, teamwork, operation and maintenance work cross-departmental, cross-type work is a lot of work, need to be good at communication, and strong teamwork skills;
  • You need to be bold and careful in your work: bold to innovate and not take the usual path, especially for the new type of operation and maintenance, which requires innovation to promote development; careful, operation and maintenance engineers are the highest online authority, and need to be careful;
  • Initiative and execution, able to actively learn international and domestic operation and maintenance technology, and introduce it into the work to improve the quality and efficiency of operation and maintenance;

Development method

  • The daily work of a business operation and maintenance engineer includes:
  • Monitor online service quality
  • Responding to abnormalities/handling sudden failures
  • Online release/upgrade products
  • Coordinate with the development and testing of corresponding product lines to deal with product problems
  • Extract based on the problems and data analysis at work, settle the concept of operation and maintenance experience into methodology/tool/system/platform, and formulate related improvement plans, implement them in various technical directions, and finally feedback back to the operation and maintenance work. Improve the efficiency of operation and maintenance and the value of products.

Platform tools

  • The operation and maintenance platform and tools used by operation and maintenance engineers include:

  • Web server: apache, tomcat, nginx, lighttpd
    apache : Apache (transliterated as Apache) is the world's number one Web server software. It can run on almost all widely used computer platforms. Because of its cross-platform and security is widely used, it is one of the most popular Web server-side software. It is fast, reliable, and extensible through a simple API, compiling Perl/Python and other interpreters into the server.
    tomcat : Tomcat is a core project in the Jakarta project of the Apache Software Foundation, which is jointly developed by Apache, Sun, and other companies and individuals. Thanks to Sun's participation and support, the latest Servlet and JSP specifications can always be reflected in Tomcat. Tomcat 5 supports the latest Servlet 2.4 and JSP 2.0 specifications. Because Tomcat has advanced technology, stable performance, and free of charge, it is deeply loved by Java enthusiasts and recognized by some software developers, making it a popular Web application server.
    nginx : Nginx (engine x) is a high-performance HTTP and reverse proxy web server, and also provides IMAP/POP3/SMTP services. Nginx was developed by Igor Sesoyev for the second most visited site in Russia, Rambler.ru (Russian: Рамблер). The first public version 0.1.0 was released on October 4, 2004.
    lighttpd : Lighttpd is a German-led open source web server software. Its fundamental purpose is to provide a safe, fast, compatible and flexible web server environment for high-performance websites. It has the characteristics of very low memory overhead, low cpu occupancy, good performance and abundant modules.

  • Monitoring: nagios, ganglia, cacti, zabbix
    nagios : Nagios is an open source free network monitoring tool that can effectively monitor the status of Windows, Linux and Unix hosts, switch routers and other network equipment, printers, etc. When the system or service status is abnormal, an email or SMS alarm will be sent to notify the website operation and maintenance personnel at the first time, and a normal email or SMS notification will be sent after the status is restored.
    Ganglia : Ganglia cluster monitoring is an open source project initiated by UC Berkeley, designed to measure thousands of nodes. The core of Ganglia includes gmond, gmetad and a web front end. It is mainly used to monitor system performance, such as: cpu, mem, hard disk utilization, I/O load, network traffic, etc. It is easy to see the working status of each node through the curve, and it is reasonable to adjust and allocate system resources to improve The overall system performance plays an important role.
    cacti : Cacti is a set of network traffic monitoring graphical analysis tools based on PHP, MySQL, SNMP and RRDTool.
    zabbix : zabbix ([ `zæbiks]) is based on providing a distributed system monitoring and network monitoring capabilities WEB interface, enterprise-class open source solutions.

  • Automatic deployment: ansible, sshpt, salt
    ansible : Ansible is a newly emerging automated operation and maintenance tool, developed based on Python, and integrates the advantages of many operation and maintenance tools (puppet, cfengine, chef, func, fabric) to achieve batch system configuration, Batch program deployment, batch running commands and other functions.
    For other sshpt and salt, we can't find the relevant brief information for the time being, please understand.

  • Configuration management: puppet, cfengine
    cfengine : cfengine (configuration engine) is a UNIX management tool whose purpose is to automate simple management tasks and make difficult tasks easier. Cfengine is suitable for managing various environments, from one host to a cluster of tens of thousands of hosts. As of version 2.2, the largest installed fleet we know for general management is about 20,000 units.
    Puppet has no relevant profile yet

  • Load balancing: lvs, haproxy, nginx
    lvs : LVS is the abbreviation of linux virtual server, linux virtual server, is a virtual server cluster system, can realize the load balancing cluster function under the unix/linux platform. The project was organized and established by Dr. Zhang Wensong in May 1998.
    haproxy : HAProxy is a free and open source software written in C language that provides high availability, load balancing, and application proxy based on TCP and HTTP.
    nginx : Nginx (engine x) is a high-performance HTTP and reverse proxy web server, and also provides IMAP/POP3/SMTP services. Nginx was developed by Igor Sesoyev for the second most visited site in Russia, Rambler.ru (Russian: Рамблер). The first public version 0.1.0 was released on October 4, 2004.

  • Transfer Tool: Scribe, Flume
    Scribe : Scribe is open source Facebook log collection systems, applications within Facebook have gained. It can collect logs from various log sources and store them on a central storage system (which can be NFS, distributed file system, etc.) for centralized statistical analysis and processing.
    flume : Flume is a highly available, highly reliable, distributed system for massive log collection, aggregation and transmission provided by Cloudera. Flume supports customizing various data senders in the log system to collect data; at the same time, Flume Provides the ability to simply process data and write to various data recipients (customizable).

  • Backup tools: rsync, wget
    rsync : rsync is a data mirroring backup tool under linux system. Use Remote Sync, a fast incremental backup tool, to synchronize remotely, support local replication, or synchronize with other SSH or rsync hosts.
    wget : wget is a free tool that automatically downloads files from the Internet. It supports downloading through the three most common TCP/IP protocols of HTTP, HTTPS, and FTP, and can use HTTP proxy. The name "wget" comes from the combination of "World Wide Web" and "get".

  • Database: mysql, oracle, sqlserver
    mysql : MySQL is a relational database management system, developed by Sweden's MySQL AB, and is a product of Oracle. MySQL is one of the most popular relational database management systems. In terms of web applications, MySQL is one of the best RDBMS (Relational Database Management System) application software.
    oracle : Oracle Database, also known as Oracle RDBMS, or Oracle for short. It is a relational database management system of Oracle. It is a product that has always been in a leading position in the database field. It can be said that the Oracle database system is a popular relational database management system in the world. The system has good portability, easy use, and strong functions, and is suitable for all kinds of large, medium, small and microcomputer environments. It is a high-efficiency, reliable, and high-throughput database solution.
    sqlserver : SQL is the abbreviation of English Structured Query Language, meaning structured query language. The main function of SQL language is to establish contact with various databases and communicate. In accordance with ANSI (American National Standards Institute) regulations, SQL is used as the standard language of relational database management systems. SQL Server is a relational database management system (RDBMS) developed and promoted by Microsoft.

  • Distributed platforms: hdfs, mapreduce, spark, storm, hive
    hdfs : Hadoop Distributed File System (HDFS) refers to a distributed file system (Distributed File System) that is designed to run on commodity hardware. It has a lot in common with existing distributed file systems. But at the same time, the difference between it and other distributed file systems is also very obvious. HDFS is a highly fault-tolerant system, suitable for deployment on cheap machines. HDFS can provide high-throughput data access, which is very suitable for applications on large-scale data sets. HDFS relaxes some POSIX constraints to achieve the purpose of streaming file system data. HDFS was originally developed as the infrastructure of the Apache Nutch search engine project. HDFS is part of the Apache Hadoop Core project.
    mapreduce : MapReduce is a parallel computing model and method for large-scale data processing that was first proposed by Google. Google's original intention of designing MapReduce was mainly to solve the parallel processing of large-scale web page data in its search engine. After inventing MapReduce, Google first used it to rewrite the Web document index processing system in its search engine. However, because MapReduce can be universally applied to many large-scale data calculation problems, since the invention of MapReduce, Google has further applied it to many large-scale data processing problems. There are tens of thousands of different algorithm problems and programs in Google that use MapReduce for processing.
    spark: Apache Spark is a fast and universal computing engine designed for large-scale data processing. Spark is an open source Hadoop MapReduce-like general parallel framework developed by UC Berkeley AMP lab (University of California, Berkeley). Spark has the advantages of Hadoop MapReduce; but it is different from MapReduce-Job intermediate output results can be It is stored in memory, so that it is no longer necessary to read and write HDFS, so Spark can be better adapted to MapReduce algorithms that require iteration such as data mining and machine learning.
    Storm : Storm can easily write and expand complex real-time calculations in a computer cluster. Storm is used for real-time processing, just like Hadoop is used for batch processing. Storm guarantees that every message will be processed, and it is fast-in a small cluster, it can process millions of messages per second. What's better is that you can use any programming language for development.
    hive : Hive is a data warehouse tool based on Hadoop, used to extract, transform, and load data. This is a mechanism for storing, querying, and analyzing large-scale data stored in Hadoop. The hive data warehouse tool can map structured data files to a database table, and provide SQL query functions, which can convert SQL statements into MapReduce tasks for execution. The advantage of Hive is that it has low learning costs. It can realize fast MapReduce statistics through similar SQL statements, making MapReduce simpler without having to develop specialized MapReduce applications. Hive is very suitable for statistical analysis of data warehouse.

  • Distributed databases: hbase, cassandra, redis, MongoDB
    hbase : HBase is a distributed, column-oriented open source database, the technology is derived from the Google paper "Bigtable: a structured data distributed storage system" written by Fay Chang . Just as Bigtable utilizes the distributed data storage provided by the Google File System (File System), HBase provides capabilities similar to Bigtable on top of Hadoop. HBase is a sub-project of Apache's Hadoop project. HBase is different from the general relational database, it is a database suitable for unstructured data storage. Another difference is that HBase is column-based rather than row-based.
    cassandra : Cassandra is an open source distributed NoSQL database system. It was originally developed by Facebook to store data in simple formats such as inboxes. It integrates the data model of Google BigTable and the fully distributed architecture of Amazon Dynamo. Facebook opened Cassandra in 2008 as open source. Since then, due to Cassandra's good scalability, It has been adopted by well-known Web 2.0 websites such as Digg and Twitter, and has become a popular distributed structured data storage solution.
    redis : Redis (Remote Dictionary Server), the remote dictionary service, is an open source log-based, Key-Value database written in ANSI C language, supporting the network, memory-based or persistent, and providing APIs in multiple languages . From March 15, 2010, the development of Redis is hosted by VMware. Since May 2013, the development of Redis has been sponsored by Pivotal.
    mongodb: MongoDB is a product between relational database and non-relational database. Among non-relational databases, it is the most versatile and most like relational database. The data structure it supports is very loose, and it is a bson format similar to json, so it can store more complex data types. The biggest feature of Mongo is that the query language it supports is very powerful. Its syntax is somewhat similar to an object-oriented query language. It can almost achieve most of the functions similar to single-table queries in relational databases, and it also supports indexing of data.

  • Container: lxc, docker
    lxc : LXC is short for Linux Container. Lightweight virtualization can be provided to isolate processes and resources without the need to provide instruction interpretation mechanisms and the other complexities of full virtualization. It is equivalent to NameSpace in C++. The container effectively divides the resources managed by a single operating system into isolated groups to better balance conflicting resource usage requirements among isolated groups.
    docker : Docker is an open source application container engine that allows developers to package their applications and dependencies into a portable image, and then publish it to any popular Linux or Windows machine, which can also be virtualized. Containers use the sandbox mechanism completely, and there will be no interfaces between them.

  • Virtualization: openstack, xen, kvm
    openstack : Openstack is a cloud platform management project, it is not a software. This project is composed of several main components to complete some specific tasks. Openstack is an open source project that aims to provide software for the construction and management of public and private clouds. Its community has more than 130 companies and 1,350 developers. These organizations and individuals use Openstack as a general front end for infrastructure as a service resources. The primary task of the Openstack project is to simplify the cloud deployment process and bring it good scalability. This article hopes to help you set up and manage your own public cloud or private cloud by providing the necessary guidance information.
    xen : Xen is an open source virtual machine monitor developed by the University of Cambridge. It intends to run up to 100 full-featured operating systems on a single computer. The operating system must be explicitly modified ("ported") to run on Xen (but provide compatibility with user applications). This allows Xen to achieve high-performance virtualization without special hardware support.
    kvm : KVM is the abbreviation of Keyboard Video Mouse. KVM can access and control the computer by directly connecting the keyboard, video and mouse (KVM) ports. KVM technology does not require the target server to modify the software. This means that the target computer can be accessed at any time in the BIOS environment. KVM provides true motherboard-level access and supports multi-platform servers and serial devices. KVM technology has evolved from the original SOHO office type to an enterprise IT basic computer room facility management system. You can easily and directly access servers and devices located in multiple remote locations from the kvm client management software. The KVM over IP solution has a complete multi-site failover function, a direct interface that complies with the new server management standard (IPMI), and the ability to map local storage media to remote locations.

  • Security: kerberos, selinux, acl, iptables
    kerberos : Kerberos is a network authentication protocol whose design goal is to provide powerful authentication services for client/server applications through a key system. The realization of the authentication process does not depend on the authentication of the host operating system, does not require the trust based on the host address, does not require the physical security of all hosts on the network, and assumes that the data packets transmitted on the network can be arbitrarily read, modified and inserted. . Under the above circumstances, Kerberos, as a trusted third-party authentication service, implements authentication services through traditional cryptographic techniques (such as shared keys).
    selinux : SELinux (Security-Enhanced Linux) is the implementation of mandatory access control by the National Security Agency (NSA), and is the most outstanding new security subsystem in Linux history. NSA developed an access control system with the help of the Linux community. Under the restrictions of this access control system, a process can only access files that it needs in its tasks. SELinux is installed by default on Fedora and Red Hat Enterprise Linux, and is also available as an easy-to-install package on other distributions.
    acl : Access control list (ACL) is an access control technology based on packet filtering. It can filter data packets on the interface according to set conditions, allowing them to pass or discard. Access control lists are widely used in routers and Layer 3 switches. With the aid of access control lists, users' access to the network can be effectively controlled, thereby ensuring network security to the greatest extent.
    iptables : IPTABLES is an IP packet filtering system integrated with the latest version 3.5 Linux kernel. If the Linux system is connected to the Internet or LAN, a server, or a proxy server that connects the LAN and the Internet, the system is conducive to better control of IP packet filtering and firewall configuration on the Linux system.

  • Problem tracing: netstat, top, tcpdump, last
    netstat : Netstat is a console command, a very useful tool for monitoring TCP/IP networks, it can display the routing table, the actual network connection and the status information of each network interface device . Netstat is used to display statistical data related to IP, TCP, UDP and ICMP protocols, and is generally used to check the network connection of each port of the machine.
    top : The top command is one of the most popular Unix/Linux performance tools. System administrators can run the top command to monitor the process and overall performance of Linux.
    tcpdump : TCPDump can completely intercept data packets transmitted on the network to provide analysis. It supports filtering for network layers, protocols, hosts, networks or ports, and provides logical sentences such as and, or, not to help you remove useless information.
    last : display the messages of those who logged in since the system was started or from the beginning of each month

Broadly speaking, all open source software is platforms and tools that operation and maintenance engineers will use, and it also includes various platforms developed by itself in various technical directions of operation and maintenance.

Career Development

The operation and maintenance engineers are divided into several categories from the work method:

  • Operation and maintenance engineer/operation and maintenance development engineer:
    Responsible for the specific product line operation and maintenance work, but also need to master the development ability, in-depth business, best understand the pain points and problems of the business, and develop/optimize the platform, tools and Means, being able to access various excellent system architectures and have the ability to compare advantages and disadvantages. At the same time, the control of the business determines the role of the corresponding operation and maintenance engineer in business development. The long-term development is to become an architect of a large system.

  • Operation and maintenance platform R&D engineer:
    specializes in the development of operation and maintenance related common platforms and technologies, and requires certain product line operation and maintenance experience or obtains operation and maintenance requirements from the product line. It has high requirements for research and development capabilities, strict standards for system design, and can understand user needs, and make operation and maintenance products that are suitable for service operation and maintenance and meet the experience of operation and maintenance engineers. The long-term development is to become various technologies Technical experts in the vertical field.

  • Database R&D Engineer/Database Engineer: The
    database direction is a relatively special direction in operation and maintenance technology. Because of the importance of business, special positions are usually required. The industry also has deep research and accumulation in this direction. The main direction is database kernel, cloud database, etc., long-term development is a technical expert in the field of database, database architect.

  • Operation and maintenance manager: In
    the process of doing things, operation and maintenance students usually need to coordinate multiple RD and QA students. They have relatively high requirements for coordination and promotion capabilities. They have good technical depth and students with high coordination and promotion capabilities are very suitable for transformation management. Position, long-term development and management positions in the technical department are the same as CTO and CEO.

When engineers in all directions develop to a certain stage, there are no clear boundaries. They need to have strong operations and maintenance, architecture, programming, algorithm and other capabilities at the same time, which is a very demanding profession.

Operation and maintenance industry prospects

  • From an industry perspective, with the rapid development of the Internet in China, the increasing scale of websites, and the increasingly complex architecture, the requirements for full-time website operation and maintenance engineers and website architects will become more and more urgent, especially for Experienced and outstanding operation and maintenance personnel are in great demand, and the older they are, the more valuable they are.
  • From a personal point of view, the technical content and requirements of operation and maintenance engineers will become higher and higher. At the same time, they are also the people who are most familiar with the company's applications and architecture, and are getting more and more attention.
  • Internet operation and maintenance is a comprehensive technical position that integrates multiple disciplines (network, system, development, security, application architecture, storage, etc.), which provides operation and maintenance engineers with a good personal ability and technology development space.
  • Relevant experience in operation and maintenance work will become very important, and it will also become an individual's core competitiveness. Excellent operation and maintenance engineers have very good problem-solving capabilities at all levels and solutions, and the ability to think globally.
  • Since the knowledge of the operation and maintenance position is very broad, it is easier to cultivate or play some aspects of personal expertise or hobbies, such as kernel, network, development, database, etc., and can be very in-depth and proficient in this area. .
  • At present, the demand for operation and maintenance talents at home and abroad is very urgent, and the salary of operation and maintenance engineers has also risen, which is equal to or even higher than that of technical departments such as R&D and testing.

International conference

The following are some international conferences related to operation and maintenance. The conference involves the exchange and learning of operation and maintenance and related technologies.

Operation and maintenance related international conferences
nsdi'14
Percona live
Velocity Oreilly web
fcw'14
LISA '14
35th IEEE S&P 2014
SIGMOD/PODS ’14
OSDI'14
oow'13
SRECON

There is still too much knowledge about operation and maintenance, what we know is only the tip of the iceberg, operation and maintenance are unlimited~_ ~

Guess you like

Origin blog.csdn.net/qq_49296785/article/details/108789537