Large-scale, high-quality financial knowledge map, how to automate the build?

Mapping knowledge (Knowledge Graph, KG) is on a large scale is essentially a semantic network, including physical, all kinds of semantic relationships between concepts and has now become the cornerstone of cognitive intelligence, is developing the core technology of artificial intelligence, it let the machine language recognition, artificial intelligence could explain possible, can significantly enhance the ability of machine learning, will become a very important way to solve the problem of data-driven phase parallel.

In recent years, as the representative of the knowledge spectrum intelligence technology is touching more and more areas, many companies have upgraded to artificial intelligence core business strategy. Is also currently enabling more and more business, promoting financial smarter artificial intelligence technology in the financial sector, knowledge maps and so on.


Challenges and Opportunities of financial KG

In the early exploration of financial knowledge map construction, it was believed that only closely related to the stock, futures, listed companies and financial. However, in practice, almost all things are related to finance in some sense, such as a sudden tornado area, it may affect crop yields, thereby affecting shipments of agricultural machinery, and ultimately affect a home production of agricultural engines shares of listed companies.

Such an association similar analysis, what we expect to achieve financial intelligence, and such a depth correlation analysis is obviously very easy to go beyond the boundaries of any expert knowledge system set in advance. So, in a sense, knowledge is generally associated with the financial sector to build the knowledge base, also face the same challenges and common knowledge base.

In addition, the diverse needs of a huge scale to support very complex business systems, companies can invest limited resources and other issues, in building financial knowledge map also brings some challenges, especially in terms of data, once specific to a particular financial scenario, data may be very sparse, and uneven distribution, poor quality.

However, lack of data in the financial sector scene, often knowledgeable experts, not much structured data scene, often rich text data, these are opportunities for financial knowledge map. Coupled with the current depth model rich variety of methods exist, the Internet has some high-quality maps can be fully utilized, have provided favorable conditions for building a map.


Construction of large-scale automated financial KG needs it?

Knowledge maps from the traditional knowledge engineering, in the late seventies, rely mainly on traditional knowledge engineering experts to describe the body in a field, to complete expression and access to knowledge through artificial means. Clearly, today's data size of the financial scene is very large, bottom-up approach requires the development of an automated data-driven, to efficiently implement map construction.

Building knowledge map has three key elements: people - the initiator of the whole knowledge map construction, it is marked by the data, and support for final validation; model - mapping knowledge of construction method is now widely adopted, mainly machine learning models ; data - data model is used labeled or unlabeled data.

Construction of large-scale automated mapping knowledge also need to take into account the above-mentioned three elements, control labor costs, large-scale knowledge acquisition, while ensuring the quality of the knowledge map, enough to build a universal, lightweight, inexpensive knowledge map. Based on current experience in academia, here are some basic principles.


End model is better than the pipeline plan

The so-called "end to end", that is, from the raw data input to the output task results, and predict the whole training process, is done in the model of. Pipeline (Pipeline) technology refers to a plurality of instructions, when executed operate to overlap the program to achieve a quasi-parallel processing techniques, is relatively easy to lead to the accumulation and propagation of errors, resulting in poor accuracy of the final. Under similar circumstances accuracy, can give priority to the use of end to end solutions, reducing labor costs characteristic of the project, to avoid error propagation.


When there are huge amounts of data, unsupervised method is more appropriate

The method of supervised and unsupervised method of choice is conditional, in the case where there are huge amounts of data, unsupervised method is more appropriate. In recent years, the industry developed a large vocabulary of unsupervised mining methods, especially in entity recognition, there are good results. Integration of a variety of statistical features are critical to achieve better results, but need to pay attention, characterized by important than the model.

Any intelligent one area, often from the vocabulary knowledge in the field of excavation began, the financial sector should also do so. It is similar with people learning, people get to know a new area, but also to learn the basic vocabulary in this field, understand the concept of vocabulary, upper and lower word, a synonym, abbreviation and so on. Let the machine get vocabulary knowledge, often unsupervised approach is needed because many of the scenes lack of standard data, but often there is sufficient text, as long as the text is large enough, can be efficiently and accurately digging out the vocabulary of the field through the use of unsupervised approach .


Make full use of behavioral data

Building knowledge map, in addition to digging in the text, or converted from existing relational tables, but are also can be constructed by digging user behavior data. Such as electricity providers, search and other scenes, there is a wealth of user behavior data, these search logs can very well help us build some relationships between words, such as search "Fintech" people will always point to open financial documents related to science and technology, so "Fintech" most likely "financial technology" a synonym.

Many companies have internal search platform for building knowledge map has the same value. By fully tap the search log data such as user behavior, the relationship between the mining vocabulary, help build the knowledge map.

 
Statistical model requires a combination of knowledge and symbols

Compared to single statistical models, statistical models and symbols combined knowledge will be more effective. The financial sector has a wealth of knowledge symbol, for example, expert rules, this knowledge can help to enhance the effect of the statistical model. For example, in playing to an entity tag, we can construct some initial restraint, give a simple example, if xx is an individual, he is certainly not a book; if he is an entrepreneur, he certainly is a personal matter. These constraints are essentially symbolic knowledge, make full use of various constraints prior knowledge structure is the key ideas effectively enhance the effect of the model.

Symbol knowledge can also be used to construct attention mechanisms. Now attention is important for deep learning models, simply put, it is to fight weight. For example, in "She has been with Apple for ten years", that is, when we give the "Apple" word play tag, "mobile phone" label "fruit" label suitable than others. Focus on symbolic knowledge to construct mechanisms inside depth model helps achieve real knowledge guide, so that better.


Indirect Direct superior knowledge data driver guide

Depth learning model is essentially a direct data-driven, but in some cases, need to start to dig some of the data pattern, then the pattern integrated into the depth model, will achieve better results. Some people do relation extraction time, which will be modeled as a relationship classification, but in fact can be tapped keywords describing enhanced output from the corpus, the subject model mining-related keywords relations labels, label use keywords to enhance relations described, thereby significantly improve the accuracy of relation extraction.


FIG explanatory model can be enhanced by

FIG model is universal, there is a very strong expression, and interpretable, controllable, easy to adjust. Interpretability determines the decision result of artificial intelligence systems can be believed humans. Such as smart investment decisions in the financial sector, even if accurate artificial intelligence decision-making more than 90%, but if the system does not give reasons for its decisions, the Investment Manager or the user is probably also very hesitant.

 

Expert knowledge can be used as seed samples

In terms of labeling data samples, if both the expert knowledge of building systems, but also knowledge of automated build, build small-scale use of expert knowledge as a data-driven seed samples, is one of the important ideas effectively reduce the manual annotation can be reduced model construction costs. 

In addition, the actual landing, the complex architecture is very important, such as statistical and rule effectively address the challenges of uneven distribution of samples brought to a single model; crowdsourcing Validation essential, because there are always some knowledge of verification is correct or not to give to human beings; updating knowledge map, you can use the Internet to drive hotspots map update, because hot spots entity on the Internet only, will it be possible to change that fact, popular entity (such as Qin Shi Huang this historical terms) general will not change.

Guess you like

Origin www.cnblogs.com/chenyusheng0803/p/12109839.html