13 pitfalls I encountered when calling third-party interfaces

foreword

In actual work, we often need to call third-party API interfaces in projects to obtain data or report data for data exchange and communication.

So, what problems will be encountered when calling the third-party API interface? How to solve these problems?

This article will talk with you about the topic of third-party API interface, I hope it will be helpful to you.

534724f879411ad0643afa8981b24513.png

1 The domain name cannot be accessed

Generally, when we connect to the API interface of a third-party platform for the first time, we may first call it through a browser or postman to see if the interface is accessible.

Some people may feel that many times in one fell swoop.

actually not.

It is possible that when you call the API interface of a third-party platform, their interface is really down, and they don't know it yet.

Another most important situation is whether your work network can access the interface of this external network.

For security reasons, some companies set up firewalls for the development environment of the intranet, or have some other restrictions. Some ip whitelists can only access some designated external network interfaces.

If you find that the domain name you visit cannot be accessed in the development environment, you have to go to the operation and maintenance classmate to add it for you ip白名单.

2 signature error

Many third-party API interfaces usually add digital signature (sign) verification in order to prevent others from tampering with data.

sign = md5(multiple parameter concatenation + key)

When first connecting to the third-party platform interface, you will encounter problems such as parameter errors and signature errors.

Among them, parameter errors are easier to solve, and the focus is on the problem of signature errors.

Signatures are generated by some algorithm.

For example: concatenate the parameter name and parameter value with a colon, if there are multiple parameters, sort by the first letter, and then concatenate multiple parameters together. Then add salt (ie: key), and then pass md5 to generate a signature.

If there are multiple parameters and you order them in reverse alphabetical order, there will be problems with the final generated signature.

If you use the key of the development environment and use the key of the production environment, it may also cause problems with the production signature.

If the third-party platform requires the last 3 md5 generation signatures, but you only use 1 time, it may also cause problems with the produced signatures.

Therefore, the interface signature is more troublesome when the interface is jointly debugged.

It is best if the third-party platform provides an sdk to generate a signature. If not, you can only handwrite the signature algorithm according to their documents.

3 signature expired

Through the above step, we adjusted the signature, and we can normally access the third-party platform to obtain data.

But you may find that for the same request, after 15 minutes, you get the data again, but the return fails.

When the third-party platform designs the interface, it adds a timestamp verification to the signature, and the same request is allowed to return data within 15 minutes. If it exceeds 15 minutes, it will directly return failure.

This design is for safety considerations.

To prevent someone from using tools for brute force cracking, constantly forging signatures, and constantly calling interface verification, if you keep exhaustive, you will be able to pass the verification one day.

sign = md5(multiple parameter concatenation + key + timestamp)

Therefore, it is necessary to increase the checksum of the timestamp.

If this happens, don't panic, just re-initiate a new request.

4 The interface suddenly does not return data

If you call an API interface of a third-party platform to query data, data will always be returned at the beginning.

But suddenly one day no data was returned.

But the API interface can respond normally.

Don't be surprised, there may be a third-party platform that deleted the data.

After I docked the API interface of the third-party platform, I deployed it to the test environment, and found that their interface did not return data. The reason was that they deleted all the data in the test environment one day.

Therefore, before deploying the test environment, you must first communicate with the other party, which data to use for testing, and cannot be deleted.

5 token invalidation

Before the API interface of some platforms makes a request, it needs to call another API interface to obtain a token, and then carry the token information in the header to access other business API interfaces.

In the API interface for obtaining the token, we need to pass in information such as the account number, password, and key. The information is different for each interface docking party.

Before we request other API interfaces, do we call the interface for obtaining the token every time in real time to obtain the token? Or request a token once, cache it in redis, and get data directly from redis later?

Obviously, we are more inclined to the latter, because if we call the interface to obtain the token once in real time before requesting other API interfaces, then the interface will be requested twice each time, which will have some impact on performance.

If the requested token is saved to redis, there will be another problem: token失效the problem.

The token obtained by calling the third-party platform to obtain the token interface generally has a validity period, such as: 1 day, 1 month, etc.

During the validity period, the API interface can be accessed normally. If the validity period of the token is exceeded, the API interface does not allow access.

It's easy to handle, isn't it OK if we set the expiration time of redis to be the same as the validity period of the token?

The idea is good, but there are problems.

How can you guarantee that the server time of your system is exactly the same as the server time of the third-party platform?

I have encountered a large factory before, which provided an interface to obtain tokens, and initiated requests within 30 days, returning the same token value every time. If more than 30 days have passed, a new one is returned.

This may happen, the server time of your system should be faster, and the time of the third-party platform should be slower. As a result, after 30 days, your system called the third-party platform’s token acquisition interface to get the token or the old token, and updated it to redis.

After a while, the token expires, and your system still uses the old token to access other API interfaces of the third-party platform, and always returns failure. But it takes 30 days to get a new token, which is too long.

In order to solve this problem, it is necessary to catch the exception of token invalidation. If you find that the token is invalid when calling other API interfaces, immediately request a token interface and update the new token to redis immediately.

This can basically solve the token invalidation problem, and also ensure the stability and performance of accessing other interfaces as much as possible.

6 interface timeout

After the system goes online, the most likely problem to occur when calling a third-party API interface should be 接口超时a problem.

There is a very complicated link between the system and the external system, and there are many problems in the middle, which may affect the corresponding time of the API interface.

As the caller of the API interface, in the face of the timeout problem of the third-party API interface, in addition to giving them feedback on the problem and optimizing the interface performance, our more effective way may be to increase the interface call 失败重试机制.

For example:

int retryCount=0;
do {
   try {
      doPost();
      break;
   } catch(Exception e) {
     log.warn("接口调用失败")
     retryCount++;
   }
} where (retryCount <= 3)

If the interface call fails, the program will automatically retry immediately 3次.

If it succeeds after retrying, the API interface is called 成功.

If it still fails after 3 retries, call the API interface 失败.

7 The interface returns 500

Occasionally, 500 problems may occur due to different parameters when calling third-party API interfaces.

For example: Some API interfaces do not check the parameters in place, and a small number of required fields cannot be empty without verification.

It just so happens that for some system requests, when calling the API interface through a certain parameter, if that parameter is not passed in, the other party may have an NPE problem. And the return code of this interface is likely to be 500.

Another situation is that there is an internal bug in the API interface. Different parameters are passed in, and different conditional branch logic is taken. When a certain branch is taken, the interface logic is abnormal, which may cause the interface to return 500.

In this case, it is useless to retry the interface. You can only contact the third-party API interface provider to report related problems and ask them to investigate the specific reasons.

They might fix the bug, or fix the data, to fix the problem.

8 The interface returns 404

If you find in the system log that the third-party API interface called returns 404, it is very pitiful.

If the third-party API interface is not online, it is likely that they changed the name of the interface and did not notify you in time.

In this case, you can hammer them.

Another situation is that if the third-party API interface is already online, the interface can be called normally at the beginning.

The third party has not changed the interface address.

Later, one day, I suddenly found that the 404 problem still occurred when calling the third-party API interface.

In this case, it is likely that there is a problem with their gateway, the latest configuration does not take effect, or the problem is caused by changing the gateway configuration.

In a word: pit.

9 The interface returns less data

I used a third-party API interface to query data in pages before, and the access was very smooth, but after going online, I found that their interface lacked data.

After checking the reason, it was found that the paging query interface returned 总页数incorrect results, which were less than the actual situation.

Some friends may be curious, how did I discover such a weird problem?

Before calling the third-party API interface to query the classification data in pages, save it to our third-party classification table.

Suddenly one day, the product reported that a third-party category could not be found in the category tree.

After I confirmed it, I found that it was really not there.

From the response log of calling the third-party API interface, no data of this category can be found.

This API interface is a page-by-page query interface. At present, the query data has been divided into more than a dozen pages, but the category we want is still not found.

The previous approach was to call 第一页the data queried by the API interface once and find out at the same time 总页数. Then call it in a loop based on the total number of pages to query 其他页the data.

I guessed at the time that there might be a problem with the total number of pages returned by their interface.

Therefore, the interface call logic can be changed to this:

  • Starting from the first page, the number of pages will increase by 1 each time the API interface is called to check data. Then judge whether the data returned by the interface is smaller than pageSize,

  • If not less than, the next call is made.

  • If it is less than, it means that it is the last page, and subsequent calls can be stopped.

After verification, it is found that the data of that category can be obtained in this way, which only shows that the total number of pages returned by the third-party paging query interface is smaller than the actual situation.

10 Secretly changed parameters

I have called the API interface of a certain platform before to obtain the status of indicators. According to the previous agreement between the two parties, there are two kinds of status: 正常and 禁用.

Then update the status to our metrics table.

Later, the systems of both parties went online and ran for several months.

Suddenly one day, the user reported that a piece of data was obviously deleted, why it can still be found on the page.

At this time, I checked the indicator table on our side and found that the status was normal.

Then check the API interface log for calling the platform, and find that the returned status of the indicator is: 下架.

what?

What state is this?

After communicating with the developers of the platform, it was found that they changed the enumeration of the status, added multiple values: on-shelf, off-shelf, etc., and did not notify us.

This is the pit.

Judging from our code here, if the state is not disabled, it is considered to be a normal state.

The off-shelf state is automatically judged as a normal state.

After communicating with the other party, they confirmed that the off-shelf status is abnormal, and indicators should not be displayed. They changed the data and temporarily solved the problem with this indicator.

Later, they changed the enum value back to the previous state according to the interface documentation.

11 Interfaces are hit and miss

I don't know if you have encountered a situation where the interface is good and bad when calling a third-party interface.

5 minutes ago, the interface could return data normally.

After 5 minutes, the interface returns 503 Unavailable.

After a few more minutes, the interface can return data normally again.

In this case, there is a high probability that the third-party platform is restarting the service. During the restart process, the service may be temporarily unavailable.

There is another situation: multiple service nodes are deployed on the third-party interface, and some of the service nodes are down. It will also lead to good and bad return values ​​when requesting third-party interfaces.

In addition, there is another situation: the configuration of the gateway is not updated in time, and the offline services are not removed.

In this way, when the user request passes through the gateway, the gateway forwards it to the offline service, making the service unavailable. The gateway forwards the request to the normal service, which returns normally.

If you encounter this problem, you should report the problem to the third-party platform as soon as possible, and then add a retry mechanism for interface failure.

12 Inconsistency between documentation and interface logic

I also encountered an API query interface provided by a third-party platform before, and the interface document clearly stated that there is a drfield representation 删除状态.

With this field, when we synchronize the classified data of the third-party platform, we can know which data has been deleted, and then we can adjust the data on our side in time to delete related data.

Later, I found that some categories had been deleted on their side, but not on our side.

What is the situation?

The logic of the code is very simple. I reviewed the code and found no bugs. Why does this happen?

After tracing the logs, it was found that when the third-party platform was called to obtain the classification interface, the other party did not return the deleted classification data to us.

That is to say, the dr field in the interface document is useless, and the interface document is inconsistent with the interface logic.

It is estimated that many friends have encountered this problem.

If you want to solve this problem, there are two main solutions:

  1. The third-party platform modifies the interface logic according to the document and returns the deleted status.

  2. After calling the category query interface, our system judges according to the category code, if the code of some categories in the database is not in the interface return value, delete these categories.

13 in arrears

We have called Baidu's bill identification interface, which can automatically identify the invoice information and obtain information such as the invoice number and amount.

It was the interface connected by another colleague before, but he left later.

The invoice recognition function has been launched and has been used for a long time without any problems.

Later, one day, the production environment users reported that the invoice could not be recognized.

I checked the logs of related services and found no abnormalities, which is strange.

I opened the code and took a closer look, and found that the colleague's code called the third-party API interface. When receiving the response data, it was directly converted into an object, and the string returned at that time was not printed.

Could it be that there is a problem with the return value of the interface?

Later, I added a log and printed out the real return value of the interface.

The reason was found out, it turned out to be in arrears.

If there is an exception, the data structure returned by Baidu's API interface cannot be obtained with some parameters of the previous colleague's entity.

This is not a small pit.

When we receive the data returned by the third-party API interface, we try our best to use the string to receive the return value first, and then convert the string into the corresponding entity class. We must print the return value in the log to facilitate later positioning of the problem.

Do not directly use the entity object to receive the return value. For some API interfaces, if different exceptions occur, the returned data structure is quite different.

Some abnormal results may be directly returned by their gateway system, and some abnormal results may be returned by their business system.

In fact, we have encountered other pitfalls before, such as calling the classification tree query interface, but the data returned by the third party has duplicate ids. How should we deal with this abnormal data?

We cyclically call the third-party API interface in the job to obtain data. If one of the calls fails, is it try/catch to catch the exception? Continue to execute subsequent calls, or terminate the current program directly? If try/catch how to ensure data consistency? Terminate the current program, how to deal with the subsequent process?

Guess you like

Origin blog.csdn.net/weixin_44045828/article/details/130333958