Browser cache and vue-cli cache strategy

Cache categories:

HTTP cache is divided into private cache and shared cache according to the cached nodes.
A private cache is a cache that is bound to a specific client - usually a browser cache. Since the stored responses are not shared with other clients, a private cache can store responses personalized for that user.

A shared cache sits between the client and server and stores responses that can be shared between users. Shared caches can be further subdivided into proxy caches and hosted caches.

Proxy caching means that in addition to the function of access control, some proxies also implement caching to reduce network traffic, which is usually not managed by service developers, so there must be appropriate controls such as HTTP headers.

Hosted caches are explicitly deployed by service developers to reduce origin server load and deliver content efficiently, including reverse proxies, CDNs, and service workers in combination with caching APIs. The characteristics of hosted caches vary depending on the products deployed. In most cases, the behavior of the cache can be controlled through the Cache-Control header and a configuration file or dashboard. For example, the HTTP caching specification does not per se define a way to explicitly delete the cache - but with a managed cache, stored responses can be deleted in real-time through dashboard actions, API calls, restarts, etc.
insert image description here

Let's focus on browser caching.

What is browser caching?

Browser caching is actually the browser saving all resources obtained through HTTP requests, and it is a behavior in which the browser stores network resources locally. When the client requests resources from the server, it will first arrive at the browser cache. If the browser has a copy of the "resource to be requested", it can directly fetch it from the browser cache instead of fetching the resource from the original server. The browser's cache mechanism is based on Response headerthe cache identifier of the response header of the HTTP request, and the actual setting of the client's browser cache is set in the server resource.

Common browser caches can only cache resources that respond to GET requests, but are powerless for other types of responses.

Why use caching?

  1. Caching reduces redundant data transmission and saves network resources
  2. Caching reduces server pressure
  3. Caching speeds up page presentation

browser caching process

insert image description here
The browser cache starts from the second request:

  1. When the browser loads the resource for the first time, the server returns 200, and Response headerreturns the resource's cache parameters Expires/Cache-Control, Last-Modified/ETag, etc.; the browser downloads the resource file from the server, and puts response headerthe The return time of the request is cached together;
    insert image description here

  2. The next time you access a resource, it will first determine whether the browser's local cache has expired. If it has not expired, it will hit the mandatory cache and not send a request to the server, and read the file directly from the local cache; if the cache expires, send it to the server. requests headerwith If-None-Matchand ;If-Modified-Since

  3. After receiving the request, the server will judge whether the requested file has been modified according to the value of and, if it is consistent, it will not be modified, hit the negotiation cache, and return 304; if it is inconsistent, there will be a change, and the new resource file will be returned directly with the new If-None-Match(Etag优先)one and value and return 200;If-Modified-Since(Last-Modified)EtagLast-Modified

Browser caching policy

Generally, browser caching strategies are divided into two types: mandatory caching (Expires, cache-control) and negotiation caching (Last-modified, Etag), and caching strategies are implemented by setting HTTP Header.

Forced caching For
mandatory caching, if the cached data has not expired (that is Catch-Control:max-age, Expiresthe cache time has not expired, or the heuristic cache has not expired), no more requests will be sent to the server, and resources will be read directly from the browser cache.

When the mandatory cache takes effect, the HTTP status code is 200. This method has the fastest page loading speed and the best performance, but during this period, if the server-side resources are modified, the page will not be able to get the latest resources. This kind of situation is what we often encounter in development, such as changing a certain style on the page, refreshing it on the page but it does not take effect, because it is forced caching, so Ctrl + F5 will be fine after a short operation .

Forced caching can be achieved by setting two HTTP Headers: Expires and Cache-Control.

  1. Expires: Response headerThe cache expiration time set in is used to specify the time when the resource expires, which is the specific time point on the server side. When the browser loads the resource again, if it is still within the expiration time, it will hit the mandatory cache.
Expires:Thu, 09 Dec 2021 04:40:13 GMT
  1. Cache-Control: Specify the max-age field, indicating that the cached content will become invalid after a certain period of time.
Cache-Control:max-age=300

It means that if the resource is loaded again within 5 minutes after the request is correctly returned, the strong cache will be hit.

Cache-Control directive effect
public Indicates that the response can be cached by the client and the proxy server
private Indicates that the response can only be cached by the client
max-age=30 Indicates that the cache will expire after 30 seconds, and the server needs to be re-requested
s-maxage=30 Override max-age, the effect is the same, it only takes effect in the proxy server
no-store Indicates that no responses are cached
no-cache Indicates that the resource can be cached, but it will be invalid immediately. Next time, a request will be initiated to verify whether the resource is expired
max-stale=30 Indicates that within 30 seconds, even if the cache expires, the cache will be used
min-fresh=30 Want to get the latest response within 30 seconds
must-revalidate For each request from the client, the proxy server must verify to the server that the cache is not out of date

The difference between the two:

  1. Expires is a product of http 1.0, and Catch-Control is a product of http 1.1;
  2. When both exist at the same time, Catch-Control has higher priority than Expires;
  3. In some environments that do not support HTTP 1.1, Expires will be useful. At this stage, its existence is only a compatible way of writing;
  4. Expires is a specific server time. If the time difference between the client and the server is large, whether the cache hits or not is not expected by the developer. Catch-Control is a time period, which is easier to control;

At this point, there is a problem. If the request that can be cached does not set any cache information, will the browser not cache the data at all? ? Of course not, in this particular case, the browser's default caching strategy – heuristic caching – is triggered.

heuristic caching

If a request that can be cached does not have Expires and Cache-Control set, but the response header has Last-Modified information set, in this case the browser will have a default caching strategy - heuristic caching. How long it takes to reuse depends on the implementation, but the specification recommends about 10% of the time after storage, that is, the cache duration = (Date - Last-Modified) * 0.1.
At present, most browsers have implemented it, but they are slightly different from each other.
Note: The browser's heuristic caching strategy will only be activated if the server does not return an explicit caching strategy.

Heuristic caching can create serious caching problems. Suppose there is a file that does not have a cache time set, and the previous version was updated a month ago. After this release, it may take 3 days for users to see the new content. If the resource is also cached in the CDN, the problem will be even more serious.

Negotiation cache
Negotiation cache, that is, at the first request to the server, the server returns data, and the response header contains Catch-Control:max-ageand Expires, or Catch-Control:no-catch. In subsequent requests, if Catch-Control:max-ageand Expiresexpires, or Catch-Control:no-catch, it will negotiate with the server and compare with the server to determine whether the resource has been modified or updated.

  • If the resources on the server side have not been modified, a 304 status code will be returned, telling the browser that the data in the cache can be used, thus reducing the data transmission pressure on the server.
  • If the data is updated, a 200 status code will be returned, and the server will return the updated resource and cache information together.
  1. Etag and If-None-Match Etag are returned by
    the server in the last time the resource was loaded , and are a unique identifier for the resource. Response headerEtags are regenerated whenever resources change. When the browser sends a request to the server next time, it will put the Etag value returned last time into the Request headerIf-None-Match, and the server will use the Etag value of the source file after receiving the value of If-None-Match For comparison, if they are the same, it means that the resource file has not changed, and the negotiation cache is hit. If inconsistent, the server will directly return the data.
    insert image description here
  2. Last-Modified and If-Modified-Since
    Last-Modified is the last modification time of the resource file, the server will Response headerreturn it in , and the browser will save this value, and put it in the Request headerIf in the next time it sends a request to the server In -Modified-Since, the server will also compare it after receiving it, and if it is the same, it will hit the negotiation cache. If inconsistent, the server will directly return the data.

The difference between the two:

  1. In terms of methods, Etag is a unique identifier for resources, and Last-Modified is the last modification time of the file;
  2. In terms of accuracy, Etag is better than Last-Modified. The time unit of Last-Modified is seconds. If a single file changes multiple times within 1 second, Last-Modified cannot reflect the modification, but Etag will change every time to ensure accuracy. If it is a load-balanced server, the Last-Modified generated by each server may also be inconsistent.
    If the file is regenerated every once in a while, but the content is the same, Last-Modified will return the resource file every time, even if the content is the same, but Etag can judge that the content of the file is the same, and it will return 304 and use the cache.
  3. In terms of performance, Etag is inferior to Last-Modified. After all, Last-Modified only needs to record the time, and Etag requires the server to calculate a hash value through an algorithm.
  4. In terms of priority, server verification gives priority to Etag.

storage location

From the perspective of storage location, browser caches are divided into four types, and each has a priority. When the caches are searched in turn and none of them hits, the network will be requested.

  1. Service Worker: An independent thread running behind the browser, which can generally be used to implement caching. Because Service Worker involves request interception, the HTTPS protocol must be used to ensure security. Service Worker's cache is different from other built-in cache mechanisms in browsers. It allows us to freely control which files to cache, how to match the cache, and how to read the cache, and the cache is persistent.
  2. Memory Cache: The cache in the memory mainly includes the resources that have been captured in the current page, such as styles, scripts, pictures, etc. that have been downloaded on the page. Reading the data in the memory is definitely faster than the disk. Although the memory cache is efficient for reading, the cache duration is very short and will be released as the process is released. Once we close the Tab page, the cache in memory is released.

Memory cache does not care about the value of the returned HTTP cache header Cache-Control when caching resources. At the same time, the matching of resources is not only to match URLs, but also to verify Content-Type, CORS and other characteristics. .

  1. Disk Cache: The cache stored in the hard disk, the reading speed is slower, but everything can be stored in the disk, which is better than Memory Cache in terms of capacity and storage timeliness.

Disk Cache will judge which resources need to be cached according to the fields in the HTTP Herder, which resources can be used directly without request, and which resources have expired and need to be re-requested. And even in the case of cross-site, once the resource with the same address is cached by the hard disk, it will not request data again. Most of the cache comes from Disk Cache.

  1. Push Cache: (push cache) is the content in HTTP/2. It will be used when none of the above three caches are hit. It only exists in the session (Session), and is released once the session ends, and the cache time is also very short, only about 5 minutes in the Chrome browser, and it does not strictly enforce the cache instructions in the HTTP header.

Regarding the memory cache and disk cache,
the resource size will be displayed in the size column when you visit a certain webpage for the first time or turn on the Disable Cache of the browser (under the Network of the browser, the same level as the Preserve log).
insert image description here
When Disable Cache is turned off, visit the webpage again and find that the size column shows (memory cache) or (disk cache)
insert image description here

state type illustrate
200 form memory cache Read the cache directly without accessing the server, and read the cache from memory. The data at this time is cached in the memory. After the kill process, that is, after the browser is closed, the data will not exist. But this method can only cache derived resources
200 form disk cache It does not request network resources (servers), and reads the cache directly from the disk. When the process is killed, the data still exists.
200 resource size Download the latest resources from the server
304 packet size Carry If-None-Match(Etag优先)and If-Modified-Since(Last-Modified)request the server, the server finds that the resource has not been updated after comparison, returns 304, and then reads the data from the cache

Control of browser cache by user behavior

  1. Address bar access and link jumping are normal user behaviors, which will trigger the browser caching mechanism;
  2. F5 to refresh, the browser will set it in the request header Cache-control: max-age=0, skip the mandatory cache judgment, and negotiate the cache judgment;
  3. Ctrl+F5 to refresh, the browser will set it in the request header Cache-control: no-cache, skip the strong cache and negotiation cache, and directly pull resources from the server.

Three-level cache principle (access cache priority)

  1. First look in the memory, if there is, load it directly.
  2. If it does not exist in the memory, it will be searched in the hard disk, and if there is, it will be loaded directly.
  3. If there is no hard disk, then make a network request.
  4. The requested resources are cached to the hard disk and memory.

vue-cli caching strategy

Since the packaged js, css and pictures generally have a hash value in their names, if the hash in the name changes, new files will naturally be pulled, so we can set such files as mandatory caching, as long as the file name remains unchanged, Just keep caching.

However, html files cannot be set as mandatory caching. If mandatory caching is set for html, it will not be updated within the validity period of the cache. If the html is not updated, the referenced js, css and other names will not be updated, then the entire service will not be updated, and the user can only clear the cache. So for html files, we can set the negotiation cache or not use the cache directly. Because the html file itself is relatively small, it can be set not to cache, and the nginx configuration is as follows.

server {
    
    
	listen 80;
	server_name yourdomain.com;
	location / {
    
    
    	try_files $uri $uri/ /index.html;
    	root /yourdir/;
    	index index.html index.htm;
 
     	if ($request_filename ~* .*\.(js|css|woff|png|jpg|jpeg)$)
     	{
    
    
          	expires    100d;  //js、css、图片缓存100天
          	#add_header Cache-Control "max-age = 8640000"; //或者设置max-age
     	}
 
     	if ($request_filename ~* .*\.(?:htm|html)$)
     	{
    
    
         	add_header Cache-Control "no-store";  //html不缓存 或 add_header Cache-Control "private, no-store, no-cache, must-revalidate, proxy-revalidate"
     	}
  	}
}

Reference:
Browser caching mechanism
HTTP caching
meta tag
Clear-Site-Data – Clear browser data related to the currently requested website

Guess you like

Origin blog.csdn.net/weixin_39964419/article/details/126851863