Basic IO under Linux (required for beginners)

content

1. Simple use and operation of library functions related to C language file operation

 二.stdout&&stderr&&stdin

3. System file IO

4. File descriptor fd

5. Allocation rules for file descriptors

Five. The principle of redirection


1. Simple use and operation of library functions related to C language file operation

In the c language, the use of the file operation functions related to the c language has been ended. For details, please refer to the blogger's article:

Detailed explanation of file operations Here we simply use a few file-related library functions fopen. Please see a piece of code below:

 1 #include<stdio.h>
  2 int main()
  3 {
  4    FILE *fp=fopen("log.txt","w");//以写的方式打开这个文件,如果文件不存在会自动创建一个
  5    if(fp==NULL)
  6    {
  7      perror("fopen");
  8    }
  9     
 10    int cnt=5;
 11    const char *str="hello Linux\n";
 12    while(cnt--)
 13    {
 14      fputs(str,fp);//往刚打开的文件中写入数据
 15    }
 16    
 17    fclose(fp);//关闭文件
 18                                                                                                                                                 
 19  
 20 }

Here we open a file by writing. When we are learning C language, if fopen opens the file by writing, if the file exists, the content of the file will be emptied. If it does not exist, it will be created in the current path. file, what is the current path?

Now let's run this program.

 We found that a log.txt file was indeed created under the path /home/ksy/BK/, so does it mean that the current path is the path where the executable program is located? Don't worry, let's use a library function for reading files in C language:

 1 #include<stdio.h>
  2 int main()
  3 {
  4    FILE *fp=fopen("log.txt","r");//以写的方式打开这个文件,如果文件不存在会自动创建一个
  5    if(fp==NULL)
  6    {
  7      perror("fopen");
  8    }
  9    //由于上次已经往文件写了,所以我们直接可以读了 
 10    char buffer[128];
 11     while(fgets(buffer,sizeof(buffer),fp))
 12     {
 13       printf("%s",buffer);                                                                                                                      
 14     }
 15    return 0;
 16 
 17 }

Run this program:

 We found that the reading was successful. Now let's discuss whether the so-called current path is the path where the executable program is located? For the convenience of testing, we will delete log.txt under /home/ksy/BK/: use this code

 1 #include<stdio.h>  
  2 int main()  
  3 {  
  4    FILE *fp=fopen("log.txt","w");//以写的方式打开这个文件,如果文件不存在会自动创建一个  
  5    if(fp==NULL)  
  6    {  
  7      perror("fopen");  
  8    }  
  9    const char *str="这是一次测试\n";  
 10    int cnt=5;  
 11    while(cnt--)  
 12    {  
 13      fputs(str,fp);                                                                                                                             
 14    }  
 15    return 0;  
 16   
 17 }  
~                    

We generate the executable in /home/ksy/BK/, and we run this executable in the home directory:

 We were surprised to find that a log.txt was generated under the path /home/ksy, which fully shows that the current path is not the path where the executable program is located. But when the executable program becomes a process, we are in that directory, and it is created in that directory.

Let's demonstrate the use of two library functions in C language a little:

 Here is a little explanation of the use of these two functions:

fwrite: the first parameter is the content you want to write, the second parameter is how many bytes to write at a time, the third parameter is the maximum number of times to write, the fourth parameter is the write to the stream, and the return The value refers to the number of actual writes. Let's demonstrate with a simple piece of code:

    1 #include<stdio.h>
    2 #include<string.h>
    3 int main()
    4 {
    5    FILE *fp=fopen("log.txt","w");//以写的方式打开这个文件,如果文件不存在会自动创建一个
E>  6    const char str="hello Linux\n";
    7     int cnt=5;
    8     while(cnt--)
    9     {
   10       fwrite(str,strlen(str),1,fp);
   11     }
   12     return 0;                                                                                                                                 
   13 }

Let's run this program:

 Let's explain fread:

The first parameter of fread is to put the read content here, the second parameter refers to how many bytes to read, the third parameter refers to the maximum number of times to read, and the fourth parameter refers to where from read, returns the number of times the value code was actually read.

  1 #include<stdio.h>
  2 #include<string.h>
  3 int main()
  4 {
  5    FILE *fp=fopen("log.txt","r");//以写的方式打开这个文件,如果文件不存在会自动创建一个
  6    char buffer[128];
  7    while(fread(buffer,13,1,fp))
  8    {
  9      printf("%s",buffer);
 10    }
 11    return 0;                                                                                                                                    
 12 }
~      

operation result:

 二.stdout&&stderr&&stdin

We often hear that everything under Linux is a file, that is to say, anything under Linux can be regarded as a file, so of course the keyboard and monitor can also be regarded as a file. We can see the data on the display because we write data to the "display file", and the computer can get the corresponding characters when we type on the keyboard because the computer reads the data from the "keyboard file".

When a c language program is running (note that the file must be opened when the program is running), three streams are opened by default, namely stdout (standard output stream), stdin (standard input stream) and stderr (standard error stream) . The corresponding devices are: monitor, keyboard, monitor. Let's check it out through the man manual:

 Through the man manual, we can find that they are all FILE* types, which are file pointers. When our c program runs, the system will open these three input and output streams. After opening, we can use scanf and printf to perform related operations on the keyboard and display. That is to say, stdin, stdout and stderr are the same concept as the file pointer obtained when we open a file. Imagine that when we use the fputs function, we set the second parameter to stdout. At this time, the fputs function will not What about displaying the data on the monitor in between? Let's verify it with a piece of code:

1 #include<stdio.h>
  2 #include<string.h>
  3 int main()
  4 {
  5   const char*str="hello ksy\n";                                                                                                                 
  6   fputs(str,stdout);
  7    return 0;
  8 }
~
~

Let's run this program:

 We found that the string was successfully printed to the display. Of course, not only C language has standard input stream, standard output stream, standard error stream, and there are also cin, cout, cerr in C++. Other languages ​​also have similar concepts.

3. System file IO

The bottom layer of the operating system actually provides us with a file IO system call interface. Some write, read, close and seek have a set of system call interfaces. Different languages ​​will be aligned and encapsulated into a set of libraries for operating files in the corresponding language. Functions do not need to know the underlying calling relationship, reducing the learning cost of users.

System call interface introduction: open, write, read and close:

1.open

Action: open a file

The function prototype is as follows:

int open(const char*pathname,int flags);
int open(const char*pathname,int flags,mode_t mode);

The first parameter of open: pathname

The first parameter of open means to open or create the target file. The things to note here are:

1. If it is given in the form of a path, when a file needs to be created, it will be created under the path you provide.

2. If only the file name is given, it will be created under the current path (above the current path and mentioned its meaning).

The second parameter of open: flags

The second parameter of open indicates how the file is opened. Common options are as follows:

 When opening a file, we can use multiple options separated by |. For example: we want to open a file for write only and create O_WRONLY|O_CREAT if the file does not exist. So what exactly are flags? In fact, it is an integer. An integer has 32 bits. Each bit is used as an option. In the corresponding function, check whether that bit is 1 to determine whether we have passed in this option. Then also It means that O_WRONLY corresponds to an integer in which only one of the 32 bits is 1. Is that true? Let's use vim to open the files in the /usr/include/asm-generic/fcntl.h directory and take a look:

 We found that these macro-defined options have in common that there is one and only one bit in their binary sequence is 1 (O_RDONLY) The binary sequence of the option is all 0, indicating that the O_RDONLY option is the default option). In the open function, use a specific number to judge and then write only the specific function.

The third parameter of open:

The third parameter is to set the permission to create the file. In linux, the file has permission. When opening a file in write-only mode, if the file does not exist, it needs to be created, but we need to set the file permissions when creating it. For the permissions, please refer to the blogger's related articles. (Note that when the file is not created, the third parameter can be left blank)

The return value of open means that the file descriptor that we open the file fails to open and returns -1.   Below we demonstrate with a piece of code:

  1 #include<stdio.h>
  2 #include<string.h>
  3 #include<unistd.h>
  4 #include<sys/types.h>
  5 #include<sys/stat.h>
  6 #include<fcntl.h>
  7 int main()
  8 {
  9   int fd1=open("./log1.txt",O_WRONLY|O_CREAT,0644);
 10   int fd2=open("./log2.txt",O_WRONLY|O_CREAT,0644);
 11   int fd3=open("./log3.txt",O_WRONLY|O_CREAT,0644);
 12   int fd4=open("./log4.txt",O_WRONLY|O_CREAT,0644);
 13   int fd5=open("./log5.txt",O_WRONLY|O_CREAT,0644);
 14   int fd6=open("./log6.txt",O_WRONLY|O_CREAT,0644);
 15   printf("%d\n",fd1);
 16   printf("%d\n",fd2);
 17   printf("%d\n",fd3);
 18   printf("%d\n",fd4);
 19   printf("%d\n",fd5);
 20   printf("%d\n",fd6);
 21   close(fd1);                                                                                                                                   
 22   close(fd2);
 23   close(fd3);
 24   close(fd4);
 25   close(fd5);
 26   close(fd6);
 27    return 0;
 28 }

We run this program:

 We found that file descriptors start at 3 and are continuously incremented. If we open a file that does not exist and does not create it as read-only then it will fail to open and return -1.

The so-called file descriptor is essentially a subscript of an array of pointers. Each subscript in the array points to a structure that stores open file information, so we can find the corresponding open file through fd (file descriptor). information. In Linux, three files are opened by default, standard input (0) standard output (1) standard error (2). This is why we open a file and why file descriptors start at 3.

 2.close

Use close to close a file in the system. Corresponding function prototype

int close(int fd);

To close the file, you only need to pass in the corresponding file descriptor. If the file is closed successfully, it will return 0 and return -1 if it fails.

3.write

The write function is used in the system interface to write relevant information to the file. The function prototype of the write function is as follows:

 The first parameter: the file descriptor of the corresponding file. The second parameter: what you want to write. The third parameter: how many bytes you want to write. Return value: The number of bytes actually written.

  1 #include<stdio.h>
  2 #include<string.h>
  3 #include<unistd.h>
  4 #include<sys/types.h>
  5 #include<sys/stat.h>
  6 #include<fcntl.h>
  7 int main()
  8 {
  9   int fd=open("./log.txt",O_WRONLY|O_CREAT,0644);
 10   const char*str="hello word\n";
 11    int cnt=5;
 12    while(cnt--)
 13    {
 14      write(fd,str,strlen(str));
 15    }
 16     close(fd);
 17                                                                                                                                                 
 18    return 0;
 19 }
~

We run this program:

 4.read

The read function is used in the system interface to read information from a file. The function prototype of the read function is as follows:

ssize_t read(int fd, void *buf, size_t count);

The first parameter is the file descriptor corresponding to the file, the second parameter is to put the read content here, the third parameter reads a few bytes, and the return value is the actual number of bytes read. -1.

1 #include<stdio.h>
  2 #include<string.h>
  3 #include<unistd.h>
  4 #include<sys/types.h>
  5 #include<sys/stat.h>
  6 #include<fcntl.h>
  7 int main()
  8 {
  9      int fd=open("./log.txt",O_RDONLY);
 10       char ch;
 11       while(1)
 12       {
 13         ssize_t ret=read(fd,&ch,1);
 14         if(ret<=0)                                                                                                                              
 15         {
 16           break;
 17         }
 18         else
 19         {
 20           write(1,&ch,1);
 21         }
 22       }
 23 
 24 
 25     close(fd);
 26 
 27    return 0;
 28 }
~

4. File descriptor fd

Files are opened by processes, and a process can open multiple files. There are also a large number of processes in the system, which means that there may be a large number of processes in the system at any time. When we open a file, we need to load the relevant attributes of the file into the memory. The operating system is the software that does management work. So how does the OS system need to manage these data? First describe the organization. The operating system will create a structure struct_file for each open file. And organize it in a double-linked list. The management of open files of OS has also become operations such as adding, deleting, checking, and modifying linked lists.

So how does the process know that those files are opened by me? In order to distinguish which process the file is opened by, it is also necessary to establish the corresponding relationship between the process and the file. When we are learning the process, when our program runs, it will load the corresponding code and data into memory, and create related data structures (task_struct, mm_struct, page table) for it. And the mapping relationship between virtual addresses and physical addresses is established through the page table.

 In fact, task_struct has a pointer to a structure. This structure is called files_struct. There is an array fd_array in the structure, and the subscript of this array is what we call fd. When the process opens the log.txt file, we need to first The file is loaded from the disk into the memory to form the corresponding struct file, the struct file is connected to the file double-linked list, and the first address of the structure is filled in the position of the subscript 3 in the fd_array array, so that in the fd_array array The pointer with subscript 3 points to the struct file, and finally returns the file descriptor of the file to the calling process.

 So we only need to use the file descriptor to get the relevant information of the open file and perform a series of operations on it. We previously verified that file descriptors start at 3 by default, which means that 0, 1, and 2 are opened by default. 0 represents the standard input stream, and the corresponding hardware device is the keyboard; 1 represents the standard output stream, and the corresponding hardware device is the display; 2 represents the standard error stream, and the corresponding hardware device is the display. When a process is created, the OS will form its own struct file according to the keyboard, display, and display, link the three struct files to the double-linked list of files, and fill in the addresses of the three struct files into the fd_array array respectively. The subscripts are 0, 1, and 2, so the standard input stream, standard output stream and standard error stream are opened by default.

5. Allocation rules for file descriptors

We opened 6 files in a row before, and we found that the file descriptors start at 3 and have consecutive addresses. Does that really start at 3 all the time? Let's look at a piece of code:

  1 #include<stdio.h>  
  2 #include<string.h>                                                                                                                              
  3 #include<unistd.h>
  4 #include<sys/types.h>
  5 #include<sys/stat.h>
  6 #include<fcntl.h>
  7 int main()
  8 {              
  9         close(0);
 10      int fd1=open("./log1.txt",O_WRONLY|O_CREAT,0644);
 11      int fd2=open("./log2.txt",O_WRONLY|O_CREAT,0644);
 12      int fd3=open("./log3.txt",O_WRONLY|O_CREAT,0644);
 13      int fd4=open("./log4.txt",O_WRONLY|O_CREAT,0644);
 14      printf("%d\n",fd1);
 15      printf("%d\n",fd2);
 16      printf("%d\n",fd3);
 17      printf("%d\n",fd4);
 18      close(fd1);
 19      close(fd2);
 20      close(fd3);
 21      close(fd4);
 22 
 23    return 0;
 24 }

Below we run the program:

 We found out how fd starts from 0, and then starts from 3 again. Now that we are closing 2, let's see what the result will be.

  1 #include<stdio.h>
  2 #include<string.h>
  3 #include<unistd.h>
  4 #include<sys/types.h>
  5 #include<sys/stat.h>
  6 #include<fcntl.h>
  7 int main()
  8 {              
  9         close(0);
 10         close(2);                                                                                                                               
 11      int fd1=open("./log1.txt",O_WRONLY|O_CREAT,0644);
 12      int fd2=open("./log2.txt",O_WRONLY|O_CREAT,0644);
 13      int fd3=open("./log3.txt",O_WRONLY|O_CREAT,0644);
 14      int fd4=open("./log4.txt",O_WRONLY|O_CREAT,0644);
 15      printf("%d\n",fd1);
 16      printf("%d\n",fd2);
 17      printf("%d\n",fd3);
 18      printf("%d\n",fd4);
 19      close(fd1);
 20      close(fd2);
 21      close(fd3);
 22      close(fd4);
 23    
 24    return 0;
 25 }
~           
~           

operation result:

 We found that 0 and 2 were also used. Now we understand that the allocation rules for file descriptors start from the smallest unused subscript

Five. The principle of redirection

With the above foundation, we can deeply study the redirection we have learned before. Understand what its principle is. First, let's look at the input retargeting term.

1. Enter the reset item.

The output redirection we learned earlier is to redirect the data we should output on the display to another file. So what's his rationale?

For example: If we want the data that should be output to the "display file" to be output to the log.txt file, then we can close the file with file descriptor 1 before opening the log.txt file, that is, the "display file" "Close, so that when we open the log.txt file later, the file descriptor assigned is 1.

  1 #include<stdio.h>  
  2 #include<string.h>  
  3 #include<unistd.h>  
  4 #include<sys/types.h>  
  5 #include<sys/stat.h>  
  6 #include<fcntl.h>  
  7 int main()  
  8 {                
  9        close(1);  
 10      int fd=open("./log1.txt",O_WRONLY|O_CREAT,0644);  
 11      printf("hello ksy\n");
 12      printf("hello ksy\n");
 13      printf("hello ksy\n");
 14      printf("hello ksy\n");
 15      close(fd);
 16 
 17    return 0;
 18 }                                                                                                                                               
~          

operation result:

 We found that the data was indeed printed to log.txt. Explain it here:

1. printf prints data to stout by default, and stdout is also a FILE* pointer, which points to a result body FILE, which encapsulates an integer, which is a file descriptor, and stdout points to the FILE structure. The stored file descriptor is 1, so printf outputs data to the file whose file descriptor is 1.

2. The output data in the c language is not immediately written into the operating system, but is temporarily stored in the c language buffer, and is flushed to the buffer when conditions come.

2. Append redirection

The difference between append redirection and output redirection is that append redirection does not overwrite data.

 Let's take a look at the principle. In fact, there is only one more O_APPEND option than output redirection.

  1 #include<stdio.h>
  2 #include<string.h>
  3 #include<unistd.h>
  4 #include<sys/types.h>
  5 #include<sys/stat.h>
  6 #include<fcntl.h>
  7 int main()
  8 {
  9        close(1);
 10      int fd=open("./log1.txt",O_WRONLY|O_CREAT|O_APPEND,0644);//追加重定向和输出重定向的区别就只是多了一个O_APPEND选项                          
 11      printf("hello ksy\n");
 12      printf("hello ksy\n");
 13      printf("hello ksy\n");
 14      printf("hello ksy\n");
 15      close(fd);
 16 
 17    return 0;
 18 }

 3. Input redirection

Input redirection means that we should be reading data from a keyboard, but now it is redirected to read data from another file.

 For example, our scanf function reads data from standard input, and now we let it read data from log1.txt, we close(0) before scanf reads data. This way the keyboard file is closed, like log1. The file descriptor of txt is 0.

  1 #include<stdio.h>
  2 #include<string.h>
  3 #include<unistd.h>
  4 #include<sys/types.h>
  5 #include<sys/stat.h>
  6 #include<fcntl.h>
  7 int main()
  8 {
  9        close(0);
 10      int fd=open("./log1.txt",O_RDONLY);//追加重定向和输出重定向的区别就只是多了一个O_APPEND选项:
 11     char buffer[128];
 12      while(~scanf("%s",buffer))
 13      {
 14        printf("%s\n",buffer);                                                                                                                   
 15      }
 16      close(fd);
 17    return 0;
 18 }
~

operation result:

The principle is similar to stdout, so I won't talk about it here 

 Consider a question: the standard output stream and the standard error stream correspond to the display, what is the difference between them?

Let's verify it with a piece of code:

  1 #include<stdio.h>
  2 #include<string.h>
  3 #include<unistd.h>
  4 #include<sys/types.h>
  5 #include<sys/stat.h>
  6 #include<fcntl.h>
  7 int main()
  8 {
  9    fprintf(stdout,"hello stdout");
 10    fprintf(stderr,"hello stderr");
 11                                                                                                                                                 
 12    return 0;
 13 }

 We found that only the relocations printed to stdout went to log1.txt. In fact, when we use redirection, the standard output stream with file descriptor 1 is redirected, and the standard error stream with file descriptor 2 is not redirected. This is the difference between the two

system call dup2

We found that we can only close the corresponding output redirection and output redirection of the corresponding file descriptor practice through close, so can we not close it? To complete the redirection we just need to copy the elements in the fd_array array. For example, if we copy the contents of fd_array[3] to fd_array[1], because stdout in C language is to output data to the file whose file descriptor is 1, then we redirect the output to the file log. txt. In linux, we are provided with this system call:

 Function: dup2 will copy the contents of fd_array[oldfd] to fd_array[newfd]. Function return value: 0 is returned if the call is successful, and -1 is returned if it fails.

When using it, you need to pay attention:

  1. If oldfd is not a valid file descriptor, the dup2 call fails, and the file with file descriptor newfd is not closed at this time.
  2. If oldfd is a valid file descriptor, but newfd and oldfd have the same value, dup2 does nothing and returns newfd.

Let's demonstrate the previous output redirection through dup2:

  1 #include<stdio.h>
  2 #include<sys/types.h>
  3 #include<sys/stat.h>
  4 #include<unistd.h>
  5 #include<fcntl.h>                                                                                                                               
  6 int main()
  7 {
  8   int fd=open("./log.txt",O_WRONLY|O_CREAT,0644);
  9    dup2(fd,1);
 10  printf("hello world\n");
 11  printf("hello world\n");
 12 
 13 }

operation result:

FILE in c language

Because the library function is the encapsulation of the system call interface, the access to the file is essentially accessed through the file descriptor fd, so the FILE structure in the C library must encapsulate the file descriptor fd. We can use vim to open the usr/include/stdio.h file to view FILE

 The full content is as follows:


struct _IO_FILE {
 int _flags; /* High-order word is _IO_MAGIC; rest is flags. */
#define _IO_file_flags _flags
 //缓冲区相关
 /* The following pointers correspond to the C++ streambuf protocol. */
 /* Note: Tk uses the _IO_read_ptr and _IO_read_end fields directly. */
 char* _IO_read_ptr; /* Current read pointer */
 char* _IO_read_end; /* End of get area. */
 char* _IO_read_base; /* Start of putback+get area. */
 char* _IO_write_base; /* Start of put area. */

 char* _IO_write_ptr; /* Current put pointer. */
 char* _IO_write_end; /* End of put area. */
 char* _IO_buf_base; /* Start of reserve area. */
 char* _IO_buf_end; /* End of reserve area. */
 /* The following fields are used to support backing up and undo. */
 char *_IO_save_base; /* Pointer to start of non-current get area. */
 char *_IO_backup_base; /* Pointer to first valid character of backup area */
 char *_IO_save_end; /* Pointer to end of non-current get area. */
 struct _IO_marker *_markers;
 struct _IO_FILE *_chain;
 int _fileno; //封装的文件描述符
#if 0
 int _blksize;
#else
 int _flags2;
#endif
 _IO_off_t _old_offset; /* This used to be _offset but it's too small. */
#define __HAVE_COLUMN /* temporary */
 /* 1+column number of pbase(); 0 is unknown. */
 unsigned short _cur_column;
 signed char _vtable_offset;
 char _shortbuf[1];
 /* char* _save_gptr; char* _save_egptr; */
 _IO_lock_t *_lock;
#ifdef _IO_USE_OLD_IO_FILE
};

From the source code of FILE, we found that the FILE structure encapsulates fd, which is _fileno inside. It's not hard to see that we also see buffers inside. The buffer here refers to the buffer in the c language. The buffer refresh strategies are as follows:

1. No buffering: no buffering

2. Line buffering: When /n is encountered, the corresponding data printed to the display is refreshed using this strategy

3. Full buffering: The buffer is refreshed when the buffer is full, or the buffer is refreshed when the process exits. This strategy is used for files.

So we need to understand that redirection will change the flushing strategy of the buffer. For example, output redirection, the original strategy for outputting to the display is line buffering, and now the strategy for outputting it to a file is full buffering: let's take a look at an example:

  1 #include<stdio.h>
    2 #include<sys/types.h>
    3 #include<sys/stat.h>
    4 #include<unistd.h>
    5 #include<fcntl.h>
    6 #include<string.h>
    7 int main()
    8 {
    9   close(1);
   10   int fd=open("./log.txt",O_WRONLY|O_CREAT,0644);
   11   if(fd<0)
   12   {
   13     perror("open");
   14     return -2;
   15   }
   16   const char*str1="hello write\n";
   17   const char*str2="hello printf\n";
   18   const char*str3="hello fwrite\n";
   19   write(fd,str1,strlen(str1));
W> 20   printf(str2);
   21   fwrite(str3,strlen(str3),1,stdout);                                                                                                         
   22   fork();
   23   fflush(stdout);//刷新
   24   close(fd);
   25   return 0;
   26 }
  ~

 We found out why only the system call fwrite prints only once, while both printf and fwrite print twice? what is this? . This is because for the system call write, it is directly written to the OS, while printf and fwrite will be written to the buffer provided by the C language, and will not be flushed to the OS immediately. After fork(), the child process When copy-on-write is performed with the parent process, the data in the buffer of the parent process will be copied by the child process, and the parent and child processes will refresh their own. So we found that the library function prints twice.

Guess you like

Origin blog.csdn.net/qq_56999918/article/details/124221285