C++学习系列一：Compiler||Structure||Variables||Types||Constants

Compilers

Computers understand only one language and that language consists of sets of instructions made of ones and zeros.

This computer language is appropritely called machine language.

A single instruction to a computer could look like this:

A particular computer’s machine language program that allows a user to input two numbers, adds the two numbers together, and displays the total could include these machine code instructions:

As you can imagine, programming a computer directly in machine language using only ones and zeros is very tedious and error prone. To make programming easier, high level languages have been developed. High level programs also make it easier for programmers to inspect and understand each each other’s programs easier.

This is a portion of code written in C++ that accomplishes the exact same purpose:
```
int a, b, sum;

cin >> a;
cin >> b;

sum = a + b;
cout << sum << endl;
```
Because a computer can only understand machine language and humans wish to write in high level languages high level languages have to be re-witten (translated) into machine language at some point. This is done by special programs called compilers, interpreters, or assemblers that are built into the various programming applications.

C++ is designed to be a compiled language, meaning that is is generally translated into machine language that can be understood directly by the system, making the generated program highly efficient. For that , set of tools are needed, know as the development toolchain, whose core are a compiler and its linker.
Console programs

Console programs are programs that use text to communicate with the user and the environment, such as printing text to the screen or reading input from a keyboard.

An IDE(Integrated Development Environment) generally integrates several development tools, including a text editor and tools to compile programs directly from it.
Structure of a program

The best way to learn a programming language is by writing programs.

A C++ statement is an expression that can actually produce some effect.
```
std::cout << "Hello World!";
```
std:cout identifies the standard character output device(usually, this is the computer screen)

<< the insertion operator indicates that what follows is inserted into std::cout.

"Hello World!" is the content inserted into the standard output.

; ends with a semicolon, marks the end of the statement, all C++ statements must end with a semicolon character.

One of the most common syntax errors in C++ is forgetting to end a statement with a semicolon.

In C++, the separation between statements is specified with an ending semicolon (;), with the separation into different lines not mattering at all for this purpose.

The program has been structured in different lines and properly indented, in order to make it easier to understand for the humans reading it. But C++ does not have strict rules on indentation or on how to split instructions in different lines.
```
// line comment
/* block comment*/
```
namespace

cout may used to instead of std::cout in some C++ code. Both name the same object : the first one uses its unqualifies name (cout), while the second qualifies it directly within the namespace std (as std::cout)

cout is part of the standard library, and all the elements in the standard C++ library are declared within what is called a namespace : the namespace std.

In order to refer to the elements in the std namespace a program shall either qualify each and every use of elements of the library (as we have done by prefixing cout with std::), or introduce visibility of its components. The most typical way to introduce visibility of these components is by means of using declarations:

using namesapace std;

The above declaration allows all elements in the std namespace to be accessed in an unqualified manner (without the std:: prefix).

To make unqualifies uses of cout as :
```
#include <iostream>
using namespace std;

int main()
{
    cout << "Hello World!";
}
```
Both ways of accessing the elements of the std namespace (explicit qualification and using declarations) are valid in C++ and produce the exact same behavior.

For simplicity, and to improve readability, the latter approach with using declarations is more often used, although that explicit qualification is the only way to guarantee that name collisions never happen.
Variables and types
Identifiers

A valid identifier is a sequence of one or more letters, digits, or underscore characters(_). and is always begin with a letter or _.

Spaces, punctuation marks, and symbols cannot be part of an identifier.

C++ uses a number of keywords to identify operations and data descriptions; therefore, identifiers created by a programmer cannot match these keywords. The standard reserved keywords that cannot be used for programmer created identifiers are :

alignas, alignof, and, and_eq, asm, auto, bitand, bitor, bool, break, case, catch, char, char16_t, char32_t, class, compl, const, constexpr, const_cast, continue, decltype, default, delete, do, double, dynamic_cast, else, enum, explicit, export, extern, false, float, for, friend, goto, if ,inline, int, long, mutable, namespace, new, noexcept, not, not_eq, nullptr, operator, or, or_eq, private, protected, public, register, reinterpret_cast, return, short, signed, sizeof, static, static_assert, static_cost, struct, switch, template, this, thread_local, throw, true, try, typedef, typeid, typename, union, unsigned, using, virtual, void, volatile, wchar_t, while, xor, xor_eq

Specific compiler may also have additional specific reserved keywords.

The C++ language is a “case sensitive” language. That means that an identifier written in capital letters is not equivalent to another one with the same name but written in samll letters.
Fundamental data types

The values of variables are stored somewhere in an unspecified location in the computer memory as zeros and ones. Our program does not need to know the exact location where a variable is stored; it can simply refer to it by its name. What the program needs to be aware of is the kind of data stored in the variable.

It’s not the same to store a simple integer as it is to store a letter or a large floating-point number; even though they are all represented using zeros and ones, they are not interpreted in the same way, and in many cases, they don’t occupy the same amount of memory.

Fundamental data types are basic types implemented directly by the language that represent the basic storage units supported natively by most systems. They can mainly be classified into :
- Character types : They can represent a single character, such as ‘A’ or ‘$’. The most basic type is char, which is a one-byte character.
- Numerical integer types : They can store a whole number value, such as 7 or 1024.
- Floating-point types : They can represent real values, such as 3.14 or 0.01, with different levels of precision, depending on which of the three floating-point types is used.
- Boolean type : The boolean type, known in C++ as bool, can only represent one of two states : true or false.
Here is the complete list of fundamental types in C++ :

In the panel above that other than char (which has a size of exactly one byte), none of the fundamental types has a standard size specified (but a minimum size, at most).

A 16-bit unsigned interger would be able to represent 65536 distinct values in the range 0 to 65535, while its signed counterpart would be able to represent between -32768 to 32767.

The types described above (characters, integers, floating-point, boolean) are collectively known as arithmetic types. But two additional fundamental types exist : void, which identifies the lack of type; and the type nullptr, which is a special type of pointer.

C++ supports a wide variety of types based on the fundamental types discussed above; these other types are known as compound data types, and are one of the main strengths of the C++ language.
Declaration of Variables

C++ is a strongly-typed language, and requires every variable to be declared with its type before its first use.

This informs the compiler the size to reserve in memory for the variable and how to interpret its value. The syntax to declare a new variable in C++ is straightforward : we simply write the type followed by the variable name( its identifier), for examples:
```
int a;
float mynumber;
int a, b, c; // separating their identifiers with commas
```
Initialization of variable

The variable could get a specific value from the moment it is declared, what is called the initialization of the variable. There are there ways to initialize variables:
1. c-like initialization (inherited from the C language)
  
  type identifier = initial_value;
2. constructor initialization (introduced by the C++ language)
  
  type identifier (initial_value);
3. uniform initialization
  
  type identifier {initial_value};
All three ways of initializing variables are valid and equivalent in C++.
Type deduction : auto and decltype

When a new variable is initialized, the compiler can figure out what the type of the variable is automatically by the initializer. For this, it suffices to use auto as the type specifier for the variable:
```
int foo = 0;
auto bar = foo; // the same as : int bar = foo
```
Variables that are not initialized can also make use of type deduction with the decltype specifier:
```
int foo = 0;
decltype(foo) bar; // the same as : int bar;
```
auto and decltype are powerful features recently added to the language.
Introduction to strings

One of the major strengths of the C++ language is its rich set of compound types, of which the fundamental types are mere building blocks.

An example of compound type is the string class. Variables of this type are able to store sequences of characters, such as words or sentences.

A first difference with fundamental data types is that in order to declare and use objects (variables) of this type, the program needs to include the header where the type is defined within the standard library (header ) :
```
#include <iostream>
#include <string>
using namespace std;

int main()
{
    string mystring;
    mystring = "This is the initial string content";
    cout << mystring << endl;
    return 0;
}
```
Inserting the endl manipulator ends the line (printing a newline character and flushing the stream)

Constants

Constants are expressions with a fixed value.

Literals

Literals are the most obvious kind of constants. They are used to express particular values within the source code of a program.

a = 5;

The 5 in this piece of code was a literal constant.

Literal constants can be classified into :

integer

decimal base; octal(preceded with a 0); hexadecimal(preceded by the characters 0x)

These literal constants have a type, just like variables. Certain suffixes may be appended to an integer literal:

Suffix	Type modifier
u or U	unsigned
l or L	long
ll or LL	long long

Unsigned may be combined with any of the other two in any order to form unsigned long or unsigned long long.

75 // int
75u // unsigned int 
75l // long
75ul // unsigned long
75lu // unsigned long

floating-point

Suffix	Type
f or F	float
l or L	long double

characters, strings

'z' // single-character literals using single quotes ''
"Hello" // a string (which generally consists of more than one character) using double quotes ""

Both single-character and string literals require quote marks surrounding them to distinguish them from possible variable identifiers or reserved keywords.

Single character escape codes:

Escape code	Description
\n	newline
\r	carriage return 回车符
\t	tab
\v	vertical tab 垂直制表符
\b	backspace
\f	form feed (page feed) 换页
\a	alert (beep)
`\'`	single quote(’)
`\"`	double quote(")
?	question mark (?)
`\\`	`backslash(\)` 反斜杠

Internally, computers represent characters as numberical codes: most typically, they use one extension of the ASCII character encoding system.

"this forms" "a single"   "string"
"of characters"
// equal to 
"this forms a single string of characters"
// Spaces within the quotes are part of the literal, while whose outside them are not.
x = "string expressed in \
two lines"
// equal to 
x = "string expressed in two lines"

In C++ , a backslash (\) at the end of line is considered a line-continuation character that merges both that line and the next into a single line.

All the character literals and string literals described above are made of characters of type char.

A different character type can be specified by using one of the following prefixes:

Prefix	Character Type
u	char16_t
U	char32_t
L	wchar_t

For string literals, apart from the above u, U, L, two additional prefixes exist:

Prefix	Description
u8	The string literal is encoded in the executable using UTF-8
R	The string literal is a raw string. Could be combined with any other prefixes

Boolean, pointers, and user-defined literals

Three keyword literals exist in C++: true, false, nullptr:
- true and false are the two possible values for variables of type bool;
- nullptr is the null pointer value;

Typed constant expressions

Sometimes, it is just convenient to give a name to a constant value:
```
const double pi = 3.14;
const char tab = '\t';
```
Preprocessor definitions

Another mechanism to name constant values is the use of preprocessor definitions. They have the following form:

# define identifier replacement

After this directive, any occurrence of identifier in the code is interpreted as replacement, where replacement is any sequence of characters.

This replacement is performed by the preprocessor, and happens before the program is compiled, thus causing a sort of blind replacement: the validity of the types or syntax involved is not checked in any way.

The #define lines are preprocessor directives, and as such are single-line instructions that-unlike C++ statements- do not require semicolons(;) at the end; the directive extends automatically until the end of the line.