Pandas can read tabular data from the Internet through read_html()
In HTML, the <table> tag can define a table:
The <thead> tag is used to define the header
<tbody> is used to define the theme of the table
<tr> in <tbody> defines a row
<th> in <tbody> defines a cell
Parameter settings in read_html():
1. IO settings:
io can be the path or URL of the local HTML document
2、match
The match value is a regular expression, only the form of the string matching the regular expression will be returned, otherwise an error will be reported
3、flavor
When specifying the parser of the web page source code for flavor, the default is generally lxml
4、header
header is used to specify one or several rows in the table as the column labels of the table. By default, the None parameter can be a single integer or a list of integers
5、index_col
header is used to specify a row in the table as the row label of the table, the default is None
6、encoding
encoding is used to specify table data and decoding methods, but generally does not need to be specified