[python] pandas get tabular data

Pandas can read tabular data from the Internet through read_html()

In HTML, the <table> tag can define a table:

The <thead> tag is used to define the header

<tbody> is used to define the theme of the table

<tr> in <tbody> defines a row

<th> in <tbody> defines a cell

Parameter settings in read_html():

1. IO settings:

io can be the path or URL of the local HTML document

2、match

The match value is a regular expression, only the form of the string matching the regular expression will be returned, otherwise an error will be reported

3、flavor

When specifying the parser of the web page source code for flavor, the default is generally lxml

4、header

header is used to specify one or several rows in the table as the column labels of the table. By default, the None parameter can be a single integer or a list of integers

5、index_col

header is used to specify a row in the table as the row label of the table, the default is None

6、encoding

encoding is used to specify table data and decoding methods, but generally does not need to be specified

Guess you like

Origin blog.csdn.net/weixin_39407597/article/details/126680560