Quantitative Data – Download stock data for free with BaoStock

Introduction

baostock is a free open source securities data platform that does not require registration. Securities data can be obtained through python API. The data format it returns is pandas DataFrame type, which is very friendly to data operations.

  • Advantages: Free, no registration required, includes A-share stocks, index data, and financial data

  • Disadvantages: The current data is incomplete and lacks financial data such as Hong Kong stocks, futures, foreign exchange and funds.

Data time range description:

  • Stock data: including daily, weekly, and monthly K-line data, time range: 1990-12-19 to present; 5, 15, 30, and 60-minute K-line data, time range: 1999-07-26 to present

  • Index data: including daily, weekly and monthly K-line data, including indexes: comprehensive index, size index, primary industry index, secondary industry index, strategy index, growth index, value index, theme index, fund index, bond index , time range: 2006-01-01 to present

  • Quarterly frequency financial data: Financial data included: asset and liability information of some listed companies, cash flow information of listed companies, profit information of listed companies, DuPont indicator information of listed companies. Time range: 2007 to present

  • Quarterly company reports: Performance forecast information of listed companies, time range: 2003 to present; Express performance information of listed companies, time range: 2006 to present

Daily latest data update time:

  • At 17:30 on the current trading day, the daily K-line data will be entered into the database;

  • At 20:30 on the current trading day, the minute K-line data is completed and entered into the database;

  • At 1:30 on the next natural day, "other financial report data" of the previous trading day will be entered into the database.

Install

pip install baostock -i https://pypi.tuna.tsinghua.edu.cn/simple/ --trusted-host pypi.tuna.tsinghua.edu.cn

After installation, run from the command line:

python -c "import baostock as bs; bs.login()"

The output "login success!" indicates that the installation is successful.

Description of commonly used functions

There are three commonly used functions: login(), query_all_stock(), and query_history_k_data_plus(). Other quarterly financial data, company report information, macroeconomic data, etc. can be obtained through the API. For details, please refer to the official website.

  1. login() function

log in system. Establish a connection to the server and log in without registration. It should be noted that if there is no operation for a period of time after logging in, the API request will time out and you need to log in again to continue downloading data.

import baostock as bs
bs.login()

bs.logout()You can actively call the function to disconnect from the server when it is no longer needed .

  1. query_all_stock() function

Get a list of all stocks on a specified trading date. You can obtain the data of a certain day through the parameter day. If the parameter is empty, it means that the data of that day will be obtained by default. The return type is a data type customized by baostock. Pandas DataFrame type data can be obtained through the get_data() function.

In [10]: date = "2022-05-27"
    ...: stock_df = bs.query_all_stock(date).get_data()
    ...: stock_df
Out[10]:
           code tradeStatus   code_name
0     bj.430047           1
1     bj.430090           1
2     bj.430198           1
3     bj.430418           1
4     bj.430489           1
...         ...         ...         ...
5315  sz.399994           1  中证信息安全主题指数
5316  sz.399995           1    中证基建工程指数
5317  sz.399996           1    中证智能家居指数
5318  sz.399997           1      中证白酒指数
5319  sz.399998           1      中证煤炭指数

[5320 rows x 3 columns]

If a day is a non-trading day, the DataFrame returned by get_data() is empty. For example, 2022-05-28 is a Saturday with no trading, then len(bs.query_all_stock('2022-05-28').get_data()) Returns 0 as shown below:

In [11]: len(bs.query_all_stock('2022-05-27').get_data())
Out[11]: 5320

In [12]: len(bs.query_all_stock('2022-05-28').get_data())
Out[12]: 0

Get the stock list through the tolist() function:

stock_list = stock_df['code'].tolist()
  1. query_history_k_data_plus() function

Get A-share historical transaction data. You can obtain daily K-line, weekly K-line, monthly K-line, as well as 5-minute, 15-minute, 30-minute and 60-minute K-line data through parameter settings. You can query the non-rerighting, pre-rerighting and post-rerighting data, which is suitable for use with moving average data. Stock picking and analysis. The return type is a data type customized by baostock. Pandas DataFrame type data can be obtained through the get_data() function.

Parameter description is as follows:

  • The code parameter is the stock code

  • The fields parameter supports multiple indicator inputs, separated by commas, and the content is filled in as the return type column. Among them, pctChg represents the increase or decrease (percentage), peTTM represents the rolling price-to-earnings ratio, and psTTM represents the rolling price-to-sales ratio.

  • frequency: data type, default is d, daily k-line; d=daily k-line, w=week, m=month, 5=5 minutes, 15=15 minutes, 30=30 minutes, 60=60-minute k-line data, It is not case sensitive; the index does not have minute line data; the weekly line can only be obtained on the last trading day of each week, and the monthly line can only be obtained on the last trading day of each month.

  • adjustflag parameter: Restoration type, no restoration of rights by default; 1: Restoration of rights later; 2: Restoration of rights before; 3: No restoration of rights.

The sample code is as follows:

code = "sz.399994"
data_fields = "date,open,high,low,close,preclose,volume,amount,adjustflag,turn,tradestatus,pctChg,peTTM,pbMRQ, psTTM,pcfNcfTTM,isST"
start_date = "2022-05-21"
end_date = "2022-05-28"
adjustflag = "2"

kdata_df = bs.query_history_k_data_plus(code, 
       data_fields, 
       start_date=start_date, 
       end_date=end_date,
              frequency='d', 
       adjustflag=adjustflag).get_data()

In [14]: kdata_df
Out[14]:
         date       open       high        low      close   preclose  ...     pctChg     peTTM     pbMRQ     psTTM pcfNcfTTM isST
0  2022-05-23  1333.9767  1339.3266  1323.2326  1339.0444  1323.8745  ...   1.145871  0.000000  0.000000  0.000000  0.000000    0
1  2022-05-24  1338.0567  1340.5022  1270.4979  1270.4979  1339.0444  ...  -5.119061  0.000000  0.000000  0.000000  0.000000    0
2  2022-05-25  1273.4179  1290.6949  1273.4179  1288.6189  1270.4979  ...   1.426291  0.000000  0.000000  0.000000  0.000000    0
3  2022-05-26  1287.7710  1306.6983  1264.3422  1299.1283  1288.6189  ...   0.815555  0.000000  0.000000  0.000000  0.000000    0
4  2022-05-27  1312.1094  1320.1226  1287.6731  1297.4858  1299.1283  ...  -0.126431  0.000000  0.000000  0.000000  0.000000    0

During the backtesting process, you usually do not want the suspension data to interfere with the backtesting, and you can delete the suspension data.

if kdata_df.shape[0]:
    kdata_df = kdata_df[(kdata_df['volume'] != '0') & (kdata_df['volume'] != '')]

Use & communicate

Follow the official account for more content. At the same time, you can also get an invitation to join the quantitative investment seminar WeChat group to communicate and discuss with many quantitative practitioners and enthusiasts, and not miss the latest industry development and technological progress.

WeChat public account: Zhuge Talk

Writing articles is not easy. If you think this article is helpful to you, please click and read it.

reference

Guess you like

Origin blog.csdn.net/richardzhutalk/article/details/125027067