docker (19): Compress docker python image size and reduce dependencies

1. About python mirroring


I have a problem when developing with python.
It is found that the image is still relatively large after the build. Find a way to reduce some files or something.
It can make the mirror smaller, so that it is convenient to publish and deploy.

2. Build dockerfile


python3 is having trouble building pandas.
Still build with python2:

FROM python:2-slim-jessie

#设置 apt sources pip timezone 参数
RUN echo "deb http://mirrors.aliyun.com/debian/ jessie main non-free contrib" > /etc/apt/sources.list && \
echo "deb http://mirrors.aliyun.com/debian/ jessie-proposed-updates main non-free contrib" >> /etc/apt/sources.list && \
echo "deb-src http://mirrors.aliyun.com/debian/ jessie main non-free contrib" >> /etc/apt/sources.list && \
echo "deb-src http://mirrors.aliyun.com/debian/ jessie-proposed-updates main non-free contrib" >> /etc/apt/sources.list && \
echo "[global]" > /etc/pip.conf  && \
echo "trusted-host=mirrors.aliyun.com" >> /etc/pip.conf  && \
echo "index-url=http://mirrors.aliyun.com/pypi/simple" >> /etc/pip.conf && \
ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
echo "Asia/Shanghai" > /etc/timezone

#install other lib
#名字变了 libmariadbclient-dev-compat - MariaDB database development files (libmysqlclient compatibility)
# libmysqld-dev
# 安装完成就不需要 dev库了删除节省空间。
# pip install pyspark 好像有问题
# https://blog.csdn.net/i2cbus/article/details/41625337 参数优化。
RUN apt-get update && apt-get install -y python-dev libmysqlclient-dev gcc cron vim supervisor git && \
    echo "#define WITH_DOC_STRINGS 1" >> /usr/include/python2.7/pyconfig.h && \
    pip install --upgrade pip && \
    pip install mysqlclient  && \
    pip install sqlalchemy && \
    pip install requests && \
    pip install numpy pandas jupyter && \
    apt-get install -y libmysqlclient18  && \
    rm -rf /root/.cache && apt-get autoclean && \
    apt-get --purge autoremove -y python-dev libmysqlclient-dev gcc && \
    find /usr/lib/python2.7 -name '*.pyc' -delete && \
    find /usr/local/lib/python2.7 -name '*.pyc' -delete && \
    rm -rf /tmp/* /var/lib/apt/* /var/cache/* /var/log/*


#1.解决 pandas 数据插入问题。直接修改数据库驱动 sqlalchemy 修改:statement.replace("INSERT INTO","INSERT IGNORE INTO")
# debian /usr/local/lib/python2.7/site-packages/sqlalchemy
# ubuntu /usr/local/lib/python2.7/dist-packages/sqlalchemy/dialects/mysql/mysqldb.py
# 增加了一个 IGNORE 参数。
#2.解决torndb在python2下面的问题:
#http://blog.csdn.net/littlethunder/article/details/8917378
RUN sed -i -e 's/executemany(statement/executemany(statement.replace\("INSERT INTO","INSERT IGNORE INTO")/g' \
        /usr/local/lib/python2.7/site-packages/sqlalchemy/dialects/mysql/mysqldb.py

#增加语言utf-8
ENV LANG=en_US.UTF-8
ENV LC_CTYPE=en_US.UTF-8
ENV LC_ALL=C

Modified the address of debian mirror to use aliyun, and pip installation has also been modified to the address of aliyun.
Installation is super fast.
Use python-dev libmysqlclient-dev gcc during installation. After the installation is complete,
delete python-dev libmysqlclient-dev gcc, which is much smaller.

find /usr/lib/python2.7 -name '*.pyc' -delete && \
find /usr/local/lib/python2.7 -name '*.pyc' -delete && \

A python file with pyc removed can be about 80 mb smaller.

rm -rf /root/.cache && apt-get autoclean
rm -rf /tmp/* /var/lib/apt/* /var/cache/* /var/log/*

Delete temporary files.

After a series of deletions the image size got to 484MB, I've worked hard.

3. Summary


After being simplified, it is much more convenient for others to use. The download speed is also faster.
It will be smaller after compression. Re-modify the mirror image of the stock system.
Just focus on stock data capture and data analysis. Demonstrate writing another one using golang.

The original link of this article is:
https://blog.csdn.net/freewebsys/article/details/79961371

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325664190&siteId=291194637