A preliminary introduction to the installation and use of xlearning

Recently, I found a colleague who is very serious in doing algorithm work. He has many advantages for me to learn. It is recommended to use a 360 open source tool, which can submit tensoeflow to yarn. This tool feels like it solves a lot of problems. I am a lazy person, and most of the work is done successfully by my colleague, and then I sit back and enjoy the results and wait for the verification results. His results are hereby recorded here:

Reference URL:

https://github.com/Qihoo360/XLearning/blob/master/README_CN.md

Step 1: Find a machine that can be connected to the Internet and prepare the compilation environment

Note: The environment prepared for compilation here, the jdk version, the hadoop version, need to match the version of the official environment

Modify the pom.xml in the XLearning-master folder after the downloaded XLearning file is decompressed

    <repositories>
        <repository>
            <id>cloudera-releases</id>
            <url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
            <releases>
                <enabled>true</enabled>
            </releases>
            <snapshots>
                <enabled>false</enabled>
            </snapshots>
        </repository>
    </repositories>
    <properties>
        <hadoop2.version>2.6.0-cdh5.5.4</hadoop2.version>
        <junit.version>4.11</junit.version>
        <jdk.version>1.7</jdk.version>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <xlearning.jar.basename>xlearning-${project.version}-hadoop${hadoop2.version}</xlearning.jar.basename>
        <xlearning.jar>${xlearning.jar.basename}.jar</xlearning.jar>
    </properties>

Because the cdh version is compiled, you need to add the cdh releases, modify the hadoop version, specify the jdk version, and make sure that the jdk of the local environment is the same as the one specified and the same as the official environment.

Then execute in the source root directory

mvn package

After the compilation is completed, the release package xlearning-1.0-dist.tar.gz will be generated in the target directory under the source code root directory

Step 2:

Unzip this tar on the client where you want to run the program,

Specify environment variables

export XLEARNING_HOME=/home/test/xlearning

export JAVA_HOME=/home/test/jdk
export HADOOP_CONF_DIR=/home/test/hadoop/etc/hadoop

If you need to start the history service

Modify the parameters in xlearning-site.xml

Modify 0.0.0.0 as the startup address 

E.g:

    <property>
        <name>xlearning.history.webapp.https.address</name>
        <value>192.168.3.56:19885</value>
    </property>

Then create on hdfs cluster

    <property>
        <name>xlearning.tf.board.history.dir</name>
        <value>/tmp/XLearning/eventLog</value>
    </property>

    <property>
        <name>xlearning.history.log.dir</name>
        <value>/tmp/XLearning/history</value>
    </property>

    <property>
        <name>xlearning.staging.dir</name>
        <value>/tmp/XLearning/staging</value>
    </property>

The following specified directory

Then modify the xlearning-env.sh file

Uncomment some of the comments in it and add 

export JAVA_HOME=/home/test/jdk
export HADOOP_CONF_DIR=/home/test/hadoop/etc/hadoop

Modify at the same time (Note: You do not need to modify the overall cluster, just modify the current node configuration)

yarn-site.xml

<property>
    <name>yarn.application.classpath</name>
    <value>$HADOOP_CLIENT_CONF_DIR,$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*,$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*</value>
</property>

then start the service

sbin/start-history-server.sh

third step:

test:

You can test it through the built-in example package:

Example test packages are in:

xlearning/examples/tensorflow/run.sh

Remember to modify the information of the specified queue --queue default when submitting

If you encounter problems related to xlearning.tf .board.enable when running , check whether the relevant packages are not installed in the local package, which can be solved by setting  xlearning.tf .board.enable=false, or by specifying that you install the relevant packages after The python environment is resolved:

The solution is to add at runtime

--user-path /opt/anaconda3/bin/  \

--launch-cmd "/opt/anaconda3/bin/python demo.py --data_path=./data --save_path=./model --log_dir=./eventLog --training_epochs=10" \

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324648858&siteId=291194637