kubeflow-9-generation of lightweight python components

Component and construction method of kubeflow-3-pipeline

1 Lightweight python components

Lightweight python components do not require you to build a new container image for every code change. They're intended to use for fast iteration in notebook environment. Lightweight python components do not require you to build a new container image for every code change
. They are used for rapid iteration in a notebook environment.
Building a lightweight python component
constructed of lightweight components python
To build a component just define a stand -alone python function and then call kfp.components.func_to_container_op (func) to convert it to a component that can be used in a pipeline.
To construct For a component, you only need to define an independent python function, and then call kfp.components.func_to_container_op(func) to convert it into a component that can be used in the pipeline.
There are several requirements for the function: There are several requirements for the function
:

(1) The function should be stand -alone. It should not use any code declared outside of the function definition. Any imports should be added inside the main function. Any helper functions should also be defined inside the main function.
Function should be independent of. It should not use any code declared outside the function definition. Any imports should be added to the main function. Any helper functions should also be defined in the main function.

(2)The function can only import packages that are available in the base image. If you need to import a package that's not available you can try to find a container image that already includes the required packages. (As a workaround you can use the module subprocess to run pip install for the required package. There is an example below in my_divmod function.)
This function can only import packages available in the base image. If you need to import an unavailable package, you can try to find a container image that already contains the required package. (As a workaround, you can use the module subprocess to run pip install for the required packages. Below is an example of the my_divmod function.)

(3) If the function operates on numbers, the parameters need to have type hints. Supported types are [int, float, bool]. Everything else is passed as string. If the function operates on numbers, the parameters need to have type hints
. The supported types are [int, float, bool]. Everything else is passed as a string.

(4)To build a component with multiple output values, use the typing.NamedTuple type hint syntax:

NamedTuple('MyFunctionOutputs', [('output_name_1', type), ('output_name_2', float)])
To build a component with multiple output values, use the typing.NamedTuple type hint syntax

2 Simple application

import kfp
#(1)定义Python function
def add(a: float, b: float) -> float:
   '''Calculates sum of two arguments'''
   return a + b

#(2)Convert the function to a pipeline operation
add_op = kfp.components.func_to_container_op(add)

#(3)定义pipeline
@kfp.dsl.pipeline(
   name='Calculation pipeline',
   description='A toy pipeline that performs arithmetic calculations.'
)
def cal_pipeline(a = 7):
    #Passing pipeline parameter and a constant value as operation arguments
    add_task = add_op(a, 4) #Returns a dsl.ContainerOp class instance. 
#(4)提交执行
if __name__ == '__main__':
    kfp.Client().create_run_from_pipeline_func(cal_pipeline, arguments={
    
    })

3 Complex applications

A bit more advanced function which demonstrates how to use imports, helper functions and produce multiple outputs.

import kfp
from typing import NamedTuple
#(1)定义高级函数
#Advanced function
#Demonstrates imports, helper functions and multiple outputs
def my_divmod(dividend: float, divisor:float) -> NamedTuple('MyDivmodOutput', [('quotient', float), ('remainder', float), ('mlpipeline_metrics', 'Metrics')]):
    '''Divides two numbers and calculate  the quotient商 and remainder余数'''
    #(1-1)Pip installs inside a component function.
    #NOTE: 安装应该放在最开始的位置,以避免升级包
    # after it has already been imported and cached by python
    import sys, subprocess;
    subprocess.run([sys.executable, '-m', 'pip', 'install', 'numpy'])
    
    #(1-2)Imports inside a component function:
    import numpy as np

    #(1-3)This function demonstrates how to use nested functions inside a component function:
    def divmod_helper(dividend, divisor):
        return np.divmod(dividend, divisor)

    (quotient, remainder) = divmod_helper(dividend, divisor)

    import json
    

    # Exports two sample metrics:
    metrics = {
    
    
      'metrics': [{
    
    
          'name': 'quotient',
          'numberValue':  float(quotient),
        },{
    
    
          'name': 'remainder',
          'numberValue':  float(remainder),
        }]}

    from collections import namedtuple
    divmod_output = namedtuple('MyDivmodOutput', ['quotient', 'remainder', 'mlpipeline_metrics'])
    return divmod_output(quotient, remainder, json.dumps(metrics))

#(2)Convert the function to a pipeline operation
#You can specify an alternative base container image (the image needs to have Python 3.5+ installed).

divmod_op = kfp.components.func_to_container_op(my_divmod, base_image='tensorflow/tensorflow:1.13.2-py3')

#(3)Define the pipeline
#Pipeline function has to be decorated with the @dsl.pipeline decorator
@kfp.dsl.pipeline(
   name='Calculation pipeline',
   description='A toy pipeline that performs arithmetic calculations.'
)
def calc_pipeline(b='7',c='17'):
    #Passing a task output reference as operation arguments
    divmod_task = divmod_op(b, c)
#(4)提交执行
if __name__ == '__main__':   
    #Specify pipeline argument values
    arguments = {
    
    'b': '7', 'c': '8'}
    #Submit a pipeline run
    kfp.Client().create_run_from_pipeline_func(calc_pipeline, arguments=arguments)

(1)Test running the python function directly
my_divmod(100, 7)
输出
MyDivmodOutput(quotient=14, remainder=2, mlpipeline_metrics=’{“metrics”: [{“name”: “quotient”, “numberValue”: 14.0}, {“name”: “remainder”, “numberValue”: 2.0}]}’)
(2)组件输出的使用方式
For an operation with a single return value, the output reference can be accessed using task.output or task.outputs['output_name']

For an operation with a multiple return values, the output references can be accessed using task.outputs['output_name']

If you use print, it will appear in the component log.

Guess you like

Origin blog.csdn.net/qq_20466211/article/details/114138717