Kubernetes源码阅读笔记——Scheduler(之一)

Scheduler是集群中Master节点的重要组件,其功能是根据集群中各Pod的资源需求、亲和性等指标,将Pod合理调度到Kubernetes集群中的各个节点上。

一、入口函数

入口函数与Controller Manager的入口函数结构相同,同样是应用了cobra包,在命令行中注册了kube-scheduler命令。

cmd/kube-scheduler/scheduler.go

func main() { rand.Seed(time.Now().UnixNano()) command := app.NewSchedulerCommand() // TODO: once we switch everything over to Cobra commands, we can go back to calling // utilflag.InitFlags() (by removing its pflag.Parse() call). For now, we have to set the // normalize func and add the go flag set by hand. pflag.CommandLine.SetNormalizeFunc(utilflag.WordSepNormalizeFunc) // utilflag.InitFlags() logs.InitLogs() defer logs.FlushLogs() if err := command.Execute(); err != nil { fmt.Fprintf(os.Stderr, "%v\n", err) os.Exit(1) } }

这里核心的方法仍然是NewSchedulerCommand。该方法位于app/server.go中,结构与Controller Manager几乎一样,因此不贴上来了。核心的部分仍然是在cobra.Command结构体的Run字段中调用runCommand方法。

runCommand方法为Scheduler配置Config,最终返回的是Run方法,将Scheduler运行起来。

runCommand方法中间有一行值得注意:

func runCommand(cmd *cobra.Command, args []string, opts *options.Options) error {

    ...

    algorithmprovider.ApplyFeatureGates()

    ...
}

这一行的作用是调用ApplyFeatureGates方法,应用Pod调度算法。详细的调度算法实现位于pkg/scheduler/algorithm和pkg/scheduler/algorithmprovider中,这里不详细展开。

二、Run

看一下Run方法:

func Run(cc schedulerserverconfig.CompletedConfig, stopCh <-chan struct{}) error {
	// Create the scheduler.
	sched, err := scheduler.New(cc.Client,
		cc.InformerFactory.Core().V1().Nodes(),
		cc.PodInformer,
		cc.InformerFactory.Core().V1().PersistentVolumes(),
		cc.InformerFactory.Core().V1().PersistentVolumeClaims(),
		cc.InformerFactory.Core().V1().ReplicationControllers(),
		cc.InformerFactory.Apps().V1().ReplicaSets(),
		cc.InformerFactory.Apps().V1().StatefulSets(),
		cc.InformerFactory.Core().V1().Services(),
		cc.InformerFactory.Policy().V1beta1().PodDisruptionBudgets(),
		cc.InformerFactory.Storage().V1().StorageClasses(),
		cc.Recorder,
		cc.ComponentConfig.AlgorithmSource,
		stopCh,
		scheduler.WithName(cc.ComponentConfig.SchedulerName),
		scheduler.WithHardPodAffinitySymmetricWeight(cc.ComponentConfig.HardPodAffinitySymmetricWeight),
		scheduler.WithPreemptionDisabled(cc.ComponentConfig.DisablePreemption),
		scheduler.WithPercentageOfNodesToScore(cc.ComponentConfig.PercentageOfNodesToScore),
		scheduler.WithBindTimeoutSeconds(*cc.ComponentConfig.BindTimeoutSeconds))
	if err != nil {
		return err
	}

	// Prepare the event broadcaster.
	if cc.Broadcaster != nil && cc.EventClient != nil {
		cc.Broadcaster.StartLogging(klog.V(6).Infof)
		cc.Broadcaster.StartRecordingToSink(&v1core.EventSinkImpl{Interface: cc.EventClient.Events("")})
	}

	// Setup healthz checks.
	var checks []healthz.HealthzChecker
	if cc.ComponentConfig.LeaderElection.LeaderElect {
		checks = append(checks, cc.LeaderElection.WatchDog)
	}

	// Start up the healthz server.
	if cc.InsecureServing != nil {
		separateMetrics := cc.InsecureMetricsServing != nil
		handler := buildHandlerChain(newHealthzHandler(&cc.ComponentConfig, separateMetrics, checks...), nil, nil)
		if err := cc.InsecureServing.Serve(handler, 0, stopCh); err != nil {
			return fmt.Errorf("failed to start healthz server: %v", err)
		}
	}
	if cc.InsecureMetricsServing != nil {
		handler := buildHandlerChain(newMetricsHandler(&cc.ComponentConfig), nil, nil)
		if err := cc.InsecureMetricsServing.Serve(handler, 0, stopCh); err != nil {
			return fmt.Errorf("failed to start metrics server: %v", err)
		}
	}
	if cc.SecureServing != nil {
		handler := buildHandlerChain(newHealthzHandler(&cc.ComponentConfig, false, checks...), cc.Authentication.Authenticator, cc.Authorization.Authorizer)
		if err := cc.SecureServing.Serve(handler, 0, stopCh); err != nil {
			// fail early for secure handlers, removing the old error loop from above
			return fmt.Errorf("failed to start healthz server: %v", err)
		}
	}

	// Start all informers.
	go cc.PodInformer.Informer().Run(stopCh)
	cc.InformerFactory.Start(stopCh)

	// Wait for all caches to sync before scheduling.
	cc.InformerFactory.WaitForCacheSync(stopCh)
	controller.WaitForCacheSync("scheduler", stopCh, cc.PodInformer.Informer().HasSynced)

	// Prepare a reusable runCommand function.
	run := func(ctx context.Context) {
		sched.Run()
		<-ctx.Done()
	}

	ctx, cancel := context.WithCancel(context.TODO()) // TODO once Run() accepts a context, it should be used here
	defer cancel()

	go func() {
		select {
		case <-stopCh:
			cancel()
		case <-ctx.Done():
		}
	}()

	// If leader election is enabled, runCommand via LeaderElector until done and exit.
	if cc.LeaderElection != nil {
		cc.LeaderElection.Callbacks = leaderelection.LeaderCallbacks{
			OnStartedLeading: run,
			OnStoppedLeading: func() {
				utilruntime.HandleError(fmt.Errorf("lost master"))
			},
		}
		leaderElector, err := leaderelection.NewLeaderElector(*cc.LeaderElection)
		if err != nil {
			return fmt.Errorf("couldn't create leader elector: %v", err)
		}

		leaderElector.Run(ctx)

		return fmt.Errorf("lost lease")
	}

	// Leader election is disabled, so runCommand inline until done.
	run(ctx)
	return fmt.Errorf("finished without leader elect")
}

Run方法包含的内容较多,下面慢慢分析:

(1)创建Scheduler。

Run方法的前几行代码调用了New方法,创建了一个Scheduler对象。这个New方法位于pkg/scheduler/scheduler.go中:

pkg/scheduler/scheduler.go

func New(client clientset.Interface, nodeInformer coreinformers.NodeInformer, podInformer coreinformers.PodInformer, pvInformer coreinformers.PersistentVolumeInformer, pvcInformer coreinformers.PersistentVolumeClaimInformer, replicationControllerInformer coreinformers.ReplicationControllerInformer, replicaSetInformer appsinformers.ReplicaSetInformer, statefulSetInformer appsinformers.StatefulSetInformer, serviceInformer coreinformers.ServiceInformer, pdbInformer policyinformers.PodDisruptionBudgetInformer, storageClassInformer storageinformers.StorageClassInformer, recorder record.EventRecorder, schedulerAlgorithmSource kubeschedulerconfig.SchedulerAlgorithmSource, stopCh <-chan struct{}, opts ...func(o *schedulerOptions)) (*Scheduler, error) { options := defaultSchedulerOptions for _, opt := range opts { opt(&options) } // Set up the configurator which can create schedulers from configs. configurator := factory.NewConfigFactory(&factory.ConfigFactoryArgs{ SchedulerName: options.schedulerName, Client: client, NodeInformer: nodeInformer, PodInformer: podInformer, PvInformer: pvInformer, PvcInformer: pvcInformer, ReplicationControllerInformer: replicationControllerInformer, ReplicaSetInformer: replicaSetInformer, StatefulSetInformer: statefulSetInformer, ServiceInformer: serviceInformer, PdbInformer: pdbInformer, StorageClassInformer: storageClassInformer, HardPodAffinitySymmetricWeight: options.hardPodAffinitySymmetricWeight, DisablePreemption: options.disablePreemption, PercentageOfNodesToScore: options.percentageOfNodesToScore, BindTimeoutSeconds: options.bindTimeoutSeconds, }) var config *factory.Config source := schedulerAlgorithmSource switch { case source.Provider != nil: // Create the config from a named algorithm provider. sc, err := configurator.CreateFromProvider(*source.Provider) if err != nil { return nil, fmt.Errorf("couldn't create scheduler using provider %q: %v", *source.Provider, err) } config = sc case source.Policy != nil: // Create the config from a user specified policy source. policy := &schedulerapi.Policy{} switch { case source.Policy.File != nil: if err := initPolicyFromFile(source.Policy.File.Path, policy); err != nil { return nil, err } case source.Policy.ConfigMap != nil: if err := initPolicyFromConfigMap(client, source.Policy.ConfigMap, policy); err != nil { return nil, err } } sc, err := configurator.CreateFromConfig(*policy) if err != nil { return nil, fmt.Errorf("couldn't create scheduler from policy: %v", err) } config = sc default: return nil, fmt.Errorf("unsupported algorithm source: %v", source) } // Additional tweaks to the config produced by the configurator. config.Recorder = recorder config.DisablePreemption = options.disablePreemption config.StopEverything = stopCh // Create the scheduler. sched := NewFromConfig(config) return sched, nil }

New方法很长,但是逻辑相对清晰。其本质就是根据传入的Informer、算法等参数,实例化一个Config,然后调用NewFromConfig方法,通过这个Config创建一个scheduler实例并返回。可以看到,scheduler中也用到了包括nodeInformer、podInformer等在内的大量Informer,因为scheduler也需要及时掌握资源的变化,从而调整调度的策略。

创建config的NewConfigFactory方法位于pkg/scheduler/factory/factory.go中,进入可以看到,该方法为一系列Informer初始化了回调函数。

其中最重要的是PodInformer的两个回调函数,将已调度和未调度的Pod分别存入缓存和队列中,并定义了对两种Pod的增、改、删方法。

此外,可以看到,在ConfigFactory中,维护了一个podQueue队列,用于存放待调度的Pod。

(2)运行广播和健康检查。

中间有几行是为Scheduler配置广播和健康检查相关内容,与Controller Manager类似,不提。

(3)Informer启动。

值得注意的是,Scheduler将PodInformer从其他的Informer中独立出来了,因为对Pod的调度才是Scheduler的核心。

(4)运行Scheduler。

这是整个方法的核心。通过调用Scheduler的Run方法,将Scheduler运行起来。

进入Run方法,我们发现方法非常简洁,就做了2件事:

pkg/scheduler/scheduler.go

func (sched *Scheduler) Run() {
	if !sched.config.WaitForCacheSync() {
		return
	}

	go wait.Until(sched.scheduleOne, 0, sched.config.StopEverything)
}

第一件事是等待缓存同步,第二件事是调用scheduleOne方法,执行Pod的调度操作。具体内容我们下一篇文章再继续分析。

三、总结

总结Scheduler的逻辑,大体上是通过cobra注册一个kube-scheduler命令并运行。命令运行时,首先应用给定的调度算法,然后基于ConfigFactory,创建一个Scheduler的实例,启动相关的Informer,然后开始执行调度。

猜你喜欢

转载自www.cnblogs.com/00986014w/p/10305425.html