Grafana启动失败报错:Grafana-server Init Failed: Could not find config defaults, make sure homepath command

一、问题描述

某项目监控服务器prometheus+grafana,但因前期规划问题,服务器磁盘空间配置不够,一天的数据量就占满了根分区,导致prometheus和grafana宕机,清理空间后,重启grafana却无法启动,报如下错误:

Jun 09 11:40:49 2-bc-hb-56-centos7 systemd[1]: grafana-server.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Jun 09 11:40:49 2-bc-hb-56-centos7 systemd[1]: Failed to start Grafana instance.
-- Subject: Unit grafana-server.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit grafana-server.service has failed.
-- 
-- The result is failed.
Jun 09 11:40:49 2-bc-hb-56-centos7 systemd[1]: Unit grafana-server.service entered failed state.
Jun 09 11:40:49 2-bc-hb-56-centos7 systemd[1]: grafana-server.service failed.
Jun 09 11:40:49 2-bc-hb-56-centos7 systemd[1]: grafana-server.service holdoff time over, scheduling restart.
Jun 09 11:40:49 2-bc-hb-56-centos7 systemd[1]: start request repeated too quickly for grafana-server.service
Jun 09 11:40:49 2-bc-hb-56-centos7 systemd[1]: Failed to start Grafana instance.
-- Subject: Unit grafana-server.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit grafana-server.service has failed.
-- 
-- The result is failed.
Jun 09 11:40:49 2-bc-hb-56-centos7 systemd[1]: Unit grafana-server.service entered failed state.
Jun 09 11:40:49 2-bc-hb-56-centos7 systemd[1]: grafana-server.service failed.

现场启动grafanna是通过systemctl start grafana-server.service来进行的。

二、分析处理

1、查看日志journalctl -xe未看到明显报错,但有如下程序性异常:

Jun 09 11:40:49 2-bc-hb-56-centos7 grafana-server[28554]: /drone/src/pkg/services/sqlstore/sqlstore.go:135 +0x6f
Jun 09 11:40:49 2-bc-hb-56-centos7 grafana-server[28554]: github.com/grafana/grafana/pkg/services/sqlstore.ProvideService(0xc0005da000, 0x18, {
    
    0x3bb1
Jun 09 11:40:49 2-bc-hb-56-centos7 grafana-server[28554]: /drone/src/pkg/services/sqlstore/sqlstore.go:67 +0xdc
Jun 09 11:40:49 2-bc-hb-56-centos7 grafana-server[28554]: github.com/grafana/grafana/pkg/server.Initialize({
    
    {
    
    0x7fff108b8cc3, 0x18}, {
    
    0x0, 0x0}, {
    
    0xc0
Jun 09 11:40:49 2-bc-hb-56-centos7 grafana-server[28554]: /drone/src/pkg/server/wire_gen.go:190 +0x1f8
Jun 09 11:40:49 2-bc-hb-56-centos7 grafana-server[28554]: github.com/grafana/grafana/pkg/cmd/grafana-server/commands.executeServer({
    
    0x7fff108b8cc3, 0
Jun 09 11:40:49 2-bc-hb-56-centos7 grafana-server[28554]: /drone/src/pkg/cmd/grafana-server/commands/cli.go:170 +0x625
Jun 09 11:40:49 2-bc-hb-56-centos7 grafana-server[28554]: github.com/grafana/grafana/pkg/cmd/grafana-server/commands.RunServer({
    
    {
    
    0x3b22c68, 0x5}, {
    
    0x
Jun 09 11:40:49 2-bc-hb-56-centos7 grafana-server[28554]: /drone/src/pkg/cmd/grafana-server/commands/cli.go:107 +0x785
Jun 09 11:40:49 2-bc-hb-56-centos7 grafana-server[28554]: main.main()
Jun 09 11:40:49 2-bc-hb-56-centos7 grafana-server[28554]: /drone/src/pkg/cmd/grafana-server/main.go:16 +0xc5
Jun 09 11:40:49 2-bc-hb-56-centos7 systemd[1]: grafana-server.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Jun 09 11:40:49 2-bc-hb-56-centos7 systemd[1]: Failed to start Grafana instance.
-- Subject: Unit grafana-server.service has failed

在这里插入图片描述
如上图所示,疑似程序错误导致了OS异常(内存读写异常)。

2、查看grafana应用日志:/var/log/grafana/grafana.log;也未看到业务类报错,基本全是info类信息;

3、直接执行命令行启动:

/usr/sbin/grafana-server --config=${CONF_FILE} --pidfile=${PID_FILE_DIR}/grafana-server.pid --packaging=rpm cfg:default.paths.logs=${LOG_DIR} cfg:default.paths.data=${DATA_DIR} cfg:default.paths.plugins=${PLUGINS_DIR} cfg:default.paths.provisioning=${PROVISIONING_CFG_DIR}

报错:

Grafana server is running with elevated privileges. This is not recommended
Grafana-server Init Failed: Could not find config defaults, make sure homepath command line parameter is set or working directory is homepath

但执行如下简化命令可正常启动:

/usr/sbin/grafana-server --config=/etc/grafana/grafana.ini --homepath=/usr/share/grafana

4、检查自启动文件:/usr/lib/systemd/system/grafana-server.service;程序需以grafana身份启动,且该用户无系统登录权限;因此上文root下启动时,会有相关提权运行程序的警示信息。但关于提示中的:grafana找不到默认的配置文件,检查配置并未发现异常,路径均正确;

/etc/sysconfig/grafana-server相关配置解释如下:

GRAFANA_USER=grafana #系统用户
GRAFANA_GROUP=grafana #系统组
GRAFANA_HOME=/usr/share/grafana #家目录,静态资源默认存放位置,升级时建议
备份
LOG_DIR=/var/log/grafana #日志目录
DATA_DIR=/var/lib/grafana #数据默认存放目录,升级时建议备份
MAX_OPEN_FILES=10000 #最大支持打开文件数
CONF_DIR=/etc/grafana #配置文件目录,升级时建议备份
CONF_FILE=/etc/grafana/grafana.ini #主配置文件
RESTART_ON_UPGRADE=true #更新时就重启
PLUGINS_DIR=/var/lib/grafana/plugins #读取插件存目录
PROVISIONING_CFG_DIR=/etc/grafana/provisioning #通过读取配置文件方式来配置
datasource和dashboard,而不是在grafana图形窗口中操作
#Only used on systemd systems
PID_FILE_DIR=/var/run/grafana #进程存放目录

扫描二维码关注公众号,回复: 14250779 查看本文章

/etc/sysconfig/grafana-server配置项:

GRAFANA_USER=grafana
GRAFANA_GROUP=grafana
GRAFANA_HOME=/usr/share/grafana
#修改的logs和data
LOG_DIR=/opt/grafana/logs
DATA_DIR=/opt/grafana/data
MAX_OPEN_FILES=10000
#配置文件的路径修改
CONF_DIR=/opt/grafana/conf
CONF_FILE=/opt/grafana/conf/grafana.ini

RESTART_ON_UPGRADE=true
#修改的plugins
PLUGINS_DIR=/opt/grafana/plugins
PROVISIONING_CFG_DIR=/etc/grafana/provisioning
#Only used on systemd systems
#进程文件路径修改
PID_FILE_DIR=/opt/grafana

5、检查相关grafana文件属性,部分为root,修改为grafana后未果。

6、替换自启动脚本里的变量(变量取值于环境配置文件/etc/sysconfig/grafana-server):

/usr/sbin/grafana-server \
> --config=/etc/grafana/grafana.ini \
> --pidfile=/var/run/grafana/grafana-server.pid \
> --packaging=rpm \
> cfg:default.paths.logs=/var/log/grafana \
> cfg:default.paths.data=/var/lib/grafana \
> cfg:default.paths.plugins=/var/lib/grafana/plugins \
> cfg:default.paths.provisioning=/etc/grafana/provisioning

依然报同样的错:

Grafana server is running with elevated privileges. This is not recommended
Grafana-server Init Failed: Could not find config defaults, make sure homepath command line parameter is set or working directory is homepath

修改加入homepath参数,重新启动:

/usr/sbin/grafana-server \
> --config=/etc/grafana/grafana.ini \
> --pidfile=/var/run/grafana/grafana-server.pid \
> --homepath=/usr/share/grafana \
> --packaging=rpm \
> cfg:default.paths.logs=/var/log/grafana \
> cfg:default.paths.data=/var/lib/grafana \
> cfg:default.paths.plugins=/var/lib/grafana/plugins \
> cfg:default.paths.provisioning=/etc/grafana/provisioning

发现不再报上述错误,反而报Panic,如下所示:
在这里插入图片描述
但实际追加force_migration=true到配置文件后,执行未果。

/usr/sbin/grafana-server --config=/etc/grafana/grafana.ini --pidfile=/var/run/grafana/grafana-server.pid --homepath=/usr/share/grafana --packaging=rpm cfg:default.paths.logs=/var/log/grafana

执行:

grafana-cli admin reset-admin-password "123456" --homepath "/usr/share/grafana"

在这里插入图片描述
另外web方式修改密码如下:

curl -X PUT -H "Content-Type: application/json" -d '{
  "oldPassword": "admin",
  "newPassword": "newpass",
  "confirmNew": "newpass"
}' http://admin:admin@<your_grafana_host>:3000/api/user/password

其中ggrafana使用的SQLite database文件/var/lib/grafana/grafana.db权限就是640的,无需修改。

7、相关经验表明grafana数据文件不一致也会导致启动失败,检查如下:

du -sh /usr/share/grafana/data/grafana.db  /var/lib/grafana/grafana.db 
944K	/usr/share/grafana/data/grafana.db
2.5M	/var/lib/grafana/grafana.db

#执行同步
rsync -av /var/lib/grafana/grafana.db /usr/share/grafana/data/grafana.db 
sending incremental file list
grafana.db

sent 2,560,734 bytes  received 35 bytes  5,121,538.00 bytes/sec
total size is 2,560,000  speedup is 1.00

#再次检查
du -sh /usr/share/grafana/data/grafana.db  /var/lib/grafana/grafana.db 
2.5M	/usr/share/grafana/data/grafana.db
2.5M	/var/lib/grafana/grafana.db
#重启验证
systemctl start grafana-server  #未果

修改grafana.data目录为/usr/share/grafana尝试启动:

/usr/sbin/grafana-server --config=/etc/grafana/grafana.ini --pidfile=/var/run/grafana/grafana-server.pid --homepath=/usr/share/grafana --packaging=rpm cfg:default.paths.logs=/var/log/grafana cfg:default.paths.data=/usr/share/grafana

可以启动,但是登录报错:
在这里插入图片描述
再次执行:

/usr/sbin/grafana-server --config=/etc/grafana/grafana.ini --pidfile=/var/run/grafana/grafana-server.pid --homepath=/usr/share/grafana --packaging=rpm cfg:default.paths.logs=/var/log/grafana cfg:default.paths.data=/usr/share/grafana cfg:default.paths.plugins=/var/lib/grafana/plugins cfg:default.paths.provisioning=/etc/grafana/provisioning

在这里插入图片描述
综上,本次报错出现在cfg:default.paths.data=/usr/share/grafana 配置中。对比检查发现/var/lib/grafana/少了一个默认的data目录,怀疑这是导致无法读取文件的原因

mkdir /var/lib/grafana/data
chown -R grafana.grafana ./data/
#命令行调试,启动正常
/usr/sbin/grafana-server --config=/etc/grafana/grafana.ini --pidfile=/var/run/grafana/grafana-server.pid --homepath=/usr/share/grafana --packaging=rpm cfg:default.paths.logs=/var/log/grafana cfg:default.paths.data=/var/lib/grafana cfg:default.paths.plugins=/var/lib/grafana/plugins cfg:default.paths.provisioning=/etc/grafana/provisioning

在这里插入图片描述
登录失败的问题,更改数据目录权限即可。另外/etc/grafana/grafana.ini的优先级高于默认的目录下的配置文件。将原先的grafana.db复制到新的data目录后,依然无法启动。从而,我们可得知,grafana.db文件损坏或异常导致了grafana无法启动。

删除/var/lib/grafana/下的所有文件,/etc/grafana/grafana.ini指定data目录为/var/lib/grafana/,重启grafana:
在这里插入图片描述
重置admin密码后,登录操作即可,重新配置数据源和dashboard。
在这里插入图片描述

三、附录

1)Grafana配置文件加载:

Grafana主要从三个配置文件读取配置:默认是$WORKING_DIR/conf/defaults.ini,其次是用户配置的$WORKING_DIR/conf/custom.ini,也可以在命令行启动grafana时通过–config参数重新指定配置文件来覆盖。如果是以deb或者rpm安装的,则默认的配置文件是/etc/grafana/grafana.ini,这个文件即在init.d的启动脚本中通过–config参数指定的。

脚本示例:

### END INIT INFO

#  tested on
#  1. New lsb that define start-stop-daemon
#  3. Centos with initscripts package installed

PATH=/bin:/usr/bin:/sbin:/usr/sbin
NAME=grafana-server
DESC="Grafana Server"

GRAFANA_USER=grafana
GRAFANA_GROUP=grafana
GRAFANA_HOME=/usr/share/grafana
DATA_DIR=/var/lib/grafana
PLUGINS_DIR=/var/lib/grafana/plugins
LOG_DIR=/var/log/grafana
CONF_FILE=$CONF_DIR/grafana.ini
PROVISIONING_CFG_DIR=$CONF_DIR/provisioning
MAX_OPEN_FILES=10000
PID_FILE=/var/run/$NAME.pid
DAEMON=/usr/sbin/$NAME

if [ ! -x $DAEMON ]; then
  echo "Program not installed or not executable"
  exit 5
fi

#
# init.d / servicectl compatibility (openSUSE)
#
if [ -f /etc/rc.status ]; then
    . /etc/rc.status
    rc_reset
fi

#
# Source function library.
#
if [ -f /etc/rc.d/init.d/functions ]; then
    . /etc/rc.d/init.d/functions
fi

# overwrite settings from default file
[ -e /etc/sysconfig/$NAME ] && . /etc/sysconfig/$NAME


function isRunning() {
    
    
  status -p $PID_FILE $NAME > /dev/null 2>&1
}

function checkUser() {
    
    
  if [ `id -u` -ne 0 ]; then
    echo "You need root privileges to run this script"
    exit 4
  fi
}

case "$1" in
  start)
    checkUser
    isRunning
    if [ $? -eq 0 ]; then
      echo "Already running."
      exit 0
    fi

    # Prepare environment
    mkdir -p "$LOG_DIR" "$DATA_DIR" && chown "$GRAFANA_USER":"$GRAFANA_GROUP" "$LOG_DIR" "$DATA_DIR"
    touch "$PID_FILE" && chown "$GRAFANA_USER":"$GRAFANA_GROUP" "$PID_FILE"

    if [ -n "$MAX_OPEN_FILES" ]; then
      ulimit -n $MAX_OPEN_FILES
    fi

    # Start Daemon
    cd $GRAFANA_HOME
    action $"Starting $DESC: ..." su -s /bin/sh -c "nohup ${DAEMON} ${DAEMON_OPTS} >> /dev/null 3>&1 &" $GRAFANA_USER 2> /dev/null
    return=$?
    if [ $return -eq 0 ]
    then
      sleep 1
      # check if pid file has been written to
      if ! [[ -s $PID_FILE ]]; then
        echo "FAILED"
        exit 1
      fi
      i=0
      timeout=10
      # Wait for the process to be properly started before exiting
      until {
    
     cat "$PID_FILE" | xargs kill -0; } >/dev/null 2>&1
      do
        sleep 1
        i=$(($i + 1))
        if [ $i -gt $timeout ]; then
          echo "FAILED"
          exit 1
        fi
      done
    fi

    exit $return
    ;;
  stop)
    checkUser
    echo -n "Stopping $DESC: ..."

    if [ -f "$PID_FILE" ]; then
      killproc -p $PID_FILE -d 20 $NAME
      if [ $? -eq 1 ]; then
        echo  "$DESC is not running but pid file exists, cleaning up"
      elif [ $? -eq 3 ]; then
        PID="`cat $PID_FILE`"
        echo  "Failed to stop $DESC (pid $PID)"
        exit 1
      fi
      rm -f "$PID_FILE"
      echo  ""
      exit 0
    else
      echo  "(not running)"
    fi
    exit 0
    ;;
  status)
    status -p $PID_FILE $NAME
    exit $?
    ;;
  restart|force-reload)
    if [ -f "$PID_FILE" ]; then
      $0 stop
      sleep 1
    fi
    $0 start
    ;;
  *)
    echo "Usage: $0 {start|stop|restart|force-reload|status}"
    exit 3
    ;;
esac

配置文件示例说明:

app_mode:    ;应用名称,默认是production
 
[path]
data:一个grafana用来存储sqlite3、临时文件、回话的地址路径
logs:grafana存储logs的路径
 
[server]
http_addr:监听的ip地址,,默认是0.0.0.0
http_port:监听的端口,默认是3000
protocol:http或者https,,默认是http
domain:这个设置是root_url的一部分,当你通过浏览器访问grafana时的公开的domian名称,默认是localhost
enforce_domain:如果主机的header不匹配domian,则跳转到一个正确的domain上,默认是false
root_url:这是一个web上访问grafana的全路径url,默认是%(protocol)s://%(domain)s:%(http_port)s/
router_logging:是否记录web请求日志,默认是false
cert_file:如果使用https则需要设置
cert_key:如果使用https则需要设置
 
[database]
grafana默认需要使用数据库存储用户和dashboard信息,默认使用sqlite3来存储,你也可以换成其他数据库
type:可以是mysql、postgres、sqlite3,默认是sqlite3
path:只是sqlite3需要,定义sqlite3的存储路径
host:只是mysql、postgres需要,默认是127.0.0.1:3306
name:grafana的数据库名称,默认是grafana
user:连接数据库的用户
password:数据库用户的密码
ssl_mode:只是postgres使用
 
 
[security]
admin_user:grafana默认的admin用户,默认是admin
admin_password:grafana admin的默认密码,默认是admin
login_remember_days:多少天内保持登录状态
secret_key:保持登录状态的签名
disable_gravatar:
 
 
[users]
allow_sign_up:是否允许普通用户登录,如果设置为false,则禁止用户登录,默认是true,则admin可以创建用户,并登录grafana
allow_org_create:如果设置为false,则禁止用户创建新组织,默认是true
auto_assign_org:当设置为true的时候,会自动的把新增用户增加到id为1的组织中,当设置为false的时候,新建用户的时候会新增一个组织
auto_assign_org_role:新建用户附加的规则,默认是Viewer,还可以是Admin、Editor
 
 
[auth.anonymous]
enabled:设置为true,则开启允许匿名访问,默认是false
org_name:为匿名用户设置组织名称
org_role:为匿名用户设置的访问规则,默认是Viewer
 
 
[auth.github]
针对github项目的,很明显,呵呵
enabled = false
allow_sign_up = false
client_id = some_id
client_secret = some_secret
scopes = user:email
auth_url = https://github.com/login/oauth/authorize
token_url = https://github.com/login/oauth/access_token
api_url = https://api.github.com/user
team_ids =
allowed_domains =
allowed_organizations =
 
 
[auth.google]
针对google app的,呵呵
enabled = false
allow_sign_up = false
client_id = some_client_id
client_secret = some_client_secret
scopes = https://www.googleapis.com/auth/userinfo.profile https://www.googleapis.com/auth/userinfo.email
auth_url = https://accounts.google.com/o/oauth2/auth
token_url = https://accounts.google.com/o/oauth2/token
api_url = https://www.googleapis.com/oauth2/v1/userinfo
allowed_domains =
 
 
[auth.basic]
enabled:当设置为true,则http api开启基本认证
 
 
[auth.ldap]
enabled:设置为true则开启LDAP认证,默认是false
config_file:如果开启LDAP,指定LDAP的配置文件/etc/grafana/ldap.toml
 
 
[auth.proxy]
允许你在一个HTTP反向代理上进行认证设置
enabled:默认是false
header_name:默认是X-WEBAUTH-USER
header_property:默认是个名称username
auto_sign_up:默认是true。开启自动注册,如果用户在grafana DB中不存在
 
[analytics]
reporting_enabled:如果设置为true,则会发送匿名使用分析到stats.grafana.org,主要用于跟踪允许实例、版本、dashboard、错误统计。默认是true
google_analytics_ua_id:使用GA进行分析,填写你的GA ID即可
 
 
[dashboards.json]
如果你有一个系统自动产生json格式的dashboard,则可以开启这个特性试试
enabled:默认是false
path:一个全路径用来包含你的json dashboard,默认是/var/lib/grafana/dashboards
 
 
[session]
provider:默认是file,值还可以是memory、mysql、postgres
provider_config:这个值的配置由provider的设置来确定,如果provider是file,则是data/xxxx路径类型,如果provider是mysql,则是user:password@tcp(127.0.0.1:3306)/database_name,如果provider是postgres,则是user=a password=b host=localhost port=5432 dbname=c sslmode=disable
cookie_name:grafana的cookie名称
cookie_secure:如果设置为true,则grafana依赖https,默认是false
session_life_time:session过期时间,默认是86400秒,24小时

#以下是官方文档没有,配置文件中有的
[smtp]
enabled = false
host = localhost:25
user =
password =
cert_file =
key_file =
skip_verify = false
from_address = [email protected]
 
[emails]
welcome_email_on_sign_up = false
templates_pattern = emails/*.html
 
 
[log]
mode:可以是console、file,默认是console、file,也可以设置多个,用逗号隔开
buffer_len:channel的buffer长度,默认是10000
level:可以是"Trace", "Debug", "Info", "Warn", "Error", "Critical",默认是info
 
[log.console]
level:设置级别
 
[log.file]
level:设置级别
log_rotate:是否开启自动轮转
max_lines:单个日志文件的最大行数,默认是1000000
max_lines_shift:单个日志文件的最大大小,默认是28,表示256MB
daily_rotate:每天是否进行日志轮转,默认是true
max_days:日志过期时间,默认是7,7天后删除

注:上述配置文件中的配置都可以通过环境变量来覆盖,使用的语法如下:

GF_<SectionName>_<KeyName>
eg:
export GF_AUTH_GOOGLE_CLIENT_SECRET=newS3cretKey

猜你喜欢

转载自blog.csdn.net/ximenjianxue/article/details/125200854