第1章 日志分析系统需求

1.1 运维痛点

1.运维要不停的查看各种日志。 

2.故障已经发生了才看日志(时间问题) 
3.节点多,日志分散,收集日志成了问题。 
4.运行日志,错误等日志等,没有规范目录,收集困难。

1.2 环境痛点

1.开发人员不能登陆线上服务器查看详细日志。 

2.各个系统都有日志,日志数据分散难以查找。 
3.日志数据量大,查询速度慢,数据不够实时

1.3 解决痛点

1.收集(Logstash 

2.存储(ElasticsearchRedisKafka 
3.搜索+统计+展示(Kibana) 
4.报警,数据分析(Zabbix

第2章 ElkStack介绍

对于日志来说,最常见的需求就是收集、存储、查询、展示,开源社区正好有相对应的开源项目:logstash(收集)、elasticsearch(存储+搜索)、kibana(展示),我们将这三个组合起来的技术称之为ELKStack,所以说ELKStack指的是ElasticsearchLogstashKibana技术栈的结合,一个通用的架构如下图所示:

第3章 ElkStack环境

  1. node1node2elasticsearch集群

    2.node3收集对象,Nginxjavatcpsyslog等日志 
    3.node4logstash日志写入Redis,减少程序对elasticsearch依赖性,同时实现程序解耦以及架构扩展。 
    4.被收集主机需要部署Logstash

主机

ip

服务

系统

linux-node1

192.168.21.133

redis  elasticsearch logstash kibana

CentOS  release 6.5 (Final)

linux-node2

192.168.21.134

redis  elasticsearch logstash kibana  nginx

CentOS  release 6.5 (Final)

第4章 ElkStack部署

4.1 java

[root@ELK-server data]# java -version

java version "1.8.0_112"

Java(TM) SE Runtime Environment (build 1.8.0_112-b15)

Java HotSpot(TM) 64-Bit Server VM (build 25.112-b15, mixedmode)

4.2 下载解压即可

tar xf elasticsearch-5.0.2.tar.gz -C /usr/local/elasticsearch

4.3 具体的配置

cd /usr/local/elasticsearch/config

[root@ELK-server config]# grep '^[a-z]' elasticsearch.yml

cluster.name: es-log

node.name: log-1

path.data: /usr/local/elasticsearch/data

path.logs:/var/log/elasticsearch/elasticsearch.log

bootstrap.memory_lock: true

network.host: 192.168.21.133

http.port: 9200

discovery.zen.ping.unicast.hosts:["192.168.21.133", "192.168.21.134"]

discovery.zen.minimum_master_nodes: 1

4.4 其他系统设置

[root@ELK-server config]# echo " vm.max_map_count = 262144 

" >>/etc/sysctl.conf

[root@ELK-server config]# sysctl -p

[root@ELK-server config]# tail -2 /etc/security/limits.conf

* soft nofile 65536

* hard nofile 65536

[root@ELK-server config]# vim /etc/security/limits.d/90-nproc.conf

*          soft    nproc    2048

4.5 报错总结

1can not run elasticsearch as root

切换到非root用户

 

2main ERROR Could not register mbeansjava.security.AccessControlException: access denied("javax.management.MBeanTrustPermission" "register")

改变elasticsearch文件夹所有者到当前用户

sudo chown -R noroot:noroot elasticsearch

 

3max virtual memory areas vm.max_map_count [65530] is too low,increase to at least [262144]

sudo vi /etc/sysctl.conf 

添加下面配置:

vm.max_map_count=655360

并执行命令:

sudo sysctl -p

 

4max file descriptors [4096] for elasticsearch process is toolow, increase to at least [65536]

sudo vi /etc/security/limits.conf

添加如下内容:

* soft nofile 65536

* hard nofile 131072

* soft nproc 2048

* hard nproc 4096

sudo vi /etc/pam.d/common-session

添加 session required pam_limits.so

sudo vi/etc/pam.d/common-session-noninteractive

添加 session required pam_limits.so

 

ack or non-link-local address, enforcing bootstrap checks

ERROR: bootstrap checks failed

memory locking requested for elasticsearch process but memoryis not locked

[root@ELK-server config]# vimelasticsearch.yml 

#bootstrap.memory_lock: true #注释

 

[root@ELK-server elasticsearch]# ./bin/elasticsearch

Can't start up: not enough memory

[root@ELK-server elasticsearch]# java  -version

java version "1.5.0"

...

解决:

[root@ELK-server elasticsearch]# java  -version

java version "1.8.0_112"

 ...

4.6 运行elasticsearch

[root@ELK-server elasticsearch]# ./bin/elasticsearch

...

 

[root@ELK-server config]# lsof -i:9200

COMMAND  PID USER   FD  TYPE DEVICE SIZE/OFF NODE NAME

java    3342  elk 109u  IPv6 132688      0t0 TCP ELK-server:9200 (LISTEN)

 [root@ELK-serverconfig]# curl -I '192.168.21.133:9200'

HTTP/1.1 200 OK

content-type: application/json; charset=UTF-8

content-length: 318

4.7 测试部署

 [root@ELK-serverconfig]# curl -XGET '192.168.21.133:9200'

{

  "name" :"log-1",

 "cluster_name" : "es-log",

 "cluster_uuid" : "HXIBVdzHTJqi5lexARIgGw",

  "version" :{

    "number": "5.0.2",

   "build_hash" : "f6b4951",

   "build_date" : "2016-11-24T10:07:18.101Z",

   "build_snapshot" : false,

   "lucene_version" : "6.2.1"

  },

  "tagline" :"You Know, for Search"

}                            

4.8 配置详解

cd /usr/local/elasticsearch/config

[root@ELK-server config]# grep '^[a-z]' elasticsearch.yml

cluster.name: es-log    #集群名称

node.name: log-1   #节点,保持唯一性

path.data: /usr/local/elasticsearch/data  #数据存放目录

path.logs: /var/log/elasticsearch/elasticsearch.log   #日志存放目录

bootstrap.memory_lock: true    #不使用swap分区,锁住内存

network.host: 192.168.21.133 #允许访问的IP

http.port: 9200   #访问端口

discovery.zen.ping.unicast.hosts:["192.168.21.133", "192.168.21.134"]  #多播地址

discovery.zen.minimum_master_nodes: 1 #生产环境配置至少两台

 

4.9 Elasticsearch插件

/usr/local/elasticsearch/bin/plugin -ielasticsearch/marcel/latest

集群管理插件:

/usr/local/elasticsearch/bin/plugin installmobz/elasticsearch-head

 

重新创建一个行新的node

只需要改变:

[root@jenkins elasticsearch]# grep node.nameconfig/elasticsearch.yml

node.name: "linux-node2"

 

集群管理插件:

访问head集群插件:http://ES_IP:9200/_plugin/head/ 

spacer.gif

spacer.gif

集群健康值颜色详解:

×××:所有的主分片都是正常运行,副本是有数据的丢失;

绿色:所有的节点正常;

红色:主节点有数据丢失,很严重。

zabbix监控状态:

curl -XGET 'http://192.168.21.134:9200/_cluster/health?pretty'2>/dev/null|awk -F'"' 'NR==3{print $4}'

 

[root@jenkins elasticsearch]# curl -XGET 'http://192.168.21.134:9200/_cluster/health'

{"cluster_name":"zhangyiling","status":"green","timed_out":false,"number_of_nodes":2,"number_of_data_nodes":2,"active_primary_shards":10,"active_shards":20,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":0,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0}

 

[root@jenkins elasticsearch]# curl -XGET 'http://192.168.21.134:9200/_cluster/health?pretty'

{

 "cluster_name" : "zhangyiling",

  "status" :"green",              #监控这个值

  "timed_out": false,

 "number_of_nodes" : 2,           #节点数

 "number_of_data_nodes" : 2,     #数据节点

 "active_primary_shards" : 10,   #主分片

 "active_shards" : 20,            #所有的分片

 "relocating_shards" : 0,        #正在迁移的分片

 "initializing_shards" : 0,      #正在初始化的分片

 "unassigned_shards" : 0,         #没有分配的节点

 "delayed_unassigned_shards" : 0,

 "number_of_pending_tasks" : 0,

 "number_of_in_flight_fetch" : 0

}

 

4.10 elasticsearch服务管理

https://github.com/elastic/elasticsearch-servicewrapper

 

cp  -r service//usr/local/elasticsearch/bin/

 

[root@ELK-server bin]#  /usr/local/elasticsearch/bin/service/elasticsearch

Usage: /usr/local/elasticsearch/bin/service/elasticsearch [console | start | stop | restart | condrestart | status | install | remove |dump ]

 

Commands:

  console      Launch in the current console.

  start        Start in the background as a daemonprocess.

  stop         Stop if running as a daemon or inanother console.

  restart      Stop if running and then start.

  condrestart  Restart only if already running.

  status       Query the current status.

  install      Install to start automatically when systemboots.

  remove       Uninstall.

  dump         Request a Java thread dump if running.

 

[root@ELK-server bin]# /usr/local/elasticsearch/bin/service/elasticsearchinstall

Detected RHEL or Fedora:

Installing the Elasticsearch daemon..

4.11 官方文档

https://www.elastic.co/guide/en/elasticsearch/guide/current/index.html

https://www.elastic.co/guide/en/elasticsearch/guide/current/administration.html

4.12 滚动升级