日志收集分析工具logstash + elasticsearch

Your logs are your data: logstash + elasticsearch

by
Andrey Redko
on February 25th, 2013
| Filed in:
Enterprise Java
Tags: Elasticsearch,
Logging,
Logstash

Topic of today’s post stays a bit aside from day-to-day coding and development but nonetheless covers a very important subject: our application log files. Our apps do generate enormous amount of logs which if done right are extremely handy for problems troubleshooting.
It’s not a big deal if you have a single application up and running, but nowadays apps, particularity webapps, run on hundreds of servers. With such a scale figuring out where is a problem becomes a challenge. Wouldn’t it be nice to have some kind of a view
which aggregates all logs from all our running applications into single dashboard so we could see a whole picture constructed from the pieces? Please welcome:
Logstash, the logs aggregation framework.

Although it’s not the only solution available, I found Logstash to be very easy to use and extremely simple to integrate. To start with, we don’t even need to do anything on the application side,
Logstash can do all the job for us. Let me introduce the sample project: standalone Java application which has some multithreading activity going on. There is a logging to the file configured using great
Logback library (SLF4J could be used as a seamless replacement). The POM file looks pretty simple:

<project
xmlns
="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemalocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd"
>

02     <modelversion>4.0.0</modelversion>
03  
04     <groupid>com.example</groupid>
05     <artifactid>logstash</artifactid>
06     <version>0.0.1-SNAPSHOT</version>
07     <packaging>jar</packaging>
08  
09     <properties>
10         <project.build.sourceencoding>UTF-8</project.build.sourceencoding>
11         <logback.version>1.0.6</logback.version>
12     </properties>
13  
14     <dependencies>
15         <dependency>
16             <groupid>ch.qos.logback</groupid>
17             <artifactid>logback-classic</artifactid>
18             <version>${logback.version}</version>
19         </dependency>
20    
21         <dependency>
22             <groupid>ch.qos.logback</groupid>
23             <artifactid>logback-core</artifactid>
24             <version>${logback.version}</version>
25         </dependency>
26     </dependencies>
27   
28     <build>
29         <plugins>
30             <plugin>
31                 <groupid>org.apache.maven.plugins</groupid>
32                 <artifactid>maven-compiler-plugin</artifactid>
33                 <version>3.0</version>
34                 <configuration>
35                     <source>1.7
36                     <target>1.7</target>
37                 </configuration>
38             </plugin>
39         </plugins>
40     </build>
41 </project>

And there is only one Java class called Starter which uses Executors services to do some work concurrently. For sure, each thread does some logging and from time to time there is an exception thrown.

01 package com.example.logstash;
02  
03 import java.util.ArrayList;
04 import java.util.Collection;
05 import java.util.Random;
06 import java.util.concurrent.Callable;
07 import java.util.concurrent.ExecutionException;
08 import java.util.concurrent.ExecutorService;
09 import java.util.concurrent.Executors;
10 import java.util.concurrent.Future;
11 import java.util.concurrent.TimeUnit;
12 import java.util.concurrent.TimeoutException;
13  
14 import org.slf4j.Logger;
15 import org.slf4j.LoggerFactory;
16  
17 public class
Starter {
18     private
final static
Logger log = LoggerFactory.getLogger( Starter.
class
);
19  
20     public
static void
main( String[] args ) {
21         final
ExecutorService executor = Executors.newCachedThreadPool();
22         final
Collection< Future< Void > > futures = new
ArrayList< Future< Void > >();
23         final
Random random = new
Random();
24  
25         for(
int i = 0; i <
10; ++i ) {
26             futures.add(
27                 executor.submit(
28                     new
Callable< Void >() {
29                         public
Void call() throws
Exception {
30                             int
sleep = Math.abs( random.nextInt( 10000
) % 10000
);
31                             log.warn(
‘Sleeping for ‘ + sleep +
‘ms‘ );
32                             Thread.sleep( sleep );
33                             return
null;
34                         }
35                     }
36                 )
37             );
38         }
39  
40         for(
final Future< Void > future: futures ) {
41             try
{
42                 Void result = future.get(
3, TimeUnit.SECONDS );
43                 log.info(
‘Result ‘ + result );
44             }
catch (InterruptedException | ExecutionException | TimeoutException ex ) {
45                 log.error( ex.getMessage(), ex );
46             }  
47         }
48     }
49 }

The idea is to demonstrate not only simple one-line logging events but famous Java stack traces. As every thread sleeps for random time interval, it causes TimeoutException to be thrown whenever the result of computation is being asked from the underlying
future object and taken more than 3 seconds to return. The last part is Logback configuration (logback.xml):

01 <configuration
scan="true"
scanperiod="5 seconds">
02     <appender
name="FILE"
class="ch.qos.logback.core.FileAppender">
03         <file>/tmp/application.log</file>
04         <append>true</append>
05         <encoder>
06             <pattern>[%level] %d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %logger{36} - %msg%n</pattern>
07      </encoder>
08     </appender>
09  
10     <root
level="INFO">
11         <appender-ref
ref="FILE">
12     </appender-ref></root>
13 </configuration>

And we are good to go! Please note that file path /tmp/application.log corresponds to
c:\tmp\application.log on Windows. Running our application would fill log file with something like that:

01 [WARN] 2013-02-19 19:26:03.175 [pool-2-thread-1] com.example.logstash.Starter - Sleeping
for 2506ms
02 [WARN] 2013-02-19 19:26:03.175 [pool-2-thread-4] com.example.logstash.Starter - Sleeping
for 9147ms
03 [WARN] 2013-02-19 19:26:03.175 [pool-2-thread-9] com.example.logstash.Starter - Sleeping
for 3124ms
04 [WARN] 2013-02-19 19:26:03.175 [pool-2-thread-3] com.example.logstash.Starter - Sleeping
for 6239ms
05 [WARN] 2013-02-19 19:26:03.175 [pool-2-thread-5] com.example.logstash.Starter - Sleeping
for 4534ms
06 [WARN] 2013-02-19 19:26:03.175 [pool-2-thread-10] com.example.logstash.Starter - Sleeping
for 1167ms
07 [WARN] 2013-02-19 19:26:03.175 [pool-2-thread-7] com.example.logstash.Starter - Sleeping
for 7228ms
08 [WARN] 2013-02-19 19:26:03.175 [pool-2-thread-6] com.example.logstash.Starter - Sleeping
for 1587ms
09 [WARN] 2013-02-19 19:26:03.175 [pool-2-thread-8] com.example.logstash.Starter - Sleeping
for 9457ms
10 [WARN] 2013-02-19 19:26:03.176 [pool-2-thread-2] com.example.logstash.Starter - Sleeping
for 1584ms
11 [INFO] 2013-02-19 19:26:05.687 [main] com.example.logstash.Starter - Result null
12 [INFO] 2013-02-19 19:26:05.687 [main] com.example.logstash.Starter - Result null
13 [ERROR] 2013-02-19 19:26:08.695 [main] com.example.logstash.Starter - null
14 java.util.concurrent.TimeoutException: null
15  at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:258) ~[na:1.7.0_13]
16  at java.util.concurrent.FutureTask.get(FutureTask.java:119) ~[na:1.7.0_13]
17  at com.example.logstash.Starter.main(Starter.java:43) ~[classes/:na]
18 [ERROR] 2013-02-19 19:26:11.696 [main] com.example.logstash.Starter - null
19 java.util.concurrent.TimeoutException: null
20  at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:258) ~[na:1.7.0_13]
21  at java.util.concurrent.FutureTask.get(FutureTask.java:119) ~[na:1.7.0_13]
22  at com.example.logstash.Starter.main(Starter.java:43) ~[classes/:na]
23 [INFO] 2013-02-19 19:26:11.696 [main] com.example.logstash.Starter - Result null
24 [INFO] 2013-02-19 19:26:11.696 [main] com.example.logstash.Starter - Result null
25 [INFO] 2013-02-19 19:26:11.697 [main] com.example.logstash.Starter - Result null
26 [INFO] 2013-02-19 19:26:12.639 [main] com.example.logstash.Starter - Result null
27 [INFO] 2013-02-19 19:26:12.639 [main] com.example.logstash.Starter - Result null
28 [INFO] 2013-02-19 19:26:12.639 [main] com.example.logstash.Starter - Result null

Now let’s see what Logstash can do for us. From the download section, we get the single JAR file:
logstash-1.1.9-monolithic.jar. That’s all we need for now. Unfortunately, because
of this bug on Windows we have to expand logstash-1.1.9-monolithic.jar somewhere, f.e. into
logstash-1.1.9-monolithic folder. Logstash has just three concepts: inputs, filters and
outputs. Those are very well explained into the documentation. In our case, the input is application’s log file, c:\tmp\application.log. But what would be the output?
ElasticSearch seems to be an excellent candidate for that: let’s have our logs indexed and searchable any time. Let’s download and run it:

1 elasticsearch.bat -Des.index.store.type=memory -Des.network.host=localhost

Now we are ready to integrate Logstash which should tail our log file and feed it directly to ElasticSearch. Following configuration does exactly that (logstash.conf):

01 input {
02     file {
03         add_field => [
‘host‘, ‘my-dev-host‘
]
04         path =>
‘c:\tmp\application.log‘
05         type =>
‘app‘
06         format =>
‘plain‘
07     }
08 }
09  
10 output {
11     elasticsearch_http {
12         host =>
‘localhost‘
13         port =>
9200
14         type =>
‘app‘
15         flush_size =>
10
16     }
17 }
18  
19 filter {
20     multiline {
21         type =>
‘app‘
22         pattern =>
‘^[^\[]‘
23         what =>
‘previous‘ 
24     }
25 }

It might look not very clear on first glance but let me explain what is what. So the input is
c:\tmp\application.log, which is a plain text file (format => ‘plain’). The
type => ‘app’ serves as simple marker so the different types of inputs could be routed to outputs through filters with the same type. The
add_field => [ ‘host’, ‘my-dev-host’ ] allows to inject additional arbitrary data into the incoming stream, f.e. hostname.

Output is pretty clear: ElasticSearch over HTTP, port 9200 (default settings). Filters need a bit of magic, all because of Java stack traces. The
multiline filter will glue the stack trace to the log statement it belongs to so it will be stored as a single (large) multiline. Let’s run
Logstash:

1 java -cp logstash-1.1.9-monolithic logstash.runner agent -f logstash.conf

Great! Now whenever we run our application, Logstash will watch the log file, filter it property and send out directly to
ElasticSearch. Cool, but how can we do the search or at least see what kind of data do we have? Though
ElasticSearch has awesome REST API, we can use another excellent project, Kibana, web UI front-end for ElasticSearch. Installation is very straightforward and seamless. After a few necessary steps, we have
Kibana up and running:

1 ruby kibana.rb

By default, Kibana provides the web UI available on port 5601, let’s point our browser to it,
http://localhost:5601/ and we should see something like that (please click on image to enlarge):

All our logs statements complemented by hostname are just there. Exceptions (with stack traces) are coupled with the related log statement. Log levels, timestamps, everything is being shown. Fulltext search is available out-of-the box, thanks to
ElasticSearch.

It’s all awesome but our application is very simple. Would this approach work across multi-server / multi-application deployment? I am pretty sure it will work just fine.
Logstash’s integration with Redis, ZeroMQ, RabbitMQ, … allows to capture logs from tens of different sources and consolidate them in one place. Thanks a lot,
Logstash guys!

Reference: Your logs are your data: logstash + elasticsearch from our
JCG partner Andrey Redko at the
Andriy Redko {devmind}blog.

时间: 12-03

日志收集分析工具logstash + elasticsearch的相关文章

日志收集+分析+报警 logstash

YUM 安装logstash 下载安装公钥: rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch 创建logstash.repo: vim /etc/yum.repos.d/logstatsh.repo [logstash-5.x] name=Elastic repository for 5.x packages baseurl=https://artifacts.elastic.co/packages/5.x/yum

logstash日志收集分析系统elasticsearch&kibana

logstash日志收集分析系统Logstash provides a powerful pipeline for storing, querying, and analyzing your logs. When using Elasticsearch as a backend data store and Kibana as a frontend reporting tool, Logstash acts as the workhorse. It includes an arsenal of

elk 日志分析系统Logstash+ElasticSearch+Kibana4

elk 日志分析系统 Logstash+ElasticSearch+Kibana4 logstash 管理日志和事件的工具 ElasticSearch 搜索 Kibana4 功能强大的数据显示客户端 redis 缓存 安装包 logstash-1.4.2-1_2c0f5a1.noarch.rpm elasticsearch-1.4.4.noarch.rpm logstash-contrib-1.4.2-1_efd53ef.noarch.rpm kibana-4.0.1-linux-x64.tar

elkb+redis建立日志收集分析系统

一.ELKB说明 elastic提供了一套非常高级的工具ELKB来满足以上这几个需求.ELKB指的是用于日志分析或者说数据分析的四个软件,各自拥有独立的功能又可以组合在一起.先来简单介绍一下这四个软件. Elastic Search: 从名称可以看出,Elastic Search 是用来进行搜索的,提供数据以及相应的配置信息(什么字段是什么数据类型,哪些字段可以检索等),然后你就可以自由地使用API搜索你的数据. Logstash:.日志文件基本上都是每行一条,每一条里面有各种信息,这个软件的功

结合Docker快速搭建ELK日志收集分析平台

结合Docker快速搭建ELK日志收集分析平台 2017-03-27 09:39 阅读 172 评论 0 作者:马哥Linux运维-Eason ELK Stack ELK (Elasticsearch + Logstash + Kibana),是一个开源的日志收集平台,用于收集各种客户端日志文件在同一个平台上面做数据分析. Introduction Elasticsearch, 基于json分析搜索引擎Logstash, 动态数据收集管道Kibana, 可视化视图将elasticsearh所收集

ELK:日志收集分析平台

目录 简介 环境说明 Filebeat 部署 web上采集配置文件 app上采集配置文件 Redis 部署 配置文件 Logstash 部署 Elasticsearch 集群部署 配置文件 Kibana 部署 参考文档 简介 ELK是一个日志收集分析的平台,它能收集海量的日志,并将其根据字段切割.一来方便供开发查看日志,定位问题:二来可以根据日志进行统计分析,通过其强大的呈现能力,挖掘数据的潜在价值,分析重要指标的趋势和分布等,能够规避灾难和指导决策等.ELK是Elasticsearch公司出品

日志收集分析系统架构

日志收集分析系统架构   一.部署架构 日志收集系统一般包括如图所示三层.Web服务器层,日志收集层,日志存储层.Web服务器层是日志的来源,一般部署web应用供用户访问,产生日志,该节点上一般需要部署日志收集程序的agent.日志收集层手机web服务器产生的日志传输给日志存储层,存储层一般使用分布式文件系统HDFS,日志可以存储在hdfs上或者hbase上. 以scribe作为日志收集系统架构,scribe分为scribe agent和scribe server 以kafka作为日志收集系统架

syslog-ng日志收集分析服务搭建及配置

syslog-ng日志收集分析服务搭建及配置:1.网上下载eventlog_0.2.12.tar.gz.libol-0.3.18.tar.gz.syslog-ng_3.3.5.tar.gz三个软件: 2.解压及安装服务端: [[email protected] tools]# tar xf eventlog_0.2.12.tar.gz [[email protected] tools]# cd eventlog-0.2.12/ [[email protected] eventlog-0.2.12

基于Elasticsearch+Fluentd+Kibana的日志收集分析系统

我们平时分析log直接在日志文件中 grep.awk 就可以获得自己想要的信息,此方法效率低下,生产中需要集中化的日志管理,所有服务器上的日志收集汇总 Elasticsearch一个节点(node)就是一个Elasticsearch实例,一个集群(cluster)由一个或多个节点组成,它们具有相同的cluster.name,它们协同工作,分享数据和负载.当加入新的节点或者删除一个节点时,集群就会感知到并平衡数据.集群中一个节点会被选举为主节点(master),它将临时管理集群级别的一些变更,例如