Merge remote-tracking branch 'tiffanyfj/metrics'

This commit is contained in:
Jerome Petazzoni
2016-10-01 08:06:43 -07:00
12 changed files with 1447 additions and 0 deletions

View File

@@ -201,6 +201,13 @@ grep '^# ' index.html | grep -v '<br' | tr '#' '-'
---
## Chapter 5: metrics
- Setting up Snap to collect and publish metric data
- Using InfluxDB and Grafana for storage and display
---
# Pre-requirements
- Computer with network connection and SSH client
@@ -3462,6 +3469,520 @@ docker node update <node-name> --availability <active|pause|drain>
class: title
# Metrics
---
## Which metrics will we collect?
- node metrics (e.g. cpu, ram, disk space)
- container metrics (e.g. memory used, processes, network traffic going in and out)
---
## Tools
We will use three open source Go projects for metric collection, publishing, storing, and visualization:
- Intel Snap: telemetry framework to collect, process, and publish metric data
- InfluxDB: database
- Grafana: graph visuals
---
## Snap
- [www.github.com/intelsdi-x/snap](www.github.com/intelsdi-x/snap)
- Can collect, process, and publish metric data
- Doesnt store metrics
- Works as a daemon
- Offloads collecting, processing, and publishing to plugins
- Have to configure it to use the plugins and collect the metrics you want
- Docs: https://github.com/intelsdi-x/snap/blob/master/docs/
---
## InfluxDB
- Since Snap doesn't have have a database, we need one
- It's specifically for time series
---
## Grafana
- Since neither Snap or InfluxDB can show graphs, we're using Grafana
---
## Getting and setting up Snap
- This will get Snap on all nodes
.exercise[
```bash
docker service create --restart-condition=none --mode global \
--mount type=bind,source=/usr/local/bin,target=/usr/local/bin \
--mount type=bind,source=/opt,target=/opt centos sh -c '
SNAPVER=v0.16.1-beta
RELEASEURL=https://github.com/intelsdi-x/snap/releases/download/$SNAPVER
curl -sSL $RELEASEURL/snap-$SNAPVER-linux-amd64.tar.gz | tar -C /opt -zxf-
curl -sSL $RELEASEURL/snap-plugins-$SNAPVER-linux-amd64.tar.gz | tar -C /opt -zxf-
ln -s snap-$SNAPVER /opt/snap
for BIN in snapd snapctl; do ln -s /opt/snap/bin/$BIN /usr/local/bin/$BIN; done'
```
]
---
## `snapd`- Snap daemon
- Application made up of a REST API, control module, and scheduler module
.exercise[
- Start `snapd` with plugin trust disabled and log level set to debug
```bash
snapd -t 0 -l 1
```
]
- More resources:
https://github.com/intelsdi-x/snap/blob/master/docs/SNAPD.md
https://github.com/intelsdi-x/snap/blob/master/docs/SNAPD_CONFIGURATION.md
---
## `snapctl` - loading plugins
- First, open a new window
.exercise[
- Load the psutil collector plugin
```bash
snapctl plugin load /opt/snap/plugin/snap-plugin-collector-psutil
```
- Load the file publisher plugin
```bash
snapctl plugin load /opt/snap/plugin/snap-plugin-publisher-mock-file
```
]
---
## `snapctl` - see what you loaded and can collect
.exercise[
- See your loaded plugins
```bash
snapctl plugin list
```
- See the metrics you can collect
```bash
snapctl metric list
```
]
---
## `snapctl` - tasks
- To start collecting/processing/publishing metric data, you need to create a task
- For this workshop we will be using just the task manifest
- Tasks can be written in JSON or YAML and the metrics you want to collect are listed in the task file
- Some plugins, such as the Docker collector, allow for wildcards which is denoted by a star (see snap/docker-influxdb.json)
- More resources:
https://github.com/intelsdi-x/snap/blob/master/docs/TASKS.md
---
## `snapctl` - task manifest
```json
---
version: 1
schedule:
type: "simple" # collect on a set interval
interval: "1s" # of every 1s
max-failures: 10
workflow:
collect: # first collect
metrics: # metrics to collect
/intel/psutil/load/load1: {}
config: # there is no configuration
publish: # after collecting, publish
-
plugin_name: "file" # use the file publisher
config:
file: "/tmp/snap-psutil-file.log" # write to this file
```
---
## `snapctl` - starting a task
.exercise[
- Using the task manifest in the snap directory, start a task to collect metrics from psutil and publish them to a file.
```bash
cd ~/orchestration-workshop/snap
snapctl task create -t psutil-file.yml
```
]
The output should look like the following:
```
Using task manifest to create task
Task created
ID: 240435e8-a250-4782-80d0-6fff541facba
Name: Task-240435e8-a250-4782-80d0-6fff541facba
State: Running
```
---
## `snapctl` - see the tasks
.exercise[
- Using the task in the snap directory start a task to collect metrics from psutil and publish them to a file.
```bash
snapctl task list
```
]
The output should look like the following:
```
ID NAME STATE HIT MISS FAIL CREATED LAST FAILURE
24043...acba Task-24043...acba Running 4 0 0 2:34PM 8-13-2016
```
---
## Check file
.exercise[
```bash
tail -f /tmp/snap-psutil-file.log
```
]
To exit, hit `^C`
---
## `snapctl` - watch metrics
- Watch will stream the metrics you are collecting to STDOUT
.exercise[
```bash
snapctl task watch <ID>
```
]
To exit, hit `^C`
---
## `snapctl` - stop the task
.exercise[
- Using the ID name, stop the task
```bash
snapctl task stop <ID>
```
]
---
## Stopping snap
- Just hit `^C` in the terminal window with `snapd` running and snap will stop and all plugins will be unloaded and tasks stopped
---
## Snap Tribe Mode
- Tribe is Snap's clustering mechanism
- Nodes can join agreements and in these, they share the same loaded plugins and running tasks
- We will use it to load the Docker collector and InfluxDB publisher on all nodes and run our task
- If we didn't use Tribe, we would have to go to every node and manually load the plugins and start the task
- More resources:
https://github.com/intelsdi-x/snap/blob/master/docs/TRIBE.md
---
## Start `snapd` with Tribe Mode enabled
- On your first node, start snap in tribe mode
.exercise[
```bash
snapd --tribe -t 0 -l 1
```
]
---
## Create first Tribe agreement
.exercise[
```bash
snapctl agreement create docker-influxdb
```
]
The output should look like the following:
```
Name Number of Members plugins tasks
docker-influxdb 0 0 0
```
---
## Join running snapd to agreement
.exercise[
```bash
snapctl agreement join docker-influxdb $HOSTNAME
```
]
The output should look like the following:
```
Name Number of Members plugins tasks
docker-influxdb 1 0 0
```
---
## Start a container on every node
- The Docker plugin requires at least one container to be started, so to ensure that happens, on node 1 create a global service (you need all nodes to be in a swarm)
- If there a specific container you'd rather use, feel free to do so
.exercise[
```bash
docker service create --mode global alpine ping 8.8.8.8
```
]
---
## Start InfluxDB and Grafana containers
- Start up containers with InfluxDB and Grafana using docker-compose on node 1
.exercise[
```bash
cd influxdb-grafana
docker-compose up
```
]
---
## Set up InfluxDB
- Go to `http://<NODE1_IP>:8083`
- Create a new database called snap with the query `CREATE DATABASE "snap"`
- Switch to the snap database on the top right
---
## Load Docker collector and InfluxDB publisher
.exercise[
- Load Docker collector
```bash
snapctl plugin load /opt/snap/plugin/snap-plugin-collector-docker
```
- Load InfluxDB publisher
```bash
snapctl plugin load /opt/snap/plugin/snap-plugin-publisher-influxdb
```
]
---
## Start task
.exercise[
- Using a task manifest file, create a task using the Docker collector to gather container metrics and send them to the InfluxDB publisher plugin
- Replace HOST_IP in docker-influxdb.json with the NODE1_IP address
```bash
snapctl task create -t docker-influxdb.json
```
]
---
# Restarting a task
- This is only necessary if the task becomes disabled
.exercise[
- Enable the task
```bash
snapctl task enable <ID>
```
- Start the task
```bash
snapctl task start <ID>
```
]
---
# See metrics in InfluxDB
- To see what metrics you're able to collect from (these should match `snapctl metric list`) use the `SHOW MEASUREMENTS` query
- To see more information from one of the metrics use something like the following with one of the metrics between the quotes:
```
SELECT * FROM "intel/linux/docker/025fd8c5dc0c/cpu_stats/cpu_usage/total_usage"
```
---
## Set up Grafana
- Go to `http://<NODE1_IP>:3000`
- If it asks for a username/password they're both `admin`
- Click the Grafana logo -> Data Sources -> Add data source
---
## Add Grafana data source
- Change the Type to InfluxDB
- Name : influxdb
- Check the default box
- Url: `http://<NODE1_IP>:8086`
- Access: direct
- Database: snap
---
## Create graphs in Grafana
- Click the Grafana logo -> Dashboards -> new
- Click on a green bar on the left -> add panel -> graph
- Click anywhere on the new line that says SELECT, then click select measurement and pick one of the metrics to display
- You can add the source (this is the hostname of each node) and filter by that if you want
- Click on "Last 6 hours" in the top right and change it to last 5 minutes and the update rate to 5s
---
## Add more nodes to the Tribe
- This will load the plugins from node 1 on the other nodes and start the same task
.exercise[
- Start snapd in tribe mode on all nodes
```bash
for N in 2 3 4 5; do ssh -f node$N snapd --tribe -t 0 -l 1 --log-path /tmp \
--tribe-node-name node$N --tribe-seed node1:6000; done
```
- Join the agreement
```bash
for N in 2 3 4 5; do ssh node$N snapctl agreement join docker-influxdb node$N; \
done
```
]
---
## InfluxDB and Grafana updates
- Now if you look at InfluxDB you should see metrics from the other nodes if you look at SHOW MEASUREMENTS again and can add these to your Grafana dashboard
---
class: title
# Thanks! <br/> Questions?
<!--

5
snap/README.md Normal file
View File

@@ -0,0 +1,5 @@
# Snap
## InfluxDB - Grafana
All files for running Snap with InfluxDB and Grafana are from https://github.com/intelsdi-x/snap/tree/master/examples/influxdb-grafana.
Some were modified for usage with this workshop.

View File

@@ -0,0 +1,16 @@
grafana:
build: ./grafana/
ports:
- "80:80"
- "3000:3000"
links:
- influxdb
influxdb:
build: ./influxdb/0.9/
ports:
- "8086:8086"
- "8083:8083"
expose:
- "8090"
- "8099"

View File

@@ -0,0 +1,29 @@
{
"version": 1,
"schedule": {
"type": "simple",
"interval": "1s"
},
"max-failures": 10,
"workflow": {
"collect": {
"metrics": {
"/intel/linux/docker/*/cpu_stats/cpu_usage/total_usage": {},
"/intel/linux/docker/*/memory_stats/usage/usage": {}
},
"process": null,
"publish": [
{
"plugin_name": "influx",
"config": {
"host": "HOST_IP",
"port": 8086,
"database": "snap",
"user": "admin",
"password": "admin"
}
}
]
}
}
}

View File

@@ -0,0 +1,20 @@
FROM debian:wheezy
RUN apt-get update && apt-get -y install libfontconfig wget adduser openssl ca-certificates
RUN wget http://grafanarel.s3.amazonaws.com/builds/grafana_latest_amd64.deb
RUN dpkg -i grafana_latest_amd64.deb
RUN apt-get install -y curl netcat
EXPOSE 3000
VOLUME ["/var/lib/grafana"]
VOLUME ["/var/log/grafana"]
VOLUME ["/etc/grafana"]
WORKDIR /usr/share/grafana
ENTRYPOINT ["/usr/sbin/grafana-server", "--config", "/etc/grafana/grafana.ini"]

View File

@@ -0,0 +1,65 @@
{
"id": 2,
"title": "snap dashboard",
"tags": [],
"style": "dark",
"timezone": "browser",
"editable": true,
"hideControls": false,
"sharedCrosshair": false,
"rows": [
{
"collapse": false,
"editable": true,
"height": "250px",
"panels": [],
"title": "Row"
},
{
"title": "New row",
"height": "250px",
"editable": true,
"collapse": false,
"panels": []
}
],
"time": {
"from": "now-6h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"templating": {
"list": []
},
"annotations": {
"list": []
},
"schemaVersion": 13,
"version": 3,
"links": [],
"gnetId": null
}

View File

@@ -0,0 +1,34 @@
FROM tutum/curl:trusty
MAINTAINER Feng Honglin <hfeng@tutum.co>
# Install InfluxDB
ENV INFLUXDB_VERSION 0.9.4.1
RUN curl -s -o /tmp/influxdb_latest_amd64.deb https://s3.amazonaws.com/influxdb/influxdb_${INFLUXDB_VERSION}_amd64.deb && \
dpkg -i /tmp/influxdb_latest_amd64.deb && \
rm /tmp/influxdb_latest_amd64.deb && \
rm -rf /var/lib/apt/lists/*
ADD types.db /usr/share/collectd/types.db
ADD config.toml /config/config.toml
ADD run.sh /run.sh
RUN chmod +x /*.sh
ENV PRE_CREATE_DB **None**
ENV SSL_SUPPORT **False**
ENV SSL_CERT **None**
# Admin server WebUI
EXPOSE 8083
# HTTP API
EXPOSE 8086
# Raft port (for clustering, don't expose publicly!)
#EXPOSE 8090
# Protobuf port (for clustering, don't expose publicly!)
#EXPOSE 8099
VOLUME ["/data"]
CMD ["/run.sh"]

View File

@@ -0,0 +1,235 @@
### Welcome to the InfluxDB configuration file.
# Once every 24 hours InfluxDB will report anonymous data to m.influxdb.com
# The data includes raft id (random 8 bytes), os, arch, version, and metadata.
# We don't track ip addresses of servers reporting. This is only used
# to track the number of instances running and the versions, which
# is very helpful for us.
# Change this option to true to disable reporting.
reporting-disabled = false
###
### [meta]
###
### Controls the parameters for the Raft consensus group that stores metadata
### about the InfluxDB cluster.
###
[meta]
dir = "/data/meta"
hostname = "localhost"
bind-address = ":8088"
retention-autocreate = true
election-timeout = "1s"
heartbeat-timeout = "1s"
leader-lease-timeout = "500ms"
commit-timeout = "50ms"
###
### [data]
###
### Controls where the actual shard data for InfluxDB lives and how it is
### flushed from the WAL. "dir" may need to be changed to a suitable place
### for your system, but the WAL settings are an advanced configuration. The
### defaults should work for most systems.
###
[data]
dir = "/data/db"
# The following WAL settings are for the b1 storage engine used in 0.9.2. They won't
# apply to any new shards created after upgrading to a version > 0.9.3.
max-wal-size = 104857600 # Maximum size the WAL can reach before a flush. Defaults to 100MB.
wal-flush-interval = "10m0s" # Maximum time data can sit in WAL before a flush.
wal-partition-flush-delay = "2s" # The delay time between each WAL partition being flushed.
# These are the WAL settings for the storage engine >= 0.9.3
wal-dir = "/data/wal"
wal-enable-logging = true
# When a series in the WAL in-memory cache reaches this size in bytes it is marked as ready to
# flush to the index
# wal-ready-series-size = 25600
# Flush and compact a partition once this ratio of series are over the ready size
# wal-compaction-threshold = 0.6
# Force a flush and compaction if any series in a partition gets above this size in bytes
# wal-max-series-size = 2097152
# Force a flush of all series and full compaction if there have been no writes in this
# amount of time. This is useful for ensuring that shards that are cold for writes don't
# keep a bunch of data cached in memory and in the WAL.
# wal-flush-cold-interval = "10m"
# Force a partition to flush its largest series if it reaches this approximate size in
# bytes. Remember there are 5 partitions so you'll need at least 5x this amount of memory.
# The more memory you have, the bigger this can be.
# wal-partition-size-threshold = 20971520
###
### [cluster]
###
### Controls non-Raft cluster behavior, which generally includes how data is
### shared across shards.
###
[cluster]
write-timeout = "5s" # The time within which a write operation must complete on the cluster.
shard-writer-timeout = "5s" # The time within which a shard must respond to write.
###
### [retention]
###
### Controls the enforcement of retention policies for evicting old data.
###
[retention]
enabled = true
check-interval = "10m0s"
###
### [admin]
###
### Controls the availability of the built-in, web-based admin interface. If HTTPS is
### enabled for the admin interface, HTTPS must also be enabled on the [http] service.
###
[admin]
enabled = true
bind-address = ":8083"
https-enabled = false
https-certificate = "/etc/ssl/influxdb.pem"
###
### [http]
###
### Controls how the HTTP endpoints are configured. These are the primary
### mechanism for getting data into and out of InfluxDB.
###
[http]
enabled = true
bind-address = ":8086"
auth-enabled = false
log-enabled = true
write-tracing = false
pprof-enabled = false
https-enabled = false
https-certificate = "/etc/ssl/influxdb.pem"
###
### [[graphite]]
###
### Controls one or many listeners for Graphite data.
###
[[graphite]]
enabled = false
bind-address = ":2003"
protocol = "tcp"
consistency-level = "one"
separator = "."
database = "graphitedb"
# These next lines control how batching works. You should have this enabled
# otherwise you could get dropped metrics or poor performance. Batching
# will buffer points in memory if you have many coming in.
# batch-size = 1000 # will flush if this many points get buffered
# batch-timeout = "1s" # will flush at least this often even if we haven't hit buffer limit
batch-size = 1000
batch-timeout = "1s"
templates = [
# filter + template
#"*.app env.service.resource.measurement",
# filter + template + extra tag
#"stats.* .host.measurement* region=us-west,agent=sensu",
# default template. Ignore the first graphite component "servers"
"instance.profile.measurement*"
]
###
### [collectd]
###
### Controls the listener for collectd data.
###
[collectd]
enabled = false
# bind-address = ":25826"
# database = "collectd"
# retention-policy = ""
# typesdb = "/usr/share/collectd/types.db"
# These next lines control how batching works. You should have this enabled
# otherwise you could get dropped metrics or poor performance. Batching
# will buffer points in memory if you have many coming in.
# batch-size = 1000 # will flush if this many points get buffered
# batch-timeout = "1s" # will flush at least this often even if we haven't hit buffer limit
###
### [opentsdb]
###
### Controls the listener for OpenTSDB data.
###
[opentsdb]
enabled = false
# bind-address = ":4242"
# database = "opentsdb"
# retention-policy = ""
# consistency-level = "one"
###
### [[udp]]
###
### Controls the listeners for InfluxDB line protocol data via UDP.
###
[[udp]]
enabled = false
bind-address = ":4444"
database = "udpdb"
# These next lines control how batching works. You should have this enabled
# otherwise you could get dropped metrics or poor performance. Batching
# will buffer points in memory if you have many coming in.
# batch-size = 1000 # will flush if this many points get buffered
# batch-timeout = "1s" # will flush at least this often even if we haven't hit buffer limit
###
### [monitoring]
###
### Send anonymous usage statistics to m.influxdb.com?
###
[monitoring]
enabled = false
write-interval = "24h"
###
### [continuous_queries]
###
### Controls how continuous queries are run within InfluxDB.
###
[continuous_queries]
log-enabled = true
enabled = true
recompute-previous-n = 2
recompute-no-older-than = "10m0s"
compute-runs-per-interval = 10
compute-no-more-than = "2m0s"
###
### [hinted-handoff]
###
### Controls the hinted handoff feature, which allows nodes to temporarily
### store queued data when one node of a cluster is down for a short period
### of time.
###
[hinted-handoff]
enabled = true
dir = "/data/hh"
max-size = 1073741824
max-age = "168h"
retry-rate-limit = 0
retry-interval = "1s"

View File

@@ -0,0 +1,156 @@
#!/bin/bash
#http://www.apache.org/licenses/LICENSE-2.0.txt
#
#
#Copyright 2015 Intel Corporation
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
set -m
CONFIG_FILE="/config/config.toml"
INFLUX_HOST="localhost"
INFLUX_API_PORT="8086"
API_URL="http://${INFLUX_HOST}:${INFLUX_API_PORT}"
# Dynamically change the value of 'max-open-shards' to what 'ulimit -n' returns
sed -i "s/^max-open-shards.*/max-open-shards = $(ulimit -n)/" ${CONFIG_FILE}
# Configure InfluxDB Cluster
if [ -n "${FORCE_HOSTNAME}" ]; then
if [ "${FORCE_HOSTNAME}" == "auto" ]; then
#set hostname with IPv4 eth0
HOSTIPNAME=$(ip a show dev eth0 | grep inet | grep eth0 | sed -e 's/^.*inet.//g' -e 's/\/.*$//g')
/usr/bin/perl -p -i -e "s/^# hostname.*$/hostname = \"${HOSTIPNAME}\"/g" ${CONFIG_FILE}
else
/usr/bin/perl -p -i -e "s/^# hostname.*$/hostname = \"${FORCE_HOSTNAME}\"/g" ${CONFIG_FILE}
fi
fi
# NOTE: 'seed-servers.' is nowhere to be found in config.toml, this cannot work anymore! NEED FOR REVIEW!
# if [ -n "${SEEDS}" ]; then
# SEEDS=$(eval SEEDS=$SEEDS ; echo $SEEDS | grep '^\".*\"$' || echo "\""$SEEDS"\"" | sed -e 's/, */", "/g')
# /usr/bin/perl -p -i -e "s/^# seed-servers.*$/seed-servers = [${SEEDS}]/g" ${CONFIG_FILE}
# fi
if [ -n "${REPLI_FACTOR}" ]; then
/usr/bin/perl -p -i -e "s/replication-factor = 1/replication-factor = ${REPLI_FACTOR}/g" ${CONFIG_FILE}
fi
if [ "${PRE_CREATE_DB}" == "**None**" ]; then
unset PRE_CREATE_DB
fi
# NOTE: It seems this is not used anymore...
#
# if [ "${SSL_CERT}" == "**None**" ]; then
# unset SSL_CERT
# fi
#
# if [ "${SSL_SUPPORT}" == "**False**" ]; then
# unset SSL_SUPPORT
# fi
# Add Graphite support
if [ -n "${GRAPHITE_DB}" ]; then
echo "GRAPHITE_DB: ${GRAPHITE_DB}"
sed -i -r -e "/^\[\[graphite\]\]/, /^$/ { s/false/true/; s/\"graphitedb\"/\"${GRAPHITE_DB}\"/g; }" ${CONFIG_FILE}
fi
if [ -n "${GRAPHITE_BINDING}" ]; then
echo "GRAPHITE_BINDING: ${GRAPHITE_BINDING}"
sed -i -r -e "/^\[\[graphite\]\]/, /^$/ { s/\:2003/${GRAPHITE_BINDING}/; }" ${CONFIG_FILE}
fi
if [ -n "${GRAPHITE_PROTOCOL}" ]; then
echo "GRAPHITE_PROTOCOL: ${GRAPHITE_PROTOCOL}"
sed -i -r -e "/^\[\[graphite\]\]/, /^$/ { s/tcp/${GRAPHITE_PROTOCOL}/; }" ${CONFIG_FILE}
fi
if [ -n "${GRAPHITE_TEMPLATE}" ]; then
echo "GRAPHITE_TEMPLATE: ${GRAPHITE_TEMPLATE}"
sed -i -r -e "/^\[\[graphite\]\]/, /^$/ { s/instance\.profile\.measurement\*/${GRAPHITE_TEMPLATE}/; }" ${CONFIG_FILE}
fi
# Add Collectd support
if [ -n "${COLLECTD_DB}" ]; then
echo "COLLECTD_DB: ${COLLECTD_DB}"
sed -i -r -e "/^\[collectd\]/, /^$/ { s/false/true/; s/( *)# *(.*)\"collectd\"/\1\2\"${COLLECTD_DB}\"/g;}" ${CONFIG_FILE}
fi
if [ -n "${COLLECTD_BINDING}" ]; then
echo "COLLECTD_BINDING: ${COLLECTD_BINDING}"
sed -i -r -e "/^\[collectd\]/, /^$/ { s/( *)# *(.*)\":25826\"/\1\2\"${COLLECTD_BINDING}\"/g;}" ${CONFIG_FILE}
fi
if [ -n "${COLLECTD_RETENTION_POLICY}" ]; then
echo "COLLECTD_RETENTION_POLICY: ${COLLECTD_RETENTION_POLICY}"
sed -i -r -e "/^\[collectd\]/, /^$/ { s/( *)# *(retention-policy.*)\"\"/\1\2\"${COLLECTD_RETENTION_POLICY}\"/g;}" ${CONFIG_FILE}
fi
# Add UDP support
if [ -n "${UDP_DB}" ]; then
sed -i -r -e "/^\[\[udp\]\]/, /^$/ { s/false/true/; s/#//g; s/\"udpdb\"/\"${UDP_DB}\"/g; }" ${CONFIG_FILE}
fi
if [ -n "${UDP_PORT}" ]; then
sed -i -r -e "/^\[\[udp\]\]/, /^$/ { s/4444/${UDP_PORT}/; }" ${CONFIG_FILE}
fi
echo "influxdb configuration: "
cat ${CONFIG_FILE}
echo "=> Starting InfluxDB ..."
exec /opt/influxdb/influxd -config=${CONFIG_FILE} &
# Pre create database on the initiation of the container
if [ -n "${PRE_CREATE_DB}" ]; then
echo "=> About to create the following database: ${PRE_CREATE_DB}"
if [ -f "/data/.pre_db_created" ]; then
echo "=> Database had been created before, skipping ..."
else
arr=$(echo ${PRE_CREATE_DB} | tr ";" "\n")
#wait for the startup of influxdb
RET=1
while [[ RET -ne 0 ]]; do
echo "=> Waiting for confirmation of InfluxDB service startup ..."
sleep 3
curl -k ${API_URL}/ping 2> /dev/null
RET=$?
done
echo ""
PASS=${INFLUXDB_INIT_PWD:-root}
if [ -n "${ADMIN_USER}" ]; then
echo "=> Creating admin user"
/opt/influxdb/influx -host=${INFLUX_HOST} -port=${INFLUX_API_PORT} -execute="CREATE USER ${ADMIN_USER} WITH PASSWORD '${PASS}' WITH ALL PRIVILEGES"
for x in $arr
do
echo "=> Creating database: ${x}"
/opt/influxdb/influx -host=${INFLUX_HOST} -port=${INFLUX_API_PORT} -username=${ADMIN_USER} -password="${PASS}" -execute="create database ${x}"
/opt/influxdb/influx -host=${INFLUX_HOST} -port=${INFLUX_API_PORT} -username=${ADMIN_USER} -password="${PASS}" -execute="grant all PRIVILEGES on ${x} to ${ADMIN_USER}"
done
echo ""
else
for x in $arr
do
echo "=> Creating database: ${x}"
/opt/influxdb/influx -host=${INFLUX_HOST} -port=${INFLUX_API_PORT} -execute="create database \"${x}\""
done
fi
touch "/data/.pre_db_created"
fi
else
echo "=> No database need to be pre-created"
fi
fg

View File

@@ -0,0 +1,241 @@
absolute value:ABSOLUTE:0:U
apache_bytes value:DERIVE:0:U
apache_connections value:GAUGE:0:65535
apache_idle_workers value:GAUGE:0:65535
apache_requests value:DERIVE:0:U
apache_scoreboard value:GAUGE:0:65535
ath_nodes value:GAUGE:0:65535
ath_stat value:DERIVE:0:U
backends value:GAUGE:0:65535
bitrate value:GAUGE:0:4294967295
blocked_clients value:GAUGE:0:U
bytes value:GAUGE:0:U
cache_eviction value:DERIVE:0:U
cache_operation value:DERIVE:0:U
cache_ratio value:GAUGE:0:100
cache_result value:DERIVE:0:U
cache_size value:GAUGE:0:U
ceph_bytes value:GAUGE:U:U
ceph_latency value:GAUGE:U:U
ceph_rate value:DERIVE:0:U
changes_since_last_save value:GAUGE:0:U
charge value:GAUGE:0:U
compression_ratio value:GAUGE:0:2
compression uncompressed:DERIVE:0:U, compressed:DERIVE:0:U
connections value:DERIVE:0:U
conntrack value:GAUGE:0:4294967295
contextswitch value:DERIVE:0:U
count value:GAUGE:0:U
counter value:COUNTER:U:U
cpufreq value:GAUGE:0:U
cpu value:DERIVE:0:U
current_connections value:GAUGE:0:U
current_sessions value:GAUGE:0:U
current value:GAUGE:U:U
delay value:GAUGE:-1000000:1000000
derive value:DERIVE:0:U
df_complex value:GAUGE:0:U
df_inodes value:GAUGE:0:U
df used:GAUGE:0:1125899906842623, free:GAUGE:0:1125899906842623
disk_latency read:GAUGE:0:U, write:GAUGE:0:U
disk_merged read:DERIVE:0:U, write:DERIVE:0:U
disk_octets read:DERIVE:0:U, write:DERIVE:0:U
disk_ops_complex value:DERIVE:0:U
disk_ops read:DERIVE:0:U, write:DERIVE:0:U
disk_time read:DERIVE:0:U, write:DERIVE:0:U
disk_io_time io_time:DERIVE:0:U, weighted_io_time:DERIVE:0:U
dns_answer value:DERIVE:0:U
dns_notify value:DERIVE:0:U
dns_octets queries:DERIVE:0:U, responses:DERIVE:0:U
dns_opcode value:DERIVE:0:U
dns_qtype_cached value:GAUGE:0:4294967295
dns_qtype value:DERIVE:0:U
dns_query value:DERIVE:0:U
dns_question value:DERIVE:0:U
dns_rcode value:DERIVE:0:U
dns_reject value:DERIVE:0:U
dns_request value:DERIVE:0:U
dns_resolver value:DERIVE:0:U
dns_response value:DERIVE:0:U
dns_transfer value:DERIVE:0:U
dns_update value:DERIVE:0:U
dns_zops value:DERIVE:0:U
drbd_resource value:DERIVE:0:U
duration seconds:GAUGE:0:U
email_check value:GAUGE:0:U
email_count value:GAUGE:0:U
email_size value:GAUGE:0:U
entropy value:GAUGE:0:4294967295
expired_keys value:GAUGE:0:U
fanspeed value:GAUGE:0:U
file_handles value:GAUGE:0:U
file_size value:GAUGE:0:U
files value:GAUGE:0:U
flow value:GAUGE:0:U
fork_rate value:DERIVE:0:U
frequency_offset value:GAUGE:-1000000:1000000
frequency value:GAUGE:0:U
fscache_stat value:DERIVE:0:U
gauge value:GAUGE:U:U
hash_collisions value:DERIVE:0:U
http_request_methods value:DERIVE:0:U
http_requests value:DERIVE:0:U
http_response_codes value:DERIVE:0:U
humidity value:GAUGE:0:100
if_collisions value:DERIVE:0:U
if_dropped rx:DERIVE:0:U, tx:DERIVE:0:U
if_errors rx:DERIVE:0:U, tx:DERIVE:0:U
if_multicast value:DERIVE:0:U
if_octets rx:DERIVE:0:U, tx:DERIVE:0:U
if_packets rx:DERIVE:0:U, tx:DERIVE:0:U
if_rx_errors value:DERIVE:0:U
if_rx_octets value:DERIVE:0:U
if_tx_errors value:DERIVE:0:U
if_tx_octets value:DERIVE:0:U
invocations value:DERIVE:0:U
io_octets rx:DERIVE:0:U, tx:DERIVE:0:U
io_packets rx:DERIVE:0:U, tx:DERIVE:0:U
ipt_bytes value:DERIVE:0:U
ipt_packets value:DERIVE:0:U
irq value:DERIVE:0:U
latency value:GAUGE:0:U
links value:GAUGE:0:U
load shortterm:GAUGE:0:5000, midterm:GAUGE:0:5000, longterm:GAUGE:0:5000
md_disks value:GAUGE:0:U
memcached_command value:DERIVE:0:U
memcached_connections value:GAUGE:0:U
memcached_items value:GAUGE:0:U
memcached_octets rx:DERIVE:0:U, tx:DERIVE:0:U
memcached_ops value:DERIVE:0:U
memory value:GAUGE:0:281474976710656
memory_lua value:GAUGE:0:281474976710656
multimeter value:GAUGE:U:U
mutex_operations value:DERIVE:0:U
mysql_commands value:DERIVE:0:U
mysql_handler value:DERIVE:0:U
mysql_locks value:DERIVE:0:U
mysql_log_position value:DERIVE:0:U
mysql_octets rx:DERIVE:0:U, tx:DERIVE:0:U
mysql_bpool_pages value:GAUGE:0:U
mysql_bpool_bytes value:GAUGE:0:U
mysql_bpool_counters value:DERIVE:0:U
mysql_innodb_data value:DERIVE:0:U
mysql_innodb_dblwr value:DERIVE:0:U
mysql_innodb_log value:DERIVE:0:U
mysql_innodb_pages value:DERIVE:0:U
mysql_innodb_row_lock value:DERIVE:0:U
mysql_innodb_rows value:DERIVE:0:U
mysql_select value:DERIVE:0:U
mysql_sort value:DERIVE:0:U
nfs_procedure value:DERIVE:0:U
nginx_connections value:GAUGE:0:U
nginx_requests value:DERIVE:0:U
node_octets rx:DERIVE:0:U, tx:DERIVE:0:U
node_rssi value:GAUGE:0:255
node_stat value:DERIVE:0:U
node_tx_rate value:GAUGE:0:127
objects value:GAUGE:0:U
operations value:DERIVE:0:U
packets value:DERIVE:0:U
pending_operations value:GAUGE:0:U
percent value:GAUGE:0:100.1
percent_bytes value:GAUGE:0:100.1
percent_inodes value:GAUGE:0:100.1
pf_counters value:DERIVE:0:U
pf_limits value:DERIVE:0:U
pf_source value:DERIVE:0:U
pf_states value:GAUGE:0:U
pf_state value:DERIVE:0:U
pg_blks value:DERIVE:0:U
pg_db_size value:GAUGE:0:U
pg_n_tup_c value:DERIVE:0:U
pg_n_tup_g value:GAUGE:0:U
pg_numbackends value:GAUGE:0:U
pg_scan value:DERIVE:0:U
pg_xact value:DERIVE:0:U
ping_droprate value:GAUGE:0:100
ping_stddev value:GAUGE:0:65535
ping value:GAUGE:0:65535
players value:GAUGE:0:1000000
power value:GAUGE:0:U
pressure value:GAUGE:0:U
protocol_counter value:DERIVE:0:U
ps_code value:GAUGE:0:9223372036854775807
ps_count processes:GAUGE:0:1000000, threads:GAUGE:0:1000000
ps_cputime user:DERIVE:0:U, syst:DERIVE:0:U
ps_data value:GAUGE:0:9223372036854775807
ps_disk_octets read:DERIVE:0:U, write:DERIVE:0:U
ps_disk_ops read:DERIVE:0:U, write:DERIVE:0:U
ps_pagefaults minflt:DERIVE:0:U, majflt:DERIVE:0:U
ps_rss value:GAUGE:0:9223372036854775807
ps_stacksize value:GAUGE:0:9223372036854775807
ps_state value:GAUGE:0:65535
ps_vm value:GAUGE:0:9223372036854775807
pubsub value:GAUGE:0:U
queue_length value:GAUGE:0:U
records value:GAUGE:0:U
requests value:GAUGE:0:U
response_time value:GAUGE:0:U
response_code value:GAUGE:0:U
route_etx value:GAUGE:0:U
route_metric value:GAUGE:0:U
routes value:GAUGE:0:U
segments value:GAUGE:0:65535
serial_octets rx:DERIVE:0:U, tx:DERIVE:0:U
signal_noise value:GAUGE:U:0
signal_power value:GAUGE:U:0
signal_quality value:GAUGE:0:U
smart_poweron value:GAUGE:0:U
smart_powercycles value:GAUGE:0:U
smart_badsectors value:GAUGE:0:U
smart_temperature value:GAUGE:-300:300
smart_attribute current:GAUGE:0:255, worst:GAUGE:0:255, threshold:GAUGE:0:255, pretty:GAUGE:0:U
snr value:GAUGE:0:U
spam_check value:GAUGE:0:U
spam_score value:GAUGE:U:U
spl value:GAUGE:U:U
swap_io value:DERIVE:0:U
swap value:GAUGE:0:1099511627776
tcp_connections value:GAUGE:0:4294967295
temperature value:GAUGE:U:U
threads value:GAUGE:0:U
time_dispersion value:GAUGE:-1000000:1000000
timeleft value:GAUGE:0:U
time_offset value:GAUGE:-1000000:1000000
total_bytes value:DERIVE:0:U
total_connections value:DERIVE:0:U
total_objects value:DERIVE:0:U
total_operations value:DERIVE:0:U
total_requests value:DERIVE:0:U
total_sessions value:DERIVE:0:U
total_threads value:DERIVE:0:U
total_time_in_ms value:DERIVE:0:U
total_values value:DERIVE:0:U
uptime value:GAUGE:0:4294967295
users value:GAUGE:0:65535
vcl value:GAUGE:0:65535
vcpu value:GAUGE:0:U
virt_cpu_total value:DERIVE:0:U
virt_vcpu value:DERIVE:0:U
vmpage_action value:DERIVE:0:U
vmpage_faults minflt:DERIVE:0:U, majflt:DERIVE:0:U
vmpage_io in:DERIVE:0:U, out:DERIVE:0:U
vmpage_number value:GAUGE:0:4294967295
volatile_changes value:GAUGE:0:U
voltage_threshold value:GAUGE:U:U, threshold:GAUGE:U:U
voltage value:GAUGE:U:U
vs_memory value:GAUGE:0:9223372036854775807
vs_processes value:GAUGE:0:65535
vs_threads value:GAUGE:0:65535
#
# Legacy types
# (required for the v5 upgrade target)
#
arc_counts demand_data:COUNTER:0:U, demand_metadata:COUNTER:0:U, prefetch_data:COUNTER:0:U, prefetch_metadata:COUNTER:0:U
arc_l2_bytes read:COUNTER:0:U, write:COUNTER:0:U
arc_l2_size value:GAUGE:0:U
arc_ratio value:GAUGE:0:U
arc_size current:GAUGE:0:U, target:GAUGE:0:U, minlimit:GAUGE:0:U, maxlimit:GAUGE:0:U
mysql_qcache hits:COUNTER:0:U, inserts:COUNTER:0:U, not_cached:COUNTER:0:U, lowmem_prunes:COUNTER:0:U, queries_in_cache:GAUGE:0:U
mysql_threads running:GAUGE:0:U, connected:GAUGE:0:U, cached:GAUGE:0:U, created:COUNTER:0:U

View File

@@ -0,0 +1,104 @@
#!/bin/bash
#http://www.apache.org/licenses/LICENSE-2.0.txt
#
#
#Copyright 2015 Intel Corporation
#
#Licensed under the Apache License, Version 2.0 (the "License");
#you may not use this file except in compliance with the License.
#You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
#Unless required by applicable law or agreed to in writing, software
#distributed under the License is distributed on an "AS IS" BASIS,
#WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#See the License for the specific language governing permissions and
#limitations under the License.
# add some color to the output
red=`tput setaf 1`
green=`tput setaf 2`
reset=`tput sgr0`
die () {
echo >&2 "${red} $@ ${reset}"
exit 1
}
# verify deps and the env
type docker-compose >/dev/null 2>&1 || die "Error: docker-compose is required"
type docker >/dev/null 2>&1 || die "Error: docker is required"
type netcat >/dev/null 2>&1 || die "Error: netcat is required"
#start containers
docker-compose up -d
echo -n "waiting for influxdb and grafana to start"
host_ip=$(curl canihazip.com/s)
echo -n "host ip: ${host_ip}"
# wait for influxdb to start up
while ! curl --silent -G "http://${host_ip}:8086/query?u=admin&p=admin" --data-urlencode "q=SHOW DATABASES" 2>&1 > /dev/null ; do
sleep 1
echo -n "."
done
echo ""
# create snap database in influxdb
curl -G "http://${host_ip}:8086/ping"
echo -n ">>deleting snap influx db (if it exists) => "
curl -G "http://${host_ip}:8086/query?u=admin&p=admin" --data-urlencode "q=DROP DATABASE snap"
echo ""
echo -n "creating snap influx db => "
curl -G "http://${host_ip}:8086/query?u=admin&p=admin" --data-urlencode "q=CREATE DATABASE snap"
echo ""
# create influxdb datasource in grafana
echo -n "${green}adding influxdb datasource to grafana => ${reset}"
COOKIEJAR=$(mktemp)
curl -H 'Content-Type: application/json;charset=UTF-8' \
--data-binary '{"user":"admin","email":"","password":"admin"}' \
--cookie-jar "$COOKIEJAR" \
"http://${host_ip}:3000/login"
curl --cookie "$COOKIEJAR" \
-X POST \
--silent \
-H 'Content-Type: application/json;charset=UTF-8' \
--data-binary "{\"name\":\"influx\",\"type\":\"influxdb\",\"url\":\"http://${host_ip}:8086\",\"access\":\"proxy\",\"database\":\"snap\",\"user\":\"admin\",\"password\":\"admin\"}" \
"http://${host_ip}:3000/api/datasources"
echo ""
dashboard=$(cat grafana/dashboard.json)
curl --cookie "$COOKIEJAR" \
-X POST \
--silent \
-H 'Content-Type: application/json;charset=UTF-8' \
--data "$dashboard" \
"http://${host_ip}:3000/api/dashboards/db"
echo ""
echo "${green}loading snap-plugin-collector-docker${reset}"
(snapctl plugin load /opt/snap/plugin/snap-plugin-collector-docker) || die "Error: failed to load docker plugin"
echo "${green}loading snap-plugin-publisher-influxdb${reset}"
(snapctl plugin load /opt/snap/plugin/snap-plugin-publisher-influxdb) || die "Error: failed to load influxdb plugin"
echo -n "${green}adding task${reset}"
TMPDIR=${TMPDIR:="/tmp"}
TASK="${TMPDIR}/snap-task-$$.json"
echo "$TASK"
cat docker-influxdb.json | sed s/HOST_IP/${host_ip}/ > $TASK
snapctl task create -t $TASK
echo ""${green}
echo "Influxdb UI => http://${host_ip}:8083"
echo "Grafana Dashboard => http://${host_ip}:3000/dashboard/db/snap-dashboard"
echo ""
echo "Press enter to start viewing the snap.log${reset}"
read
tail -f /tmp/snap.out

21
snap/psutil-file.yml Normal file
View File

@@ -0,0 +1,21 @@
---
version: 1
schedule:
type: "simple"
interval: "1s"
max-failures: 10
workflow:
collect:
metrics:
/intel/psutil/load/load1: {}
/intel/psutil/load/load15: {}
/intel/psutil/load/load5: {}
/intel/psutil/vm/available: {}
/intel/psutil/vm/free: {}
/intel/psutil/vm/used: {}
config:
publish:
-
plugin_name: "file"
config:
file: "/tmp/snap-psutil-file.log"