Fylamynt Product Docs
  • Welcome to Fylamynt
  • Getting started
    • Onboarding Checklist
    • 1. Setting up your first Cloud Service target account
    • 2. Getting to know workflows
    • 3. Creating your first workflow using AWS Cloud Services
    • 4. Setting up your first Integration
    • 5. Setting up your first resource
    • 6. Creating your first Incident Response workflow
    • 7. Incident Management - Automatic workflow execution
    • 8. Quickstart - Fylamynt sample workflows
  • Integrations
    • Amazon EventBridge
    • Ansible
    • AWS
    • AWS Health
    • Container
    • Datadog
    • Elasticsearch
    • Generic Webhook
    • GitHub
    • Google Kubernetes Engine (GKE)
    • Humio
    • Instana
    • Jenkins
    • Jira
    • New Relic
    • Opsgenie
    • PagerDuty
    • Pulumi
    • Prometheus
    • ServiceNow
    • Slack
    • Splunk
    • Splunk On-Call (VictorOps)
    • Spot by Netapp
    • Squadcast
    • Sumo Logic
    • Teleport
    • Terraform Cloud
    • Terraform CLI
    • Twilio
    • Zoom
  • AWS Services
    • EC2
    • Service Health
  • RESOURCES
    • API Keys
    • CloudFormation Templates
    • CloudWatch Targets
    • EKS Permissions
    • S3 Buckets
    • SSH Targets
  • FEATURED WORKFLOWS
    • Stop or Terminate underutilized AWS resources
    • Rightsizing EC2 instances
    • Incident Response Automation
    • Application Performance Monitoring
  • Announcement
    • What's new?
  • Support
    • Contact Fylamynt
Powered by GitBook
On this page
  • Category:
  • Workflow name:
  • Description:
  • Integrations:
  • Workflow review:

Was this helpful?

  1. FEATURED WORKFLOWS

Incident Response Automation

Incident Response Featured Workflow

PreviousRightsizing EC2 instancesNextApplication Performance Monitoring

Last updated 3 years ago

Was this helpful?

Category:

Incident Response Automation

Workflow name:

Restart Linux service on memory utilization alert

Description:

The following example workflow will show how Fylamynt can automate the remediation of an alert received from your existing performance monitoring or logging tools.

The use case we are addressing is where Linux-based servers are monitored for high memory utilization due to perhaps an application that experiences memory leaks. When an alert is received for a specific server, the remediation process taken is to SSH to that Linux server that sits in an isolated network or behind a firewall and restarts the service. This might sound straightforward, but what if this incident occurs at 2 am in the morning, or how does the SRE get access to the isolated environment to authenticate and execute commands on the Linux server in question…

Integrations:

Workflow review:

Our example workflow is triggered from a PagerDuty Alert which retrieves the alert body, in JSON format, from the PagerDuty service.

The alert body output from PagerDuty is then used as the input for the JSONPATH action node which extracts only the hostname of the Linux server with the memory utilization alert using a path expression.

An approval request is then sent to a Slack channel to execute the SSH command on the affected server.

The Teleport SSH Execute Action node takes the output of the JSONPATH node, which contains the matched hostname, as the SSH Target Host, and executes the command provided to restart the service.

The workflow then transforms the JSON to string, and a message is sent to a specified Slack channel, with the hostname as a variable, notifying the team that the service for the host was successfully restarted.

The alert body output from PagerDuty is used again as the input for the JSONPATH action node which extracts the PagerDuty Incident ID.

Lastly, the workflow uses the retrieved Incident ID to automatically resolve the PagerDuty incident that triggered this workflow.

Before we review the workflow in Fylamynt, let's look at the different tools and integrations required to make this happen. First off we are using to monitoring the EC2 instance for memory utilization however for this particular workflow New Relic can very easily be replaced by other APM tools like or for which we have integrations available as well. The Policy in New Relic has a notification channel configured that sends the incident to a PagerDuty service. For the service, we configure a webhook integration to Fylamynt which allows us to monitor the incident generation on the service and trigger workflow automatically. Our integration is then used to authenticate and execute the SSH command on the specific EC2 Linux-based instance. And lastly, we require a that provides the ability to send messages and approval notifications to your Slack Team.

New Relic
Datadog
Sumo Logic
PagerDuty
Teleport
Slack integration