Incident Response Checklist

Working checklist for the SANS / NIST PICERL incident response lifecycle — preparation, identification, containment, eradication, recovery, lessons learned.


Download PDF ↓

Incident Response Checklist

A working checklist based on the SANS / NIST PICERL lifecycle. Use this in real-time during an incident, or as a tabletop guide.

1. Prepare (before anything happens)

  • On-call rotation documented; primary + backup names known
  • Comms tree: who to call (legal, exec, PR, insurance, MSSP, IR retainer)
  • Out-of-band comms channel ready (Signal group, separate Slack workspace) — assume primary is compromised
  • Decision-tree authority: who can authorize containment that takes services offline?
  • Logs centralized and at least 90 days deep
  • Recent backup test: when was the last successful restore drill?
  • IR retainer or hotline number on file

2. Identify

  • Source of detection logged (alert, user report, third party, threat-intel feed)
  • Initial timestamp recorded (UTC)
  • Affected systems enumerated: hostname, IP, owner, criticality
  • Indicator of Compromise (IOC) noted: file hash, IP, domain, behavior
  • Severity assigned (Low / Med / High / Critical) — re-evaluate hourly
  • Incident ticket opened; ticket ID broadcast to responders
  • Initial scoping question: data exfil, destructive, persistence, or recon?

3. Contain

Short-term:

  • Isolate affected hosts (network quarantine, not power-off — preserve volatile evidence)
  • Disable compromised credentials (rotate, revoke MFA tokens, kill active sessions)
  • Block known IOCs at perimeter (firewall, DNS, EDR)
  • Take memory + disk image of at least one affected host before reboot

Long-term:

  • Patch / reconfigure to prevent re-entry on the same vector
  • Increase logging and monitoring on adjacent systems
  • Push EDR / IDS rule updates derived from observed IOCs

4. Eradicate

  • Remove malware (re-image is safer than clean — clean only if you’re certain)
  • Reset all credentials with potential exposure
  • Rotate any leaked secrets (API keys, certificates, service accounts)
  • Verify persistence mechanisms removed: scheduled tasks, services, registry run keys, web shells, SSH keys, cron jobs

5. Recover

  • Restore from clean backup (verified-clean — confirm pre-incident timestamp)
  • Phased return to production (canary first, then full)
  • Heightened monitoring for at least 30 days
  • Confirm normal business function with affected business owners

6. Lessons Learned (within 2 weeks)

  • Post-incident review meeting scheduled
  • Timeline document: what we knew when, what we did when
  • Root cause: technical AND organizational
  • Detection gap: would we catch this faster next time? What instrumentation is missing?
  • Action items logged in a tracker — owner + date for each
  • Controls / playbook updates committed
  • Tabletop scenario drafted from the incident for next training cycle

Severity Quick Reference

Level Examples Response Time
Critical Data exfil confirmed; ransomware spreading; production down Immediate, all hands
High Single-host compromise; phishing with credential theft < 1 hour
Medium Suspicious activity, no confirmed compromise < 4 hours
Low Failed exploit attempts; benign anomaly Next business day

Communication Templates

Internal first message:

We are investigating a potential security incident affecting [systems]. The incident ticket is [ID]. Please [specific instruction — e.g., do not log into X]. Updates every 30 minutes.

Hold-line (don’t have facts yet):

We are aware of [event]. We are currently investigating. We will share verified details as we have them. Avoid speculating internally or externally until we confirm.