Threat Hunting: Hypothesis Chaining

Hypothesis chaining is a method that enables threat hunters to narrow down search results during a hunt by appending or branching off of their original hypothesis.

This technique helps threat hunters take an overwhelming output from a hunt and continue their investigation without being hindered by a large volume of results.

For example, a threat hunter hypothesizes that threat actors may be using DGAs (Domain Generation Algorithms) for phishing, malware distribution, or command and control (C2). They generate a hunt in DNS or Secure Web Gateway logs, looking for users interacting with .xyz TLDs. However, upon running the initial query to validate their hypothesis, the results yield tens of thousands of entries. Despite efforts to group or count the domains, there are still over 1,000 distinct domains in the results.

This is where hypothesis chaining becomes useful. The threat hunter can append one or more sub-hypotheses to further narrow the results.

  • Original hypothesis: threat actors may be creating DGAs for phishing, malware, or C2
    • Sub-hypothesis 1: the malicious domain may be used for phishing, therefore we’ll filter domains in the hunt that have more than 20 transactions in the original hypothesis result set.
    • Sub-hypothesis 2: the malicious domain may be used for C2, which we hypothesize would have an average of one heartbeat every hour (regardless of jitter), therefore we filter down the original hypothesis results further by only displaying domains that were interacted with 118-218 times (1 transaction per hour * 24 hours = 24 times per day or 168 times per week [plus or minus 50 = 118-218]) in the last 7 days.

Hypothesis chaining can help a threat hunter take the output of a hunt that is too large to act upon, and further their investigation without being blocked by a large number of results.

For example, a threat hunter hypothesizes that threat actors may be creating DGAs for phishing, malware, or C2, so they generate a hunt in DNS or Secure Web Gateway logs looking for users who interact with .xyz TLDs. Upon running the initial query to validate their hypothesis, the results are in the tens of thousands. They take steps to group or count the domains, however they still have over 1,000 distinct domains in the results.

This is where hypothesis chaining comes in. The threat hunter could append one or more sub-hypotheses.

  • Original hypothesis: Threat actors may be using .xyz TLDs for malicious activity.
    • Sub-hypothesis Chain 1: The malicious domain may be used for phishing; therefore, we’ll filter domains in the hunt that have more than 20 transactions in the original result set
    • Sub-hypothesis Chain 2: The malicious domain may be used for C2, which we hypothesize would have an average of one heartbeat every hour (regardless of jitter). To refine the original results, we filter domains interacted with 118-218 times (1 transaction per hour × 24 hours = 24 times per day, or 168 times per week, plus or minus 50 = 118-218) in the last 7 days.