Project Blue Book
~12,618 cases. The classification reveals the investigators, not the phenomena.
Decision Chain: Blue Book
Ingest declassified Project Blue Book case records. ~12,618 cases with conclusion tags (Identified, Insufficient Info, Unknown).
~12,618 cases ingested. Categorical data with date-level timestamps only.
Attempt self-exciting point process fit. Blue Book data is categorical with coarse timestamps — Hawkes process requires fine-grained temporal data to work.
Hawkes process UNRELIABLE for this dataset. Timestamps too coarse, data too categorical.
Instead of temporal self-excitation, analyze the distribution of conclusion tags. How were cases classified and does the classification reveal anything about the investigation process?
"Unknown" classifications cluster in periods of reduced staffing and budget. Classification reflects investigation capacity, not phenomenon characteristics.
Compare case classification rates across different project periods, staffing levels, and investigator assignments.
Cases classified "Unknown" correlate more strongly with investigation workload than with case characteristics.
The classification tells you about the investigators, not the phenomena
Project Blue Book is the wrong dataset for Hawkes process analysis — the data is categorical and timestamps are too coarse. But conclusion-tag analysis reveals something important: the rate of "Unknown" classifications correlates with project staffing and budget, not with the nature of the cases themselves. When the project was understaffed, more cases were labeled "Unknown" or "Insufficient Information." The classification system reveals investigation bias, not genuine anomaly distribution.