Maybe some other business KPIs, but the emphasis is on building "graphs that the operational team should be looking at on ~hourly basis to prioritize fixes and spot systemic regressions."
Ideally it would integrate with Pagerduty, in that it'd be easy to express "page me if if X is above Y."
My preference is that I have my own cron job that captures the metrics from our queues/databases and pushes them to the tool via a ~REST API but I could be convinced of a different method.
So far the only thing I can think of is Datadog, specifically https://www.datadoghq.com/solutions/real-time-business-intelligence/
Anything else I should be looking at? Or any other tips?