At the moment we use PostgreSQL which serves us pretty well. We have some caching tables setup and with our processing scripts we have the data in customers dashboards within 2 minutes. We move our raw visits in tables with visits aggregated per hour and per day.
This is all working fine. But we want to start targeting bigger clients now and would love to be fine then as well.
A few things that are important to us when selecting a database:
- it is very popular (we want to solve issues fast)
- it is free to use (the license)
- it is easy to maintain (very important)
- it can run on my own servers
One that really appeals to us is Apache Cassandra with Apache Spark.
But we're not sure if we should optimize our PostgreSQL workflow. I wouldn't mind investing 1 or 2 weeks of my time into setting up a new database, but it should also save time in the future. The common queries we will have are (see our demo dashboard to see what we show [1]):
- get (unique) page views from a website grouped by per 24 hours
- get list of most visited pages and show those with the percentages of referrals aggregated by day (/contact: twitter.com 20%, organic 80%)
- get conversions between events of sessions/list of events
There are probably many tools out there to help, but I wouldn't use a sledgehammer to hammer a nail.
[1] https://simpleanalytics.com/simpleanalytics.com