Show HN: Exploring Apple Health with ClickHouse, Vega-Altair, Llama 3, Quarto

2 points

2 years ago

Hey everyone,

a few days ago I wrote a tiny Python script that turns an Apple Health export.xml file (~1 GB in my case, ~10 years of data) into a very simple 5 column .parquet file (~40 MB):

  - type (e.g. "CyclingDistance")
  - value (e.g. "12.100")

  and 3 datetime timestamps:

  - start
  - end
  - created

https://github.com/atlaslib/atlas

Why?

The Apple Health export is huge (!), the export.xml schema is wild and a bit irregular and most of its identifiers are prefixed (`HKQuantityTypeIdentifier`).

Even though the types of observations are a wide range (e.g. DistanceWalkingRunning, DistanceCycling, DistanceSwimming, StepCount, …) I put all of them into 1 simple table to make it easier to explore.

[(!) One tiny disadvantage of this is that I had to make the "value" column a String. Even though most observations are floating point numbers. Some observations (e.g. sleep phases) have categories that are enums (?) and I chose not to map them to numbers for now.

Any ideas what to do about this: very welcome.]

Here are a few example charts I generated using Clickhouse (chDB) and Vega-Altair in a Quarto notebook:

https://github.com/atlaslib/atlas#explore

Here is the notebook with example code:

https://github.com/atlaslib/atlas/blob/main/examples/apple-h...

Another example of how neat it is to have this data easily accessible:

Asking questions in natural language (e.g. "when was my last swimming workout?") and ("what is my total walking distance") translated to SQL by llama 3 works:

https://x.com/__tosh/status/1784714636610187488/video/1

Repo on Github:

https://github.com/atlaslib/atlas

Have a great weekend!