a few days ago I wrote a tiny Python script that turns an Apple Health export.xml file (~1 GB in my case, ~10 years of data) into a very simple 5 column .parquet file (~40 MB):
- type (e.g. "CyclingDistance")
- value (e.g. "12.100")
and 3 datetime timestamps:
- start
- end
- created
https://github.com/atlaslib/atlasWhy?
The Apple Health export is huge (!), the export.xml schema is wild and a bit irregular and most of its identifiers are prefixed (`HKQuantityTypeIdentifier`).
Even though the types of observations are a wide range (e.g. DistanceWalkingRunning, DistanceCycling, DistanceSwimming, StepCount, …) I put all of them into 1 simple table to make it easier to explore.
[(!) One tiny disadvantage of this is that I had to make the "value" column a String. Even though most observations are floating point numbers. Some observations (e.g. sleep phases) have categories that are enums (?) and I chose not to map them to numbers for now.
Any ideas what to do about this: very welcome.]
Here are a few example charts I generated using Clickhouse (chDB) and Vega-Altair in a Quarto notebook:
https://github.com/atlaslib/atlas#explore
Here is the notebook with example code:
https://github.com/atlaslib/atlas/blob/main/examples/apple-h...
Another example of how neat it is to have this data easily accessible:
Asking questions in natural language (e.g. "when was my last swimming workout?") and ("what is my total walking distance") translated to SQL by llama 3 works:
https://x.com/__tosh/status/1784714636610187488/video/1
Repo on Github:
https://github.com/atlaslib/atlas
Have a great weekend!