The methodology is simple. I gathered all links from https://news.ycombinator.com/front ("past" on the navigation bar) for each day from 2020-01-01 to 2020-07-09. These are the top stories of each day. This is a trivial task and resulted in 17566 links (raw data [0][1][2]). There are <100 duplicates, which I kept. Among these are 1112 plain HTTP links, amounting to ~6.3% out of 17566.
Next I analyzed how many of the 1112 plain HTTP links are available over HTTPS. Methodology:
1. Check if the HTTP version redirects to the HTTPS version; if so, done, otherwise record the HTTP response;
2. Replace http:// with https:// and see if the HTTPS URL works; if so, record the HTTPS response;
3. Compare the HTTP and HTTPS responses. If they're identical, done. If not, compare the length of the responses; if they differ by <=1%, record this as HTTPS response almost identical as HTTP, and assume the HTTPS version works (the page may not use relative URLs or omit the protocol, so the HTTPS response may be subtly different while having the exact same rendered output).
The analysis script is available at [3].
---
To be continued in a comment since I'm hitting the 2000 char limit: https://news.ycombinator.com/item?id=23802522