Show HN: Did YC hackathon yesterday, sharing idea: Using CV with health data

3 points

7 years ago

I wanted to make something that could help doctors, medical groups and health insurance companies scan their documents (images, files) and detect their members' Protected Health Information (PHI) and then scrub it out on device. This would help them avoid leaking protected member data accidentally and help them not have to manually scrub this data out before it's safe to post on the network. Leaks of this kind of data can be costly per HIPAA.

I used Computer Vision to scan an image for text, then classified if it was one of the nearly 20 forms of protected health data using a combination of ML and heuristics, then scrubbed it out on device by drawing a box over the text. It's not 100% accurate but showed good results (seems like about 90%). I also added the ability to tap on the image and manually scrub an area, as well as remove any incorrect classifications.

https://imgur.com/a/BFVlaxL

Next up I'd like to improve the accuracy of the model and then build a desktop client that does the same kind of detection on different documents. Some of the feedback that came up during my pitch was wether this could be used to identify other kinds of information (financial), and I'd like to experiment with generalization of the model for different domains.

Figured I'd post for more feedback - what a fun event, and it was so great to talk to aspiring and past YC alumni and partners!