- Restructure a hierarchy of files
- Return the set of files in a hierarchy such that each file matches arbitrary criteria
- Run various commands on a hierarchy of input files (based on the file's properties) and create a hierarchy of output files
- Turn any reasonably-structured file (e.g., a log file) into a more usable CSV (comma-separated value, which can be imported into a spreadsheet) file
- Pull specific lines and fields out of a CSV matching arbitrary criteria
- Create simple (plaintext) reports based on the data in a set of files
Just send me an email with the description of the task. I'll take on any reasonable task. I can receive the files via SCP, FTP, email (depending on file size), or HTTP. If the data set is huge and you'd like me to do so, I can even log in and work on the files directly.
I work in Forensic Computing, and I've often worked on massive data sets. I've turned gigabytes of tcpdump files into useful reports. I've turned hundred-gigabyte IIS log files into a small list of unauthorized requests. I've extracted the emails from an mbox file, the attachments from each email, and created a browse-able view of the emails. I created a simple system to convert all of my personal FLAC files into much-smaller MP3s, and randomly pulled out a set of those MP3s (based on my rating of each song) to be put onto my cell phone's microSD card.
One of the parts I enjoy most about my current job is turning massive unwieldy data into something useful. I truly love using and creating simple command line utilities that accomplish great things when they are put together. Let me know if I can help you.