I'm going through our patron database and cleaning up old data that we no longer need in response to the new data protection laws being passed in Massachusetts. Ideally, if I wanted to ensure that we stored no social securities, I'd simply write the following SQL statement:
UPDATE patron SET ssan = '';
I don't have write access to our database, so I'm not sure that is possible. Instead I perform the following steps:
The only options I ever consider are -m (monitor the process every time this many records are processed) and -r (only process this many records). I want the whole database and I want to be notified every 1000 records.
Pptrnextr -m 1000
Each line in the patron SIF represents a single patron. Sed is a non-interactive text editor that processes files line by line. We can use it and regular expressions to find the ssn and remove it. The SSN is at offset 269. It's max length is 11 characters so we replace it with 11 blanks. We will redirect the output of the command into a file. Assuming the original patron SIF is called sif.pxtr and we want it to be saved as sif.pxtr.modified:
sed -nr "s/^(.{268})(.{11})/\1 /p" sif.pxtr > sif.pxtr.modified
This is just our usual patron update process. We'll use our sif.pxtr.modified file and identify patrons using the institution_id.
Pptrnupdt -p sif.pxtr.modified -i I -o I -m 1000
It's a pretty hellish process. If you know better, please let me know.