Skip to Content

Scanning for Sensitive Data - Cornell Spider

Printer-friendly versionPrinter-friendly version

After thorough testing of available "Sensitive Data Scanners", the OIT found that Cornell's Spider was the best available free tool.  However, individual departments are free (presuming compliance with their own internal policies) to use whatever tool they wish.

Spider is not officially supported by OIT, but we will be willing to help users to a reasonable extent with any technical issues encountered.

Important:

  • Use of Spider is at the user's own risk. Users are highly encouraged to carefully read the following documentation and if necessary enlist the support of your local IT staff. Your department may have internal procedures and policies regarding the use of scanning tools.
  • Spider executes a search of designated data locations and returns the results that it believes might contain SSNs.
  • If Spider reports no sensitive files, this does not guarantee that SSNs are not on the machine in a format or file that Spider cannot read or cannot access.
  • Technology can only take us so far - users should be aware whether or not their systems house Restricted Information and be prepared to take appropriate actions should Restricted Information be found.
  • Verify each file that Spider finds. Be aware that Spider produces many false alarms. You must verify each file to ensure the data contained in it is truly a SSN before deciding what to do with the file. (Files flagged by Spider that don't contain SSNs are called “False Positives”.)
  • Promptly remove the Spider scan database or log file. Spider logs are often a direct road map to your files containing sensitive information. Whenever possible, remove the log files after you have protected or removed the files that contain Restricted Information.
  • Spider reads and analyzes each file on your computer. Depending on the number and size of the files on your system, Spider may take a significant amount of time to complete. We recommend you allocate 1 - 3 hours for the scan to run.
  • Spider can consume a significant amount of computer resources while running. As a result, your computer may perform unstably during the scan. The program may appear to freeze or lock up; just be patient and the scan will finish. For this reason you may wish to launch the scan at the end of your shift or when you can leave it unattended.
  • You can specify the file directory that Spider will scan. This is a good way to narrow your search and break the process into manageable chunks of time.
  • Protect files you need to keep that contain Restricted Information.

The instructions below come in two formats: thorough step by step instructions and an abbreviated quick guide.  They were written specifically for Windows Vista, although the basic principles are the same regardless of the operating system / version of software.

The instructions contain direct links to download Spider, but if desired the official Spider page is located here.  Users may also obtain the Linux and Mac clients at the Cornell site.

AttachmentSize
Sensitive Data Scanner Instructions.Spider.Windows.Vista_.pdf1.47 MB
Sensitive Data Scanner Instructions.Spider.Windows.Vista_.Short Version.pdf64.29 KB