In a recent post, Hu Yoshida refrences an IDC presentation discussing the rate of growth of structured versus unstructured data. It seems that we can expect unstructured data to grow at a rate of some 63.7% annually. I wonder what actual percentage of this data represents useful information?
Personally I know I’m guilty of data untidiness. I have a business file server on which I heap more data on a regular basis. Some of it is easy to structure; Excel and Word documents usually get named with something meaningful. Other stuff is less tangible. I download and evaluate a lot of software and end up with dozens (if not hundreds) of executables, msi and zip files, most of which are cryptically named by their providers.
Now the (personal) answer is to be more organised. Every time I download something, I could store it in a new structured folder. However life isn’t that simple. I’m on the move a lot and may download something at an Internet cafe or elsewhere where I’m offline from my main server. Whilst I use offline folders and synch a lot of data, I don’t want to synch my entire server filesystem. The alternative is to create a local image of my server folders and copy data over on a regular basis, trouble is, that’s just too tedious and when I have oodles of storage space, why should I bother wasting my time? There will of course come a time when I have to act. I will need to upgrade to bigger or more drives and I will have (more) issues with backup.
How much of the unstructured data growth out there occurs for the same issues? I think most of it. I can’t believe we are really creating real useful content at a rate of 63.7% per year. I think we’re creating a lot of garbage that people are too scared to delete and can’t filter adequately using existing tools.
OK, there are things out there to smooth over the cracks and partially address the issues. We “archive”, “dedupe”, “tier” but essentially we don’t *delete*. I think if many more organisations operated a strict Delete Policy on certain types of data after a fixed non-access time, then we would all go a long way to cutting the 63.7% down to a more manageable figure.
Note to self: spend 1 hour a week tidying up my file systems and taking out the trash…..