Not all data is important, in organizations, usually input data’s are more important but what we do for the data once harnessed from the database, save that as well? How many times we create the data, templates and save, thinking of something that might be useful tomorrow, but most of the time they are never used as we build better desire or better insight to take the challenge. It is very important to understand what are we looking for but it should be considered that there is no expert and there is high possibility of basic “trial and error” method to come up to any solution. Teams are fighting for the best alternatives to implement, sometimes for the sake of their department, and sometimes for creativity they can bring inside? But most of the time it just end up in waste of time and opportunity.
Data once harnessed should be either taken care or trashed or there might be some better mechanism to minimize its storage requirements but just saving the methods/procedure/queries to do it again. Some would feel lazy, or argue associating time required to run a query again, if we have data saved which might be used for some reference if we couldn’t bring some more sense to the on-going work. This is done in many instances inside any organization and the CRAP data’s are occupying lot of unnecessary spaces. Big data is getting bigger and bigger but it should be checked if they are really relevant? If not there must be a process defined how to declare anything relevant to be relevant? The process needs a complete brain storming from different part of the department and should flow equally to the relevant areas inside an organization.
The use of hyperlinks with the resultant data after the query can be a way to minimize the space required for the queried data. This can save the storage spaces to a major extent, and without considering the space crunch and giving freedom to run multiple queries, and saving/exporting it to the way one wants. However it would create a boundary where to keep them, probably in the tools where the query has run before the export or other activities are run. Again, there are some issues with multiple backup for double triple security will hinder its effectiveness. It is safe to consider multiple backups in extreme cases but it might lead to problems created with duplication of data and version mismatch if not administered correctly. Incremental data over the used data again will help reducing the storage requirements in the due course of the activity. There is a need for seeking how backup technology works for the systems to work for data harnessing.
Duplication of data also gets created while providing the inputs, in the process of feeding the values into the system. If there is a possibility to make an efficient system which checks the duplication itself and eliminates it, can bring revolution in the storage space used, and only relevant data will be getting into the system. If there is proper control on what data is going with a creative filter before it is fed, big data can be minimized and controlled. Once we control and minimize the effect of unstructured data initially at this stage, these data probably would need less attention later when dumped for a relevant reason.
- Save data, Backup data and Backup again. Do you backup again for no Reason? (abfreshmind.wordpress.com)
- Creating a Strict Backup Regiment on OSX (bloggingpro.com)
- Protecting Data in the Cloud: The Truth About SaaS Backup – A Spiceworks Study (backupify.com)
- Desktop Backup Software And Restoration (computerstech01.wordpress.com)
- Large organisations can save $3m through better backup, says EMC (hispanicbusiness.com)
- Data Protection as a Service (infocus.emc.com)
- Home Users and Data Backup (perryhallpcrepair.wordpress.com)
- Online Backup Solutions for Mac Users (blogherald.com)
- Duplicity + S3: easy, cheap, encrypted, automated full-disk backups for your servers (phusion.nl)
- Backups suck – a rant (3ofcoins.net)