Big data is composed of many small data. Big data possess all the attributes of small data but are more concerned due to their large sizes. Big data is in the forms of Exabyte’s, Zeta bytes and it is ever-growing. There is a paradigm shift in the thinking of the companies as they consider huge potential in the data irrespective of the incapable current technology available for analysis. Even though it is not completely ethical to store day-to-day data in the fields of social networking, medical or any source that contains personal information. These data are hyped because these data are personal. For example data available online are often subjective and medical reports and treatments done on patients are highly confidential information. This raises a question who actually owns the data, is it a property of hospital because the physician has advised for such treatment or customer? The problem arises when there are no strict rules present for data management and if there is any there is no control on the data flow because it is too complex and uncertain. There are many ways through which the data can be leaked and misused. For Example wikileaks when a low-level employee copied US confidential information into fake lady gaga CD and revealed many information. Lack of regulation and lack of care on data ethics which is possibly due to the greediness of data gathering race among the government and private organizations is one of the major contributors to the data ethics issue arising in this current situation. Since 2006, Twitter has been monitored and their data are gathered by congress for future analysis, and keeping a track on us. This could be tactically used at the time of war, for example problem Syria and disputes in Egypt. The uncertainty of data usage lies when there is no single formula to analyse the data. The uncertainty is with the methods and it is capable of providing any sort of output which may be highly useful or could be otherwise if not administered properly. Data stewardship is one of the bottlenecks in big data which can be automated to some extent through artificial intelligence. The algorithm is supposed to improve with time and the analysis of data too. This personal information which when gathered can be used for marketing purpose, psychological research or for any kind of study and investigations. When questioning on regulations for data MY(ing), there is uncertainty with period of expiry with the data gathered and when anonymousness is maintained.
Davis and Patterson’s four question framework offers an important first step toward a common ground for discussing related issues:
- Identity: Is offline existence identical to online existence? “Some think obviously yes, others no, but we want to be explicit and engage the questions in a collaborative fashion.”
- Privacy: Who should control access to data? Davis points out that three data points can identify 87 percent of Americans: gender, birth date and zip code. “That means in any particular set of data, if I have one of three, I can correlate that data set with another, and I can identify you.”
- Ownership: Who owns data, can we transfer the rights of it, and what are the obligations of people who generate and use that data? Davis points out that the World Economic Forum describes data as a new economic asset class that can be traded, sold and basically treated as a currency.
- Reputation: What is important about reputation, says Davis, is the realization that the number of digital conversations and interactions that take place, and that we can participate in, fragments our ability to manage reputation. “Understanding the implications of that are going to be very important.”
Big data can be of many purposes and is a hot topic as of now, however, it is somewhere in the middle of its development, there are a lot of speculations that analysis can be optimized and proper results could be drawn from the data gathered. Just accumulating data is not the only thing to do, considering expenses in keeping those ever-changing and growing data, there should be a proper trade-off between cost and benefits incurred off-setting technological progression with better analysis. There is some speculation that Moore’s law will stop at some level, however, computing is not the bottleneck, current systems and servers are capable enough but the tools are not that smart and there is an inequality of data reach due to authoritative influences in relation to it. For example some speculation with internet is not safe, wikileak was succumbed, and twitter been monitored. The intriguing factor here certainly is the awareness of people with the data they share intentionally or unintentionally. Big data is generated not only from social networking sites but also from engines such as from Formula 1 cars, CERN generates 40TB/sec, windmills, hospitals through pulse rates and every second information from patients which can be used together to uncover some relations. Also, big data has been used by Nate silver to predict US presidential election. Share market and their trends are dependent on big data. Big data usage and its issues with ethics depends on its applicability and context it has been used, for example when FBI/CIA uses big data against the ethics, against any criminal could be considered but at the same time there will be less or no study on casinos and the trends been followed there. At the same time there will be less information available for banks online. These are considered within the organizations due to confidentiality concerns, in which data stewards, data analysts, and data ethics scientists are from a same firm but there is a need for integration among the members. Their collective insight towards big data and its harnessing for something good would yield productive results. It would also help in dealing with ethics and how to eliminate bad data, or sometime there might be a need for extracting good data from bad data which might need all the teams to work together to achieve the goal.
It is true among different organizations, for example Intel, IBM and congress can have cross-collaboration to work on certain health related issues. Intel is supposed to provide data processors, IBM with the software and congress is with the raw data.
In other part of the space, there are speculations that knowledge societies are having concerns with artificial intelligence, Brynjolfsson and McAfee’s claims that new technologies will reduce jobs. It can be inferred from their studies that emphasis on big data and artificial intelligence will aid in destroying management insight.
Mobile devices can generate plenty of data about location, model, services activated and so on. Also, it has been appreciated to enhance surveys. It can be used as input devices for census in remote areas instead of using paper which is a burden when feeding into computer. Mobile devices are becoming a substitute for the conventional census process and are effective. For example, if someone is going to into a remote village in Africa to understand how many children are vaccinated? Mobile phone device with right app would make the things easy in this case. Same procedures can be continued for population census in such places. These examples are one aspect of mobile devices, people accessing facebook, twitter, google+ through mobile devices has increased their popularity and generates huge data, and lots of personal information too. People’s emotions, feelings and behaviours can be used for research purposes. Mangers and marketers can get an edge using them predicting consumers behaviour and intention towards purchasing intention can give a competitive advantage to the companies.
Claims of objectivity and accuracy can lead the situation to any direction. There are chances of error while running the program, if the raw data is not fed properly or there are many assumptions made while analysis could bring big uncertainty and this could easily lead to butterfly effect. Inaccuracy is another concern which is in the periphery of big data process. Poor quality or bad data is an example of inaccuracy which can be a result of poor data stewardship. It has been suggested that the weakest link is the strongest link in a system, these bad quality data can easily influence the objectivity claims and it can bring risk to the entire system. Big data is quite expensive to adopt, unless there is a specific need, organizations are not interested in it but they are more interested in accumulating data to avoid missing opportunities. Sometimes government regulations can force the organization to implement it for example Basel II for banking firms.
- Capturing Big Data in Social and Detection Systems: Market Opportunities and Challenges 2013 – 2019 (prnewswire.com)
- What big data did next (in 2014) (computerweekly.com)
- How Are You Managing Big Data? Data, Data Everywhere (domo.com)
- Predicting Big Data’s 2014 (zdnet.com)
- The need for big data ethics (itworldcanada.com)
- Fusionex Launches GIANT, Asia’s Premier Big Data Analytics Solution (sys-con.com)
- Big data World war (abfreshmind.wordpress.com)
- Big Data Explained: Real World Examples of Big Data (fliptop.com)
- In Pictures: 12 Big Data predictions for 2014 (computerworld.co.nz)
- Infographic: The Physical Size of Big Data (domo.com)