Technology is an integral part of our lives and its use is set only to increase; it has an effect on nearly everything we meet on a daily basis. Developments in technology do not just happen. Thanks to data, specifically Big Data that has extended the span of technology development.
Data is used throughout the world to drive processes, development and localization; nothing happens without data and in many cases, free of cost software or apps only happen for the retrieval of that data. Whilst technology knows what is around you, what your preferences are and how you live your life, it is companies such as Google that thrive on that data; yes, it makes life easier for us on the whole, but the by-product of that is that the technology companies know more about us than most of us would like.
Have you ever searched for a product in Google on your social networking site’s timeline? The result that you show is just one example of big data working at its finest.
Hadoop is one such offering, using Big Data as a Service (BDaaS).
What is Hadoop?
The Apache Software Foundation has developed Hadoop as a software framework. It was created in open-source to distribute storage and handling of large data sets.It has been developed with the fundamental assumption that hardware failures are commonplace, and that they can be automatically dealt with in the system.
It makes the whole process of storing and sorting data more efficient, less time-consuming and less processing power dependent.
Due to the very nature of the product, storing sensitive data can be an issue, these are our tips to help minimize those issues.
Plan | What is the nature of the data?
Rolling out Hadoop across your network and hoping for the best just will not work; apart from anything else, you could find yourself in breach of any data compliance legislation that is relevant to your industry, country or even just your company.
You need to understand and identify what the data contains, whether any of it is of a sensitive nature and where that data will sit within the system. It needs to be safely sorted and contained within sub-sections or groups, meaning better control of access.
Basics | Security
Basic security measures should be a given on any system such as this, having one simple access for the entire system should be avoided at all costs; it is a sure-fire way of introducing risk.
Perhaps the best way of securing (and potentially monitoring access) is to split the data and access into relevant groups, with each group member having separate login credentials.Whilst on the face of it, this sounds unnecessarily complicated; it does give added protection and the benefit of easier long-term monitoring. As we stated in the planning stage, with the data being subdivided into groups, no singular person has access to the whole system, this isn’t just from the employee perspective, it should also mean that no-one outside of the network can access all the information also.
Choice | Mediation
Depending on the type of data that is being used, needed or perhaps even just stored, there are decisions to be made about the sensitive nature of the data.Data remediation is the process of securely removing data, in most cases, that is Personally Identifiable Information (PII) and is of a very sensitive nature (for obvious reasons).
Generally, the two simplest ways of doing this are encryption and masking, although of course there are numerous other ways depending on the level of remediation needed.
When introducing the Hadoop system, you should identify which technique works best for you and for the data, ensuring that the system will fit with your need.
Access | Controlling users
Although we have already mentioned the importance of access control, with particular regard to individual access credentials, thought should also be given to how that access or control of that access fits within existing processes within the organization, even down to the human resource element. An SME for example may need to have easier hierarchy in organization to get access control. A multi-national with dedicated departments can take access control to the further level.
Regulate | Infrastructure & Policy
Any system is only as good as its infrastructure and policy relating to that system.
A strict calendar of events should be created and adhered to, with the implementation of regular and meaningful training, additions (where needed) to the company’s employee handbook and a better understanding of data protection procedures and perhaps what the loss of that data could mean to the company. If an employee does not realize just what the cost could be, they may not be overly worried about the loss or breach of any data.
Housekeeping | Ongoing monitoring
Although regular monitoring of the system seems very low down on the priority list, it is perhaps one of the key areas. Once the system is live, you should monitor it regularly; how will you ever know that there have been issues without monitoring the system?
That directly relates to the next point; if you do not know that there have been issues, how will you fix or contain those problems?
As we can see, there is no one single point that is above any others, when using a system such as Hadoop, you should be willing to look at and invest in a number of different strategies for the safe storage of sensitive data.
Of course, there are steps that should be taken before introducing the system, such as the planning, but unless you are willing to act on the actions resulting from the planning or the monitoring, then you are fundamentally risking any personal or sensitive data that the system may contain.