PrivacyOps

The Sensitive Data Intelligence Adoption Process

Now that we have understood why organizations need Sensitive Data Intelligence and what benefits it serves, let’s look at the adoption process stepwise.

Step 1: Build a Catalog of All Shadow and Managed Data Assets

The first step in Sensitive Data Intelligence is to gather and build a catalog of all cloud-native and non-native data assets. This data could be across SaaS applications, structured and unstructured IaaS data stores across multiple cloud providers, or on-premises.

The following steps help organizations build their comprehensive data asset catalog:

Discover all current data assets, including the following:
- Shadow assets such as databases and file servers running on generic compute instances often go unaccounted for when migrated to cloud environments.
- Advanced metadata of shadow assets such as instance properties, version, vendor information, open ports, etc.
- Cloud-native data assets such as cloud storage buckets, data warehouses, data lakes, and databases are deployed across multi-cloud environments.
- Advanced metadata of native assets such as vendor information, encryption, status, port information, location, owner, size, etc.

Create visibility into data assets via native integration (APIs and other native integration mechanisms) that automatically extracts all assets and metadata associated with data assets into a single catalog. The SDI solution connects with configuration management databases (CMDBs) and cloud providers (e.g., AWS, Azure, GCP) to collect all data assets into a single repository.

Import and map various asset properties and attributes from CMDBs to the data catalog. The CMDBs are populated and enriched by synchronizing asset metadata.

Benefits of data asset discovery and building the data catalog
1	Out-of-the-box discovery of hundreds of sensitive data attributes.
2	A central repository of data assets. Without a central repository of data assets, organizations have no way to know where personal and sensitive data is stored and how it is protected.
3	The CMDBs are automatically and periodically scanned to ensure that data assets are kept up-to-date.
4	An enriched asset catalog helps organizations comply with cybersecurity frameworks such as CIS, NIST, and many others that require organizations to maintain an up-to-date inventory of their data assets.

Step 2: Enrich sensitive data catalogs with privacy, security and governance metadata

Every data asset has various metadata that can be classified into business, technical, and security categories. Sensitive Data Intelligence provides native connectors and REST-based APIs to scan and extract metadata, including all business, technical and security metadata associated with each asset. With this metadata, organizations can determine how their Personally Identifiable Information (PII), Personal Health Information (PHI), and similar sensitive data are protected and governed.

Business metadata: Business metadata attributes include data asset owners, data privacy officers, asset locations, IP address, etc. With these insights, DPOs can deliver privacy assessments, assess security measures and require assistance on other business tasks. Moreover, business metadata provides more business context about the data and can help map the relationships between objects in the catalog (like the relations between databases, datasets, and columns).
Technical metadata: Technical metadata in the context of privacy and security includes insights such as retention policies – the number of days the organization should retain a particular data attribute to comply with data retention and disposal policies. Organizations can use several other tags to describe the purpose of data processing or purpose limitation scenarios.
Security metadata: Security metadata tends to provide insights into the security posture of the data asset, how the data is protected, and define sensitivity labels such as public, general, confidential, highly confidential, and many more. Depending on the security metadata, the organization can enable security controls such as encryption, masking, tokenization, and anonymization. Data access policies ensure that data is only accessible to authorized personnel.

Benefits of enriching sensitive data categories with privacy, security and governance metadata
1	It provides more context and understanding to datasets.
2	It provides more insights into how a particular dataset should be handled, shared, and protected depending on its metadata category.
3	Retention labels within metadata manage the retention and disposition of data and control for how long the organization should keep a particular dataset and how the organization should handle it after the expiry of the retention period.

Step 3: Discover sensitive and personal data across any structured and unstructured assets

Once all assets and their metadata have been cataloged, the next step is to enrich these assets with insights about sensitive data stored in them. Sensitive data is a specific set of personal data that requires additional protection compared to other data types. Since sensitive data needs to be protected and managed separately from different kinds of personal data, it is paramount for organizations to detect and identify all sensitive data stored in their data assets.

Let’s look into some of the types of personal and sensitive data.

Types of personal and sensitive data
Personal data	Any information relating to an identified or identifiable person. For example, name, email, phone number, social security number, driver’s license number, passport number, postal address, location data, or IP address.
Healthcare	Health-related data that the organization can use to identify an individual such as medical record number, insurance number, medical images
Financial	Any financial information associated with users such as credit card numbers, account numbers, PINs, etc.
Educational	Educational and academic records such as degrees, courses, disciplinary records, dates of attendance.

Since personal and sensitive data is distributed across hundreds of data assets, the process of finding specific data attributes can be highly complex and time-consuming. Sensitive Data Intelligence helps organizations find specific data attributes within minutes across all structured and unstructured data stores. It also allows organizations to detect unique attributes that have specific requirements under global privacy laws.

This particular step involves detecting sensitive data in structured and unstructured data stores using in-built data attributes or custom attributes via a comprehensive detection engine. It has the following components:

Unstructured data catalog: Sensitive Data Intelligence detects sensitive files in unstructured data stores and categorizes them across coarse-level and fine-level document categories such as academia, legal, financial, human resources, and more. Document types can vary across research papers, medical consent forms, insurance forms, tax forms, financial statements, and custom ones that are proprietary to a specific organization. They can contain sensitive information such as social security numbers, credit card numbers, driver’s license numbers, and more. SDI leverages various purpose-built AI/ML techniques to achieve highly accurate & fine-grain document classifications.
Structured data catalog: Sensitive Data Intelligence leverages various AI/ML techniques that fuse numerous signals to provide highly accurate column classifications across structured data stores. This enables organizations to visualize all the sensitive data discovered in any of their structured data stores. It involves searching and finding data elements across all structured data systems and within specific databases, tables, and columns using powerful filters. These techniques apply automatically to custom data types and CSV, Avro, and other structured files.

Benefits of discovering sensitive and personal data across structured and unstructured assets
1	Highly accurate and fine-grain document classifications in unstructured systems
2	Highly accurate column classifications in structured systems.
3	The organization can run multiple data discovery scans in parallel based on business requirements.
4	Granular customization, data sampling, and targeting for high-speed scans.

Step 4: Enrich the sensitive data catalog with automated classification and tagging

Once sensitive data has been discovered from structured and unstructured data stores, the next step in Sensitive Data Intelligence is to enhance the sensitive data with automated classification and tagging. Sensitive Data Intelligence leverages machine learning technologies and natural language processing to deliver highly accurate auto-classification of datasets and data labeling. An extensible policy framework is used to automatically apply sensitive labels and metadata to files/documents for various use-cases.

The following steps help you achieve this:

Identify hotspots of sensitive data using sensitivity filters and labels. Sensitivity labels and hotspots such as confidential, private, and public indicate organizations how particular data sets should be managed, shared, and protected. The organization can apply sensitivity labels according to its data classification policies.
In addition to identifying hotspots in structured and unstructured data stores, a bi-directional integration is done between the organization’s systems and third-party data catalogs. This allows both organizations to synchronize privacy and security metadata. Flexible synchronization policies enable organizations to set up synced information on specific data stores.
Organizations can add additional metadata such as “purpose of processing” to files/documents indicating business reasons for holding data and ensuring data is used only for intended purposes. Finally, the Sensitive Data Intelligence tool automatically enriches the sensitive data catalog with purpose metadata.

Benefits of developing sensitive data catalog with automated classification and tagging
1	Based on the functional categorization of data, an organization can determine which privacy and security regulations apply to those datasets.
2	A data classification policy can be maintained that will help organizations demonstrate how particular data should be managed, shared, and protected.
3	Tags and labels enable organizations to find and report on data held for various purposes required by privacy regulations.

Step 5: Discover and Centralize Sensitive Asset & Data Posture

Once your asset catalogs have been enriched, the next step is to manage your security posture across your multi-cloud data assets, various SaaS applications, and on-premise clouds to ensure your data environment is secure.

Sensitive asset and data posture management help organizations gain visibility and configuration monitoring of data assets while ensuring adequate security settings. Organizations can scope configuration settings based on the sensitivity of data in them. For example, disabling public access data settings is required for data containing confidential information. However, data containing an organization’s website materials should have public access. Also, applying selective security settings based on the data’s sensitivity helps lower cost and management overhead. For example, enabling Cloudtrail or Server access logs broadly on all data is unnecessary and expensive, and the organization may only need it for regulated data for compliance audits.

Detecting security misconfigurationsThe most crucial function of sensitive asset and data posture management is detecting security misconfigurations of data stores. Security misconfigurations refer to inaccurately configured or unprotected datastores. For example, personal data stored without encryption, publicly accessible sensitive data, audit logging disabled, and similar security misconfigurations.
View security alertsAnother function of sensitive asset and data posture management is the ability to view security alerts. Organizations can view centralized security alerts by severity parameters such as very high, high, medium, which help them prioritize posture remediation. They can search and filter alerts to aid their security efforts and suppress alerts to reduce noise.

Dynamic enterprise environments require continuous data discovery scans to ensure regular security posture monitoring and compliance. Sensitive Data Intelligence provides the ability to monitor Sensitive Assets and Data Posture continuously. It also enables auto-remediation to resolve security risks instantly.

As a result of these processes, an organization can automate security and privacy controls. Once an organization has gained visibility into its data security posture, it can discover gaps in its security controls and orchestrate appropriate security controls to fill the gaps.

Benefits of Sensitive Asset and Data Posture
1	It provides visibility into the security posture of data and enables data protection.
2	Organizations can choose appropriate mitigation measures and security controls depending on the nature of data to be protected and its sensitivity.

Step 6: Visualize and configure data risk

It becomes challenging for organizations to determine which data poses the most significant security and privacy risks with a data glut. However, for continuous data security and compliance purposes, organizations need to understand the inherent risk of the data. Without a clear understanding of data risks, an organization may misallocate budgets and resources for risk mitigation activities and security controls.

This step provides an executive summary of an organization’s data risks in the form of a data risk graph. It provides a single numerical figure depicting the overall data risk.

This step has the following processes:

Risk Visualization:The data risk graph provides a numeric risk-centric view of sensitive and personal data in an organization’s environment with a clear breakdown of various risk contributors. This step enables organizations to review data risk at an aggregate global level or a granular level for each data store, location, personal data attribute, or data subject’s residence. In addition, they can track changes in global data risk over time and identify high-risk activities.
Identify high-risk assets:The data risk graph has highlighted and ranked order data risk by data assets, locations, owners, and personal data types to enable organizations to prioritize and target security budgets and resources towards high-risk areas. Organizations can also record historical data risk scores to track how risk scores improve or deteriorate over time.
Configure data risks:Organizations can customize and configure data risk scores using simple knobs to indicate the sensitivity of various factors. They can also set sensitivity levels based on data types, data locations, data subjects’ residencies, and data concentrations.

Benefits of data visualization, identification of high-risk assets & configuration of data risks
1	Organizations can take immediate containment and mitigation actions even before a security incident has taken place. For example, if a large amount of sensitive data appears in the data system, a spike in the risk score can alert organizations to take proactive actions.
2	Tracking global data risk scores can help organizations uncover high-risk activities and whether or not risk scores have reduced over time.

Step 7: Build a relationship map between data and their owners

This step is the final stage of Sensitive Data Intelligence and is paramount in ensuring compliance with global privacy laws. It involves building a People-Data-Graph to map personal data with its correct owners, i.e., customers, users, employees, and other individuals. People-Data-Graph is a graph between an individual and their personal data across all connected systems. It is an easy-to-use conversational interface.

This step has the following processes:

The People-data-graph automatically links personal data with their owners in all structured and unstructured data systems. PDG can identify documents or files that contain an individual’s personal information in complex unstructured data systems.
The People-data-graph gathers granular details about individuals and links them with their identities, assets, and all other personal data attributes. Using this information, organizations can fulfill DSRs, identify cross-border data transfers, and pinpoint impacted users in case of a data breach.

Benefits of building a relationship map between data and their owners
1	This step enables organizations to fulfill DSR requests within days instead of weeks or months and increase customer’s confidence about their privacy practices.
2	This step enables organizations to create relevant DSR and data breach reports in a secure portal, reducing personal data sprawl.