Office 365 Custom DLP: How to create custom Sensitive Information?

Yes this is interesting topic for me because it involve programming! I will make this topic as simple as learning alphabet, because I will be showing you the importance on how to create your very own DLP sensitive information. DLP templates are come in the form of xml file format.

*Note: You may need to spend some time in this. Practice makes perfect result.

Importance that you must include into your xml are:

  1. Rule
  2. Entity
  3. Pattern
  4. Identity Match/Id Match/Format/RegularExpression

#This is the flow chart

Rule -> Entity -> Pattern -> Identity Match/Format/Regular Expression

Ok, now you know what are the importance, next will be things you need to take note on the “importance” that can have multiple section. That is “Pattern” & “Identity Match”. You can only have 1 rule consist with 1 Entity, where that Entity can have multiple unique pattern types and each pattern can have its own unique ID Match.

Below is a sample of my code on how it looks like in xml;

*Note: You have to change the GUID of the highlighted red parts, as you can see there are 4 GUIDs, but only 2 GUID are the same. To get new GUID, you simple have to open your PowerShell and type the command “[guid]::newguid()”.

<?xml version="1.0" encoding="UTF-8"?>
<RulePackage xmlns="">
<!-- Need to change guid, rule package guid, [guid]::newguid()-->
<RulePack id="872155dc-1234-4e3e-a10d-x"> 
<Version build="0" major="1" minor="0" revision="0"/> 
<!-- Need to change guid, publisher guid --> 
<Publisher id="6907d14a-1234-4023-87cd-x"/> 
<Details defaultLangCode="en-us"> <LocalizedDetails langcode="en-us"> <PublisherName>Company Group</PublisherName> 
<Name>ID Custom Rule Pack</Name>
<Description> This rule package contains the custom ID entity. </Description> </LocalizedDetails> 

<!--This orange part, is your rule type-->
<!-- ID --> 
<!--This blue part, is your entity-->
<!-- need to change guid, entity guid--> 
<Entity id="b660289d-189e-1234-9e0a-x" patternsProximity="300" recommendedConfidence="70">
<!--This green part, is your pattern type-->
<Pattern confidenceLevel="80"> 
<!--This purple part, is your Identity match name-->
<IdMatch idRef="Regex_id1"/> 
<Pattern confidenceLevel="80">
<IdMatch idRef="Regex_id2"/> 

<!--This pink part is your Regular Expression-->
<!--Format: AB-C-DE-FGH--> 
<Regex id="Regex_id1">(\d{2})[-](\d{1})[-](\d{2})[-](\d{3})</Regex> 
<!--Format: ABCDEFGMANNN -->
<Regex id="Regex_id2">(\d{7})[mM][a-zA-Z](\d{3})</Regex> 

<!-- Resource guid same as rule guid --> 
<Resource idRef="b660289d-189e-1234-9e0a-x">
<Name default="true" langcode="en-us">ID</Name>
<Description default="true" langcode="en-us"> A custom classification for detecting IDs. </Description> 



The above xml consist of 2 patterns both are set with accuracy of 80%, means if DLP scanned your mail/sharepoint/onedrive consist what is inside the pattern and has 80% match percentage will trigger the rule. Inside each pattern consist unique identity which name “Regex_id1”  and “Regex_id2”. After that, is comes to setting the format for each unique identities. As you can see above, the format i had state in the comment.

*Note: The code above doesn’t limit your needs, you could play around with what you wish to include, such as keywords, false positive, or etc.. You could learn more about twerking around the codes by reading below references. You can also use any online tester site to test out regex of your code .



Author: sabrinaksy

Just a little girl who love what she does best.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s