Power BI Governance – Approve/Reject all dataset creation in your tenant

This is a post about Power BI Governance that will not use any Power BI capabilities. We are going outside the box here to get the job done on keeping your environment governed.

If you can complete 3 simple steps that I will show below, then you can get an email like the one below every time someone adds an unauthorized dataset into your Power BI environment.

This blog post was inspired by some great work by Kent Weare and our Flow team here: https://flow.microsoft.com/en-us/blog/automate-flow-governance/ It was just close enough to what I needed to do that it worked out. In this post if I skim over a few details, refer back to this post and it might help.

Scenarios

Power BI has introduced more governance controls recently with the GA of Workspaces V2. I encourage everyone to use the out of the box features that now exist especially the contributor role discussed here.

In my scenario, the problem happens to be that V1 workspaces don’t have the same governance capabilities and any Power BI Pro user can go create a Team or O365 group and they will automatically become an administrator of the corresponding Power BI workspace (V1) that gets created. I want to monitor for any datasets getting created in V1 workspaces and be able to delete them even though I am NOT an admin in those workspaces. This is NOT possible to do without intervention I will outline below. Of course, i want to be able to contact these users and coach them about using new V2 workspaces.

Another scenario that can be done with this is to monitor for “rogue” gateway datasets. Would need a few modifications to the below workflow but would also be a popular scenario. I have talked to some organizations that have found dozens of gateways being used that they didn’t know about.

Three simple steps

I am lying as these steps aren’t so simple. They require skills that aren’t necessarily what a BI administrator may have. The good news however is that if you can get a little help from an O365 global administrator and maybe a developer friend, you can knock this out in less than a day. I have done most of the hard work on this and collectively built this solution in probably two developer days.

The three steps described below are as follows

  1. Need to register an app in Azure Active Directory to be able to use the Office 365 Management API, the Graph API, and the Power BI API. You will need a global administrator grant a set of permissions to this App before steps 2 and 3.
  2. In Microsoft Flow (or in Azure Logic Apps) import the provided “Flow” that contains all of the logic required for this.
  3. Use Postman (or REST API tool of choice) to call the O365 Management API to set the Flow created in step 2 as a “webhook” to any O365 audit log events (which of course includes “PowerBI-CreateDataset”.

Once the above is completed you will be amazed at how you now have the control that you didn’t think was possible. You no longer have to constantly audit for bad actors.

Let’s get started!

Azure Active Directory (O365 permissions) Setup

You may need a trained professional for this as I won’t be listing out all the necessary steps here. What makes it even harder is that we recently changed the user interface for this so catching up with some old docs may be confusing.

Go to the Azure Portal and from Azure Active Directory go to “App registrations” and click on “New registration”. Give it a unique name and the redirect url can just be localhost

After it is registered, open it up and record/copy the following properties on the “Overview” dialog:

Application (client) ID
Directory (tenant) ID

You will need both of these GUIDs later.

Go to the “certificates & secrets” dialog and create a “new client secret”. Also record/copy this value. It will be referred to as “App Secret” later in this post and we will need it.

On the “API permissions” dialog you will need to add permissions from 3 services: Office 365 Management API, Graph API, Power BI Service. These should be listed as groupings that you can select specific permissions out of. The screen shot below shows the permissions out of each group you are going to need. Pay attention to the “Type” indicator of Application or Delegated.

Until this point, you might be surprised that you have been able to do all of this without a global administrator. Well, none of this will work if you don’t do a very important task of clicking the “Grant Consent” button at the bottom of this dialog page. This is going to either not exist or be grayed out if you are not an administrator.

Note: Go back to the overview dialog for this app and simply take note of the “Endpoints” button at the top. These are the different API authorization endpoints for the different services. I have provided what you need in the source code for the Flow and Postman, but it is nice to know how to get back to verify these later if something isn’t working for you.

Go back to Azure Active Directory and in the left nav go to “Users”

We need to find a Power BI Administrator that we don’t mind recording a password value for to be used in our Flow app. Often this is going to be a service account of some sort that gets used for API purposes ONLY and has different restrictions on password rotation/etc. This should have “Power BI Admin” privledges.

In my example i will be using the same account as a Power BI Admin Service Account and Power BI Admin email notification but you don’t want this in real life.  Get the “Object ID” from the Profile dialog and record/copy it for later use.

That’s it. Step #1 complete.

Import package to Microsoft Flow and Configure

In the source code repository on GitHub I have provided both a Zip file and a JSON file that can be used. I haven’t fully tested the JSON file as that is supposed to allow you to import directly to Logic Apps. Microsoft Flow after all is built on top of logic apps so if you prefer that route go for it. For this blog post we are going to use the zip file (aka package).

Two connections that I use in this flow that you should be aware of is “Office 365” and “Flow Approvals”

When you import the package, you are going to need to use existing connections for these or create them in the “Data” section.
Note: I have no idea about Flow licensing or if your organization is setup for it. It could be possible that doing Flow Approvals requires a certain SKU or something. Literally no idea. If the below import steps fail simply because of the Approvals connector, let me know via Twitter @angryanalytics and I can create a version without approvals and just notifications. 

The import package dialog looks like this. You can use the “Action” button to make some modifications upon import (not sure why anyone wouldn’t like the name “angry governance”)

The related resources “select during import” link is where you can define new connectors if you don’t already have them in the “Data” section of your Flow app.

Once imported hover over the Flow item and you will see an “Edit” icon appear, click that.

The below is a screen shot of the not even the entire flow, only about 90% of it because it was too big to zoom out enough. You can see that it is quite complex.

I will attempt to explain each step of this Flow below. This turned into a dissertation so if you don’t care and just want to move on to making it actionable, skip below.

  1. The Flow is triggered by an O365 webhook (which we will configure later). Any time an audit event is created in O365 (which includes Power BI events) this Flow is triggered.
  2. Variables need to be set (some of the stuff I had you record earlier will need to be filled in here… i will explain below). These variables drive communication with the APIs we need that all use the permissions setup in Step #1.
  3. Call the Get Log Event API from O365 Management APIs and pass it an authorization token and the contentUri from the trigger event (this is effectively a link to lookup what happened).
  4. In parsing the response of the Log Event, there are going to be multiple records return. It enters a for each loop and if the “Operation equals CreateDataset” we do further inspection of the event, otherwise we do nothing.
    1. Note that you could modify this to look for any events listed here. The “CreateGateway” event comes to mind as a good one to notify a BI administrator on. Maybe good for the next blog post.
  5. For my scenario, If we do find a CreateDataset event I want to check if the Group created is a V1 or V2 workspace. This is done in two parts.
    1. Call the Graph API to “GET” the Group (in Power BI we call them Workspaces but in O365/Graph they are Groups).
    2. If the Workspace that the dataset was created in is a V2 workspace I don’t care about it as I already have governance controls there like a Contributor role and an admin setting that allows me to restrict users from creating workspaces. V2 workspaces do not show up in O365/Graph so therefore if I don’t get a status code of 200 (which means the call was successful) then I assume it is a V2 workspace and therefore i can stop processing.
      Note: What i did also find out late in testing is that datasets created in “My Workspace” will also throw a non 200 status code so I am not auditing for datasets there either.
  6. If this is a V1 workspace, i am now going to use the Graph API to add my Power BI Admin service account to that workspace. I have to do this because if i choose to reject this dataset as an administrator, the Power BI API WILL NOT work to delete the dataset unless the user is an admin of that workspace. Adding this account also gives the option to the Power BI administrator to login with that service account and do a manual inspection before approve/reject.
  7. I now check for a status code of 204 which indicates I added/created the user assignment successfully. If it wasn’t successful, that could actually mean that my Power BI Admin service account was already an admin of that workspace (might be rare but could happen). If this happens it throws a 400. I am not doing further inspection on a 400 error at this time so be aware that something else could trigger that and could mess up your Flow. If 400 is thrown, i set a variable via a “Compose” element to indicate the user is already a member/admin of the group. I use this later. If I didn’t get a 204 and I didn’t get a 400 I assume some other error occured and i send an email to the Power BI admin that a flow has failures
  8. Now we send a polite email warning to the user who created the unauthorized dataset in a V1 workspace and let them know it will be reviewed by an admin.
  9. We start an approval process that sends an email to the Power BI admin email address that was set in the Variables section in Step #2 with all the important information about the dataset.
    Note: For some reason the ReportName and ReportId are not working and therefore the link to the item is broken. Need to fix this.
  10. If the admin clicks the “Approve” button on the email, an approval email is sent to the user who created the dataset, if not, proceed to step #11
  11. Now we have to call the Power BI API to Delete the dataset automatically. We send a rejection email to the user so they know it was deleted.
    Note: I don’t believe that V1 workspaces allow for use of user principals for authorization so this is why we had to set a Power BI Admin service account id and password in step #2. It would be cleaner to not have a user identity at all and just use the app permissions.
  12. Check the variable we set in Step #7 to see if the power bi admin service account needs to be deleted from the Group and if so we use the Graph API to do that so that there is no residual effect to that O365/Team from the Power BI admin inspection process once it is completed.

In the workflow, the only section that requires configuration is the “Set Variables” in the second action. Click on it to explode the “Scope”.

Other than the last variable for the email address to be used for the approvals, all of these are used for the API calls. They are required for authorization to do some pretty intrusive things like adding administrators to O365 groups and deleting Power BI datasets. This is why we needed an O365 global administrator to approve our app permissions earlier as this shouldn’t be taken lightly.

Let’s go through each of these and click on them to set the values:

Client Key – (aka App Secret) You recorded this in the first section. Often these will have special characters so we need to html encode it. Use https://www.url-encode-decode.com/ and encode your client key and take that value and paste it in here
Client ID – paste exactly as copied (shouldn’t contain special characters)
Tenant ID – paste exactly as copied
Power BI Admin ID (for Graph) – the Object ID for the admin service account exactly as copied
Power BI Admin email-username (for Power BI API) – the email address of the admin service account discussed in the first section
Power BI Admin Password (for Power BI API) – this is the password of the service account discussed in the first section
Power BI Admin Email (for Flow Approvals) – This is an email address of a REAL PERSON who is checking emails on this account. Someone who can make the desicion to approve/reject datasets. Can it be a distribution list or security group? I don’t know for sure, try it out 🙂

Once you have updated these variables, you have completed the configuration of your Flow (or logic app). The last thing you need to do is go to the first activity in your Flow (the trigger) and copy the HTTP POST URL. This is the “webhook” you need for Step #3.

At the top of the page, click “Save”.

That’s it. Step #2 complete.

Create the Webhook

This is by far the easiest shortest step of the 3… however it can be intimidating if you aren’t familiar with REST APIs and tools like Postman. This is where a developer friend can make very short work out of this task.

Every time an event happens in Power BI, we want our Flow to be called to inspect it. We can do this by setting up a webhook. A webhook is simply an automated endpoint that gets called every time something occurs.

One thing to note about Power BI events: It could take as long as 30 minutes after the event occurs for the event to show up in the audit logs. The webhook fires immediately after that, but when you go to test this, realize that it is pretty consistently 20-30 minutes before your Flow will get executed.

In the source code that i provided i included a Postman collection that can be used to do testing on a lot of the API calls that are used in the Flow. These are provided for troubleshooting purposes.

Import the collection from the my GitHub repository. You can either download it and import or simply “Import from Link”.

In that collection is an O365 Management API folder that has an “Auth” and then a “Create Webhook” API that need to be executed to hook up your Flow with the URL we recorded at the end of Step #2.

Run the “Auth” first. When you go to run it you will notice that i have used variables for the url and the body (x-www-form-urlencoded).

Create an environment with these variables or simply replace these values with those recorded in Step #1. For the redirectUri just use http://localhost or the custom value you used when setting up your app registration in step #1.

Send the request (should be a POST) and copy the value returned in the “access_token”. This value is how you authorize subsequent calls to the O365 Management API. It is good for one hour and holds information about the permissions your app has (those api permissions you setup in step #1).

Now open the “Create Webhook” request. In the “Authorization” tab replace the “Token” value with the access_token you just recorded. In the “Body” replace the {{flowurl}} variable with the Flow trigger url you recorded at the end of Step #2. Replace the {{activityfeedauthid}} with some arbitrary name. Of course i used “angrygovernance

If you got a success then your webhook should be created and you are now ready to test.

Step #3 complete!

Testing

Upload a Power BI report to a V1 workspace in Power BI. Go to the Flow you imported and Saved. After about 30 minutes you should see a Flow that is running (it has manual approval steps to be taken so it will run for months until approved/rejected.

you will see other “succeeded” runs of your flow that are not important. These are other events occuring in O365 such as you saving your Flow… We have a condition looking for only “PowerBI-CreateDataset” events before we are kicking off approval process.

You should see emails generated to the administrator email id you specified and also back to the user that created the dataset. If you reject the dataset, it should automatically get deleted (along with the report) and the Power BI Admin user should get removed from the V1 workspace if it was not already a member of it before the Flow started.

Conclusion

I know that your scenarios may vary from mine that i have provided above, but with this Flow to use as a basis, you should be able to do some adjustments and create really kick ass governance Flows that give you more control in Power BI than you ever had before.

 

 

 

 

Dealing with Budget vs Actuals Closed Dates

This post is an extremely practical one targeted at business analysts. Most of you have a scenario that includes a general ledger and monthly budgets that have corresponding actuals. Often there are actuals in the system that you don’t want to include because they haven’t “closed” yet.

If you are somewhat new to Power BI, you are probably handling this by manually setting filters in your Power BI report and having to adjust them each month. When i see report authors try to automate this i typically see a single measure including a FILTER expression to account for ONLY closed actuals. The problem with this approach is that you end up writing this FILTER expression many times over if there are many measures that need to observe closed vs unclosed actuals. And this approach also doesn’t allow for adjusting axis on bar charts displaying your budget and actuals values.

In this blog post I will show an alternative option that is pretty easy and allows it to be applied to any measure that you may include in your model. For the PBIX file referenced below, you can find it on my GitHub

In the report below, notice that my main “% Variance to Budget” measure on the left side is including unwanted actuals from 2016, a few months in 2017 and 2018. I am using the same “% Variance to Budget” measure on the right side but is only including months that have a budget defined (2017 only) as well as ONLY months that actuals have closed (January through October excluding November and December).

The two visuals on the right are utilizing a calculated date table based on our “closed actual dates” instead of the full date dimension. Lets review the simple data model being utilized in this example.

I have a budget table that includes the status if that month has been “closed” or not.

Date Budget Status
01/01/2017 300 Closed
02/01/2017 500 Closed
03/01/2017 500 Closed
04/01/2017 400 Closed
05/01/2017 400 Closed
06/01/2017 500 Closed
07/01/2017 500 Closed
08/01/2017 400 Closed
09/01/2017 400 Closed
10/01/2017 400 Closed
11/01/2017 400 Open
12/01/2017 500 Open

 

Notice that it does not include any dates from 2016 or 2018 but also shows the status of November and December as “Open”.

Now in my actuals table, you can see that the dates include 2016, November and December of 2017, and 2018.

Date Actuals
01/01/2016 800
03/01/2016 100
09/12/2016 4000
11/22/2016 250
01/04/2017 100
01/28/2017 300
02/01/2017 500
03/12/2017 200
04/05/2017 400
04/22/2017 100
05/02/2017 300
05/28/2017 100
06/29/2017 500
07/01/2017 100
07/04/2017 800
07/23/2017 200
08/09/2017 400
09/21/2017 100
09/23/2017 500
10/11/2017 300
10/20/2017 100
10/31/2017 250
11/12/2017 100
11/21/2017 200
12/01/2017 500
12/22/2017 2000
12/30/2017 100
01/02/2018 200
02/02/2018 1000

I have pulled both of these tables into my data model and have also included a Date dimension table that i typically use. However,  have also created a “calculated table” from the “New Table” icon on the modeling tab that is going to reflect our “closed actuals dates”.

The formula for this calculated table is below:

ClosedDates = CALENDAR(
    MIN(Budget[Date]),
    CALCULATE(EOMONTH(MAX(Budget[Date]),0),Budget[Status]="Closed"))

 

The CALENDAR() DAX function expects two dates that it will use as the beginning of the date range and the end of the date range that will be produced. All dates between these two dates will be produced as a table.

The “Start Date” is pretty easy as we will use the minimum date from the budget table “Budget[Date]”. The “End Date” however is a bit more complex. A Calculate statement is used so that we can apply a FILTER to the table to ensure the Budget[Status] is “Closed”. This is the expression is the second parameter to the CALCULATE function. The first parameter is doing two things:

  • First it is getting the MAX Budget Date which in our budget table after the filter for status is applied would be 10/1/2017. The problem is that our budget is monthly while our actuals are daily. The monthly budget dates are being stored as the first date of the month.
  • So, the second function being applied is EOMONTH which stands for “End of Month”. This will give the last date in the month based on a date value in its first parameter and an offset (how many months from the date that is given) in which our case we want to use the same month so the offset is zero.

 

Now we have a calculated table that includes ONLY dates we want to use when comparing actuals vs budget. In the data model, lets connect this to the Date Dimension table as that is the table that would likely hold the relationship to the Actuals and Budget table.

You can see from my model i did add two additional calculated fields to the ClosedDates table for Month and Year. Those formulas are below:

Month = Month(ClosedDates[Date])

Year = Year(ClosedDates[Date])

 

To test this, i used a percentage difference formula that i created from a “Quick Measure” by clicking the “…” next to a table you want to create the measure in.

I named mine “% variance to budget” which automatically produced the following DAX expression:

% Variance to Budget = 
VAR __BASELINE_VALUE = SUM('Budget'[Budget])
VAR __VALUE_TO_COMPARE = SUM('Actuals'[Actuals])
RETURN
 IF(
  NOT ISBLANK(__VALUE_TO_COMPARE),
  DIVIDE(__VALUE_TO_COMPARE - __BASELINE_VALUE, __BASELINE_VALUE)
 )

 

Now that the measure is complete, add a card visual displaying it.

Initially you will get a number that contains all actuals being used as the comparison value to the budget, but by simply dragging into the FILTERS pane your ClosedDates[Date] field as a filter in the visual or even at the page level (if everything on this page is comparing actuals against budget) equal to “is not blank” it will ONLY show actuals for dates that are closed and a budget is defined.

Now your measure is only showing variance based on the dates that exist in the Closed Dates calculated table

Because we created a calculated table, it can also be utilized in a bar chart showing our “actuals” and “budget” fields without having to wrap them in measures or manually filter out the time frames that don’t have budget associated or closed actuals.

 

Conclusion

Not the flashiest blog post but I hope can help solve a real practical problem for many business analysts trying to compare budgets vs actuals or even forecasts.

Power BI Audit Log Analytics Solution

As Power BI adoption in your organization grows, it becomes more and more important to be able to track the activity in the environment.

When you start to think about deploying a Power BI Audit Log solution that is repeatable there are a few challenges that you will face.

  • Going to the O365 Audit Logs portal each time you want to extract log events is a manual process and doesn’t scale
  • When automating this process through API or PowerShell, there is a limit to how much data you can pull, therefore examples that are currently available also don’t scale very well
  • The AuditData field is a JSON format by default and although Power BI can parse JSON beautifully, when doing this over several thousand record entries may result in data load errors

Based on these factors, i have put together a PowerShell script that can be scheduled on a nightly basis that can iterate MORE than 5000 records so that no data is lost. Also, the screenshot below is of an initial template that you can use to start with to analyze your audit logs for your organization.

TL;DR

.

  • The required files can be found on my GitHub
  • Update the PowerShell script with a UserID and Password that has O365 audit log privileges
  • Use Task Scheduler to schedule the PowerShell script to run each night at midnight (run as admin).
  • At the end of the script, specify the directory you would like the script to generate CSV files in
  • In the PBIX file, it was challenging to get a parameter to work for the file location that the CSVs are in, so in the Query Editor the script for the AuditLog table needs to be manually modified to include your file path.
  • Enjoy

Quick look at the PowerShell

First, there is a PowerShell script.

Set-ExecutionPolicy RemoteSigned

#This is better for scheduled jobs
$User = "<<enter o365 admin user email here>>"
$PWord = ConvertTo-SecureString -String "<<enter password here>>" -AsPlainText -Force
$UserCredential = New-Object -TypeName "System.Management.Automation.PSCredential" -ArgumentList $User, $PWord

#This will prompt the user for credential
#$UserCredential = Get-Credential


$Session = New-PSSession -ConfigurationName Microsoft.Exchange -ConnectionUri https://outlook.office365.com/powershell-liveid/ -Credential $UserCredential -Authentication Basic -AllowRedirection
Import-PSSession $Session



$startDate=(get-date).AddDays(-1)
$endDate=(get-date)
$scriptStart=(get-date)

$sessionName = (get-date -Format 'u')+'pbiauditlog'
# Reset user audit accumulator
$aggregateResults = @()
$i = 0 # Loop counter
Do { 
	$currentResults = Search-UnifiedAuditLog -StartDate $startDate -EndDate $enddate `
								-SessionId $sessionName -SessionCommand ReturnLargeSet -ResultSize 1000 -RecordType PowerBI
	if ($currentResults.Count -gt 0) {
		Write-Host ("  Finished {3} search #{1}, {2} records: {0} min" -f [math]::Round((New-TimeSpan -Start $scriptStart).TotalMinutes,4), $i, $currentResults.Count, $user.UserPrincipalName )
		# Accumulate the data
		$aggregateResults += $currentResults
		# No need to do another query if the # recs returned <1k - should save around 5-10 sec per user
		if ($currentResults.Count -lt 1000) {
			$currentResults = @()
		} else {
			$i++
		}
	}
} Until ($currentResults.Count -eq 0) # --- End of Session Search Loop --- #

$data=@()
foreach ($auditlogitem in $aggregateResults) {
    $datum = New-Object –TypeName PSObject
    $d=convertfrom-json $auditlogitem.AuditData
    $datum | Add-Member –MemberType NoteProperty –Name Id –Value $d.Id
    $datum | Add-Member –MemberType NoteProperty –Name CreationTime –Value $auditlogitem.CreationDate
    $datum | Add-Member –MemberType NoteProperty –Name CreationTimeUTC –Value $d.CreationTime
    $datum | Add-Member –MemberType NoteProperty –Name RecordType –Value $d.RecordType
    $datum | Add-Member –MemberType NoteProperty –Name Operation –Value $d.Operation
    $datum | Add-Member –MemberType NoteProperty –Name OrganizationId –Value $d.OrganizationId
    $datum | Add-Member –MemberType NoteProperty –Name UserType –Value $d.UserType
    $datum | Add-Member –MemberType NoteProperty –Name UserKey –Value $d.UserKey
    $datum | Add-Member –MemberType NoteProperty –Name Workload –Value $d.Workload
    $datum | Add-Member –MemberType NoteProperty –Name UserId –Value $d.UserId
    $datum | Add-Member –MemberType NoteProperty –Name ClientIP –Value $d.ClientIP
    $datum | Add-Member –MemberType NoteProperty –Name UserAgent –Value $d.UserAgent
    $datum | Add-Member –MemberType NoteProperty –Name Activity –Value $d.Activity
    $datum | Add-Member –MemberType NoteProperty –Name ItemName –Value $d.ItemName
    $datum | Add-Member –MemberType NoteProperty –Name WorkSpaceName –Value $d.WorkSpaceName
    $datum | Add-Member –MemberType NoteProperty –Name DashboardName –Value $d.DashboardName
    $datum | Add-Member –MemberType NoteProperty –Name DatasetName –Value $d.DatasetName
    $datum | Add-Member –MemberType NoteProperty –Name ReportName –Value $d.ReportName
    $datum | Add-Member –MemberType NoteProperty –Name WorkspaceId –Value $d.WorkspaceId
    $datum | Add-Member –MemberType NoteProperty –Name ObjectId –Value $d.ObjectId
    $datum | Add-Member –MemberType NoteProperty –Name DashboardId –Value $d.DashboardId
    $datum | Add-Member –MemberType NoteProperty –Name DatasetId –Value $d.DatasetId
    $datum | Add-Member –MemberType NoteProperty –Name ReportId –Value $d.ReportId
    $datum | Add-Member –MemberType NoteProperty –Name OrgAppPermission –Value $d.OrgAppPermission

    #option to include the below JSON column however for large amounts of data it may be difficult for PBI to parse
    #$datum | Add-Member –MemberType NoteProperty –Name Datasets –Value (ConvertTo-Json $d.Datasets)

    #below is a poorly constructed PowerShell statemnt to grab one of the entries and place in the DatasetName if any exist
    foreach ($dataset in $d.datasets) {
        $datum.DatasetName = $dataset.DatasetName
        $datum.DatasetId = $dataset.DatasetId
    }
    $data+=$datum
}

$datestring = $startDate.ToString("yyyyMMdd")
$fileName = ("c:PBIAuditLogs" + $datestring + ".csv")
Write-Host (" writing to file {0}" -f $fileName)
$data | Export-csv $fileName

Remove-PSSession -Id $Session.Id

 

  • Notice that you need to enter O365 audit log privileged credentials at the top so that this can be ran automatically. If you have more clever ways to pass these credentials in so they are not exposed in the file by all means, do that
  • The Do/Until loop handles if there are more than 5000 records in the result set which would easily be the case for a large Power BI community.
  • The foreach loop extracts the AuditData column JSON format and creates an individual record for each entry. This makes the Query Editor in Power BI less complex and easier to accomplish retrieving several hundred thousand records without import errors
  • finally we create a CSV for the data with the date of the file entries (yesterdays info if this is ran at midnight every day). This dumps each file in c:PBIAuditLogs. You can obviously change this file location to wherever you want to store your CSV extracts

You can use Task Scheduler to run the above PowerShell script every night at midnight.

The PBIX file

In the Power BI file, we are connecting to the content of the entire folder shown above. I went ahead and included the PBIX file WITH the sample data so you could get an idea of what your data may look like.

This is where i have to admit that i tried to use a parameter for this but ran into some Query Editor challenges with how Power BI creates a Sample File transform to import multiple files from a single folder. If you can see what i did wrong here I would love your feedback, but for now, you can ignore the file directory parameter in the Query Editor and need to go to “Advanced Editor” on the “AuditLog” query and modify the file location to be the location you are dumping files from the PowerShell script.

Change the below file location as needed.

Once you have made this change, you should be able to “Close and Apply” and your data will now be populated in this basic audit log analytics view.

Using the file

I created a couple basic pages to get this blog post shipped and so you can start taking advantage of the solution, but it is nowhere near as complete as you can eventually make it. I have a simple overview page that was screenshotted above. It can help you determine number of active users, reports, dashboards, and datasets being used for any time period your audit log data covers.

The second page is a user activity view i created from a calculated table. It helps you determine which users in the system may be inactive so you can re-assign power bi licenses and also shows detailed activity for an individual user that you can select from the slicer.

Other things you can mine from this data:

  • Who has signed up for Free Power BI trials
  • What “Apps” are being created
  • what embed tokens are being generated and used
  • many other possibilities

The PowerShell script and the PBIX file are located on my GitHub here

 

 

Data Driven Subscriptions in Power BI

What was that? Did I just say you could create a data driven subscription in Power BI? Like old school SSRS reporting where i can determine that these 5 sales people get an email with the information only relevant to them while these other 5 people over here get information only relevant to them?

Yep, that is exactly what i said…

Really this is pretty basic stuff and although you will see that this technique is a bit limited in options (for instance, we can ONLY send the hyperlink to the report and not a PDF or attachment) it works nonetheless.

Microsoft Flow

I have chosen to do this via Microsoft Flow, but this technique could be achieved several ways. You can use the scheduler of your choice. Flow is a business friendly workflow engine and if you are already an Office 365 shop, it is likely you can use it as part of your current licensing. In this article we are going to use Flow to iterate through a subscription table stored in SQL Server twice a day and send emails with a hyperlink to the properly filtered report.

To learn more about Flow and how to evaluate it for FREE, visit https://preview.flow.microsoft.com/en-us/

The Power BI Report

In this blog post we will use the AdventureWorksDW database and key off of the “DimSalesTerritory” table to determine which sales country should be filtered in the subscription notification. The report has two pages showing sales information. The PBIX file is available on my GitHub repository if you don’t want to create your own.

When setting up the report, add DimSalesTerritory.SalesTerritoryCountry to the Report Level Filters. Adding to the Report Level will ensure that EVERY page of the report adheres to the filtered value that comes from the subscription (described below).


Enabling Data Driven Subscriptions

To achieve this we will use url query string parameters as described here when putting the hyperlink in the email notification.

To create a custom hyperlink for each email address (to make it truly data driven), I am using a SQL Server table to store the Sales Territory each employee should see in the report. Below is the DDL for the “subscriptions” table that I am adding to the AdventureWorksDW database. However, this could easily be a SharePoint list or some other data store that can be accessed from Microsoft Flow to drive the email subscriptions

 

CREATE TABLE [dbo].[subscriptions](
[id] [int] IDENTITY(1,1) NOT NULL,
[email] [varchar](50) NOT NULL,
[groupid] [varchar](50) NOT NULL,
[reportid] [varchar](50) NOT NULL,
[filtertable] [varchar](50) NULL,
[filtercolumn] [varchar](50) NULL,
[filteroperator] [varchar](12) NULL,
[filtervalue] [varchar](50) NULL,
[modifieddt] [datetime] NULL,
[modifieduser] [varbinary](50) NULL
)
GO

 

URL Formats in Power BI

The table above will contain the email address of the recipient as well as critical sections of the Power BI URL: the group id and the report id.

There are 3 different formats of the base url that for this example you will have to determine how your report is available to users in Power BI.

If the report is shared directly in your personal workspace, the format is below
https://app.powerbi.com/groups/me/reports/{reportid}/ReportSection

Notice that in this scenario, the group identified is “me”. This indicates the report is available in your personal workspace.

If the report is shared with you from an App Workspace (formerly known as Group Workspace), the format is below
https://app.powerbi.com/groups/{groupid}/reports/{reportid}/ReportSection

As described in my previous blog post Power BI Content Workflow – with Apps , the best practice however is to distribute content via an “App” and therefore the format should render as below:
https://app.powerbi.com/groups/me/apps/{groupid}/reports/{reportid}/ReportSection

In this scenario, we are using the groupid value from the SQL Server data source as the App and you can see that the group is still specified as “me”

Query String

In the guidance referenced above for including a query string, all of the above formats would have “?filter=DimSalesTerritory/SalesTerritoryCountry eq ‘United States’” for instance appended to set the report filter to only shows sales related to the United States.

SQL Server Entries

In this example, there are two entries in my subscriptions table, one for steve@angryanalytics.com that will be sent a link via email for ‘United States’ sales and another for bob@angryanalytics.com that will be sent a link for ‘France’ sales.

Setting up Microsoft Flow

In Microsoft Flow, there are two types of activities: Actions and Triggers. We will create a “Schedule” trigger in a blank flow

From the Schedule (Recurrence) trigger, select the frequency to be Daily and the proper time zone from the advanced options. Then you can select multiple hours you want the subscription to be ran

To add an additional action, use the New Step button

After that select the “SQL Server – Get Rows” action and find your “subscription” table. Note: You will need to connect to your database first if you have never connected before via Microsoft Flow.

 

Magic

And this is where the magic happens if you have never used flow before… add the “Send an email” action. Flow detects the previous action and sees that there is a “Get Rows” that has been performed. It will first create a loop (Apply to each) for every row in the table and then it will show the columns available from the selected rows so that they can be used in the “Send an email” action.

The email address can be used for the “To” location. I created a static subject for “Your Power BI report is ready!”. Click “Show advanced options” to expand the selection and make sure to set “Is HTML” to “Yes”.

Finally in the “Body” use an html tag <a href=”…”>Click Me!</a> or something more elegant of course. In the href attribute string together the base url (in my case, “https://app.powerbi.com/groups/me/apps/&#8221;) and append the rest of the variable data from the subscriptions table to make dynamic data driven link.

Now Save your flow and you can click “Run Now” to test it (this will run it now regardless of the trigger schedule you created).

Result

As expected I receive an email for both steve@angryanalytics.com and bob@angryanalytics.com

Because when we setup the Power BI report, we used a Page level filter, both pages of the report will auto filter to the expected sales territory.

 

Now, every day at the times selected for the schedule 100s of people can get the filtered report link that would show them the most up to date information after data refreshes have occurred.

Other Actions

In this example, I used a simple email, however, you could have selected Twilio to use for text messaging notifications or something like SendGrid to create highly polished email content that looks much better than a simple url link. More complex workflows could also be performed by created conditional logic or more complex filters.

Conclusion

As said, this is pretty basic stuff… but very powerful.

 

 

 

Working with Scatter Plots in Power BI

I really like some of the advancements that have been made in Power BI scatter plots over the last few months. I wanted to point out some capabilities you may not be using that maybe you should be.

Data Sampling Improvements

In the September 2017 release, you can now be confident that all of your outliers are being shown. No one can visually look at a plot and interpret several thousand data points at once, but you can interpret which of those points may be outliers. I decided to test this out myself between a Python scatter plot of 50k data points and Power BI.

In the test, I used a randomly generated normal distribution of 50k data points to ensure I had some outliers.

#Create a random dataset that has a normal distribution and then sort it (in this case, 50000 data points)
x = np.random.normal(50,25,50000)
x = np.sort(x)

#Create another dataset to put on the y axis of the scatter plot
y = np.random.normal(50,25,50000)

#plot the dataset
plt.scatter(x,y,alpha=0.3)
plt.show()

(You can see the Python notebook on my GitHub here).

Here it is in Python:

Here it is in Power BI (September desktop release)

Notice that all the outliers have been preserved. Note that in previous releases, the Power BI rendering of this would have been shown as below.

This is a great improvement. To learn more about this update, check out the official blog post on high density sampling: https://powerbi.microsoft.com/en-us/documentation/powerbi-desktop-high-density-scatter-charts/

Working with Outliers (Grouping)

Now that we know the dense sampling is preserving our outliers, we can perform some analysis on them. Power BI makes it easy to CTRL+click on multiple outliers and then right-click and add new Group

This will create a new field in your fields list for this group of outliers and will automatically include a Group for “Other” (the other 49.993 data points that weren’t selected). Note that I renamed my field to “High Performers”

As this is a random dataset with integers for x,y values there are no dimensions here that may be interesting to compare, but consider now we can always come back to this grouping for further analysis such as the bar chart below:

Clustering

You can also use “…” in the upper right of the scatter chart to automatically detect clusters. Our example again is a bit uninteresting due to it being a random normal distribution but gives you an idea of how you can cluster data that is more meaningful.

Symmetry Shading and Ratio Lines

These gems were released in the August 2017 desktop release and really helps visualize the skew of your data.

Both of these can be turned on from the analytics tab.

Instead of using our sample dataset above I will use the dataset from my last blog post on Scorecards and Heatmaps.

In the below plot I took the SalesAmount field and plotted on the y axis against the SalesAmountQuota field on the x axis. From Symmetry shading we can observe that none of our sales people are meeting their quota. From the ratio line we can see the few individuals that have a positive variance to the ratio while most are flat or below the ratio.

You can read more about these two features in the August Desktop blog post: https://powerbi.microsoft.com/en-us/blog/power-bi-desktop-august-2017-feature-summary/#symmetryShading

Conclusion

These are just a few of the recently released features that I think have made the native scatter chart in Power BI a very useful visual. I have posted the PBIX file for the normal distribution data on my GitHub if you would like to download: https://github.com/realAngryAnalytics/angryanalyticsblog/tree/master/20171002-scatterplots

 

 

Performance Scorecards and Heatmaps

The new Matrix visual in Power BI is impressive and with a few tricks you can make great interactive scorecards and heat maps. At first glance, you may assume this to be easy, but you will see in the below content you have to make some deliberate decisions with formatting and also create several measures to get the job done.

I have the finalized PBIX file on my GitHub repository so use that for reference as we walk through building these visuals.

I am using the AdventureWorksDW dataset for this. I have imported the FactResellerSales table and selected the related tables. The main tables used will be FactResellerSales, DimEmployee, DimSalesTerritory, FactSalesQuota

I have created 5 base measures to be used for our scorecard. Note that I have manufactured the Customer Complaints value by using RAND. The required measures are below:

//add this as a calculated column in FactResellerSales table
Customer Complaint Indicator = IF(ROUND(RANDBETWEEN(0,1000),0)=999,1,0)

//add these measures to the FactResellerSales table
Sum of Sales Amount = sum(FactResellerSales[SalesAmount])

Total Orders = DISTINCTCOUNT(FactResellerSales[SalesOrderNumber])

Discount Percentage = IFERROR(sum(FactResellerSales[DiscountAmount]) / sum(FactResellerSales[SalesAmount]),blank())

Sum of Customer Complaints = sum(FactResellerSales[Customer Complaint Indicator])

Quota Variance = ([Sum of Sales Amount] - sum(FactSalesQuota[SalesAmountQuota]) ) / sum(FactSalesQuota[SalesAmountQuota])

Also, one more change to make things more readable. In this data set in DimEmployee do a rename on the EmergencyContactName to Full Name. I noticed that this was the concatenated value of First Name and Last Name… or you could create a calculated column if you wish.

To create a scorecard as I have shown above, we need to use the “Selected Measure” trick where we will create a Measure table and use a DAX expression to populate the appropriate measure value depending on which value is selected in the table.

You can use the “Enter Data” button in the ribbon to simply enter the 5 values (and optionally an order) as shown below

Now with the Metrics table and the measures you have added above, you can create the below measure to switch between them based on a slicer or filter that has been applied

Selected Measure = IF(
    ISFILTERED(Metrics[Value]),
    SWITCH(
        FIRSTNONBLANK(Metrics[Value],1=1),
        "Sales Amount",sum(FactResellerSales[SalesAmount]),
        "Total Orders",[Total Orders],
        "Discount Percentage",[Discount Percentage],
        "Customer Complaints",sum(FactResellerSales[Customer Complaint Indicator]),
        "Quota Variance",[Quota Variance]),
    BLANK())

This snippet shows a SWITCH statement used against the FIRSTNONBLANK value only if the Metrics table ISFILTERED

You can try this out by creating a simple bar chart visual using the Selected Measure as the value and the DimEmployee[Full Name] (that we created earlier) as the X Axis. Also add the Metrics[Value] as a slicer

This simple trick allows us to get a row for each measure on our scorecard later.

Lots of Measures

The next part is tedious. Each of our 5 measures above actually requires 5 additional measures to get the attributes found in the scorecard. This means 30 total measures to show 5 rows in our scorecard.

The opportunity here however is if you are not a DAX expert, this may be a good primer for you to understand how DAX really makes almost anything possible in PowerBI. I will show the formulas for “Sales Amount” and a few nuances for other metrics after that. For all the details, refer to my GitHub repository with the PBIX file.

For our scorecard, we want to be able to show a percentile and a quartile value. This makes it easy at a glance to see how an individual (or sales region) is doing in respect to their peers. To get either of these values, we must first calculate the RANK of the sales person.

We will use RANKX to calculate the RANK of the sales person for [Sum of Sales Amount]

Sales Amount Rank =
    RANKX(
        ALL(DimEmployee),
        [Sum of Sales Amount])

Note the use of the ALL function over DimEmployees. Because FILTER Context is applied in a table or matrix visual, when a row is shown for employee “Amy Alberts”, because of relationships in the data model, the only rows from the FactResellerSales table that are being evaluated for a measure are the ones for Amy Alberts’ employeeKey. We need to use ALL to ensure that filter context is cleared when evaluating the RANKX expression. See the below table as an illustration.

There is one more problem with our DAX however… DimEmployee has over 200 employees and not all of them are in Sales. The above DAX will calculate a rank for ALL employees instead of JUST the sales employees. One method to reduce this is to simply set this column to BLANK() if the [Sum of Sales Amount] measure is blank. The final DAX used is below. (I used a variable for readability but not necessary)

Sales Amount Rank =
var rnk = RANKX(
        ALL(DimEmployee),
        [Sum of Sales Amount])
RETURN IF(ISBLANK([Sum of Sales Amount]),BLANK(),rnk)

NOTE: simply using sum(FactResellerSales[SalesAmount]) will not work here, this is why we created a measure for it

With a Rank value, now we can calculate the “percentile” the sales person is in. This is effectively “normalizing” the ranks.

Sales Amount Percentile =
var a = countrows(
            FILTER(ALL(DimEmployee),DimEmployee[DepartmentName]="Sales"))
var b = [Sales Amount Rank]
return IF(b=0,BLANK(),IFERROR(ROUND((a-b)/a,2),blank()))

To determine the denominator, we are using a FILTER to only retrieve the employees that are in the “Sales” department.

Now with the percentile, we can determine the quartile the sales person is in.

Sales Amount Quartile = 
IF(ISBLANK([Sales Amount Percentile]),BLANK(),
    IF([Sales Amount Percentile]>=.75,1,
        IF(FactResellerSales[Sales Amount Percentile]>=.5,2,
            IF([Sales Amount Percentile]>=.25,3,
                4))))

I have used the ISBLANK() function to ensure I do not get any unwanted values showing up in Quartile 4. (Note: I had an Excel savvy client tell me I could use MOD() for this but I couldn’t get it to work, so this may be more verbose than it has to be)

Lastly, lets get the average Sales Amount for the company as a comparison for the score card.

Sales Amount Avg = CALCULATE(
              AVERAGEX(DimEmployee,[Sum of Sales Amount]),
              FILTER(ALL(DimEmployee),DimEmployee[DepartmentName]="Sales"))

CALCULATE applies the second argument’s FILTERed table to the first argument’s expression. AVERAGEX will get the average value of [Sum of Sales Amount] by the DimEmployee record

Now apply the same logic to the other 4 metrics we want to put on our scorecard. A few things to note are explained below.

When using RANK, some of the metrics such as Discount Percentage (as we don’t want to give big discounts to make sales) and Customer Complaints need to be ranked in ascending order instead of the default of descending order. So in those instances, we need to add an additional parameter to the RANKX function.

Customer Complaint Rank =
var rnk = RANKX(
              FILTER(ALL(DimEmployee),
                  NOT(ISBLANK([Sum of Customer Complaints]))),
              [Sum of Customer Complaints],,ASC)
RETURN IF(ISBLANK([Sum of Sales Amount]),BLANK(),rnk)

We have added “ASC” as the 4th argument to RANKX for “Ascending” order. Notice the missing 3rd argument. This is to tell the rank function if you want DENSE ranking. We want the default which means if 3 values are tied for the 10th ranking, the next rank will show 13 instead of 11.

Also, in the above measure, we have added a FILTER to the ALL(DimEmployee). See the below screenshot to see the RANK without this filter:

When the FILTER expression appears, we are further reducing the table in the first argument. Here, we are checking for [Sum of Customer Complaints], if it ISBLANK(), then applying a NOT to that. So, if [Sum of Customer Complaints] is not blank, we want to include those rows in the ranking. This is necessary to add when doing an ASCENDING rank as employees in the employee table that have no sales records are still getting ranked. This is true also when in DESCENDING order but it doesn’t matter because those rankings are below the sales people and because we have applied the ISBLANK([Sum of Sales Amount]) after the rank has occurred, they are filtered out anyway.

Just as was done with the Selected Measure before, each of these additional measures that may be displayed such as “Quartile” and “Percentile” need to have the ISFILTERED condition applied to the Metrics table:

Selected Percentile = IF(
    ISFILTERED(Metrics[Value]),
        SWITCH(
            FIRSTNONBLANK(Metrics[Value],1=1),
            "Sales Amount",[Sales Amount Percentile],
            "Total Orders",[Total Orders Percentile],
            "Discount Percentage",[Discount Percentage Percentile],
            "Customer Complaints",[Customer Complaints Percentile],
            "Quota Variance",[Quota Variance Percentile],
            BLANK()))

Repeat this with Quartile and Avg measures.

Formatting the Scorecard

Create a new matrix visual

Now select the Metrics[Value] as Rows and your Selected Measure measures as values.

On the screenshot above you will notice that I added an additional measure Selected Measure Target . This follows the same pattern as shown above with the SWITCH statement and uses the sum of FactSalesQuota[SalesAmountQuota] as well as the 3 static values for Customer Complaints Target, Discount Percentage Target, and Quota Variance Target. I left Total Orders as BLANK(). The final measures are shown below.

Customer Complaints Target = 0

Discount Percent Target = .01

Quota Variance Target = 0

Selected Measure Target = IF(
    ISFILTERED(Metrics[Value]),
        SWITCH(
            FIRSTNONBLANK(Metrics[Value],1=1),
            "Sales Amount",sum(FactSalesQuota[SalesAmountQuota]),
            "Total Orders",BLANK(),
            "Discount Percentage",[Discount Percent Target],
            "Customer Complaints",[Customer Complaints Target],
            "Quota Variance",[Quota Variance Target],
            BLANK()))

Apply a matrix style from the formatting menu in the visuals pane. I chose Bold header.

I also increase the “Text Size” property in the Column Headers, Row Headers, and Values all to 11.

With the July 2017 release of Power BI Desktop, you can now simply right click any of the values in the selected fields and rename them. This allows us to rename our row and column headers.

Now we have this

Right click on the Percentile in the selected fields pane and select conditional formatting -> data bars

In the below dialog, notice that I have changed the Minimum/Maximum to Number values 0 and 1. This is to keep the data bars from being “relative” to what is shown and will show the full span from 0-100%. Also have changed the Positive bar color to a gray so that the bar is not distracting, but rather accents the percentile.

Now right click the Quartile and select conditional formatting -> color scales

See the selections in the above screen shot.

Format blank values – I select “Don’t Format” as when a value is blank, don’t give the impression that it is actually good or bad

Minimum and Maximum – I change these values from Lowest and Highest value to a Number. This is because you want every representation of quartile 1,2,3,4 to be the exact same color. If you leave this as default, sometimes 2 will be completely red and sometimes it will be completely green

Diverging – This gives us a center color that helps make the 2nd quartile look light green and 3rd quartile be orange.

The hex values I use for these colors are 107C10, FFFE00, A80000 respectively

Now the scorecard is complete. Add a slicer for Employee[Full Name] and you can see how each of your sales people are doing

Insight taken from the above screenshot that “Linda Mitchell” is one of our best sales people but it may come at the expense of customer complaints and giving deeper discounts than others.

Creating the Heatmap Visual

Now we can take the skills acquired from the scorecard to create a heatmap.

Simply use a matrix visual again and set the rows to DimEmployee[Full Name] and select each individual measure you want to display as a value. Try to put all the quartiles together so there is a good heatmap effect. Include additional measures that may be valuable to sort by. Remember, the matrix visual has automatic sorting by column so this may help with analysis. I have selected the below values.

With re-applying the Quartile conditional formatting logic from above, the heatmap should looks something like this.

If you have a bunch of additional employees showing, add the DimEmployee[DepartmentName] as a visual or page level filter and that will reduce your rows.

Using a Region Hierarchy

IF you are still here (most of you have already thought TL;DR by now)… we can real quick add a calculated column in the DimEmployee table for “Sales Region” so that we can create a heatmap that roles up to the sales region.

Sales Region = LOOKUPVALUE(DimSalesTerritory[SalesTerritoryRegion],
                   DimSalesTerritory[SalesTerritoryKey],
                   DimEmployee[SalesTerritoryKey])

Simply add the DimEmployee[Sales Region] to your Rows above your DimEmployee[Full Name]. Now you have a heatmap that can rollup to the sales region level. See below.

On the scorecard you now can see comparisons of your regions as well.

Conclusion

There are many variations of scorecards and heatmaps. I have found the above use of the new Matrix visual to be very good.

 

 

 

 

Power BI Routing Visual with Two Lines of R

Objective

Although the out of box Bing and ESRI maps in Power BI can visualize most business requirements, trying to visualize routes or shipping lanes for the transportation industry can be challenging. Fortunately in Power BI we can expand our visualization palette by using either custom visuals or R visuals.

In this blog post we will look at publicly available Flight Data and determine routes that have the highest likelihood of cancellation. This could easily be translated to shipping or transportation scenarios.

You can get the final solution from my github repository or you can download the “Airports” data set from http://openflights.org/data.html and the “Flights” data set from Engima.IO

NOTE: Engima.IO is a repository for public datasets. It is free but requires you to create a login to use it. I enjoy working with the enigma-us.gov.dot.rita.trans-stats.on-time-performance.2009 as it is rather large at 2.4 GB.

Although the above visual only requires two lines of R code to be written, there are two hurdles to get over first: Ensuring R and the required libraries are installed on your desktop, and doing the data prep in the query editor to create the R data frame in the format that is expected.

Installing R and ggmap

There is already well documented guidance on installing R to be used with Power BI on the Power BI Documentation site. Once this installation has been complete, we need to get ggmap and other supporting R libraries installed as well.

I prefer going to the RGui command line (just search for “RGui”) and perform the following command:

install.packages("ggmap")

Doing this in the R console will automatically download any dependent packages as well. If you performed this line in a Power BI visual directly it would not install the other required packages and you will get an error when you run the solution.

Data Prep

In Power BI, lets first bring in the airports data CSV file we downloaded from http://openflights.org/data.html. The important columns in this data set are the 3 letter airport code and the latitude and longitude of the airport. You can include the other fields for more detail as I am showing below, however they are not necessary for us to achieve the R visual above.

Next import the flight data that was downloaded from Engima.IO for 2009. This data is extremely wide and a lot of interesting data points exist, however we can simply remove a large portion of the columns that we will not be using. Scroll to the right Shift+click and right click to Remove Columns that start with “div”. Alternatively you can use the “Choose Columns” dialog to un-select.

To reduce the number of rows we will work with, filter on the flightdate column to only retrieve the last 3 months.

Shaping the Data

We now need to shape the data for the R visual. To only require 2 lines of R, the data has to be in the following format

index Direction Latitude Longitude
1 Origin 41.97 -87.9
1 Destination 33.63 -84.42
2 Origin 41.73 -71.43
2 Destination 33.63 -84.42
3 Origin 35.21 -80.94
3 Destination 33.63 -84.42

We will include other columns, however the format of alternating rows for the origination of the route and then the destination with the latitude and longitude for each is required. Having an Index row that keeps the right origin and destination values ordered appropriately will also help Power BI from making adjustments that you don’t want.

The Double Merge

Latitude and Longitude are not included in the flight data so we need to do a MERGE from the airport data set that we have.

NOTE: Finding lat/long values for zip codes, airport codes, city names, etc… is generally the hardest part of using ggmap and is why most of the time the use of a second reference data set is required to get this data

Perform a “Merge Queries” against the “origin” column of the flights data and the “Code” column (this is the 3 letter airport code from the airports data set)

Rename the newly created column as “Origin” and then click the “Expand” button to select the Latitude and Longitude columns ONLY to be merged.

Now repeat the above Merge steps a second time but instead of using the “origin” use the “dest” column from the flights data and merge against the same 3 digit code in the airports data. Call the new column “Dest” before expanding the Latitude and Longitude.

Once finished your dataset should look like this:

We have successfully acquired latitude and longitude which ggmap needs to plot the path of the line. However, we need the values of Lat/Long for origin and destination to be on separate rows as shown above. This is where we get to use my favorite M/PowerQuery function of “Unpivot”

Using Unpivot

To ensure later that the order of our rows are not out of sync when creating the dataframe for the R visual, add an index column via the “Index Column” button on the “Add Column” tab. I start at 1.

We need to have two columns to unpivot on to create two rows for each single row currently in the data set. I achieve this most simply by adding two custom columns via the “Custom Column” button on the “Add Column” tab. Just fill in the expression with the string “Origin” and then “Destination” for each new column as shown below

The data should now look like this:

Select both of the new custom columns (mine are called Direction 1 and Direction 2) right click and select “Unpivot Columns”

Now each row has been duplicated so that we can get the Origin and Destination latitude and longitude on separate rows.

The newly created “Attribute” column should be removed and in my example I have renamed “Value” to “Direction”.

NOTE: In this example I am using unpivot to manipulate my single row into two rows. A more meaningful use of unpivot would be if you have revenue data by month and each month is represented as a column (Jan, Feb, March, etc…) you could select all the month columns and “Unpivot” and you would now have a row for each month as the attribute and the sales amount as the value.

Conditional Columns

Once the unpivot has been completed, add two conditional columns for “Latitude” and “Longitude” via the “Conditional Column” button in the “Add Column” tab to get the values for Origin and Destination into a single column for each. Use the “Direction” column as the conditional and when it equals “Origin” select the “Origin.Latitude” column. Otherwise, select the “Dest.Latitude” column.

See below example for Latitude:

Be sure to change the type of the two circled buttons from Value to Column.

Repeat the above for Longitude.

Change the type of these new columns to decimal numbers.

Now remove the 4 columns of “Origin.Latitude”, “Origin.Longitude”, “Dest.Latitude”, “Dest.Longitude”.

The last 4 columns of the Flights data set should now look like this:

Data prep is complete. We can now close and Apply.

The data load will take a long time if you used the 2.4 GB file from Enigma.IO… Time for coffee 🙂

Creating the Viz

As we work with this data set, remember that we now have two rows representing each flight. So if you want to get any counts or summarizations, always divide by 2 in your measures.

Number of Flights = COUNTROWS(Flights) / 2
Number of Delayed Flights = sum(Flights[depdel15]) / 2
Number of Cancelled Flights = sum(Flights[cancelled]) / 2

With these 3 measures, we can start working on visualizations. First, simply a few card visuals and a matrix by origin/destination

We have 1.6 million flights in our data set. Creating an R visual to represent all of these routes will not be very productive and will probably crash your memory anyway. Let’s setup a filter by origincityname and only route the cancelled flights.

For the above visual, we first should add origincityname as a slicer and select “Atlanta, GA”. Then add a slicer for cancelled = 1.

To create the R visual, select the “R” visualizations icon

Pull in the following values from the “Flights” data

This automatically generates some R code in the R script editor for the Visual

NOTE: This is really important as we have eliminated potentially 100s of lines of R code to “prep” the data to make the data frame look like we need it to be entered into the ggmap function. This was all done via the Query Editor transformations we made above.

Now we simply add the 2 lines of R code underneath the generated R script section

# This function simply imports the required library (ggmap) 
# to be used in the Power BI R visual
library(ggmap)

# qmap is a wrapper for ggmap and get_map. 
# The first parameter is the locale that you want to be mapped. second is zoom level. 
# Then geom_path is used to give the details of what data is to be plotted.
qmap("united states", zoom = 4) + geom_path(aes(x = Longitude, y = Latitude), size = .1, data = dataset, colour="red", lineend = "round")

NOTE: The comments make it longer than two lines, but helps describe what is happening

The geom_path function is part of the ggplot2 library and is further explained here: http://docs.ggplot2.org/current/geom_path.html

Hopefully from this example you can see that the R code is fairly minimal in filling the requirement of routing visualization.

Measures vs Columns for this R Visual

One limitation that currently exists in Power BI Desktop is that the measures we defined earlier are not really providing value to the visualization because we need to include the “Index” column to keep the dataset ordered as expected for ggmap to plot the route from the alternating rows of data.

Because the index column is required to keep the sort order, the filter context applied to the DAX measure of “Number of Cancelled Flights” will always equal 1. This does not allow us to do much “Business Intelligence” of the data set.

EARLIER function

Until this day, I am still not fully aware of why they call this function EARLIER, but what we need to do to introduce some actual “Business Intelligence” into this R visual is to create a column with the total number of cancelled flights via a given route. This “column’s data” will be repeated over and over so beware of how you utilize it. However, it will give us a great way to ONLY retrieve the data that we want.

For the “Flights” data set, add the following column to create a unique value for each Route that exists between airports:

Unique Route ID = CONCATENATE(Flights[originairportid],Flights[destairportid])

Once that value is added, the EARLIER function can be applied to get the total number of cancellations:

Total Cancelled for Route = SUMX(FILTER('Flights',EARLIER('Flights'[Unique Route ID])='Flights'[Unique Route ID]),'Flights'[cancelled])/2

The above value is repeated in every row, so don’t use it to be summarized… use it as a page level filter or a slicer to only retrieve data that meets that requirement (example: Only show routes that have more than 25 cancellations)

Make sure your slicers are cleared and your new plot should look something like this:

Now the solution is starting to become usable to gain some insights into the data. You can download my finished solution from the github repository and see that I have duplicated the “Airports” data set and have one for Origin and one for Destination that I can use as a slicer to more quickly find routes that have frequent cancellations from/to each city or state.

Conclusion

This is just an example of the many ways Power BI can be extended to help solve business problems through visualizations. One note to make is that ggmap is not yet supported by PowerBI.com. This specific R visual is only available in the desktop but many other R visuals are supported in the service as well.

And for my next article, we will see if i am brave enough to post my real opinions on the internet about Big Data and data warehousing architectures. I have lots to “rant” about but we will see if I actually post anything 🙂

Dynamic RLS via Hierarchy in Power BI

I recently was working with a client and we were trying to implement row level security for a sales territory hierarchy. I am posting this blog because it wasn’t quite as intuitive as I would have thought.

hierarchy

For this blog post, I will work with the very simple AdventureWorksDW2012 (download here) DimSalesTerritory table. This sales territory dimension only has 11 rows but imagine if the hierarchy breakdown had 10k or more. I will explain a couple of iterations I took on the problem.

02-DimSalesTerritory

This blog post will be more “terse” than usual. So some of you may be very happy to not have to endure my hyperbole on the topic at hand 🙂

One mention I will make is that if you are trying to implement Dynamic Security, the “Securing the Tabular BI Semantic Model” white paper provides an in-depth look at the patterns being shown below.

GitHub Repo

If you want the two PBIX files I use below to more easily follow this solution, please find them on my GitHub repo

https://github.com/realAngryAnalytics/angryanalyticsblog/tree/master/DynamicRLSwithHierarchy

Simple RLS Example

So, the issue we were trying to solve was the fact that we had a table that showed the employee relationship to their respective Sales Territory. This is a quite simple problem to solve if there is a 1 to 1 relationship between employee and Sales Territory. If an employee can only access Sales Territories they are associated with, then the “security” table would look like this:
03-Security

For this table I simply used a calculated table derived from the AdventureWorksDW DimEmployee table. I purposely excluded all the records that had a Sales Territory of N/A. See the DAX for this table below.

Security Table = SUMMARIZECOLUMNS('DimEmployee'[SalesTerritoryKey], 
                 DimEmployee[EmployeeKey], DimEmployee[EmailAddress], 
                 FILTER(DimEmployee,DimEmployee[SalesTerritoryKey]<>11))

A quick look at the data model

04-datamodel1

Now, simply go to your “Manage Roles” option on the modeling tab of Power BI Desktop:

05-manageroles_nav

…and for the DimSalesTerritory table enter the following DAX in the Table Filter DAX Expression

DimSalesTerritory[SalesTerritoryKey] = CALCULATE(
          VALUES('Security Table'[SalesTerritoryKey]),
          SUMMARIZE(
            'Security Table',
            'Security Table'[EmployeeKey]
          ),'Security Table'[EmailAddress]=USERPRINCIPALNAME()
         )

Name the above role as “Basic Security”.

The above pattern is easy to pick out in the white paper mentioned above. The USERPRINCIPALNAME() expression ensures that DAX picks up the email address of the individual logged into powerbi.com

When we select “View as Role”, check the “Other user” box to use a user other than who is logged into the machine  (because your email address is likely not in the adventure works DW sample database). Enter “david8@adventure-works.com”

06-view_as_role_dialog

When back on DimSalesTerritory in the table view, we can now see that the only territory allowed in this view is the Northwest United States

07-DimSalesTerritory_filtered

We can also review some simple visuals including a geography hierarchy bar chart, total orders over time line chart and a card. (This is all using the FactResellerSales data)

08-visuals_filtered_01

When we remove the Role filter, we can see all of the geographies re-appear

09-Visuals_unfiltered01

Employee Hierarchy RLS example

The above is a pretty straightforward dynamic row level security example.

Now let’s assume an example that all the employees in the United States territory have a manager. Let’s call him Bob. Bob should be able to see all the records for the United States. This includes the Northwest (1), Northeast (2), Central (3), Southwest (4), and Southeast (5) territories.

If we do a simple right click “Copy Table” on the Security Table to Excel, we can add the necessary rows for Bob to have the access he needs to see all rows for the United States.

10-excel-security

What is the problem with this approach?

For our simple AdventureWorks database, this is fine because of the small number of records. If we consider a real world retail store example, each sales territory may be broken down into 30 districts, and each district may have 20 stores. We may have 20k employees as well. So our Hierarchy would look like this:

Country -> Region -> District -> Store

6 Countries -> 10 Regions -> 300 Districts -> 6000 stores

The CEO of the company would have to have 6000 entries in the security table, a region manager would have to have 600 entries and so on. This table starts to get extremely bloated. In my real life client example we estimated that the security table would have between 10-50 million records in it.

A better approach

Ideally, I want the CEO of the company should only have to have 6 records for the highest level of the hierarchy. In our example above of Bob that should have access to everything in the United States, he would only have to have 1 record for the United States. This greatly collapses our bloated security table and if the company has 20k employees, if you include some multiple entry managers (someone who may manage 2 districts, or someone who manages 2 or 3 stores), you would only have around 25k records.

A security table would now look like this:

11-excel-security-2

We have added a column for CountryKey (100 represents the United States) and we only have one record for Bob instead of 5.

Applying this Solution

keepcalm

A few things need to be done to our PBIX file to implement this. First, we need an additional dimension table for each additional hierarchy level added to our security table above. I have replaced the Calculated Security Table and a new Country table with an RLS_tables.xslx spreadsheet that I have included in the GitHub resources above. It has a second sheet for “Country”.

13-excel-country

Note: We would need a sheet/dimension for District and Store if we were to implement a larger hierarchy as I discussed above

To make this new method work, the FACT table (in this case the FactResellerSales) needs to include a column for each level in the hierarchy that may need to be used to filter. Therefore, CountryKey has to be added to the FactResellerSales table. This is best served from your underlying database, but to keep this entire solution in Power BI Desktop, I have added a cross reference table to the RLS_tables.xslx spreadsheet that links SalesTerritoryKey and CountryKey.

14-excel-xCountrySalesTerritory

Pay attention to the new data model. Note that the relationship from DimSalesTerritory to the Security table has been removed. This change will be discussed a little later.

15-datamodel2

To get the CountryKey to exist in the FactResellerSales table, add the following calculated column using LOOKUPVALUE() function

CountryKey = LOOKUPVALUE(xCountrySalesTerritory[CountryKey],xCountrySalesTerritory[SalesTerritoryKey],[SalesTerritoryKey])

With CountryKey in the FACT table, now create single direction relationship from Country table to FactResellerSales via the newly added column. You would repeat this process for additional dimensions such as District and Store in a real world scenario.

16-datamodel-CountryKey

Handling the Dynamic RLS

For this most important part, I have to give credit to Kasper De Jonge (@Kjonge) for helping me. Removing the relationship between the Security table and DimSalesTerritory was key. We now have two paths to securing the FACT table and trying to navigate Active/Inactive relationships got too messy and I was unable to resolve.. This is where using LOOKUPVALUE() and MULTIPLE active ROLES at the same time worked.

In Manage Roles, create a new role named “Country Security” and use the following DAX.

Country[CountryKey] = LOOKUPVALUE(
      'Security'[CountryKey],
      'Security'[EmailAddress],USERPRINCIPALNAME(),
      'Security'[CountryKey],Country[CountryKey]
     )

 

The LOOKUPVALUE() function searches the email addresses in the security table for the currently logged in person and will find what CountryKey that they have access.

Create another Role named Sales Territory Security and use the following DAX.

DimSalesTerritory[SalesTerritoryKey] = LOOKUPVALUE(
           'Security'[SalesTerritoryKey],
           'Security'[EmailAddress],USERPRINCIPALNAME(),
           'Security'[SalesTerritoryKey],DimSalesTerritory[SalesTerritoryKey]
         )

 

Quite similarly, this is finding the SalesTerritoryKey that the person has access.

Now when we view the report as Bob, we are seeing all of the data for the United States with a single CountryKey in the security table instead of having to have a row for every territory.

17-viewasrole-02

18-visuals-filtered-final01

What makes this especially powerful is that having multiple roles (one for Country and one for DimSalesTerritory) creates an “OR” condition between them… I am going to modify the Security table in my excel spreadsheet to include for Bob the ability to see data for the United Kingdom sales territory also.

19-excel-security-option2

Now Bob sees his Country information plus an additional sales territory.

20-visuals-filtered-final02

Finishing Up

To round this out, we probably want to hide the Security table from being viewed in the report and also add a FALSE() DAX expression on the Security table in the two roles that we created so that if someone were to do an “Analyze in Excel” there would be no exposure to the table.

21-hidetables

22-manageroles-end

 

Conclusion

That is it! we have just solved a hierarchical Security structure being used with Dynamic RLS in Power BI. I didn’t feel that this topic of using a hierarchy in dynamic security was well covered in other posts so I hope that this helps.

On a side note, if anyone has a good tutorial on how to better format wordpress sites (I feel like i am SHOUTING with this font) and restrict the numerous spam i get, please tweet me at @angryanalytics 🙂

 

 

Map Visual – How to deal with large datasets

 

The Power BI bubble map is very useful when plotting geography points rather than shapes or areas. I’d say that in 90% of the cases in how you should be representing your data the out of the box functionality should be fine. Occasionally however you will run into scenarios that you must plot more than a few thousand data points in which you will exceed the upper bound and get the little information icon shown below.

20160801-004

001

What is frustrating about this when it comes to maps is that it does not take a representative sample of the data and instead will just “cut off” the data points. Other visualizations will show an information icon with this message instead:

002

There are 3 techniques that I have used to deal with this issue.

  • Creating a representative sample via DAX
  • Using R script with ggmap
  • Using a location hierarchy

All of these have pros and cons.

Getting a Representative Sample via DAX

The best way to illustrate the native map behavior and how we solve for it is by using an embedded Power BI report. I am going to use some anonymized data from a logistics company that represents cell phone signal strength across one of their routes. The route in this example starts in the Chicago, IL area (United States). See the first tab in the report below.

https://app.powerbi.com/view?r=eyJrIjoiZDE2YzUzNmEtMmVlYy00ZTBhLTk0YjUtNmZlNjQzYjkwNTY3IiwidCI6ImQ3MDVhYmIxLTU5OWMtNDA3MC04ZTIwLTY2MGMzMmZiNWYxZCIsImMiOjZ9

Note: Use the “full screen” button in the lower right to expand

In the 4 visuals above, notice that the 1st and 3rd only plot a tiny portion of the latitude/longitude. This is confirmed as if you hover over the highest point the latitude is much less than the max latitude shown in the multi-line card visual in the upper right. You can also just “zoom out” on these and see that no points are plotted anywhere near Chicago which is where this route starts.

I included the 2nd visual as I did find it interesting that when I used a value in the SIZE field that may be greater than 1, it seemed to plot more evenly across the entire span of the latitude, however I still would not consider this “representative”.

The 4th visual is simply a scatter plot of the latitude/longitude. Note when hovering over the “I” icon in the upper left it states that it is a representative sample. This is what we want to achieve in the bubble map that is not done natively. In this scenario where we have a continuous latitude/longitude plot, this is a good solution.

 

Now navigate to the second page of the report. Here you will find the map visual always giving the representative sample regardless of how many dates you select in the slicer. I have solved the real business problem here by adding the cellphone signal strength as the legend across the route.

Compare the min and max latitudes in the card visual with hovering over the highest and lowest plotted point on the map. You should notice them to be consistent and the points in between are filled nicely.

“If there is an easier way to do this than what i am about to show, please share with me”

I worked through several iterations before arriving at the conclusion that using RAND() ( or RANDBETWEEN() actually ) was the magic I needed. This is the random number generator expression for those of you that don’t know. And for those of you that do know, you are probably questioning my knowledge a little bit at this point 🙂 but this was truly the only way i got this to work without having to make too many assumptions about the data.

Let’s get into the DAX.

It is important to know how you plan to slice the data in your map visual. In my case, i want to be able to select a date or range of dates to review the cellphone signal strength and how it may vary over time. We can simply get the count of rows in the table with a formula like this:

Total Signals (WRONG) = countrows('Carrier Signal Strength')

I have appended “WRONG” to the measure name because we run into an issue of CONTEXT on the map and we have to do a context transition (as described by Alberto Ferrari here). In the correct formula below, we need to include ALL rows of the table, except for the rows we are filtering out with the date slicer.

Total Signals (RIGHT) = CALCULATE(countrows('Carrier Signal Strength'),
	ALLEXCEPT('Carrier Signal Strength','Carrier Signal Strength'[Signal Date]))

Notice on the second page of my embedded report above, i have listed a couple of the measures in my card visual that are the wrong ones. Apply a date filter and then in the map select a single point. Notice how the “Total Signals (WRONG)” changes to 1. This is because you have just applied filter context to the measure when you selected that single point on the map. We have to use this measure in getting a representative sample but it has to ignore the map’s filter context which is why in the RIGHT measure above, we have done the context transition to get ALL of the rows except for those being filtered by the date.

Now we need to apply a sample rate to the selected rows. The sample rate will be dynamic depending on how many rows are selected. We start to exceed the bounds of the map visual at around 3000 points so i play it safe below and use 2500 as my denominator. This is because the later calculation will not exactly match this and we may end up with more than 2500 rows.

Sample Rate (RIGHT) = CEILING(DIVIDE([Total Signals (RIGHT)],2500),1)

The CEILING function just rounds whatever the decimal number is up to the nearest integer as i have specified “1” as the second argument.

On page 2 in the report above, you can see how the sample rate changes as the total signals increases with multiple dates selected in the slicer

As the next building block in the representative sample formula, we will pull out the random number generator to give a random number between 1 and the dynamic sample rate that was calculated above

Rand (RIGHT) = RANDBETWEEN(1,[Sample Rate (RIGHT)])

We will use the number generated from this measure and compare it to our Sample Rate. When they are equal we will plot the point.

stopjpg

Before you call me crazy and question my judgement… yes, i know that RAND or RANDBETWEEN does not guarantee me a consistent sampling of data. If my sample rate is 100, it may take me 150 or even 200 tries before my measure above equals the sample rate. But it also may only take 5 tries to equal it as well. I look at the use of RAND() as a “better than the alternative” approach as it gives me the opportunity to get a representative sample of my data versus getting a cut off and unusable data set.

In a prior attempt, i used the MOD function and some other DAX acrobats to try to ensure a consistent representation of the data, but in each attempt i was defeated by the filter context issue of the lat/long on the map. This is where I would humbly welcome feedback from any DAX ninjas if there is a better way.

The reason that using the RANDBETWEEN() function works is that it re-evaluates for every single point on the map trying to be plotted.

Below is the measure for the representative sample. Use this in the SIZE field.

Representative Sample (RIGHT) = 
if([Rand (RIGHT)]=[Sample Rate (RIGHT)],
	[Total Signals (RIGHT)],
	BLANK())

As it is possible i could have multiple rows being counted for the same lat/long position, i add some additional DAX to ensure i only have a size of 1 or BLANK()

Representative Sample (RIGHT) = 
var plot = if([Total Signals (RIGHT)]>=1,1,BLANK())
RETURN if([Rand (RIGHT)]=[Sample Rate (RIGHT)],plot,BLANK())

The result below is a nice smooth representative sample without too much DAX needing to be written.

20160801-003

Pros for using this approach:

  • Easy… low difficulty level
  • Native Bing map means it is zoom-able… in my use case, it is important to be able to zoom in on the areas that have bad signal coverage
  • Great for big datasets… I believe all the DAX used above is available when doing direct query

Cons

  • Using RAND does not ensure a consistent sampling
  • May not be well suited for geographically disperse data (see the hierarchical approach below)

Using R script with ggmap

Another way to approach the sampling problem is with an R visual. We can create an R Visual that uses ggmap to plot the data points for our route.

I drag in the LATITUDE, LONGITUDE, Max_Signal_Rate, and the Total Signals measure (to avoid pre-sampling the data, reset this measure to just use the COUNTROWS function)

Total Signals = countrows('Carrier Signal Strength')

Here is the R script that then needs to be put in the R Script Editor

#Pull in the ggmap library
library(ggmap)

#Specify the google map area to use with a zoom factor. 
#Unfortunately this will create a static map that is not zoom-able
#I have selected "Paris, il" as it appears to be near the center of my route. 
#You can test the text to use for the location attribute by going to https://www.google.com/maps
Illinois<- get_map(location = 'paris, il', zoom = 7)

#Use a sample if your dataset starts to get too large. 
#The R visual in Power BI can accept more than 3000 data points but there still appears to be an upper bound 
#I have not found what that limit is yet
datasample <- dataset[sample(nrow(dataset), 5000), ]

#Create a theme variable to remove the X and Y access as well as the legend from the map plot
theme <- theme(axis.title.y = element_blank(), axis.title.x = element_blank(), legend.title= element_blank(), 
legend.key = element_blank(), legend.position = "none")

#Perform the plot with ggmap by using our map of Illinois and adding the geom_point plot behavior
#    as well as the theme we just defined
#Replace "data = dataset" with "data = datasample" if you determine a sample needs to be used
ggmap(Illinois, extent = "device") + geom_point(aes(x = LONGITUDE, y = LATITUDE, color=Max_Strength_Rate),
    alpha = 0.1, size = 2, data = dataset) + theme

To use the above code, you will need to be sure you have installed R on your machine and that you have also installed the ggmap and ggplot2 packages.

The above code will produce a plot shown below and can still be filtered with a date slicer.

20160801-005

As i am still learning R, i didn’t take the time to assign a specific coloring scheme for the plotted points so the above just took the default.

There are a couple of obvious CONS to this approach

  1. You have to know R. It is not nearly as easy as just using native Power BI visuals and a little bit of DAX
  2. The map is not zoom-able. It is completely static which does not allow me to achieve solving my business problem which is to zoom in on the spots that my signal strength is low. For my use case, this does not work, however in another use case, the zoom may not be an issue.

A couple of PROS for this approach

  1. I can plot more data points with ggmap. It appears there is still a data limit but it is larger than the out of box map. I have not yet determined what that limit is. I believe it is more to do with Power BI R visuals however and not with the R ggmap functionality.
  2. With the ability to “code your own visual”, there is much more control and opportunities to create a custom experience. instead of using the geom_point function we could use geom_tile to see a more “blocky” heat map style plot as shown in this tutorial or even using geom_path which would allow you to do more of a point A to point B type of map if you didn’t really need to plot information along the way as shown in my example.

Using a Location Hierarchy

This to me seems like an obvious approach when dealing with location data that has a natural hierarchy such as Country->State/Province or Country->City.

So instead of having something like this

20160801-006

by using a hierarchy in the “Location” field you can right click and “drill down” on a parent geography to see the details

20160801-007

20160801-008

This is a very clean way to keep your data points under the threshold of around 3000 without having to write any DAX or R. The biggest problem with this however is it is fully dependent on the location hierarchy giving the necessary split to keep the lowest level of detail going over the threshold.

 

These are 3 different ways of dealing with too much data trying to be plotted on your maps. Again, for the representative sample pattern i showed i am interested in any thoughts of better approaches. Leave them in the comments or ping me on Twitter at @angryanalytics.

90 Day Power BI Implementation Plan

Power BI Turns One Year Old Today!
Before I start with my normal ramblings, let’s take a quick moment to say “Happy Birthday” to Power BI… It officially went generally available on July 24th 2015. After one year, I can say from experience the momentum has been huge and this product has not disappointed. Below is a little happy birthday tribute video

Power BI Implementation Plan

Every organization should have an implementation plan when deciding to move from departmental use of Power BI to a full enterprise deployment. I have put together a 90 day implementation outline below that emphasizes quickly getting a pilot set of users on Power BI so that you can obtain real feedback early before you expose everyone in your organization to things that could have been avoided had you learned from a smaller deployment.

I have not included all important details below but this is meant to be an outline that can be tweaked to meet your organization’s needs. Again, I would like to reference the Power BI Governance Whitepaper here as it helps articulate at a detailed level things that are not included below.

In the below plan I have listed the use of an “enterprise data mart” as optional. I am a strong advocate for early adoption of SSAS Tabular models and I believe from day 1 they should be in consideration for “what you know now” to be data that should be really approached from an IT Managed solution rather than a business managed (or self-service) solution. However, there are other approaches and depending on what your underlying data strategy may be, these data marts might not be necessary.

 

First 30 days

  • Identify group of users for Power BI Pilot
  • Hold initial security meeting with security stakeholders to identify any foreseen issues – Previous blog post on content to consider
  • Initial short term policies are defined for High Business Impact (HBI)/Low Business Impact (LBI) data and Power BI usage
  • Training material is verified and initial pilot group is trained on Power BI at end of 30 day period
  • Active Directory Groups structure considered for future Row Level Security (RLS) requirements
  • Optional: Initial Tabular model for enterprise data is defined and shared with Pilot user group as first data mart external to the self service capabilities
  • High Level Design started to support long term BI strategy
    • This would be the underlying data strategy to consider various data storage options such as Hadoop, a large data warehouse, or simply relying on targeted data marts (such as SSAS Tabular models) to be used once self-service models mature into enterprise wide models

30-60 days

  • Power BI Pilot for the targeted users after training has taken place
    • This would be mostly self-service with some ability to use the initial enterprise data mart if one was implemented
  • Security requirements defined for long term Power BI implementation
    • Any gaps are addressed with additional policies or procedures
  • Detail design started to support long term BI strategy
  • Initial feedback from Power BI pilot group is collected and considered for Power BI implementation plan
  • Power BI implementation plan defined to support rest of organization
  • Define success criteria for the overall enterprise roll out.
    • This may be number of active users, number of reports, number of content packs

60-90 days

  • Optional: Refined enterprise data mart completed with Row Level Security considerations
  • Active directory structure to support organizational role out of Power BI completed
  • Training begins for rest of organization on Power BI
  • implementation of long term data strategy underway with phased approach for replacing “self-service” models with enterprise models as needed
  • Optional: High Level requirements for Power BI Embedded defined if external user access (such as customers or partners)

Beyond first 90 days, things to consider

  • majority of “knowledge workers” within the organization are taking advantage of Power BI to help run the business
  • Optional: Power BI Embedded development/pilot underway
  • Power BI continuous learning efforts through user groups / lunch and learns are established and underway
  • Evaluate (and continuously evaluate and update) the success criteria defined to see if they are being met