How to Use Dataflows to Enhance your Power BI Datasets

I’m a big fan of Power BI dataflows! In this article, I’ll show you why, and you will be able to reproduce the steps I’ve taken if you have sufficient Project Online permissions (minimal Portfolio Viewer permissions).

The Need

If you work with Project Online, you have most likely considered reporting on your project content at some point. In Project Online, data can be extracted using OData feeds. The bad thing about the connection is that it is slow. It can also be frustrating and difficult to maintain together with a dataset.

Enter Dataflows

As described in this Microsoft article, dataflows are designed for “reusable transformation logic.” In other words, they help you to do something you would otherwise do manually over and over again for multiple reports. If you are strict on access policies, dataflows also provides a valid “single source of truth” by limiting access to the source material and promoting the use of the dataflows as that source.

If you are already exploring premium features and Azure Data lake, it’s nice to know that dataflows can be exposed in the data lake and that you will get additional computational power when using the premium feature (described in detail here: “Enhanced compute engine”).

My first experience with dataflows was a very pleasant one. I needed to work on a time-phased report using timesheet time-phased data, task assignment information, and resource capacity information. If you’ve worked with OData before and a large enough source, you know that this would take forever to load into Power BI.

Figure 1: Loading thousands of rows and waiting a long time due to OData connections

When I moved my most basic data retrieval processes to dataflows, the file was smaller and it loaded much, much faster.

How to Configure Dataflows for Project Online

For the sake of illustration, we will extract the following tables and run them as dataflows on the Power BI service.

  • TasksTimephasedDataSet
  • Tasks
  • Projects

The TasksTimephasedDataSet will be the largest due to the creation of a single row for each task each day. Odata can be accessed through the URL format below:

https://<your environment>.sharepoint.com/sites/pwa/_api/ProjectData

This creates a connection to all the data tables, that can then be accessed for reporting.

Dataflows have some limitations on complex queries, but will be suitable for the initial data load and alterations I will show you now.

Using Dataflows in Power BI Datasets

To create a dataflow, we start from the PowerBI.com homepage. Ideally, you would create a specific workspace to monitor and administrate your dataflows. As you will see, performing the following actions with a service account makes perfect sense.

Within the workspace, you will find a tab called Datasets + dataflows, and this is where we will store our flows.

Figure 2: The workspace tab, Datasets + dataflows

When we press the New button on top, we start the wizard, which walks us through adding a new dataflow.

Figure 3: There are four available options to start a dataflow. We will choose “Define new tables.”

The four options available tell us that there are more ways to build dataflows than just defining new tables. The other three build upon previously created sources and can be useful when extending those datasets, for instance to reduce duplication (linked tables).

We will “Add new tables,” in this example.

From this point onward, we are presented with a menu structure that is quite similar to the Power BI Power Query interface, and with good reason. This is because this is a Power Query, from the browser.

Figure 4 Find the OData source on the 2nd row from the bottom.

After selecting the OData source, we will need to authenticate. Make sure the URL is correct and you access the source with an organizational account. You will be prompted to sign in. Click next (bottom right of the page) to move to the next step.

Figure 5: Connect to the OData data source with dataflows.

Continuing on, we are presented with all the tables in the OData feed. Here, we will select the three mentioned above. When you select a table, you will see a sample of the data appear in the middle section. This can be useful for inspecting the data before the actual data load action is performed.

Figure 6: Gather the tables that you need for the report.

The next step is to “Transform data,” which is obviously the most interesting part of the Dataflow. Here, we will limit the table content, and we will make small transformations. The page will look very familiar, again, to anyone that is used to transforming data in Power BI Desktop.

Figure 7: Transform data is very similar to the Power BI Desktop.

For the Project Table, I will keep the columns ProjectId, EnterpriseProjectTypeName, ProjectName, and ProjectOwnerName. I will filter out the “Timesheet Administrative Work Items” project.

For the Tasks Table, I will keep the columns ProjectId, TaskId, TaskName, TaskIsSummary, TaskIsProjectSummary, and TaskIsMilestone. I will filter out any milestones and summaries.

Figure 8: Filtering keeps only “real” tasks.

Now that I have two dimensional tables, I can work on my facts table (in case these terms are new to you, here is a very nice video explaining the differences). Basically, the dimension tables are your “slice and dice” tables, whereas the fact table defines “what is being sliced and diced.” In our example, the thing that is being sliced is workhours. We can filter the data based on task names and/or project names, owners, types, etc.

The TimePhasedDataSet will contain the columns: ProjectId, TaskId, TimeByDay, TaskActualWork, TaskWork, and TaskIsProjectSummary. We will keep only the rows that aren’t project summaries and the rows that contain more than 0 TaskWork. As a last step, we will change the TimeByDay column to represent only dates and not the date and time.

Figure 9: Transforming the data in dataflows is easy.

After these changes, I’m done with my dataflow. I’ll save and close the Power Query wizard.

Figure 10: Provide a name and useful description to your new dataflow.

After saving the dataflow, you will get a reminder on the refresh options. Keep your data up to date by adding this refresh frequency. For Project Online, I believe data should be refreshed daily, or even twice a day.

Figure 11: Remember to keep the data up to date.

In our data table list, we have four actions now available. These are “Edit table,” “Apply ML Model,” “Properties,” and “Incremental refresh.” Be aware that both Apply ML Model and Incremental refresh might require additional resources and licenses.

Figure 12: Available actions in dataflow tables

With these steps, we’ve created a “Staging dataflow,” and for our purposes, this will be our source for further transformations in Power BI Desktop.

The next steps will be done in Power BI Desktop, and our first action will be to load our dataflow tables.

Figure 13: Utilize Get data > Power BI dataflows to grab our tables.

Select all the tables that were just created.

Figure 14: You might have more tables available.

Once the data is available in the Power BI file, be sure to build the correct relationship between the tables.

Figure 15: Note the relations between the dimensional and fact tables.

After the relations have been set, we have our model complete and can start reporting on it.

Figure 16: A simple table containing EPT, ProjectName, time, and work values

The data loads very fast and is refreshed daily on my Power BI service.

Adding Some Additional DAX Intelligence

Because dataflows can’t do everything, I wanted to add a little bit more complex DAX measures to the report.

From this lovely old article, we will grasp some valuable DAX queries that we can use to make a nice burn-down chart. I will create a separate DAX table to include the following DAX measures. When creating these DAX measures, make sure to format them correctly right away.

Figure 17: Add the measures as Decimal numbers, with the thousand separator and 2 decimals.

Total planned work for the period

Total Planned :=

SUMX (

    CALCULATETABLE (

        TaskTimephasedDataSet,

        ALL ( TaskTimephasedDataSet[TimeByDay] )

    ),

    TaskTimephasedDataSet[TaskWork]

)

Actual work done to date

ActualToDate :=

SUMX (

    FILTER (

        CALCULATETABLE (

            TaskTimephasedDataSet,

            ALL ( TaskTimephasedDataSet[TimeByDay] )

        ),

        TaskTimephasedDataSet[TimeByDay] <= MAX ( TaskTimephasedDataSet[TimeByDay] )

    ),

    TaskTimephasedDataSet[TaskActualWork]

)

Planned work to date

PlannedToDate :=

SUMX (

    FILTER (

        CALCULATETABLE (

            TaskTimephasedDataSet,

            ALL ( TaskTimephasedDataSet[TimeByDay] )

        ),

        TaskTimephasedDataSet[TimeByDay] <= MAX ( TaskTimephasedDataSet[TimeByDay] )

    ),

    TaskTimephasedDataSet[TaskWork]

)

Planned work

Planned Work :=

[Total Planned] – [PlannedToDate]

Actual remaining work

Actual Remaining Work :=

[Total Planned] – [ActualToDate]

After adding these DAX formulae, our data should look like this:

Figure 18: Added DAX measures to the Power BI file

And with this additional information, we can create nice visuals like this one:

Figure 19: A line chart containing actual, remaining, and planned work across time

Conclusion

Dataflows can make your reporting work in Project Online a lot faster by taking away the long waiting time for data to load. This is especially true in the case of Time Phased DataSets dataflows, as I’ve illustrated in this article.

Once you have the dataflow tables in place, you can expand on them using your DAX skills or even add additional tables from other sources in Power BI Desktop.

There are some advanced options, as well, such as creating even faster table load times with the incremental refresh. You could also consider using machine learning. These actions; however, require additional resources, skills, and licenses.

Written by Erik van Hurck
Erik van Hurck is a Senior PPM consultant for Projectum, a western European Microsoft Partner with offices in Denmark and The Netherlands. On top of that Erik is a Microsoft MVP. As such, Erik assists enterprise customers to adopt the new Power Platform cloud solutions for Project and Portfolio Management. Beyond writing for MPUG, Erik also has a personal blog (www.theprojectcornerblog.com).
Share This Post
Have your say!
00

Leave a Reply