Merge Tables Power Query

broken image


This article was edited on 2nd Nov 2019. Teams virtual meeting. JoinKind.Local has been removed to avoid problems with merges on tables with primary keys:

  1. Merge Tables Excel 2010
  2. Merge Tables Power Query Power Bi
  3. Join Tables Power Query M
  4. Power Query Text Functions

Improving the performance when merging two tables with the Power Query M language March 22, 2018 Power BI seddryck In a previous post, I explained how to parse a semi-structured flat file with the help of a range-join.

In today's video i will show you one technique for conditional merging tables in Power Query.I forgot to mention the source of the video:https://social.techn. For the Combine Multiple Tables in Power BI demonstration purpose, we are going to combine these three tables. To do so, please click the Edit Queries option under the Power BI Home tab. Clicking Edit Queries option opens a new window called Power Query Editor. To expand the tables in the column called Content, click on the double headed arrow of the column header Select the radio button for Expand UnMark the checkbox against the label ‘Use original column name as prefix' Click OK As you can see below, data from 4 different sheets is consolidated into a single table.

In this article you'll learn how to speed up the aggregation of joined/merged tables by orders of magnitude (I recorded up to 30 times faster execution times). This method works for merges where both table have multiple rows for each keys. If one of your tables has a primary key, the method Chris Webb describes here works just as good: Chris Webb's article on how to improve performance on aggregations after joins using primary keys .

You can follow along the different methods in this file: PerformanceAggregationsAfterMerges1_Upload.zip

Background

When you join a table to another table in Power Query, the UI gives you the option to either expand the columns (default) or aggregate the contents of the joint tables. That's useful if multiple rows are returned for the rows of the table that has been joined to (left table):

Performance of native aggregation after join is very slow

But this method is extremely slow. Compared to 'simply' expanding all values to new rows (which took around 5 seconds), the aggregation took around 50 seconds. The automatically generated code uses the 'Table.AggregateTableColumn'-function. (see Query1_NativeAggregate)

Table.AggregateTableColumn(#'Merged Queries', 'Expanded Custom', {{'Animal', each Text.Combine(_, ', '), 'CombinedValues'}})

My first attempt to speed up performance was not to expand the column that contains the merged table at all, but to add a column instead with a manual aggregation function. (see Query2_AddManualAggregation)

Table.AddColumn(#'Merged Queries', 'CombinedValues', each Text.Combine([Expanded Custom][Animal], ', '))

How

This improved speed by 50-60%, but still, way slower than expanding all rows.

The unexpected solution

What turned out to be by far the fastest was to expand the columns to all the new rows and then 'group back'. (see Query3_ReGroupIntegrated)

Table.Group(#'Expanded Expanded Custom', {'Column1', 'Custom'}, {{'CombinedValues', each Text.Combine(_[Animal], '. ') }})

To my surprise, this was even faster than skipping this step (around 2 seconds, instead of 5). Means: This aggregation shortened the table from 676k rows to 26k rows. Of course, loading a shorter table should take less time. But I expected the computation of this aggregation also to take a fair amount of time. So at the end, this was actually less than the time gained by the shorting of the table.

But the surprise didn't stop here. As I know many beginners aren't so comfortable with editing existing code, I tried a different method (see Query4_ReGroupAddColumn): I kept the native 'All rows'-operation and added the same column than in Query2_AddManualAggregation. And it was just as fast/even slightly faster than the fast Query3_ReGroupIntegrated.

So just by adding 2 steps: Expansion of the merged column and Re-Grouping I sped up the query significantly: Another mystery that the M-engine holds for me…

Please share your thoughts about and experiences with this method in the comments below. Maybe MS will change the code behind the 'default-Aggregate'-function, if there is enough evidence that the alternative method proves to be superior and stable in many other cases as well.

Thanks and stay queryious 😉

-->

Merge Tables Excel 2010

A merge queries operation joins two existing tables together based on matching values from one or multiple columns. You can choose to use different types of joins, depending on the output you want.

Merging queries

You can find the Merge queries command on the Home tab, in the Combine group. From the drop-down menu, you'll see two options:

  • Merge queries: Displays the Merge dialog box, with the selected query as the left table of the merge operation.
  • Merge queries as new: Displays the Merge dialog box without any preselected tables for the merge operation.

Identify tables for merging

The merge operation requires two tables:

  • Left table for merge: The first selection, from top to bottom of your screen.
  • Right table for merge: The second selection, from top to bottom of your screen.

Note

The position—left or right—of the tables becomes very important when you select the correct join kind to use.

Select column pairs

After you've selected both the left and right tables, you can select the columns that drive the join between the tables. In the example below, there are two tables:

  • Sales: The CountryID field is a key or an identifier from the Countries table.
  • Countries: This table contains the CountryID and the name of the country.

Merge Tables Power Query Power Bi

Merge dialog box with the Left table for merge set to Sales and the CountryID column selected, and the Right table for merge set to Countries and the CountryID column selected.

The goal is to join these tables by using the CountryID column from both tables, so you select the CountryID column from each table. After you make the selections, a message appears with an estimated number of matches at the bottom of the dialog box.

Note

Although this example shows the same column header for both tables, this isn't a requirement for the merge operation. Column headers don't need to match between tables. However, it's important to note that the columns must be of the same data type, otherwise the merge operation might not yield correct results.

You can also select multiple columns to perform the join by selecting Ctrl as you select the columns. When you do so, the order in which the columns were selected is displayed in small numbers next to the column headings, starting with 1.

For this example, you have the Sales and Countries tables. Each of the tables has CountryID and StateID columns, which you need to pair for the join between both columns.

First select the CountryID column in the Sales table, select Ctrl, and then select the StateID column. (This will show the small numbers in the column headings.) Next, perform the same selections in the Countries table. The following image shows the result of selecting those columns.

![Merge dialog box with the Left table for merge set to Sales, with the CountryID and StateID columns selected, and the Right table for merge set to Countries, with the CountryID and StateID columns selected. The Join kind is set to Left outer.

Join Tables Power Query M

Expand or aggregate the new merged table column

After selecting OK in the Merge Microsoft live meeting issues. dialog box, the base table of your query will have all the columns from your left table. Also, a new column will be added with the same name as your right table. This column holds the values corresponding to the right table on a row-by-row basis.

From here, you can choose to expand or aggregate the fields from this new table column, which will be the fields from your right table.

Power Query Text Functions

Table showing the merged Countries column on the right, with all rows containing a Table. The expand icon on the right of the Countries column header has been selected, and the expand menu is open. The expand menu has the Select all, CountryID, StateID, Country, and State selections selected. The Use original column name as prefix is also selected.

Note

Currently, the Power Query Online experience only provides the expand operation in its interface. The option to aggregate will be added later this year.

Join kinds

A join kind specifies how a merge operation will be performed. The following table describes the available join kinds in Power Query.

Join kindIconDescription
Left outerAll rows from the left table, matching rows from the right table
Right outerAll rows from the right table, matching rows from the left table
Full outerAll rows from both tables
InnerOnly matching rows from both tables
Left antiOnly rows from the left table
Right antiOnly rows from the right table
Power query combine 2 columns into one

This improved speed by 50-60%, but still, way slower than expanding all rows.

The unexpected solution

What turned out to be by far the fastest was to expand the columns to all the new rows and then 'group back'. (see Query3_ReGroupIntegrated)

Table.Group(#'Expanded Expanded Custom', {'Column1', 'Custom'}, {{'CombinedValues', each Text.Combine(_[Animal], '. ') }})

To my surprise, this was even faster than skipping this step (around 2 seconds, instead of 5). Means: This aggregation shortened the table from 676k rows to 26k rows. Of course, loading a shorter table should take less time. But I expected the computation of this aggregation also to take a fair amount of time. So at the end, this was actually less than the time gained by the shorting of the table.

But the surprise didn't stop here. As I know many beginners aren't so comfortable with editing existing code, I tried a different method (see Query4_ReGroupAddColumn): I kept the native 'All rows'-operation and added the same column than in Query2_AddManualAggregation. And it was just as fast/even slightly faster than the fast Query3_ReGroupIntegrated.

So just by adding 2 steps: Expansion of the merged column and Re-Grouping I sped up the query significantly: Another mystery that the M-engine holds for me…

Please share your thoughts about and experiences with this method in the comments below. Maybe MS will change the code behind the 'default-Aggregate'-function, if there is enough evidence that the alternative method proves to be superior and stable in many other cases as well.

Thanks and stay queryious 😉

-->

Merge Tables Excel 2010

A merge queries operation joins two existing tables together based on matching values from one or multiple columns. You can choose to use different types of joins, depending on the output you want.

Merging queries

You can find the Merge queries command on the Home tab, in the Combine group. From the drop-down menu, you'll see two options:

  • Merge queries: Displays the Merge dialog box, with the selected query as the left table of the merge operation.
  • Merge queries as new: Displays the Merge dialog box without any preselected tables for the merge operation.

Identify tables for merging

The merge operation requires two tables:

  • Left table for merge: The first selection, from top to bottom of your screen.
  • Right table for merge: The second selection, from top to bottom of your screen.

Note

The position—left or right—of the tables becomes very important when you select the correct join kind to use.

Select column pairs

After you've selected both the left and right tables, you can select the columns that drive the join between the tables. In the example below, there are two tables:

  • Sales: The CountryID field is a key or an identifier from the Countries table.
  • Countries: This table contains the CountryID and the name of the country.

Merge Tables Power Query Power Bi

Merge dialog box with the Left table for merge set to Sales and the CountryID column selected, and the Right table for merge set to Countries and the CountryID column selected.

The goal is to join these tables by using the CountryID column from both tables, so you select the CountryID column from each table. After you make the selections, a message appears with an estimated number of matches at the bottom of the dialog box.

Note

Although this example shows the same column header for both tables, this isn't a requirement for the merge operation. Column headers don't need to match between tables. However, it's important to note that the columns must be of the same data type, otherwise the merge operation might not yield correct results.

You can also select multiple columns to perform the join by selecting Ctrl as you select the columns. When you do so, the order in which the columns were selected is displayed in small numbers next to the column headings, starting with 1.

For this example, you have the Sales and Countries tables. Each of the tables has CountryID and StateID columns, which you need to pair for the join between both columns.

First select the CountryID column in the Sales table, select Ctrl, and then select the StateID column. (This will show the small numbers in the column headings.) Next, perform the same selections in the Countries table. The following image shows the result of selecting those columns.

![Merge dialog box with the Left table for merge set to Sales, with the CountryID and StateID columns selected, and the Right table for merge set to Countries, with the CountryID and StateID columns selected. The Join kind is set to Left outer.

Join Tables Power Query M

Expand or aggregate the new merged table column

After selecting OK in the Merge Microsoft live meeting issues. dialog box, the base table of your query will have all the columns from your left table. Also, a new column will be added with the same name as your right table. This column holds the values corresponding to the right table on a row-by-row basis.

From here, you can choose to expand or aggregate the fields from this new table column, which will be the fields from your right table.

Power Query Text Functions

Table showing the merged Countries column on the right, with all rows containing a Table. The expand icon on the right of the Countries column header has been selected, and the expand menu is open. The expand menu has the Select all, CountryID, StateID, Country, and State selections selected. The Use original column name as prefix is also selected.

Note

Currently, the Power Query Online experience only provides the expand operation in its interface. The option to aggregate will be added later this year.

Join kinds

A join kind specifies how a merge operation will be performed. The following table describes the available join kinds in Power Query.

Join kindIconDescription
Left outerAll rows from the left table, matching rows from the right table
Right outerAll rows from the right table, matching rows from the left table
Full outerAll rows from both tables
InnerOnly matching rows from both tables
Left antiOnly rows from the left table
Right antiOnly rows from the right table

Fuzzy matching

You use fuzzy merge to apply fuzzy matching algorithms when comparing columns, to try to find matches across the tables you're merging. You can enable this feature by selecting the Use fuzzy matching to perform the merge check box in the Merge dialog box. Expand Fuzzy matching options to view all available configurations.

Note

Fuzzy matching is only supported for merge operations over text columns.





broken image