Fixing the missing values issue with Datama “Fill in null value” function

Have you ever faced challenges in properly attributing metrics across different dimensions when analyzing KPIs?

Consider this scenario: you have two separate tables. The first breaks down transaction data by country and user gender. The second details product sales and revenue, segmented by product name and country.

Now, here’s the challenge—how can you reliably assess the impact of user gender on sales revenue for different product names, when gender isn’t directly linked to products in the data ?

Figure 1 Source

Or take another common situation in e-commerce performance analysis. Suppose you have reliable aggregated data on sessions and revenue coming from your back-end systems (see Figure 2). It’s trustworthy and complete—but not very flexible.

What if you want to analyze how this actual revenue evolves specifically on mobile devices? Unlike what you get from your favorite Web analytics tool API, like Google Analytics (see Figure 3), which often provides mobile revenue metrics that don’t quite align with the back-end figures, you’re now stuck with a mismatch. So how can you bridge that gap to get meaningful, device-specific insights from your most accurate data source?

Figure 2 Source

Figure 3 Source

At the heart of all these examples lies a common issue: reattributing metrics to the dimensions you wish you had, but that don’t actually exist in the current data structure. The key step in tackling this challenge is what we call “Fill in Null Values”.

In this article, we’ll walk through how to solve this type of problem using Datama, leveraging one of the newest features in the “Append” block: the Fill in Null Values function. This feature allows you to intelligently redistribute an aggregate or misaligned value across the desired dimensions—even when those dimensions aren’t fully populated in your dataset—by filling null values across all relevant rows in the target column.

Conceptually, this function works by using the distribution of other available metrics already broken down by the dimension you’re targeting. Take our first example: while Transactions aren’t split by Product Name, we do have Quantity and Revenue split that way. So, we can reattribute Transactions across Product Name using the proportional distribution of either Quantity or Revenue—whichever makes more sense for the analysis. This ensures that the distribution remains consistent with the patterns already present in your data.

In the next sections, we’ll demonstrate how this works in practice using the two use cases we introduced earlier.

Use case 1:

Let’s now focus on a specific case: analyzing the evolution of key performance indicators—namely, the number of purchases, the average number of products per purchase, and the average price per sold product—for Product A between Q3 and Q4.

To do this, we start by setting up a market equation in the Datama Compare solution. This equation will serve as the foundation for breaking down the change in overall performance into its contributing factors, helping us isolate what’s driving the movement of each KPI over time. (read more about Datama Compare):

Figure 4

To make the equation work, we need to slice performance by Client gender and Product name for all metrics. Here is what to do for this data preparation.

Step 1 Put two tables into one

We begin by adding the two unrelated tables shown on the Figure 1 above together within the Datama Prep block. In this block, we utilize the Append, designed to merge datasets that do not have an existing relationship. This function allows us to combine the tables into one cohesive dataset, aligning them vertically by stacking rows from both sources.

Figure 5

Then we have a table that only merged all rows but hasn’t reattributed all metrics into all dimensions:

Figure 6 Source

Step 2 Fill in null values

The magic thing is that, by activating the function fill in null value at Append step in Datama, we can reattribute the transactions into each individual row of Product name using Quantity as attribution key, and distribute Quantity and Revenue by Client Gender based on Transactions proportion.

Figure 7

Step 3 table reattributed

By clicking on ‘Apply’, you will immediately see the reattributed table as below. Here, the distribution of each Country and Product name is based on Quantity, and the splitting of Client gender for each country is based on Transactions. Don’t be surprised with the decimal number for Quantity and Transactions , which seems to be out of logic, but it’s correctly weighted. If you sum all these numbers, it will return the same total as before(see Figure 6).

So we now have a reattributed table with all available metrics for each row of dimensions.

Figure 8 Source

Step 4 Datama Compare analysis

Then comes the easy analysis in Datama compare to see how the changes of KPIs (number of purchases, average number of products per purchase and average price per sold product) impacted the sales revenue between Q3 and Q4. Thanks to the flexibility of Datama’s filter feature, we can easily apply filters across any available dimension. In this case, we simply filter by Product A, ensuring our analysis focuses solely on this specific category throughout the comparison.

Figure 9

From Q3 to Q4, we observe an overall increase in sales revenue for Product A. This growth is primarily driven by contributions from the USA and France. However, both the average number of items sold per purchase and the average basket size have actually declined during this period.

Use case 2:

In the second use case, we’re working with reliable Sessions and Revenue data sourced from the back end—both of which are aggregated and trustworthy. However, we want to break down performance using typical GA dimensions like Device or Browser.

For example, let’s say we want to analyze the evolution of actual revenue generated on Mobile only—not just the figures reported directly by GA, which can often include mismatches—but the version corrected to align with back-end data. This allows for a more accurate and granular view of performance across devices, based on verified data.

Figure 10 Source

To achieve this, we need to distribute Sessions from IT(back-end records) based on GA distribution, and similarly Revenue IT(back-end records) based on Revenue GA distribution.

Step by step demonstration is as follows:

We create a market equation for Datama compare:

Figure 11

Step 1 Put two tables into one

Add two sources and append them together and we got the table merged.

Figure 12

Figure 13 Source

Step 2 Fill in null values

Match the metrics respectively so that aggregated sessions and revenue are distributed in each row.

Figure 14

Step 3 table reattributed

You will immediately get the reattributed table:

Figure 15 Source

Step 4 Datama Compare analysis

Last but not least, with just one click, Datama allows you to instantly surface all changes across GA dimensions between two comparable periods. Whether you’re looking at Device, Browser, Country, or any other dimension, you get a clear and structured view of what’s driving the shifts—making deep-dive analysis faster and far more intuitive.

Figure 16

Now, we can see that the increase in actual revenue is primarily driven by an improved conversion rate, particularly from traffic coming through the Chrome browser. While the consent rate had a negative impact on revenue, its effect wasn’t as significant as the positive contribution from the improved conversion rate.

Conclusion

Fill in null value is especially powerful in cases where we have incomplete data, like missing values or unfilled metrics, allowing us to give every row a proportional value, ensuring consistency and accuracy in our analysis. This function works by distributing values based on the share of available metrics, ensuring that the reattribution maintains a similar trend to the existing data. This approach prevents data distortion or misleading analysis, as it preserves the natural proportions of the dataset while filling in missing values. By leveraging this method, we can achieve a more balanced and accurate representation of the data without artificially inflating or skewing the results.