Nothing better than a “light” topic as warm-up to start the new year : OBIEE Time Series functions (inspired by a thread on OTN forum of few days ago). Actually, it was a “light” topic and started growing while writing …
The thread on OTN I had in mind when writing this post was a question about how to use time series functions without having a date column. The poster had a case with only months and weeks but no daily data, so missing a real date (day) column in the database.
Spoiler Alert
OBIEE doesn’t use or need any time/date function for Time Series, you can use these functions on any kind of data as long as you define chronological keys to be used to sort data.
Time Series functions: a “must have” for an analytical platform
I still didn’t see a single OBIEE model without implementing time series functions soon or late, using the embedded functions or modelling them directly in the RPD or producing the data in the data source by ETL.
Any business need to compare data at a given point in time with the same point in time a month or a year before, or to sum together all the current months of the year to see how close to a yearly forecasted value the reality is. It is also quite common to business rules not using a fixed January 1 to December 31 calendar: the business of an organization doesn’t finish at the end of the year to restart from scratch the day after with the new year, so looking at data by adopting period rolling measures it’s a real life need.
All these are what OBIEE calls Time Series functions: AGO, TODATE and PERIODROLLING.
How does Time Series functions work in OBIEE?
To explain how OBIEE calculate Time Series functions let’s take a really simple example: 2 tables, one is a calendar dimension and the other is a fact table with amount sold and quantity sold, the 2 tables are linked with FK_CALENDAR from the fact table linking to PK_CALENDAR int the dimension.
The example is: month by month, I want to see amounts sold and previous month amounts sold next to it.
The result is a simple table with 3 columns: Month, Amount sold and AGO(Amount sold, “Month”, 1). This last column has a formula in the analysis directly using the AGO function: AGO(“Fact Sales”.”Amount sold”, “Dim Calendar”.”Hierarchy Calendar”.”Month”, 1) :
- “Fact Sales”.”Amount sold” : is the measure I want to retrieve in the past
- “Dim Calendar”.”Hierarchy Calendar”.”Month” : is the logical level name of the time dimension in the RPD (not the month column, but the logical level name of the hierarchy, the UI in the OBIEE front-end doesn’t allow you to select this value in the formula editor window, so you must take note of the name before)
- 1 : is the number of steps to move backward (positive number) or forward (negative number)
How does the physical query generated by OBIEE looks like? How does it retrieve the “AGO” value? What kind of time/date function does it use to move in the past?
Let’s look at the physical SQL :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
WITH SAWITH0 AS (select DENSE_RANK() OVER ( ORDER BY T891.MONTH_CHRON_KEY) as c1, T891.PK_CALENDAR as c2, T891.MONTH_YYYY_MM as c3 from SALES_DIM_CALENDAR T891), SAWITH1 AS (select D1.c1 + 1 as c1, D1.c2 as c2, D1.c3 as c3 from SAWITH0 D1), SAWITH2 AS (select distinct DENSE_RANK() OVER ( ORDER BY T891.MONTH_CHRON_KEY) as c1, T891.MONTH_YYYY_MM as c2 from SALES_DIM_CALENDAR T891), SAWITH3 AS (select sum(T921.AMOUNT_SOLD) as c1, D3.c2 as c2 from SALES_FACT_SALES T921, SAWITH1 D4, SAWITH2 D3 where ( T921.FK_CALENDAR = D4.c2 and D3.c1 = D4.c1 ) group by D3.c2, D4.c3), SAWITH4 AS (select sum(T921.AMOUNT_SOLD) as c1, T891.MONTH_YYYY_MM as c2 from SALES_DIM_CALENDAR T891, SALES_FACT_SALES T921 where ( T891.PK_CALENDAR = T921.FK_CALENDAR ) group by T891.MONTH_YYYY_MM) select D1.c1 as c1, D1.c2 as c2, D1.c3 as c3, D1.c4 as c4 from ( select D1.c1 as c1, D1.c2 as c2, D1.c3 as c3, D1.c4 as c4 from (select 0 as c1, coalesce( D1.c2, D2.c2) as c2, D2.c1 as c3, D1.c1 as c4, ROW_NUMBER() OVER (PARTITION BY coalesce( D1.c2, D2.c2) ORDER BY coalesce( D1.c2, D2.c2) ASC) as c5 from SAWITH3 D1 full outer join SAWITH4 D2 On D1.c2 = D2.c2 ) D1 where ( D1.c5 = 1 ) order by c2 ) D1 where rownum <= 65001 |
For a simple analysis OBIEE generates a query with 5 WITH blocks and finally the main query, not the shortest code …
Where are the time/date functions and calculations?
OBIEE doesn’t use any time/date calculation
Do not try to find them, you will not find any! OBIEE doesn’t use any time/date calculation. Time Series functions in OBIEE are produced by performing operations moving a given number of steps forward or backward from a relative position in the dataset at a given level (logical hierarchy level). It’s a bit like windowing functions in Oracle Database working on a “window” of your dataset.
Looking more in details in the code you will see that SAWITH0 and SAWITH2 use a ranking function (my data source is an Oracle database, so DENSE_RANK) to generating a ranking column. In my example the ranking is performed based on the “MONTH_CHRON_KEY” column: what’s that? Why this column? It’s the chronological key I set in the RPD.
The Chronological key: the driver of Time Series functions
In my example I explicitly created a column in my dataset to highlight what is used by OBIEE when generating the queries, my column as the same format of the “Month” column, so values like YYYY-MM, “2013-01”, “2013-02” etc.
As this column when sorted return the correct order for periods the ranking function assign numbers starting with the first period and increment the value for each row: January 2013 is 1, February 2013 is 2 etc. The physical key is included in SAWITH0 as it’s the column needed to make the join with the fact table later.
The next step OBIEE does in SAWITH1 is to apply the AGO function to align the previous month fact (returned by AGO) with the relative month position. It performs this operation by doing a simple +1 on the ranking column, so January 2013 is now at position 2, February 2013 position 3 etc.
SAWITH2 will then perform a ranking again but this time only retrieving the column I want to display on the screen, not taking into consideration the physical keys. This part of the query is to make sure to retrieve the right “label” for the AGO value (remember: in AGO 1 month the value of January 2013 must be associated with the label “2013-02” because it represents the AGO value for February).
SAWITH3 is performing the real query to get the AGO value, so it queries the fact table, it uses the new ranking created in SAWITH1 and the “label” from SAWITH2 to return the AGO values related to “new” months labels.
SAWITH4 is the normal query you would have when not using Time Series functions, so it’s the current value for a month.
The main query is finally putting these 2 datasets, SAWITH3 and SAWITH4, one next to each other to return the AGO value on the same row as the current value.
As you see there are many steps, and the very first is sorting the time dimension values and assigning to each one a unique contiguous (no gaps) number based on the chronological key of the logical level selected by the Time Series function. The logic will then move around based on that ranking by adding or subtracting positions (a simple mathematical addition or subtraction).
The impact of wrong or poor chronological key
To highlight the fact everything is done based on the chronological key defined in the RPD let’s change my model and set as Chronological another column of the following format: 2013-1, 2013-2, …, 2013-10, etc. It’s really similar to the previous but the month doesn’t have the leading zero.
Interesting result, the AGO function seems to jump around, from January to October, then it’s correct till December and then back to February and again correct till September.
Why does it happen? Because of a poor chronological key…
The ranking on this column is calculated by sorting the strings, and in a string sort 2013-10 comes after 2013-1 and before 2013-2, so October, November and December comes after January. After December the ranking goes back to February.
OBIEE doesn’t care at all if it makes sense from a calendar point of view or not, the tool isn’t aware of that because it’s not a problem. A chronological key has been set, OBIEE uses it without looking at the business meaning of that column.
Reductio ad absurdum: Time Series functions on Country dimension
Because OBIEE translate these functions into backward or forward movements based on the relative position you can also use Time Series functions on something not all time/calendar related.
What about calculating a “one country ago” measure? Not sure why you would like to compare a country with the one just before but … technically you can do it !
Let’s be honest: I’m not saying there is any logic into a “one country ago” formula, it’s just to demonstrate OBIEE doesn’t care about what kind of data you feed it with for Time Series functions.
The only possible context to do Time Series on a not calendar dimension I can imagine is if you do your forecasting/budgeting iterations with Excel or APEX to feed a database table. You then have an analysis to compare deltas between different versions of your forecasting/budgeting iterations. I saw something similar in Planning / Essbase, where the version and scenario dimension are often compared.
As you can see OBIEE doesn’t care at all, it applies what the RPD says: I modelled a Country Dimension as time dimension, I set chronological keys on countries, subregions and regions, I use the AGO function at the country level: I have my results!
Curious about the physical SQL? Exactly the same as before when using months, simply replaced references to the calendar dimension by country dimension.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
WITH SAWITH0 AS (select DENSE_RANK() OVER ( ORDER BY T902.COUNTRY_NAME) as c1, T902.PK_COUNTRY as c2, T902.COUNTRY_NAME as c3 from SALES_DIM_COUNTRY T902), SAWITH1 AS (select D1.c1 + 1 as c1, D1.c2 as c2, D1.c3 as c3 from SAWITH0 D1), SAWITH2 AS (select distinct DENSE_RANK() OVER ( ORDER BY T902.COUNTRY_NAME) as c1, T902.COUNTRY_NAME as c2 from SALES_DIM_COUNTRY T902), SAWITH3 AS (select sum(T921.AMOUNT_SOLD) as c1, D3.c2 as c2 from SALES_FACT_SALES T921, SAWITH1 D4, SAWITH2 D3 where ( T921.FK_COUNTRY = D4.c2 and D3.c1 = D4.c1 ) group by D3.c2, D4.c3), SAWITH4 AS (select sum(T921.AMOUNT_SOLD) as c1, T902.COUNTRY_NAME as c2 from SALES_DIM_COUNTRY T902, SALES_FACT_SALES T921 where ( T902.PK_COUNTRY = T921.FK_COUNTRY ) group by T902.COUNTRY_NAME) select D1.c1 as c1, D1.c2 as c2, D1.c3 as c3, D1.c4 as c4 from ( select D1.c1 as c1, D1.c2 as c2, D1.c3 as c3, D1.c4 as c4 from (select 0 as c1, coalesce( D1.c2, D2.c2) as c2, D2.c1 as c3, D1.c1 as c4, ROW_NUMBER() OVER (PARTITION BY coalesce( D1.c2, D2.c2) ORDER BY coalesce( D1.c2, D2.c2) ASC) as c5 from SAWITH3 D1 full outer join SAWITH4 D2 On D1.c2 = D2.c2 ) D1 where ( D1.c5 = 1 ) order by c2 ) D1 where rownum <= 65001 |
Just a detail: do not try to flag many dimensions as time in the RPD, the server doesn’t like it too much (the consistency check didn’t complain, I was surprised, but the check-in didn’t like it at all !). You end up with a kind of inconsistent result and some strange metadata in your RPD. The best way to clean up is probably to delete the hierarchies completely and rebuild them.
To resume, and some advice
- OBIEE Time Series functions do not need any real date (physical data type column)
- The “key element” of a correct time dimension is the Chronological key
- The Chronological key must be sortable, the sort result defines the behaviour of Time Series functions
- Always set a chronological key for each level of your time dimension hierarchy to avoid unpredictable results
- If your business requires the logic of Time Series functions on something else than a calendar dimension you can do it
- Do not set more than one hierarchy as “time” in the RPD, not accepted by the system
Dear Mr Gianni;
Many Thanks for this useful article, I depend on it to prepare my OBIEE,
on the other hand I have a problem, when I use FORECAST function, it gives the forecast for the year that have data, and sum the others in the next year in one cell, for example I made the forecast for 6 month starting from 2019-09, it sum the data of 2020 in one cell after 2019-12!,
Many Thanks,
Hi,
FORECAST is not a real time series function in OBIEE. It calls a R function in the background, they “just” placed the function in the time series folder in the list of functions. Difficult to figure out what’s going on based on such a short message. Does your time dimensions have content for the future months?
The best would be to open a thread in https://community.oracle.com with full details. It will be a lot easier to understand your case and provide hints.
Dear Gianni,
Thanks for your great post.
In our test environment, we have data for Jan-11, Jan-12, Jan-13, Jan-14, Jan-31. I am using AGO function to show today and last date data. I created a time dimension as in your article, The report showed correct data for Jan-12, Jan-13, Jan-14. However, It doesn’t show any row for Jan-11. I expect it show data for Jan-11 with empty AGO value as in your first picture in this article.
Can you give some advises?
Thanks in advance.
Chuong
Hi,
AGO doesn’t remove the line if if doesn’t have a previous value to return, as you say the AGO value will simply be missing for the first row.
Look at the physical queries generated to find out if you don’t have a filter somewhere making you lose the first date. There is really nothing magic behind AGO, so in the physical query you will probably find the reason for your Jan-11 to not be returned at all.
Thanks a lot, Gianni.
It’s my fault when not drag out the date in the OBIEE Analysis.
Such a nice explanation Gianni. Appreciate it.
It would be great if you explain the other two timeseries functions(periodrolling() and todate()) in the same manner.
Thanks again.
How about comparing to previous business day?
AGO can do it, as long as you have a time dimension containing your meaning of business day.
As explained in the post the tool doesn’t understand what a “business day” is, it will just move around your dataset, whatever it is. If your dataset has the business days defined and configured correctly, the tool will be able to move around that and find you the previous business day.