Quantcast
Channel: bicortex
Viewing all articles
Browse latest Browse all 72

How To Populate Date Dimension Table

$
0
0

I am confident that there is already a number (probably tens if not hundreds) of Internet posts which deal with the process of ‘DATE’ dimension table population and I thought long and hard whether I should spend my time to reiterate this information; however, how we customize this routine is very individual and there is no ‘one size fits all’ doctrine on how to structure the ETL or query to get the best results. Not trying to echo what has already been said, I thought it would be worthwhile to provide a quick rundown on what I have grown accustomed to when it comes to creating or updating the ‘DATE’ dimension, the linchpin of any dimensional data mart model. This process can be further customized to accommodate any changes or additions based on the business needs.

Nearly any dimensional table can be created using an SSIS package but in this case DTS job just adds to maintenance overhead and complexity of the whole solution, especially, that once populated, Dim_Date does not require further updates or ongoing maintenance. Therefore, this post demonstrates only how to do it via SQL. Also, many people use Excel spreadsheet (a good sample can be downloaded from the Kimbal Group’s website) and simply import this into a table via ‘Import Data’ database task or a job/package. It is a great solution; however, in spite of Excel powerful formulae pool, more complex date derivatives cannot be easily generated. Besides, once most of the Dim_Date table structure is set in concrete, it is easy, quick and powerful method which can be reused (partially or fully) in other projects.

Firstly, we need to create an empty table (Dim_Date) to populate with ‘datetime’ values and their derivatives. It will also be assigned a primary key (Date_Key) as a surrogate key which will become a foreign key in a fact table. We also create a unique index on a Date_Time column to enable faster lookups. The code for all these functions is as follows:

IF OBJECT_ID('Dim_Date') IS NOT NULL DROP TABLE Dim_Date

GO

 CREATE TABLE [dbo].[Dim_Date](
 [Date_Key] [int] NOT NULL,
 [Date_Time] [datetime] NOT NULL,
 [Calendar_Date] [date] NOT NULL,
 [Calendar_Year] [smallint] NOT NULL,
 [Calendar_Quarter] [char](5) NOT NULL,
 [Calendar_Quarter_Number] [smallint] NOT NULL,
 [Calendar_Year_Quarter_Name] [varchar](65) NOT NULL,
 [Calendar_Month] [smallint] NOT NULL,
 [Calendar_Month_Number] [smallint] NOT NULL,
 [Calendar_Month_Name] [varchar](30) NOT NULL,
 [Calendar_Year_Month_Name] [varchar](35) NOT NULL,
 [Calendar_Month_Start] [smallint] NOT NULL,
 [Calendar_Month_End] [smallint] NOT NULL,
 [Calendar_Week] [smallint] NOT NULL,
 [Week_Day] [int] NOT NULL,
 [Week_Day_Name] [varchar](30) NOT NULL,
 [Week_Day_Type] [varchar](7) NOT NULL,
 [Days_In_Calendar_Year] [smallint] NOT NULL,
 [Days_In_Calendar_Month] [smallint] NOT NULL,
 [Days_In_Calendar_Week] [smallint] NOT NULL,
 [Weeks_In_Calendar_Month] [smallint] NOT NULL,
 [Weeks_In_Calendar_Year] [smallint] NOT NULL,
 [Public_Holiday_Flag] [char] (1) NULL,        --Varies depending on the location e.g. country, region
 [Financial_Year] [smallint] NOT NULL,         --Based on Australian standard financial year start and end dates
 [Financial_Quarter] [smallint] NOT NULL       --Based on Australian standard financial year start and end dates

CONSTRAINT [PK_DimDate] PRIMARY KEY CLUSTERED
(
[Date_Key] ASC
)
) ON [PRIMARY]
GO
CREATE UNIQUE INDEX Idx_DimDate ON Dim_Date(Calendar_Date)
GO
SET DATEFIRST 1                                --If first day of week is a Monday. If first day of week = Saturday then set to 6
GO

Next, once the table has been created, we can populate it with a query which, as mentioned earlier, has a variable content,depending on every organization case e.g. financial year may be based around different calendar dates depending on the country standard, public holidays vary from not only country to country but also state to state etc. The table’s start date and end dates are also variable, depending on the business’s individual needs i.e. how far back in history or how far into the future in time you would like to go. Let’s populate the table with the following two queries.


 WITH DateCTE AS
(
  SELECT CAST('2010/01/01' as datetime) AS Date_Value
  UNION ALL
  SELECT DATEADD(dd, 1, Date_Value)
  FROM DateCTE
  WHERE DATEADD(dd, 1, Date_Value) <= '2019/12/31'
)

INSERT INTO Dim_Date
SELECT CAST(CONVERT(CHAR(8),CAST(Date_Value as DATETIME),112) as INT) as Date_Key,
Date_Value as Date_Time,
Date_Value as Calendar_Date,
YEAR(Date_Value) AS Calendar_Year,
CAST(YEAR(Date_Value) AS CHAR(4)) + CAST(DATEPART(Quarter ,Date_Value) AS CHAR(1)) AS Calendar_Quarter,
DATEPART(Quarter ,Date_Value) as Calendar_Quarter_Number,
DATENAME(YEAR,Date_Value) + ' Qtr' + DATENAME(QUARTER,Date_Value) AS Calendar_Year_Quarter_Name,
CAST(CONVERT(CHAR(6),CAST(Date_Value AS DATETIME),112) AS INT) AS Calendar_Month,
DATEPART(m,Date_Value) AS Calendar_Month_Number,
DATENAME(MONTH,Date_Value) AS Calendar_Month_Name,
CAST(YEAR(Date_Value) AS CHAR(4)) + ' ' + DATENAME(MONTH,Date_Value) AS Calendar_Year_Month_Name,
CONVERT(CHAR(6),CAST(Date_Value AS DATETIME),112) + '01' AS Calendar_Month_Start,
CAST(CONVERT(CHAR(8),CAST(CONVERT(VARCHAR(12), dateadd(day,-1 * day(dateadd(month,1,Date_Value)),dateadd(month,1,Date_Value)),113) AS DATETIME),112) AS INT) AS Calendar_Month_End,
CAST(YEAR(Date_Value) AS CHAR(4)) + CAST(DATEPART(wk ,Date_Value) AS VARCHAR(2)) AS Calendar_Week,
DATEPART(dw, Date_Value) AS Week_Day,
DATENAME(dw, Date_Value) AS Week_Day_Name,
CASE WHEN DATENAME(dw, Date_Value) IN ('Saturday','Sunday') THEN 'WeekEnd' ELSE 'WeekDay' END AS Week_Day_Type,  --Day_Type_Code function from previous post may also be used here
COUNT(DATEPART(wk ,Date_Value)) OVER (PARTITION BY YEAR(Date_Value)) AS Days_In_Calendar_Year,
COUNT(*) OVER (PARTITION BY CAST(CONVERT(CHAR(6),CAST(Date_Value AS DATETIME),112) AS INT)) AS Days_In_Calendar_Month,
COUNT(*) OVER (PARTITION BY CAST(YEAR(Date_Value) AS CHAR(4)) + CAST(DATEPART(wk ,Date_Value) AS VARCHAR(2)) ) AS Days_In_Calendar_Week ,
(SELECT DISTINCT COUNT(DATEPART(wk,b.Date_Value)) OVER (PARTITION BY DATEPART(m,b.Date_Value)) AS Weeks_In_Calendar_Month
FROM DateCTE b  WHERE
DATEPART(m,a.Date_Value) = DATEPART(m,b.Date_Value)
AND YEAR(a.Date_Value) = YEAR(b.Date_Value)
AND DATEPART(dw, Date_Value) = (1)
) AS Weeks_In_Calendar_Month,
(SELECT DISTINCT COUNT(DATEPART(wk ,Date_Value)) OVER (PARTITION BY YEAR(Date_Value)) AS Weeks_In_Calendar_Year
FROM DateCTE b
WHERE YEAR(a.Date_Value) = YEAR(b.Date_Value)
AND DATEPART(dw, Date_Value) = (1)
) AS Weeks_In_Calendar_Year,
NULL,
CASE WHEN datepart(mm, (select dateadd(d,datediff(d,0,dateadd(s,-1,dateadd(m,datediff(m,0,Date_Value)+1,0))),0))) > 6
THEN datepart(yy, (select dateadd(d,datediff(d,0,dateadd(s,-1,dateadd(m,datediff(m,0,Date_Value)+1,0))),0))) + 1
ELSE datepart(yy,(select dateadd(d,datediff(d,0,dateadd(s,-1,dateadd(m,datediff(m,0,Date_Value)+1,0))),0))) END AS Finanical_Year,
CASE WHEN datepart(mm, Date_Value) BETWEEN 7 AND 9 THEN 1
WHEN datepart(mm, Date_Value) BETWEEN 10 AND 12 THEN 2
WHEN datepart(mm, Date_Value) BETWEEN 1 AND 3 THEN 3
WHEN datepart(mm, Date_Value) BETWEEN 4 AND 6 THEN 4 END as Finacial_Quarter
FROM DateCTE a
ORDER BY Date_Key
OPTION (MAXRECURSION 0)
GO

The above query is pretty straightforward – create CTE (Common Table Expression), populate it with a range of dates also using query hint (MAXRECURSION) to provide looping and finally create all date derivatives i.e. attributes of the Dim_Date dimension that the business can potentially look for/need as far as reporting and analytics functions. It is essential to ensure that the most common or requested date derivatives are calculated during this process rather than during report query building or cube/fact table population as this will enable the creation of a central repository for date data, also ensuring that unnecessary server resources are not wasted during further ETL processes, cube population, query execution etc. In most cases this can be established early on i.e. during the business requirements gathering phase or looking at sample legacy reports. Also, as this information is fairly rigid and not susceptible to ongoing changes (unlike other dimensions requiring Slowly Changing Dimensions functionality to cater for future alterations), future updates and maintenance should be fairly easy.
Notice that Public_Holiday_Flag attribute in the Dim_Date table can accommodate NULL values and that during the population process we insert NULLs into it. This is because this column will be populated separately due to its nature i.e. these values cannot be computed or derived and need to be inserted manually. The existence of this field is not applicable to all environments but anywhere where KPIs are calculated based on a number of days from or to an event and the exclusion of public holidays is taken into account, this flag is necessary. Unfortunately, this process requires manual updates to a separate lookup table where public holidays are stored although I have seen automated attempts to pull this information from an online feed instead. As much as I am an “automation junkie”, I tend to do it manually as it only takes 2 minutes to gather the date values for the whole year. To populate public holidays flag field I’ve created a control table which I update once a year based on the yearly public holidays calendar issued by the state government. As in Australia not all public holidays are celebrated in all states equally, this may require some modifications based on your business requirements e.g. you may have a number of dates pointing to public holidays for each state/province.

CREATE TABLE [dbo].[Public_Holidays](
	[Public_Holiday_Date] [date] NOT NULL
) ON [PRIMARY]
GO

All there is left to finish the Dim_Date table population is to issue an UPDATE statement to populate the column according to the Public_Holidays dates as per below:

UPDATE Dim_Date
SET Public_Holiday_Flag =  'Y'
FROM Dim_Date a INNER JOIN Public_Holidays b
ON a.Calendar_Date = b.Public_Holiday_Date
GO
UPDATE Dim_Date
SET Public_Holiday_Flag =  'N'
WHERE Public_Holiday_Flag IS NULL
GO

This is just one of the ways to create a ‘DATE’ dimention table used for a data mart dimentional model. Some of those attributes we calculated and pushed into the table may be obsolete for your project and some may not be there at all. This is why ‘DATE’ dimension table, although common to all data marts, is an enterprise-specific object and which attributes you use to populate it are driven by your business requirements and needs. If a different data derivative or date format is required, there is no reason why it should not be implemented during this process as apposed to during query structuring for a given output (this really is a separate topic for a separate post). Remember, dimension tables are supposed to be wide, as opposed to fact tables, which are typically long. Having a few extra columns will not hurt a great deal and the CPU cycles we save by ensuring all required attributes are already pre-computed can go a long way during an intensive ETL or report execution process.


Viewing all articles
Browse latest Browse all 72

Latest Images

Trending Articles



Latest Images