65.9K
CodeProject is changing. Read more.
Home

Spilt DateTime Ranges in SQL Server

Jul 1, 2023

CPOL

2 min read

viewsIcon

9564

downloadIcon

21

Split date time ranges into days or possibly other ranges using CTE in SQL Server

Background

I had a case where I had to convert a date range into a set of rows. For example, I had a record saying a product discount from 1st June till 10th June. This single row needs to be transformed into 10 rows, one for each day of sales. How could I do this in a fast and scalable manner, since I had thousands and thousands of records which might result in millions? That's not all.

  • There could be different types of products
  • Overlapping date ranges, leading to more discounts (sum)

Split Ranges to Day Wise Range

Here, we are going to convert the date range to possible day records.

Data

DECLARE @tblDateRange TABLE (
    Id INT IDENTITY(1, 1),
    TypeId INT,
    FromDateTime DATETIME,
    ToDateTime DATETIME
)
INSERT INTO @tblDateRange (TypeId, FromDateTime, ToDateTime)
VALUES
(1, '2023-01-01 10:00:00.000', '2023-01-01 10:00:00.000'),   --start and end is same
(2, '2023-02-02 00:00:00.000', '2023-02-04 23:59:59.000'),   --start and end is start 
                                                             --and end of a day
(3, '2023-03-05 10:00:00.000', '2023-03-06 23:59:59.000'),   --start in middle of 
                                                             --another date
(4, '2023-04-07 00:00:00.000', '2023-04-08 21:00:00.000'),   --end in middle of 
                                                             --another date
(5, '2023-05-09 11:00:00.000', '2023-05-11 11:00:00.000'),   --start and end in 
                                                             --middle of another date
(6, '2023-06-01 10:00:00.000', '2023-06-01 22:00:00.000');   --start and end in 
                                                             --middle of same date

Split Query

In this split process, we are using recursion to populate new rows inside a CTE.

WITH 
DateRanges(Id, LevelNo, [Date], DateWiseStartDateTime, DateWiseEndDateTime)
AS
(
    SELECT 
        p.Id,
        1,
        CAST(p.FromDateTime AS DATE),
        p.FromDateTime,
        IIF(DATEADD(SECOND, -1, DATEADD(DAY, DATEDIFF(DAY, 0, p.FromDateTime) + _
        1, 0)) < p.ToDateTime, DATEADD(SECOND, -1, _
        DATEADD(DAY, DATEDIFF(DAY, 0, p.FromDateTime) + 1, 0)),  p.ToDateTime)    
    FROM @tblDateRange p
    UNION ALL
    SELECT 
        c.Id,
        p.LevelNo + 1,
        CAST(DATEADD(DAY, DATEDIFF(DAY, 0, _
             DATEADD(DAY, 1, p.DateWiseStartDateTime)), 0) AS DATE),
        DATEADD(DAY, DATEDIFF(DAY, 0, DATEADD(DAY, 1, p.DateWiseStartDateTime)), 0),
        IIF(c.ToDateTime < DATEADD(DAY, 1, p.DateWiseEndDateTime), _
            c.ToDateTime, DATEADD(DAY, 1, p.DateWiseEndDateTime))
    FROM @tblDateRange c
    JOIN DateRanges p ON c.Id = p.Id
    WHERE DATEADD(DAY, DATEDIFF(DAY, 0, _
          DATEADD(DAY, 1, p.DateWiseStartDateTime)), 0) < _
                  c.ToDateTime    --date wise start datetime < range to datetime
)
SELECT 
    *
FROM DateRanges
ORDER BY Id, LevelNo
OPTION (MAXRECURSION 30000) -- default 100, max 32767, no limit 0

By default, it is going to support 100 recursive calls, and we don't need to use OPTION (MAXRECURSION 30000). If it requires more than that, we need to use that OPTION clause where the max value can be 32767. Use 0 for no limit.

Join Split Rows to Actual Rows

Let's join the split rows with actual data. Here is the new select query in the CTE:

SELECT 
    d.*,
    r.LevelNo,
    r.Date,
    r.DateWiseStartDateTime,
    r.DateWiseEndDateTime
FROM @tblDateRange d
JOIN DateRanges r ON d.Id = r.Id
ORDER BY d.Id, r.LevelNo
OPTION (MAXRECURSION 30000) -- default 100, max 32767, no limit 0

Limitations

Imagine we have ranges for 10 or 20 years. For each year, it's going to generate about 356 rows each year. The end query will be a bit slow. Another way is splitting them into small ranges based on overlapping and nonoverlapping ranges.

Split Ranges to Overlapping/Related Ranges

Here, we are going to convert the date range to all possible small ranges. Here is an idea:

1                                                                     12
|-------|-----------------------------|---------|-----------|--------|
        2                              6
        |-----------------------------|
                                                8            10
                                                |-----------|
Input
Range       Part
1-12        1
2-6         1
8-10        1


Expected Result
NewRange    SUM of Parts
1-2            1
3-6            2
7-8            1
9-10           2
11-12          1

As we can see, here overlapping and nonoverlapping ranges are available.

Data

DECLARE @tableDataRanges TABLE(
    Id INT IDENTITY(1, 1),
    TypeId INT,
    FromDateTime DATETIME,
    ToDateTime   DATETIME,
    Points    INTEGER
);
INSERT INTO @tableDataRanges (TypeId, FromDateTime, ToDateTime, Points)
VALUES      
(1, '2023-01-13', '2023-01-20 23:59:59', 20),            --no overlapping, no split
(1, '2023-02-10', '2023-02-20 23:59:59', 10),            --15-20 overlapping, 
                                                         --should be splited in two
(1, '2023-02-15', '2023-02-25 23:59:59', 10),            --15-20 overlapping, 
                                                         --should be splited in two

(2, '2023-02-10 12:00:00', '2023-02-20 23:00:00', 20),   --15-20 overlapping, 
                                                         --should be splited in two
(2, '2023-02-15 06:00:00', '2023-02-25 23:59:59', 20),   --15-20 overlapping, 
                                                         --should be splited in two

(3, '2023-02-10', '2023-02-20 23:59:59', 30),            --has overlapping, but no split
(3, '2023-02-05', '2023-02-25 23:59:59', 20);            --10-20 overlapping, 
                                                         --should be splited in three

Split Query

WITH 
PosibleStartDateTimes
AS 
(
    SELECT FromDateTime AS StartDateTime, TypeId, 1 AS Cover
    FROM   @tableDataRanges
    UNION ALL
    SELECT DATEADD(SECOND, 1, ToDateTime) AS StartDateTime, TypeId, -1 AS Cover
    FROM  @tableDataRanges
)
,AggregatedStartDateTimes
AS 
(
    SELECT StartDateTime, TypeId, SUM(Cover) AS Cover
    FROM PosibleStartDateTimes
    GROUP BY StartDateTime, TypeId
)
,StartDateTimeToRanges
AS 
(
    SELECT 
        StartDateTime AS FromDateTime,
        TypeId,
        LEAD(StartDateTime) OVER (PARTITION BY TypeId ORDER BY StartDateTime) _
                                  AS ToDateTime,
        SUM(Cover) OVER (PARTITION BY TypeId ORDER BY StartDateTime) AS NumberOfParts
    FROM AggregatedStartDateTimes
)
,PossibleRanges
AS
(
    SELECT 
        FromDateTime,
        DATEADD(SECOND, -1, ToDateTime) AS ToDateTime,
        TypeId,
        NumberOfParts
    FROM StartDateTimeToRanges
    WHERE NumberOfParts > 0
)
SELECT 
    *    
FROM PossibleRanges
--WHERE FromDateTime < ToDateTime --optional
ORDER BY TypeId, FromDateTime;

We have different TypeId data in the table. That's why in the OVER clause, we are using PARTITION BY TypeId partitioning by row type.

Join Split Rows to Actual Rows

Let's join the split range rows with actual data. Here is the new select query in the CTE:

SELECT 
    d.*,
    r.NumberOfParts,
    r.FromDateTime AS RangeFromDateTime,
    r.ToDateTime AS RangeToDateTime
FROM @tableDataRanges d
JOIN PossibleRanges r ON d.TypeId = r.TypeId _
     AND (d.FromDateTime <= r.FromDateTime AND d.ToDateTime >= r.ToDateTime)
ORDER BY d.Id, r.FromDateTime;

Others

If there is no type difference, simply we can do a query without using PARTITION BY:

DECLARE @tableRanges TABLE(
    Id INT IDENTITY(1, 1),
    FromDateTime DATETIME,
    ToDateTime   DATETIME,
    Points    INTEGER
);
INSERT INTO @tableRanges (FromDateTime, ToDateTime, Points)
VALUES      
('2018-01-01', '2018-01-31 23:59:59', 80),
('2018-01-07', '2018-01-10 23:59:59', 10),
('2018-01-07', '2018-01-31 23:59:59', 10),
('2018-01-11', '2018-01-31 23:59:59', 5),
('2018-01-25', '2018-01-27 23:59:59', 5),
('2018-02-02', '2018-02-23 23:59:59', 100);

WITH 
PosibleStartDateTimes
AS 
(
    SELECT FromDateTime AS StartDateTime, Points, 1 AS Cover
    FROM   @tableRanges
    UNION ALL
    SELECT DATEADD(SECOND, 1, ToDateTime) AS StartDateTime, -1 * Points, -1 AS Cover
    FROM  @tableRanges
)
,AggregatedStartDateTimes
AS 
(
    SELECT StartDateTime, SUM(Points) AS Points, SUM(Cover) AS Cover
    FROM PosibleStartDateTimes
    GROUP BY StartDateTime
)
,StartDateTimeToRanges
AS 
(
    SELECT 
        StartDateTime AS FromDateTime,
        LEAD(StartDateTime) OVER (ORDER BY StartDateTime) AS ToDateTime,
        SUM(Points) OVER (ORDER BY StartDateTime) AS Points,
        SUM(Cover) OVER (ORDER BY StartDateTime) AS NumberOfParts
    FROM AggregatedStartDateTimes
)
,PossibleRanges
AS
(
    SELECT 
        FromDateTime,
        DATEADD(SECOND, -1, ToDateTime) AS ToDateTime,
        Points,
        NumberOfParts
    FROM StartDateTimeToRanges
    WHERE NumberOfParts > 0
)
SELECT 
    *    
FROM PossibleRanges
--WHERE FromDateTime < ToDateTime --optional
ORDER BY FromDateTime;

References

Conclusion

Nonrecursive PARTITION BY query was the fastest of them all

Code/Environment

Find the same code .sql file as the zip, if things not working as expected let me know. Tested code in Microsoft SQL Server 2019 (RTM) - 15.0.2000.5 (X64).

History

  • 1st July, 2023: Initial version