My Collections: May 2012

Wednesday, May 30, 2012

Calculating Running Totals - SQLTeam.com

Roger writes "Is there a way to retrive a field value from the previously read row in order to use it to calculate a field in the current row . . ."

For example:
day sales cumu_total 1 120 120 2 60 180 3 125 305 4 40 345

In order to calculte the cumulative total I need to know the previous cumulative total (cumulative total could be any other calculation). I did this in MySQL by using variables to temporarily hold values between rows but SQL server doesn't assinging variables in retrieval statements. Is it possible to do this in a Select statement?"

The answer to the questions is "yes", you can solve this problem with a single SELECT statement. Instead of just showing the solution that came to mind, though, I want to take a look at three ways to solve the problem.

Creating the Test Data

I used the code shown below to create the base table and a few thousand rows of test data. It is important to add enough data so that the efficiency of the three solutions can be accurately measured. In other words, with only a few rows of data they all seem efficient.

CREATE TABLE Sales (DayCount smallint, Sales money)
CREATE CLUSTERED INDEX ndx_DayCount ON Sales(DayCount)
go
INSERT Sales VALUES (1,120)
INSERT Sales VALUES (2,60)
INSERT Sales VALUES (3,125)
INSERT Sales VALUES (4,40)

DECLARE @DayCount smallint, @Sales money
SET @DayCount = 5
SET @Sales = 10

WHILE @DayCount < 5000
 BEGIN
  INSERT Sales VALUES (@DayCount,@Sales)
  SET @DayCount = @DayCount + 1
  SET @Sales = @Sales + 15
 END

The Three Solutions

The three different solutions I tested are shown below. The execution time with and without a clustered index on DayCount is shown at the top of each batch.

Solution 1: Temp Table/Cursor

(NoIndex = 2 secs, Index = 2 secs)

CREATE TABLE #Sales (DayCount smallint, Sales money, RunningTotal money)

DECLARE @DayCount smallint,
        @Sales money,
        @RunningTotal money

SET @RunningTotal = 0

DECLARE rt_cursor CURSOR
FOR
SELECT DayCount, Sales
FROM Sales

OPEN rt_cursor

FETCH NEXT FROM rt_cursor INTO @DayCount,@Sales

WHILE @@FETCH_STATUS = 0
 BEGIN
  SET @RunningTotal = @RunningTotal + @Sales
  INSERT #Sales VALUES (@DayCount,@Sales,@RunningTotal)
  FETCH NEXT FROM rt_cursor INTO @DayCount,@Sales
 END

CLOSE rt_cursor
DEALLOCATE rt_cursor

SELECT * FROM #Sales ORDER BY DayCount

DROP TABLE #Sales

Solution 2: The "Celko" Solution

(NoIndex = 25 secs, Index = 20 secs)

SELECT DayCount,
       Sales,
       Sales+COALESCE((SELECT SUM(Sales) 
                      FROM Sales b 
                      WHERE b.DayCount < a.DayCount),0)
                         AS RunningTotal
FROM Sales a
ORDER BY DayCount

Solution 3: The "Guru's Guide" Solution

(NoIndex = 38 secs, Index = 17 secs)

SELECT a.DayCount,
       a.Sales,
       SUM(b.Sales)
FROM Sales a
CROSS JOIN Sales b
WHERE (b.DayCount <= a.DayCount) AS RunningTotal
GROUP BY a.DayCount,a.Sales
ORDER BY a.DayCount,a.Sales

Solution 1 uses a temporary table and a cursor and executes extremely fast. I must admit that my original intent was to show this solution as the way *not* to solve the problem. In general, I tell developers to avoid cursors at all costs. In this particular case, however, the cursor approach is much quicker than the single SELECT solutions.

Solution 2 is the one that came to mind when I originally read the question. I attribute the solution to Joe Celko, because it can be found in his SQL for Smarties book and I'm sure I learned about it in one of his posts in the SQL Server newsgroups.

The tricky part about this solution is the use of a correlated subquery and the COALESCE function. A correlated subquery is one that executes for each row in the outer query. The inner query sums the Sales for rows with a DayCount value less than the current row.

The COALESCE function returns the first non-NULL value in its list of values. The first row processed returns a NULL for SUM(Sales) because there is no DayCount less than it's value. When this happens a 0 is returned and added to the Sales value.

Solution 3 is labeled the "Guru's Guide" because it can be found in Ken Henderson's enormously popular "The Guru's Guide to Transact-SQL." I would have never come up with this solution, but I guess that's what makes Ken the Guru and me a Guru wannabee.

This solution uses a CROSS JOIN and table aliases to join the Sales table with itself. This causes each row in the left table (Sales a) to be joined with each row in the right table (Sales b) where the DayCount in b in less than the DayCount in a. The SUM(b.Sales) and the GROUP BY a.DayCount, a.Sales then allow the running total for each row to be calculated. If you are having trouble figuring out how this works draw a layout of two instances of Sales side-by-side, and then draw lines from a row in the left table to the rows in the right table that meet the WHERE condition.

When I first ran the tests I did not have a clustered index on DayCount. Upon realizing DayCount is a prime candidate for a clustered index, I added it and re-ran the tests. The execution time for Solution 3 was significantly reduced after the index was created because of the number of rows joined by the CROSS JOIN.

And as an FYI, the server I used to tests these solutions is a Compaq ML370 with dual 600's and 1 GB RAM--your results may vary.

More than 3 requests, I'll translate this to Chinese.
超过3个请求，我就会把这篇文章翻译成中文。

Thursday, May 17, 2012

Repeat until loop : Repeat until « Procedure Function « SQL / MySQL

Repeat until loop

  

mysql>

mysql> delimiter //

mysql> create procedure test_repeat (IN in_count INT)

    -> BEGIN

    ->     declare count INT default 0;

    ->

    ->     increment: repeat

    ->         set count = count + 1;

    ->         select count;

    ->         until count > 10

    ->     end repeat increment;

    ->

    -> END

    -> //

Query OK, 0 rows affected (0.00 sec)



mysql>

mysql> delimiter ;

mysql>

mysql> call test_repeat(10);

+-------+

| count |

+-------+

|     1 |

+-------+

1 row in set (0.00 sec)



+-------+

| count |

+-------+

|     2 |

+-------+

1 row in set (0.00 sec)



+-------+

| count |

+-------+

|     3 |

+-------+

1 row in set (0.01 sec)



+-------+

| count |

+-------+

|     4 |

+-------+

1 row in set (0.01 sec)



+-------+

| count |

+-------+

|     5 |

+-------+

1 row in set (0.01 sec)



+-------+

| count |

+-------+

|     6 |

+-------+

1 row in set (0.01 sec)



+-------+

| count |

+-------+

|     7 |

+-------+

1 row in set (0.01 sec)



+-------+

| count |

+-------+

|     8 |

+-------+

1 row in set (0.01 sec)



+-------+

| count |

+-------+

|     9 |

+-------+

1 row in set (0.01 sec)



+-------+

| count |

+-------+

|    10 |

+-------+

1 row in set (0.01 sec)



+-------+

| count |

+-------+

|    11 |

+-------+

1 row in set (0.01 sec)



Query OK, 0 rows affected (0.01 sec)



mysql>

mysql> drop procedure test_repeat;

Query OK, 0 rows affected (0.00 sec)

More than 3 requests, I'll translate this to Chinese.
超过3个请求，我就会把这篇文章翻译成中文。

Monday, May 07, 2012

Learn it the soft way: OneToMany relationship in Doctrine2 and Symfony2 for dummies

It may seem obvious to a lot, but ...

THE CLASSES

In a forum, there are Authors. These authors can post messages. Every Message has one and only one Author, and an Author can write many Messages : this is typically a OneToMany relationship.
Here are the equivalent classes :

/**
 * @ORM\Entity
 * @ORM\Table(name="Author")
 * @ORM\Entity(repositoryClass="Acme\DemoBundle\AuthorRepository")
 */
class Author
{
  
    /**
     * @ORM\Id
     * @ORM\Column(type="integer")
     * @ORM\GeneratedValue(strategy="AUTO")
     * @var int
     */
    protected $id;
  
    /**
     * @ORM\OneToMany(targetEntity="Message", mappedBy="author", cascade={"persist", "remove"})
     */
    protected $messages;
     
    /**
     * Add message
     *
     * @param Acme\DemoBundle\Entity\Message $message
     */
    public function addMessage(Acme\DemoBundle\Entity\Message $message)
    {
        $this->mesages[] = $message;
        $message->setAuthor($this);
        return $this;
    }
     
    /**
     * Get messages
     *
     * @return Doctrine\Common\Collections\Collection 
     */
    public function getMessages()
    {
        return $this->messages;
    }
 
     
    public function __construct()
    {
        $this->messages = new \Doctrine\Common\Collections\ArrayCollection();
    }
}
 
 
/**
 * @ORM\Entity
 * @ORM\Table(name="Message")
 * @ORM\Entity(repositoryClass="Acme\DemoBundle\Repository\MessageRepository")
 */
class Message
{
    /**
     * @ORM\Id
     * @ORM\Column(type="integer")
     * @ORM\GeneratedValue(strategy="AUTO")
     * @var int
     */
    protected $id;
     
    /**
     * @ORM\ManyToOne(targetEntity="Author", inversedBy="messages", cascade={"persist", "update"})
     * @ORM\JoinColumn(name="author_id", referencedColumnName="id", onDelete="cascade", onUpdate="cascade")
     * @var Acme\DemoBundle\Entity\Author
     */
    protected $author;
     
}

THE ISSUE

Now, let's say you want to transfer messages from author 2 to author 1 for some reason. Here is how to do it step by step :

1

2

3

4

5

6

// Get the entities
$Author1 = $authorRepository->find($idAuthor1);
$Author2 = $authorRepository->find($idAuthor2);
 
// Retrieve messages
$messagesOfAuthor2 = $Author2->getMessages();

Here is a visual description of what has happened :

State 0 :

We transfer each of the messages :

// Move message
foreach($messagesOfAuthor2 as $message)
{
    $Author1->addMessage($message);
    // Leads to state 1
    $Author2->getMessages()->removeElement($message);
    // Leads to state 2
}

State 1 :

State 2 :

Note the deletion of 1 link in the removeElement function, and the addition of 2 links in the addMessage function.
We can now remove the Author 2 and flush the EntityManager :

1

2

3

$entityManager->remove($Author2);
 
$entityManager->flush();

Learn it the soft way: OneToMany relationship in Doctrine2 and Symfony2 for dummies

More than 3 requests, I'll translate this to Chinese.
超过3个请求，我就会把这篇文章翻译成中文。