Microsoft Business Intelligence

Saturday, May 7, 2011

Aggregation Of Strings In T-SQL

Suppose, you have a scenario where you need to group by some column and in the mean time aggregate (concatenation of string in our case) "strings"... how can we do that?

Well, there are numerous methods, Anith, has given a detailed method about them which could be found in his blog at http://www.simple-talk.com/sql/t-sql-programming/concatenating-row-values-in-transact-sql/.

However, the one, which I liked was by using FOR XML PATH.

Let's see the demonstration -

1. Create a table where we have two columns A (for grouping) and B (for concatenation of strings) and then insert some dummy values -

DECLARE @table Table(A INT, B VARCHAR)
INSERT INTO @table VALUES(1,'A') ,(1,'B'),(1,'C'),(2,'X'),(2,'Y'),(2,'Z')

2. Now type following query, which will achieve the desired result -

SELECT T1.A,( SELECT B + ',' FROM @table T2 WHERE T2.A = T1.A FOR XML PATH('')) AS B
FROM @table T1
GROUP BY A ;

and the desired output is -

A         B
1        A,B,C
2        X,Y,Z

FOR XML PATH is used to generate XML code from the SQL.

This solution is tricky but elegant!

Tuesday, March 22, 2011

Dynamic Redirection of Rows to Designated "Folders\Files"

Today, we will see how we can redirect rows from a database table to designated folder and then a file dynamically. Let me explain the scenario first -

We have a table say Test, having following instance:

Now, if you notice we have S1, S2, S3, S4 and S5 as 5 distinct ‘SubCategory’. We have one root folder, say, D:\RootFolder, inside this root folder 5 folders (S1..S5) would be created dynamically and in each folder a text file would be created for that day where rows would be copied based on the subcategory. So in folder S1 a file S1_20110321.txt would be created and all the records pertaining to ‘S1’ would be copied here. Similary all the records pertaining to Si subcateogry would be redirected to Si folder and copied to Si_today'sdate.txt file.

Let's see how we can implement it. First of all we would require three variables, objSubCat, strQuery and strSubCat. objSubCat is a Object variable to store the record set fetched from the DB, strQuery is used to store a string for creating DB query and strSubCat is used for storing the subcategory.

We also need to make sure that value of strQuery should change dynamically in order to fetch the correct recordset, hence, we'd be using expression to achieve dynamic behavior. Following screen shot shows the expression settings -

Below screen shot shows these variables and their default value -

Now, we would create an ExecuteSQL task to store all the distinct SubCategory in objSubCat variable. Then based on objSubCat variable ForEachLoop would process the data accordingly. Following screen shot is how our package would look like -

Now, I'll show you how to configure each task in the control flow to achieve the given scenario.

1. Exectue SQL Task: This task is used to store all the distinct SubCategory to an object variable objSubCat. Following two screen shot shows how I have configured General and Result Set of Execute SQL task.

(a) General:

(b) Result Set:

Once this task is executed in run time it will store distinct sub categories in objSubCat variable which would be used for looping the data set.

2. FOREACH Loop Task: Foreach loop will iterate based on the objSubCat variable. Since we can't use object variables in the expressions we need to assign value of objSubCat to some other 'non-object' variable and here we would use strSubCat. Following are the configuration used for this task.

(a) Collection:

(b) Variable Mappings:

3. File System Task: This task will create folder at run time dynamically. Following are the configuration which are used to achieve it.

Following screen shot shows setting about Source Connection property.

Following screen shot shows setting about Source Connection property -

"D:\\RootFolder\\" + @[User::strSubCat]

4. Data FlowTask: This task is the main task which will used all the configuration we have done and would redirect records accordingly. Following components are used -

OLEDB Source is used for fetching the records from the DB Test table dynamically based on the iteration state. The query gets generated dynamically for the query variable. Please refer below screen shot for the details -

Once this is done we just need to configure Flat File destination. The setting is simple, we just need to configure Flat File Connection manager's Connection String dynamically. Following dynamic string is used to achieve this -

"D:\\RootFolder\\" + @[User::strSubCat] +"\\"+@[User::strSubCat] +"_"+ (DT_STR, 14, 1252) datepart ("yyyy",getdate()) + (DT_STR, 14, 1252) datepart ("mm",getdate()) + (DT_STR, 14, 1252) datepart ("dd",getdate()) + ".txt"

It's done. Now, just run the package and see if folders like 'S1', 'S2' etc are created inside D:\RootFolder and there are text files with correct records.

Hope you find it interesting and please let me know your feedback, any better approach would be highly appreciating.

Sunday, March 20, 2011

"ContributionToParent" Calculation

Today, we will see a simple but important calculation which is often used in data analysis and that is “Percentage Contribution to its Parent” calculation. Let’s take an example where we have a state Maharashtra having 3 cities Mumbai, Pune and Nagpur where we need to analyze the sales data. Total sale of the state is say 1000$s for some random year, whereas total sale in Mumbai, Pune and Nagpur is 600$, 400$ and 200$ respectively so contribution of Mumbai for Maharashtra would be (600*100/1000) i.e. 60% and similarly for Pune and Nagpur would be 40% and 20% respectively.

Now we will see how we can simulate this scenario in SSAS 2008.

Let’s create a simple cube having following schema. (Assuming reader can create it very quickly without much guidance or can create their own test cube, concept would be the same!)

Now, we will create a calculated member, say ContributionToParent, please see below screen shot for configuring it –

Following MDX expression is used to evaluate the calculated member –

([Measures].[Sales]) / ([Measures].[Sales],[DimGeo].[DimGeo].CurrentMember.Parent)

You can try this on your own test cube and then can test data manually and then from cube browser, result would be the same!

Friday, March 18, 2011

Contextual or Correlated Charting

Sometime we need to create charts which are contextual in nature i.e. their data is based on data selected from some other report item like some column field etc. E.g. in the following screen shot if you click on any Category field say "C3" the data for the pie chart would change accordingly.

Designing such report is quiet simple. You just need to create two different datasets one for the table and the other for the chart. Dataset for the chart would be parameterized and the parameter value would be passed from the field you selected in the table (Category like C1, C2 etc. in our case)

Let's come into the implementation part. Follow below steps -

Step 1: Create a data set for the table report item using below query:

SELECT 'C1' AS Category, 4 AS Pass, 6 AS Fail
UNION ALL
SELECT 'C2' AS Category, 5 AS Pass, 5 AS Fail
UNION ALL
SELECT 'C3' AS Category, 6 AS Pass, 4 AS Fail
UNION ALL
SELECT 'C4' AS Category, 7 AS Pass, 3 AS Fail

Step 2: Create another, parameterized dataset using below query:

SELECT Category, Pass, Fail
FROM (SELECT 'C1' AS Category, 4 AS Pass, 6 AS Fail
UNION ALL
SELECT 'C2' AS Category, 5 AS Pass, 5 AS Fail
UNION ALL
SELECT 'C3' AS Category, 6 AS Pass, 4 AS Fail
UNION ALL
SELECT 'C4' AS Category, 7 AS Pass, 3 AS Fail) AS X
WHERE (Category = @CategoryParam)

Step 3: Configure Table using dataset created in step 1 and configure Chart using dataset created in Step 2.

Step 4: Now, the important part, right click the Category text box and configure the Text Box properties as per the below screen shot -

You need to configure Action and set parameter value so that data for pie chart is selected accordingly.

You are done, you can select any of the category and data of pie chart would change accordingly. Isn't this so simple? Please let me know if you have a better way to do this.

Sunday, February 13, 2011

How to count total rows in a table without count(*)?

SELECT rows FROM sys.partitions WHERE object_id = object_id('test');

Thursday, December 30, 2010

AS Top 10 Best Practices

Here you go with a nice article on top 10 Analysis Services best practices -

http://technet.microsoft.com/en-us/library/cc966527.aspx

Wednesday, December 22, 2010

BISM a new paradigm in BI space

I was not surprised rather depressed until yesterday when I read something about BISM i.e. Business Intelligence Semantic Model. It was not BISM which astonished me but BISM superseding MDX/ UDM.

Denali, the new product by Microsoft is successor to SQL Server 2008 R2. There are/should be huge changes in this much awaited release but BISM is something which is drawing attention from most of us. It looks a different paradigm with widely used jargons like DAX, VertiPaq, Powerpivot etc.

It'd be too early to comment whether MDX/UDM would be dead soon or not but what I feel is that they will be amongst us for at least half a decade if not the whole! What important is at this point of time, as an MSBI developer, what should be our approach? I guess peeping out to new concepts like DAX, Power Pivot, VetriPaq, BISM etc. wouldn't be harmful. Adapting new techonology quickly and in right now is the right approach and we should do it now!

A detail discussion on this can be read from the below blog –
http://cwebbbi.wordpress.com/2010/11/11/pass-summit-day-2/

Sunday, December 19, 2010

Reusing SSRS Template

In an environment where we need to create lot many reports and all the reports have some common charactersitics, it's good idea to have one base report as template and reuse it for authoring other reports for faster development and most important consistency. Here are the easy steps -

1. Create any base template (RDL file) based on your business need.

2. Copy the RDL file (base template) and paste it in the below location:
C:\Program Files\Microsoft Visual Studio 9.0\Common7\IDE\PrivateAssemblies\ProjectItems\ReportProject

3. Right click Reports Folder and click Add->NewItem. You can see the newly added report template there, refer below screen shot:

So simple and yet useful...:)

Tuesday, November 2, 2010

Passing parameters to SSIS package from command prompt

TestParamPackage.dtsx is a simple package having two string variables defined in it. An script task is added in the control flow just to use these variables.

The two variables are FirstName & LastName. Following code is added to the script task -

MessageBox.Show(Dts.Variables["FirstName"].Value.ToString() + Dts.Variables["LastName"].Value.ToString());

Let's assume the package is at the location - "c:\TestParamPackage.dtsx"

Now run the below command in the command prompt and execute it -

C:\>dtexec /f "c:\TestParamPackage.dtsx" /set \Package.Variables[User::FirstName].Value;"Santosh"

A message box will pop up showing text as "Santosh"

Now run the below command -

C:\>dtexec /f "c:\TestParamPackage.dtsx" /set \Package.Variables[User::FirstName].Value;"Santosh" /set \package.variables[User::LastName].Value;"Joshi"

If you notice, I have passed two parmeters form the command prompt. In Message box now SantoshJoshi will appear.

N.B. After the "Value;" don't put any space else it will through error.

Sunday, October 31, 2010

SSAS Pundits

You can see list of SSAS pundits in the below link -
http://dwbi1.wordpress.com/whos-who-in-analysis-services/

Saturday, September 11, 2010

Join Algorithim - Nested Loop Join

Logical Joins (Cross, Inner & Outer joins) are internally implemented using one of the following physical joins -

1. Nested Loop Join
2. Merge Join
3. Hash Join

Optimizer decides at run time which algorithim to use for the logical joins depending upon the type of join and the nature of the inputs like the number of rows in the input, sorting of records based on the join column etc.

Here I'll discuss Nested Loop Join -

Suppose there are two tables or rather inputs T1 & T2, with size N1 and N2 and N1>N2, which we need to join.

Optimizer will take one record from T1 and will scan all the records from T2 one by one. If a record matches, based on the joining column it will be returned so for every input there would be N2 scans and then total scans would be N1*N2.

As you can see for large values of N1 and N2, the complexity of this algorithim increases drastically (N1*N2), this join is not recommended for large values of N1 and N2 rather used for smaller input size.

Unlike other join algorithims, which requires at least one predicate as equi join, Nested loop join could be used for both kind of predicates - equi or non equi joins. Nested loop join cannot be used with Right outer joins.

You can force optimizer to use any kind of join algorithims, provided logical join supports it. E.g. if you run below query,

SELECT * FROM Sales.SalesOrderDetail AS SOD INNER JOIN
Production.Product AS P ON SOD.ProductID=P.ProductID

Optimizer might use Hash join in the query execution plan, but if you want to inforce Nested Loop Join on the optimizer, you need to use below syntax -

SELECT * FROM Sales.SalesOrderDetail AS SOD INNER Loop JOIN
Production.Product AS P ON SOD.ProductID=P.ProductID

Also, following hint in the query cannot be used because Nested loop join cannot be used with Right Outer Joins -

SELECT * FROM Sales.SalesOrderDetail AS SOD RIGHT OUTER LOOP JOIN
Production.Product AS P ON SOD.ProductID=P.ProductID

and you will get error message as "Msg 8622, Level 16, State 1, Line 1Query processor could not produce a query plan because of the hints defined in this query. Resubmit the query without specifying any hints and without using SET FORCEPLAN."

2. Merge Join: Details could be found here.

3. Hash Join: Details could be found here.

Wednesday, September 8, 2010

Understanding Impersonation in SSAS 2008

It took some time for me to understand impersonation in SSAS 2008, hope following links, would be helpful to understand it in better way:

http://social.msdn.microsoft.com/Forums/en-US/sqlanalysisservices/thread/80799351-a1d1-4c08-8b31-bb95d1053fd3/#0e002801-236c-481b-8e1a-5d40015e2020

http://social.msdn.microsoft.com/Forums/en-US/sqlanalysisservices/thread/62e56047-442e-4d6e-9aa3-a3ae6a52e206

Any further questions would be greatly appreciated.

SSAS Interview Questions.

Following could be some basic SSAS & MDX questions which can be asked in any interview process -

1. What is UDM? Explain.
2. Explain Star & Snow Flake schemas.
3. In which situations we should use SnowFlake schemas?
4. What are pros & cons of star and snowflakes?
5. Explain many to many relationship.
6. Explain factless fact table.
7. Explain Attribute relationship, their type and usage.
8. What is Degenerated dimension?
9. What is Audit dimension?
10. What is Junk dimension?
11. What is SCD dimensions?
12. Explain Dimension Usage tab?
13. Explain Named Sets.
14. Explain Calculated Members.
15. Explain Bit Map indexes.
16. Explain types of processing modes.
17. Explain cube processing task in SSIS.
18. How you will create dynamic partitions?
19. Explain Transalations, KPI.
20. How you will design your aggregations?
21. What is proactive caching?
22. What is usage based optimization?
23. Explain MOLAP, ROLAP and HOLAP. Explain scenarios where each feature is helpful.
24. Explain cold & warm caching.
25. Explain how you will tune a given MDX query.

Tuesday, September 7, 2010

Microsoft MSBI Inteview Questions!

I was lucky to get call from one of my dream companies, Microsoft. Following are few questions, which were asked in the first round of telephonic interview -

1. What are the default databases created when SQL Server is installed?
2. What are deterministic & non deterministic functions in T-SQL? Give some examples.
3. There is a table with exactly 100 records. How you will categorize them into 4 parts each with 25 records?
4. How many isolation levels are there in SQL Server transactions?
5. What is Fill Factor? Explain.
6. Questions were asked around taking back up and back up related files.
7. What is Scope in MDX? Explain with example. Question was asked based on some scenario.
8. What is a subcube?
9. What are the basic steps for tuning a MDX query?
10. What are the different ways to migrate SSAS 2008 cubes?
11. In which scenario you should use SnowFlake schema?
12. Explain many to many relationship?
13. What is an audit dimension?
14. What is a degenerated dimension?
14. Give some example of fully blocked transormation?
15. What is Checkpoint?
16. You have a package, it loads 3 million rows. You have used checkpoints. Now Package failed after 2.5 million rows are inserted. You fix the problem and restart the package. What will happen?
17. Explain Attribute relationship.
18. What are different types of attribute relationship?
19. Will incremental processing will work with flexible relationship?
20. What are different methods of doing iteration in T-SQL?
21. How to eliminate Cursors in T-SQL? Explain with examples.
22. How will you deploy SSIS package from command line?
23. What is indirect configuration in SSIS package?
24. What are semi additive measures?
25. What is range operator in MDX?
26. How you can see data of current date in cube using MDX?
27. Explain Transalation. How you will configure so that user belonging to particular region can see translated captions?
28. What steps you will follow to migrate script task from 32 bit to 64 bit environment?
29. Explain lock escalation?
30. Explain proactive caching.
31. What are materialized views?

Based on the above list you can see that question which were asked are quiet basic & fundamental. If you get call from them, try to buy out some time and referesh you basic beforehand. Interviewer was cool and was listening me properly, they were just checking your basics.

Good luck and share your experience, if you have any, with Microsoft.

Saturday, April 3, 2010

How to create a new table schema from the existing table schema in SQL Server?

Some time we need to replicate existing schema of a table, to test something or whilst doing any demo.. Following method could be used:
Method 1: This is really easy and simple.
Select * into NewTable From ExistingTable Where 1>2
1>2 condition will never be true so no data will be transferred. If you want to replicate table with data as well, don't provide any condition.
You can also select some columns from the existing table and create a new table based on it. E.g.
Select Column1, Column2 into NewTable From ExistingTable Where 1>2, however, it's quiet obvious that these columns should be present in the existing table.

Sunday, December 27, 2009

My First Blog..

I was never ever been a writer..never ever been an avid reader either..far from such things since my college days..

From last few years..I read blogs of many people and found useful information may be technical, non technical or anything..and then realized if all were like me, I'd have been deprived of such knowledge, experience and wisdom..and then finally decided to create my own blog, where I can put my thoughts and something interesting I encounter with..

So here is my blog, where I'll put my thoughts and experiences mainly dedicated to MSBI. Please feel free to comment and give your valuble inputs.