Advanced Analytics – Technology and Tools
1Module 5: Advanced Analytics – Technology and Tools
1Module 5: Advanced Analytics – Technology and Tools
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
Advanced Analytics – Technology and Tools
During this lesson the following topics are covered:
• SQL Essentials • SET Operations
• Online analytical processing (OLAP) features
• GROUPING SETS, ROLLUP,CUBE
• GROUPING, GROUP_ID functions
• Text processing, Pattern matching
In-database Analytics SQL essentials
2Module 5: Advanced Analytics – Technology and Tools
These topics are covered in this lesson.
2Module 5: Advanced Analytics – Technology and Tools
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
Set Operations
Greenplum supports the following set operations as part of a SELECT statement:
• INTERSECT – Returns rows that appear in all answer sets
• EXCEPT – Returns rows from the first answer set and excludes those from the second
• UNION ALL – Returns a combination of rows from multiple SELECT statements with repeating rows
• UNION – Returns a combination of rows from multiple SELECT statements with no repeating rows
3Module 5: Advanced Analytics – Technology and Tools
Set Operations
Set operators:
• Manipulate the results sets of two or more queries by combining the results of individual queries into a single results set.
• Do not perform row level filtering.
Set operations supported by Greenplum are:
• INTERSECT which returns rows that appear in all answer sets generated by individual SELECT statements.
• EXCEPT returns all rows from the first SELECT except for those which also selected by the second SELECT. This operation is the same as the MINUS operation.
• UNION ALL combines all the results of two or more SELECT statements. There may be repeating rows.
• UNION combines the results of two or more SELECT statements. There will be no repeating rows.
3Module 5: Advanced Analytics – Technology and Tools
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
Set Operations – INTERSECT
INTERSECT:
• Returns only the rows that appear in both SQL queries
• Removes duplicate rows
Table A
Table B
Intersect
SELECT t.transid,
c.custname
FROM facts.transaction t
JOIN dimensions.customer c
ON c.customerid = t.customerid
INTERSECT
SELECT t1.transid,
c1.custname
FROM facts.transaction t1
JOIN dimensions.customer c1
ON c1.customerid = t1.customerid
WHERE t1.transdate BETWEEN
‘2008-01-01’ AND ‘2008-01-21’
4Module 5: Advanced Analytics – Technology and Tools
Set Operations – INTERSECT
A set operation takes the results of two queries and returns only the results that appear in both result sets. Duplicate rows are removed from the final set returned.
4Module 5: Advanced Analytics – Technology and Tools
Copyright © 2014 EMC Corporation. All rights reserved.
Copyright © 2014 EMC Corporation. All Rights Reserved.
Set Operations – EXCEPT
EXCEPT:
• Returns all rows from the first SELECT statement
• Omits all rows that appear in the second SELECT statement
SELECT t.transid,
c.custname
FROM facts.transaction t
JOIN dimensions.customer c
ON c.customerid = t.customerid
EXCEPT
SELECT t1.transid,
c1.custname
FROM facts.transaction t1
JOIN dimensions.customer c1
ON c1.customerid = t1.customerid
WHERE t1.transdate BETWEEN
‘2008-01-01’ AND ‘2008-01-21’
Table A
Results of first query
Table B
Results of second query
Table A minus Table B
5Module 5: Advanced Analytics – Technology and Tools
Set Operations – EXCEPT
The EXCEPT set operation takes the distinct rows of the first query and returns all of the rows that do not appear in the result set of the second query.