Sunday, November 6, 2011

Distributed Queries (access and query data from multiple sources Ex;Sql server, Oracle,Excle, textfiles)

The heading of the topic looks small, but its a huge topic.I cannot cover whole topic in one blog.
But I will try to make this topic is very simple.We will start from basic, when you want to retrieve data
from any table you simply execute, "select * from database_name.schema_name.table name".When you execute this query it automatically
pull data from server, to where you connected from SSMS(sql server management studio).
Same way, If you want to query data from other data source, ( it may be sql sqlserver, oracle, excel, mysql, text file) from your local sql engine, we call it as distributed queries.
Distributed queries access data from multiple heterogeneous data sources. These data sources can be stored on either the same or different computers.
Microsoft SQL Server supports distributed queries by using OLE DB.


OLE DB providers expose data in tabular objects called rowsets. SQL Server allows rowsets from OLE DB providers to be referenced in
Transact-SQL statements as if they were SQL Server tables.

If you see above statement, its saying you can query on any source( even textfile or excel file) using T-sql satements.
The whole intenstion of this post is also to show, how can we achieve that.

With these ditributed queries, you can even insert,update,delete data in othere sources using T-sql statements.

Sql server supports referencing heterogeneous OLE DB data sources in Transact-SQL statements by using: the linked server name or the ad hoc computer name.

If you are using frequent distributed qieries then,use linked server, otherwise use Ad-hoc queries.

By default, SQL Server does not allow ad hoc distributed queries using OPENQUERY, OPENROWSET and OPENDATASOURCE. When this option is set to 1, SQL Server allows ad hoc access.
When this option is not set or is set to 0, SQL Server does not allow ad hoc access.

Ad hoc distributed queries use the OPENQUERY,OPENROWSET and OPENDATASOURCE functions to connect to remote data sources that use OLE DB. OPENROWSET and OPENDATASOURCE should be used only to reference OLE DB data sources that are accessed infrequently. For any data sources that will be accessed more than several times, define a linked server.
Enable AD Hoc Distributed Queries on Instance:::
SP_CONFIGURE 'SHOW ADVANCED OPTIONS', 1;
RECONFIGURE;
SP_CONFIGURE 'AD HOC DISTRIBUTED QUERIES', 1;
RECONFIGURE;
GO

WAYS TO ACCESS DATA THROUGH DISTRIBUTED QUERIES::

1) USING LINKED SERVER:::

TO USE LINKEDSERVER IN A QUERY, WE NEED TO CONFIGURE LINKED SERVER FIRST.
NOTE::PLEASE CHECK PROVIDERS AND INSTALL REQUIRED PROVIDES(IF NEEDED), BEFORE YOU CREATE LINKED SERVER

EX:(ORACLE VERSION8 OR LATER)

USE MASTER
EXEC MASTER.DBO.SP_ADDLINKEDSERVER @SERVER = N'ORACLE_REMOTE', @SRVPRODUCT=N'ORACLE', @PROVIDER=N'ORAOLEDB.ORACLE', @DATASRC=N'INSTANCE_NAME'
EXEC MASTER.DBO.SP_ADDLINKEDSRVLOGIN @RMTSRVNAME=N'ORACLE_REMOTE',@USESELF=N'FALSE',@LOCALLOGIN=NULL,@RMTUSER=N'XX',@RMTPASSWORD='XXXX'

EX:(ORACLE)

USE MASTER
EXEC MASTER.DBO.SP_ADDLINKEDSERVER @SERVER = N'ORACLE_REMOTE', @SRVPRODUCT=N'ORACLE', @PROVIDER=N'MSDAORA', @DATASRC=N'INSTANCE_NAME'
EXEC MASTER.DBO.SP_ADDLINKEDSRVLOGIN @RMTSRVNAME=N'ORACLE_REMOTE',@USESELF=N'FALSE',@LOCALLOGIN=NULL,@RMTUSER=N'XX',@RMTPASSWORD='XXXX'


EX:(SQLSERVER)

USE MATEREXEC MASTER.DBO.SP_ADDLINKEDSERVER @SERVER=N'SQL_REMOTE', @SRVPRODUCT=N'', @PROVIDER=N'SQLOLEDB', @DATASRC=N'SQL_INSTANCE_NAME';
EXEC MASTER.DBO.SP_ADDLINKEDSRVLOGIN @RMTSRVNAME=N'TEST',@USESELF=N'FALSE',@LOCALLOGIN=NULL,@RMTUSER=N'XXXX',@RMTPASSWORD='XXXXX'


EX:(EXCEL FILES)

EXEC MASTER.DBO.SP_ADDLINKEDSERVER @SERVER=N'EXCEL_TEST', @SRVPRODUCT=N'JET 4.0', @PROVIDER=N'MICROSOFT.JET.OLEDB.4.0', @DATASRC=N'D:\TEST.XLS',@PROVSTR='EXCEL 5.0'
SELECT * FROM EXCEL_TEST...[ONE$]


EX:(TEXT_FILES)

EXEC MASTER.DBO.SP_ADDLINKEDSERVER @SERVER=N'TEXT_TEST', @SRVPRODUCT=N'JET 4.0', @PROVIDER=N'MICROSOFT.JET.OLEDB.4.0', @DATASRC=N'D:\',@PROVSTR='TEXT'
SELECT * FROM TEXT_TEST...[FILENAME#TXT]


NOTE: WHEN THE TEXT DRIVER IS USED, THE FORMAT OF THE TEXT FILE IS DETERMINED BY USING A SCHEMA INFORMATION FILE. THE SCHEMA INFORMATION FILE IS ALWAYS NAMED SCHEMA.INI AND ALWAYS KEPT IN THE SAME DIRECTORY AS THE TEXT DATA SOURCE. THE SCHEMA INFORMATION FILE PROVIDES THE IISAM WITH INFORMATION ABOUT THE GENERAL FORMAT OF THE FILE, THE COLUMN NAME AND DATA TYPE INFORMATION, AND SEVERAL OTHER DATA CHARACTERISTICS. A SCHEMA.INI FILE IS ALWAYS REQUIRED FOR ACCESSING FIXED-LENGTH DATA.(EX: TAB DELIMITED OR ^ OR ! OR )
IF THE FILE IS COMMA(,) SEPARATED FILE, THEN NO NEED TO USE SCHEMA.INI FILE.

HERE ARE THE SAMPLE SCHAM.INI FILES

EX:--(^) DELIMITED
[FILENAME.TXT]
COLNAMEHEADER=TRUE
FORMAT=DELIMITED(^)
MAXSCANROWS=0
CHARACTERSET=ANSI
COL1=ID CHAR WIDTH 50
COL2=NAME LONGCHAR

--TAB DELIMITED
[FILENAME.TXT]
COLNAMEHEADER=TRUE
FORMAT=TABDELIMITED
MAXSCANROWS=0
CHARACTERSET=ANSI
COL1=ID CHAR WIDTH 50
COL2=NAME LONGCHAR

ONCE YOU CREATE LINKED SERVER, YOU CAN ACCESS THESE DATA FROM
MULTIPLE WAYS
.

A)OPENQUERY::( IF YOU WANT TO ACCESS DATA FROM TABLE FROM OTHER SQL SERVER)

SELECT * FROM OPENQUERY(LINKED_SERVERNAME,'SELECT * FROM REMOTE_DBNAME.DBO.TABLE_NAME')


B)FOUR PART NAME

USING OPENQUERY, YOU CANNOT USE PARAMETERS,TO EXECUTE STORED PROCS USE FOUR PART NAME.
WHEN YO SELECT DATA WITH FOUR PART NAME,YOU SHOUL INCLUDE DATABASENAME,SCHEMANAME,TABLENAME

SELECT * FROM SQL_REMOTE.REMOTE_DBNAME.DBO.TABLE_NAME (SELECT * FROM SQL_REMOTE_DB.REMOTE_DBNAME..TABLE_NAME, THIS IS NOT VALID)


EXECUTE STORED PROC USING PARAMETER

EXEC SQL_REMOTE.REMOTE_DBNAME.DBO.STOREDPROC1 @PARAMETER=1


2) AD-HOC METHOD:::

THIS METHOD IS AN ALTERNATIVE TO ACCESSING TABLES IN AL LINKED SERVER AND IS A ONE-TIME, AD HOC METHOD OF CONNECTING AND ACCESSING REMOTE DATA
NY USING OLE DB.


A) OPENROWSET:::

THE OPENROWSET FUNCTION CAN BE REFERENCED IN THE FROM CLAUSE OF A QUERY AS IF IT WERE A TABLE NAME.

EX::
SELECT A.* FROM OPENROWSET('SQLNCLI10','SERVER=SERVER_NAME;TRUSTED_CONNECTION=YES;', 'SELECT * FROM DB_NAME.DBO.TABLE_NAME') AS A
SELECT A.* FROM OPENROWSET('SQLNCLI10','SERVER=USCTAP3823;UID=SEDEV;PWD=SEDEV','SELECT * FROM DB_NAME.DBO.TABLE_NAME') AS A
SELECT * FROM OPENROWSET('MICROSOFT.JET.OLEDB.4.0','EXCEL 5.0;DATABASE=D:\TEST.XLS',[ONE$]) AS A

B) OPENDATASOURCE:::

PROVIDES AD-HOC CONNECTION INFORMATION AS PART OF A FOUR PART OBJECT NAME WITHOUT USING A LINKED SERVER NAME.

EX::

SELECT * FROM OPENDATASOURCE('SQLNCLI','SERVER=SERVER_NAME;UID=XXX;PWD=XXXX').DATABASE_NAME.SCHEMA_NAME.TABLE_NAME
SELECT * FROM OPENDATASOURCE('MICROSOFT.JET.OLEDB.4.0','DATA SOURCE=C:\TEST.XLS;EXTENDED PROPERTIES=EXCEL 5.0')...[ONE$]

No comments:

Post a Comment