Monday, November 7, 2011

Creating Linked Server To MYSQL from Ms SQL Server and importing tables from MYSQL

One day I am seriously working on one project with Ms SQL server, Suddenly my manager jumped into my cabin and he asked me to import all the tables from MYSQL to SQL server.I never worked on MYSQL, and don't want to install any MYSQL client tools and export data into text files and load into SQL server.I thought of making easy and automate this process.
I started looking for available OLE DB provider for MYSQL, but I didn't find that.So I found an alternative way to connection to MYSQL from MS SQL server.

A provider is needed for any data source that is to be exposed through OLE DB. Microsoft releases providers for several common data sources, including SQL Server, Oracle, and Jet. In addition, Microsoft ships a special provider called the “OLE DB Provider for ODBC Drivers”. This was the very first OLE DB provider ever released, and is also known as “MSDASQL”, due to the fact that it is contained in “MSDASQL.DLL”.
MSDASQL allows OLE DB consumer applications to utilize ODBC drivers to connect to a data source. The provider converts incoming OLE DB calls into ODBC calls, and passes them on to the specified ODBC driver. It then retrieves results from the ODBC driver and formats those into OLE DB specific structures which the consumer then has access to.

To achieve this first you need to install "OLEDB provider for ODBC driver" and "MYSQL ODBC connector".
Download paths for above two components::
OLEDB provider for ODBC driver:::http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=20065
MYSQL ODBC connector::http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=20065
Once you install these two components, please check they are installed or not.
For OLEDB provider for ODBC driver, GO SSMS( Sql server Management studio), expand Server objects,
then expand Linked servers, then Providers, you can see "MSDASQL"


For MYSQL ODBC connector::
goto start--->RUN--->type ODBCAD32 it will open below window.
then go to drivers tab, there you can see installed driver.

Once you have provider and ODBC connector, create Linked server between MSSQL to MYSQL.



EXEC master.dbo.sp_addlinkedserver

@server = N'MYSQL',

@srvproduct=N'MySQL',

@provider=N'MSDASQL',

@provstr=N'DRIVER={MySQL ODBC 5.1 Driver}; SERVER= server name;

DATABASE=db_name; USER=XXX; PASSWORD= XXX_XXX; OPTION=3'

Once you have this successfully configured, Use below SP to import data from MYSQL server.

USE DB_Name
GO
/****** Object: StoredProcedure [dbo].[spGenerating_gene_leveldata_from_reporter_leveldata] Script Date: 01/31/2011 15:44:11 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
IF OBJECT_ID('[dbo].[sp_import_data_from_MYSQL]','P') is not null
drop procedure [dbo].[sp_import_data_from_MYSQL]
GO
CREATE PROCEDURE [dbo].[sp_import_data_from_MYSQL]
/*
purpose: This stored proc is used import data from MYSQL to MS SQL server
This stored proc produce some chucks, which we execute in different windows to increase the performance.
*/
AS
BEGIN
BEGIN try
SET NOCOUNT ON
declare @procName varchar(100), @dbName varchar(256), @date varchar(20), @ntab varchar(200), @nln varchar(4),@user varchar(56),@rows int
DECLARE @ErrMsg nvarchar(4000), @ErrSev varchar(3), @ErrSt varchar(4), @ErrProc nvarchar(126),@ErrLn nvarchar(10), @ErrNr nvarchar(10), @ErrLoc nvarchar(300)
select @procName=DB_NAME()+'.'+ss.name+'.'+so.name from sys.objects so, sys.schemas ss
where so.object_id = @@PROCID and so.schema_id=ss.schema_id
select @date = cast(getDate() as varchar(20)),@nln=''+char(10)+' ', @dbName=DB_NAME(),@ntab=@procName+'> ', @user=suser_sname(),@rows=0
DECLARE @tableName varchar(256),@databaseName varchar(256),@table_id int,@SQL Nvarchar(1024),@ParmDefinition nvarchar(500),@SQl_TXT varchar(MAX)
SELECT @databaseName='exome',@table_id=1
raiserror('Entering stored procedure "%s" at "%s"',0,1,@procName,@date) with nowait
IF (Cursor_Status('global', 'c_tables') >= 0 or Cursor_Status('local', 'c_tables') >= 0)
begin close c_tables; deallocate c_tables end
declare c_tables cursor for select Tables_in_exome FROM openquery(MYSQL, 'SHOW TABLES IN exome')
open c_tables
fetch next from c_tables into @tableName
while @@FETCH_STATUS=0
BEGIN
      DECLARE @cnt int=0
raiserror('%d.) Loading data from table%s',0,1,@table_id,@tableName) with nowait
SELECT @SQL='
SELECT @cntOUT=cnt FROM openquery(MYSQL, ''SELECT COUNT(*) as cnt FROM '+@databaseName+'.'+@tableName+''')'
SET @ParmDefinition = N'@cntOUT int OUTPUT'
EXECUTE sp_executesql @SQL, @ParmDefinition, @cntOUT=@cnt OUTPUT;
raiserror('Number of rows in %s are %d ',0,1,@tableName,@cnt) with nowait
if @cnt>10000
BEGIN
raiserror('loading in batches (each batch it will load 10000 records)',0,1)
DECLARE @start int, @end int
SELECT @start=0,@end=10000
WHILE((@cnt+10000)>@start)
BEGIN
if @start=0
BEGIN
SELECT @SQl_TXT='SELECT * INTO '+@dbName+'.'+@databaseName+'.'+@tableName+' FROM openquery(MYSQL, ''SELECT * FROM '+@databaseName+'.'+@tableName+' LIMIT '+convert(varchar(20),@start)+' ,'+convert(varchar(20),@end)+''')
GO'

print @SQl_TXT
exec(@SQl_TXT)

END ELSE BEGIN

SELECT @SQl_TXT='INSERT INTO '+@dbName+'.'+@databaseName+'.'+@tableName+' SELECT * FROM openquery(MYSQL, ''SELECT * FROM '+@databaseName+'.'+@tableName+' LIMIT '+convert(varchar(20),@start)+' ,'+convert(varchar(20),@end)+''')
GO'
print @SQl_TXT
exec(@SQl_TXT)

END

select @start=@start+10000
--select @end=@end+10000

END

END ELSE BEGIN
SELECT @SQl_TXT='SELECT * INTO '+@dbName+'.'+@databaseName+'.'+@tableName+' FROM openquery(MYSQL, ''SELECT * FROM '+@databaseName+'.'+@tableName+''')
GO'
print @SQl_TXT
exec(@SQl_TXT)

END

raiserror('%s%s',0,1,@nln,@nln) with nowait
SET @table_id=@table_id+1    

fetch next from c_tables into @tableName

END
CLOSE c_tables
DEALLOCATE c_tables

END TRY
BEGIN CATCH
select @ErrMsg =ERROR_MESSAGE(), @ErrSev =ERROR_SEVERITY(), @ErrSt =ERROR_STATE(),@ErrProc =ERROR_PROCEDURE(), @ErrLn = ERROR_LINE(), @ErrNr = ERROR_NUMBER()
raiserror('Failed to execute procname %s, %s Error nr %s, state %s, sev %s; line %s, %s with msg: %s',16,-1,@procName, @nln,@ErrNr,@ErrSt,@ErrSev,@ErrLn,@nln,@ErrMsg)
END CATCH
END
GO

Using Distributed queries, You can select,insert,update delete data from other server(oracle, Mysql, sqlserver)

In previous post, we saw how to create linked server and how to select data with these linked server.To day I will try to show some more interesting things we can do this distributed queries.
When I select data with distributed queries, I thought why don't we insert,delete, update data in linked server. SQL server providing Pass-through queries to achieve this functionality.

---TESTING WITH MY ORACLE INSTANCE
---SO I CREATED ONE TESTING TABLE IN MY ORACLE INSTANCE NAME CALLED "TEST".
--FIRST CREATE LINKED SERVER WITH ALL THE CREDENTIALS

USE MASTER
EXEC MASTER.DBO.SP_ADDLINKEDSERVER @SERVER = N'ORACLE_REMOTE', @SRVPRODUCT=N'ORACLE', @PROVIDER=N'ORAOLEDB.ORACLE', @DATASRC= 'INSTANCE NAME'
EXEC MASTER.DBO.SP_ADDLINKEDSRVLOGIN @RMTSRVNAME=N'ORACLE_REMOTE',@USESELF=N'FALSE',@LOCALLOGIN=NULL,@RMTUSER=N'XX',@RMTPASSWORD='XX'


---EXECUTING SELECT QUERY
SELECT * FROM OPENQUERY(ORACLE_REMOTE,'SELECT ID,NAME FROM SHEMA_NAME.TEST')

-----EXECUTING INSERT PASS-THROUGH QUERY
INSERT OPENQUERY(ORACLE_REMOTE,'SELECT ID,NAME FROM SHEMA_NAME.TEST')
VALUES(1,'BHASKER')

-----EXECUTING UPDATE PASS-THROUGH QUERY
UPDATE OPENQUERY(ORACLE_REMOTE,'SELECT NAME FROM SHEMA_NAME.TEST WHERE ID=1')
SET NAME='BHASKER_NEW'

-----EXECUTING DELTE PASS-THROUGH QUERY
DELETE OPENQUERY(ORACLE_REMOTE,'SELECT NAME FROM SHEMA_NAME.TEST WHERE ID=1')

---TESTING WITH MY SQL SERVER INSTANCE
USE MASTER
EXEC MASTER.DBO.SP_ADDLINKEDSERVER @SERVER = N'SQL_REMOTE', @SRVPRODUCT=N'SQL', @PROVIDER=N'SQLOLEDB', @DATASRC=N'SQLINSTANCE'
EXEC MASTER.DBO.SP_ADDLINKEDSRVLOGIN @RMTSRVNAME=N'SQL_REMOTE',@USESELF=N'FALSE',@LOCALLOGIN=NULL,@RMTUSER=N'XX',@RMTPASSWORD='XXX'

---EXECUTING SELECT QUERY
SELECT * FROM OPENQUERY(SQL_REMOTE,'SELECT ID,NAME FROM DB_NAME.DBO.TEST')


-----EXECUTING INSERT PASS-THROUGH QUERY
INSERT OPENQUERY(SQL_REMOTE,'SELECT ID,NAME FROM DB_NAME.DBO.TEST')
VALUES(1,'BHASKER')

-----EXECUTING UPDATE PASS-THROUGH QUERY
UPDATE OPENQUERY(SQL_REMOTE,'SELECT NAME FROM DB_NAME.DBO.TEST WHERE ID=1')
SET NAME='BHASKER_NEW'


-----EXECUTING DELTE PASS-THROUGH QUERY
DELETE OPENQUERY(SQL_REMOTE,'SELECT NAME FROM TGI_DBSNP.DBO.TEST WHERE ID=1')
---SET NAME='BHASKER_NEW'

Sunday, November 6, 2011

Distributed Queries (access and query data from multiple sources Ex;Sql server, Oracle,Excle, textfiles)

The heading of the topic looks small, but its a huge topic.I cannot cover whole topic in one blog.
But I will try to make this topic is very simple.We will start from basic, when you want to retrieve data
from any table you simply execute, "select * from database_name.schema_name.table name".When you execute this query it automatically
pull data from server, to where you connected from SSMS(sql server management studio).
Same way, If you want to query data from other data source, ( it may be sql sqlserver, oracle, excel, mysql, text file) from your local sql engine, we call it as distributed queries.
Distributed queries access data from multiple heterogeneous data sources. These data sources can be stored on either the same or different computers.
Microsoft SQL Server supports distributed queries by using OLE DB.


OLE DB providers expose data in tabular objects called rowsets. SQL Server allows rowsets from OLE DB providers to be referenced in
Transact-SQL statements as if they were SQL Server tables.

If you see above statement, its saying you can query on any source( even textfile or excel file) using T-sql satements.
The whole intenstion of this post is also to show, how can we achieve that.

With these ditributed queries, you can even insert,update,delete data in othere sources using T-sql statements.

Sql server supports referencing heterogeneous OLE DB data sources in Transact-SQL statements by using: the linked server name or the ad hoc computer name.

If you are using frequent distributed qieries then,use linked server, otherwise use Ad-hoc queries.

By default, SQL Server does not allow ad hoc distributed queries using OPENQUERY, OPENROWSET and OPENDATASOURCE. When this option is set to 1, SQL Server allows ad hoc access.
When this option is not set or is set to 0, SQL Server does not allow ad hoc access.

Ad hoc distributed queries use the OPENQUERY,OPENROWSET and OPENDATASOURCE functions to connect to remote data sources that use OLE DB. OPENROWSET and OPENDATASOURCE should be used only to reference OLE DB data sources that are accessed infrequently. For any data sources that will be accessed more than several times, define a linked server.
Enable AD Hoc Distributed Queries on Instance:::
SP_CONFIGURE 'SHOW ADVANCED OPTIONS', 1;
RECONFIGURE;
SP_CONFIGURE 'AD HOC DISTRIBUTED QUERIES', 1;
RECONFIGURE;
GO

WAYS TO ACCESS DATA THROUGH DISTRIBUTED QUERIES::

1) USING LINKED SERVER:::

TO USE LINKEDSERVER IN A QUERY, WE NEED TO CONFIGURE LINKED SERVER FIRST.
NOTE::PLEASE CHECK PROVIDERS AND INSTALL REQUIRED PROVIDES(IF NEEDED), BEFORE YOU CREATE LINKED SERVER

EX:(ORACLE VERSION8 OR LATER)

USE MASTER
EXEC MASTER.DBO.SP_ADDLINKEDSERVER @SERVER = N'ORACLE_REMOTE', @SRVPRODUCT=N'ORACLE', @PROVIDER=N'ORAOLEDB.ORACLE', @DATASRC=N'INSTANCE_NAME'
EXEC MASTER.DBO.SP_ADDLINKEDSRVLOGIN @RMTSRVNAME=N'ORACLE_REMOTE',@USESELF=N'FALSE',@LOCALLOGIN=NULL,@RMTUSER=N'XX',@RMTPASSWORD='XXXX'

EX:(ORACLE)

USE MASTER
EXEC MASTER.DBO.SP_ADDLINKEDSERVER @SERVER = N'ORACLE_REMOTE', @SRVPRODUCT=N'ORACLE', @PROVIDER=N'MSDAORA', @DATASRC=N'INSTANCE_NAME'
EXEC MASTER.DBO.SP_ADDLINKEDSRVLOGIN @RMTSRVNAME=N'ORACLE_REMOTE',@USESELF=N'FALSE',@LOCALLOGIN=NULL,@RMTUSER=N'XX',@RMTPASSWORD='XXXX'


EX:(SQLSERVER)

USE MATEREXEC MASTER.DBO.SP_ADDLINKEDSERVER @SERVER=N'SQL_REMOTE', @SRVPRODUCT=N'', @PROVIDER=N'SQLOLEDB', @DATASRC=N'SQL_INSTANCE_NAME';
EXEC MASTER.DBO.SP_ADDLINKEDSRVLOGIN @RMTSRVNAME=N'TEST',@USESELF=N'FALSE',@LOCALLOGIN=NULL,@RMTUSER=N'XXXX',@RMTPASSWORD='XXXXX'


EX:(EXCEL FILES)

EXEC MASTER.DBO.SP_ADDLINKEDSERVER @SERVER=N'EXCEL_TEST', @SRVPRODUCT=N'JET 4.0', @PROVIDER=N'MICROSOFT.JET.OLEDB.4.0', @DATASRC=N'D:\TEST.XLS',@PROVSTR='EXCEL 5.0'
SELECT * FROM EXCEL_TEST...[ONE$]


EX:(TEXT_FILES)

EXEC MASTER.DBO.SP_ADDLINKEDSERVER @SERVER=N'TEXT_TEST', @SRVPRODUCT=N'JET 4.0', @PROVIDER=N'MICROSOFT.JET.OLEDB.4.0', @DATASRC=N'D:\',@PROVSTR='TEXT'
SELECT * FROM TEXT_TEST...[FILENAME#TXT]


NOTE: WHEN THE TEXT DRIVER IS USED, THE FORMAT OF THE TEXT FILE IS DETERMINED BY USING A SCHEMA INFORMATION FILE. THE SCHEMA INFORMATION FILE IS ALWAYS NAMED SCHEMA.INI AND ALWAYS KEPT IN THE SAME DIRECTORY AS THE TEXT DATA SOURCE. THE SCHEMA INFORMATION FILE PROVIDES THE IISAM WITH INFORMATION ABOUT THE GENERAL FORMAT OF THE FILE, THE COLUMN NAME AND DATA TYPE INFORMATION, AND SEVERAL OTHER DATA CHARACTERISTICS. A SCHEMA.INI FILE IS ALWAYS REQUIRED FOR ACCESSING FIXED-LENGTH DATA.(EX: TAB DELIMITED OR ^ OR ! OR )
IF THE FILE IS COMMA(,) SEPARATED FILE, THEN NO NEED TO USE SCHEMA.INI FILE.

HERE ARE THE SAMPLE SCHAM.INI FILES

EX:--(^) DELIMITED
[FILENAME.TXT]
COLNAMEHEADER=TRUE
FORMAT=DELIMITED(^)
MAXSCANROWS=0
CHARACTERSET=ANSI
COL1=ID CHAR WIDTH 50
COL2=NAME LONGCHAR

--TAB DELIMITED
[FILENAME.TXT]
COLNAMEHEADER=TRUE
FORMAT=TABDELIMITED
MAXSCANROWS=0
CHARACTERSET=ANSI
COL1=ID CHAR WIDTH 50
COL2=NAME LONGCHAR

ONCE YOU CREATE LINKED SERVER, YOU CAN ACCESS THESE DATA FROM
MULTIPLE WAYS
.

A)OPENQUERY::( IF YOU WANT TO ACCESS DATA FROM TABLE FROM OTHER SQL SERVER)

SELECT * FROM OPENQUERY(LINKED_SERVERNAME,'SELECT * FROM REMOTE_DBNAME.DBO.TABLE_NAME')


B)FOUR PART NAME

USING OPENQUERY, YOU CANNOT USE PARAMETERS,TO EXECUTE STORED PROCS USE FOUR PART NAME.
WHEN YO SELECT DATA WITH FOUR PART NAME,YOU SHOUL INCLUDE DATABASENAME,SCHEMANAME,TABLENAME

SELECT * FROM SQL_REMOTE.REMOTE_DBNAME.DBO.TABLE_NAME (SELECT * FROM SQL_REMOTE_DB.REMOTE_DBNAME..TABLE_NAME, THIS IS NOT VALID)


EXECUTE STORED PROC USING PARAMETER

EXEC SQL_REMOTE.REMOTE_DBNAME.DBO.STOREDPROC1 @PARAMETER=1


2) AD-HOC METHOD:::

THIS METHOD IS AN ALTERNATIVE TO ACCESSING TABLES IN AL LINKED SERVER AND IS A ONE-TIME, AD HOC METHOD OF CONNECTING AND ACCESSING REMOTE DATA
NY USING OLE DB.


A) OPENROWSET:::

THE OPENROWSET FUNCTION CAN BE REFERENCED IN THE FROM CLAUSE OF A QUERY AS IF IT WERE A TABLE NAME.

EX::
SELECT A.* FROM OPENROWSET('SQLNCLI10','SERVER=SERVER_NAME;TRUSTED_CONNECTION=YES;', 'SELECT * FROM DB_NAME.DBO.TABLE_NAME') AS A
SELECT A.* FROM OPENROWSET('SQLNCLI10','SERVER=USCTAP3823;UID=SEDEV;PWD=SEDEV','SELECT * FROM DB_NAME.DBO.TABLE_NAME') AS A
SELECT * FROM OPENROWSET('MICROSOFT.JET.OLEDB.4.0','EXCEL 5.0;DATABASE=D:\TEST.XLS',[ONE$]) AS A

B) OPENDATASOURCE:::

PROVIDES AD-HOC CONNECTION INFORMATION AS PART OF A FOUR PART OBJECT NAME WITHOUT USING A LINKED SERVER NAME.

EX::

SELECT * FROM OPENDATASOURCE('SQLNCLI','SERVER=SERVER_NAME;UID=XXX;PWD=XXXX').DATABASE_NAME.SCHEMA_NAME.TABLE_NAME
SELECT * FROM OPENDATASOURCE('MICROSOFT.JET.OLEDB.4.0','DATA SOURCE=C:\TEST.XLS;EXTENDED PROPERTIES=EXCEL 5.0')...[ONE$]