Connecting two data sources together

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Connecting two data sources together

Dan Giannone

Bit of a conceptual question regarding how to best utilize unixODBC. I am working on a Linux machine (2.6.32-504.8.1.el6.x86_64  SMP).  I have two data source drivers installed: Netezza and HIVE. Currently what I am doing is querying Netezza to generate a text file of the results on my local Linux machine. I then load that data into HIVE as a separate command, not using isql.

 

I was wondering if it was possible to skip the intermediary step here, and load data directly from Netezza to HIVE. Is this doable, and if so, what would it entail? I’m aware that technologies already exist to do this, but I am a big fan of unixODBC and wanted to see if I could make this work. Here is what I am working with currently:

 

#Netezza commands - Documentation

>>Create external table test1 (column1 type, column2 type, …) USING (DATAOBJECT ("/path_to_file/file.txt") escapechar '\' nullValue 'NULL' delim '|' REMOTESOURCE 'ODBC' delimiter 180);

>>Insert into table test1 select column1, column2, … from schema.table;

 

#HIVE commands – Documentation

>>load data local inpath '/path_to_file/file.txt' into table hive_test;

 

So in general, I am asking if it is possible to connect two data sources together in some way. If anyone has specific Netezza/Hive knowledge that could provide insight, even better.

 

Thanks,

 

Dan

 


The information transmitted is intended only for the person or entity to which it is addressed
and may contain CONFIDENTIAL material. If you receive this material/information in error,
please contact the sender and delete or destroy the material/information.

_______________________________________________
unixODBC-support mailing list
[hidden email]
http://mailman.unixodbc.org/mailman/listinfo/unixodbc-support
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Connecting two data sources together

Nick Gorham-2
On 05/10/16 13:52, Dan Giannone wrote:

Bit of a conceptual question regarding how to best utilize unixODBC. I am working on a Linux machine (2.6.32-504.8.1.el6.x86_64  SMP).  I have two data source drivers installed: Netezza and HIVE. Currently what I am doing is querying Netezza to generate a text file of the results on my local Linux machine. I then load that data into HIVE as a separate command, not using isql.

 

I was wondering if it was possible to skip the intermediary step here, and load data directly from Netezza to HIVE. Is this doable, and if so, what would it entail? I’m aware that technologies already exist to do this, but I am a big fan of unixODBC and wanted to see if I could make this work. Here is what I am working with currently:

 

#Netezza commands - Documentation

>>Create external table test1 (column1 type, column2 type, …) USING (DATAOBJECT ("/path_to_file/file.txt") escapechar '\' nullValue 'NULL' delim '|' REMOTESOURCE 'ODBC' delimiter 180);

>>Insert into table test1 select column1, column2, … from schema.table;

 

#HIVE commands – Documentation

>>load data local inpath '/path_to_file/file.txt' into table hive_test;

 

So in general, I am asking if it is possible to connect two data sources together in some way. If anyone has specific Netezza/Hive knowledge that could provide insight, even better.

 

Thanks,

 

Dan


Well, it would be simple in perl using dbd::odbc, or in C, I would open two connections, bind a set of columns for the result set using SQLBindCol, and bind the same memory as input parameters to a insert into, then fetch from one, and execute the other. You could then get clever and fetch arrays of results at a time and insert the same array.

--
Nick
_______________________________________________
unixODBC-support mailing list
[hidden email]
http://mailman.unixodbc.org/mailman/listinfo/unixodbc-support
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Connecting two data sources together

Daniel Kasak
I do this kind of thing with lots of different source / target
databases, using unix pipes. For example:

1) mkfifo /path/to/pipe
2) Netezza writes to pipe via external table
3) Hive reads from pipe

You have to make sure the *reading* process doesn't want to do a seek
in the source ( ie the pipe ), because you can't do that to pipes. You
also have to be careful about the order of operations ( ie reader and
writer ). Add error handling. Clean up the pipe later.

Otherwise, it's a relatively straight-forward and extremely high
performance way of getting your data from one DB to another.

If you're interested, I've been working on an ETL framework that does
precisely this:
http://smart-associates.biz/products/smartframeworks/smartfwnetezza.html
and has very good support for Netezza ( it was originally written for
Netezza, but has been expanded to support almost all databases ).
Email me if you're interested and we can discuss a demo or trial
license. Alternatively I'm happy to help you out on this further if
needed.

Dan

On Wed, Oct 5, 2016 at 11:58 PM, Nick Gorham <[hidden email]> wrote:

> On 05/10/16 13:52, Dan Giannone wrote:
>
> Bit of a conceptual question regarding how to best utilize unixODBC. I am
> working on a Linux machine (2.6.32-504.8.1.el6.x86_64  SMP).  I have two
> data source drivers installed: Netezza and HIVE. Currently what I am doing
> is querying Netezza to generate a text file of the results on my local Linux
> machine. I then load that data into HIVE as a separate command, not using
> isql.
>
>
>
> I was wondering if it was possible to skip the intermediary step here, and
> load data directly from Netezza to HIVE. Is this doable, and if so, what
> would it entail? I’m aware that technologies already exist to do this, but I
> am a big fan of unixODBC and wanted to see if I could make this work. Here
> is what I am working with currently:
>
>
>
> #Netezza commands - Documentation
>
>>>Create external table test1 (column1 type, column2 type, …) USING
>>> (DATAOBJECT ("/path_to_file/file.txt") escapechar '\' nullValue 'NULL' delim
>>> '|' REMOTESOURCE 'ODBC' delimiter 180);
>
>>>Insert into table test1 select column1, column2, … from schema.table;
>
>
>
> #HIVE commands – Documentation
>
>>>load data local inpath '/path_to_file/file.txt' into table hive_test;
>
>
>
> So in general, I am asking if it is possible to connect two data sources
> together in some way. If anyone has specific Netezza/Hive knowledge that
> could provide insight, even better.
>
>
>
> Thanks,
>
>
>
> Dan
>
>
> Well, it would be simple in perl using dbd::odbc, or in C, I would open two
> connections, bind a set of columns for the result set using SQLBindCol, and
> bind the same memory as input parameters to a insert into, then fetch from
> one, and execute the other. You could then get clever and fetch arrays of
> results at a time and insert the same array.
>
> --
> Nick
>
> _______________________________________________
> unixODBC-support mailing list
> [hidden email]
> http://mailman.unixodbc.org/mailman/listinfo/unixodbc-support
>
_______________________________________________
unixODBC-support mailing list
[hidden email]
http://mailman.unixodbc.org/mailman/listinfo/unixodbc-support
Loading...