
Here’s a nice, easy, quick (30k r/s on my MacBook) method to load a million rows using PDI and LucidDB:
#Pentaho data integration url Patch
Some users would just assume avoid the patch instructions above and have posed the question: In a general sense, if not the streaming loader how would I load data into LucidDB?Īgain, LucidDB likes to “pull” data from remote sources.

Most all of these contortions will have sorted themselves out and by the time 4.2 GA PDI and 0.9.4 GA of LucidDB are released the streaming loader should be working A-OK. There’s mutliple threads involved, when exceptions happen users have received cruddy error messages such as “Broken Pipe” that are unhelpful at best, frustrating at worse. Our streaming loader “fakes” a pull data source, and allows PDI to “push” into it. LucidDB wants to PULL data from remote sources, with it’s integrated ELT and DML based approach (with connectors to databases, salesforce, etc). In some ways, we’ve built an unnatural approach to loading for PDI: PDI wants to PUSH data into a database. Early and often comes with some risk, and many have felt the pain of some of the issues that have been discovered with the streaming loader.
#Pentaho data integration url software
In some ways, we have to admit, that we released this piece of software too soon. In fact, until PDI 4.2 GA and LucidDB 0.9.4 GA it’s pretty problematic unless you run through the process of patching LucidDB outlined on this page: Known Issues.

By far, the most popular way for PDI users to load data into LucidDB is to use the PDI Streaming Loader.
