Posts Tagged 'ETL'

The Death of Traditional Integration

Recently I hosted a SnapLogic webinar featuring the company’s co-founder and CEO, Gaurav Dhillon, and industry analyst, author and practitioner David Linthicum called: The Death of Traditional Data Integration. The webinar was very well attended and the discussion was quite lively. I’ve posted sections of the transcript on the SnapLogic blog, you can also listen to the podcast on the company’s iTunes channel, and the slides are now on Slideshare. I’ve embedded the YouTube video below. Enjoy!


Build Your Own Integration – Don’t Be Dumb!

Image representing ebizQ as depicted in CrunchBase

Image via CrunchBase

Recently Hollis Tibbetts has been writing extensively about all aspects of data integration (with a heavy does of cloud) on ebizQ. His most recent post pulls no punches:  Building Integration Yourself – Possibly the Dumbest Idea You’ve Had in a Long Time.

The article wraps up with a series of questions to consider before you jump into the “tarpit” of hand-coding your data integration. I suggest you pose these questions to anyone in your IT organization who tells you they’re “just going to write some scripts” or “simply develop Web  Services” when it comes to cloud integration:

1)  In the SaaS world, APIs are updated on average 4-12 times a year. What is the impact of that on your custom code? What if a document format (e.g. and EDI document) changes? Will you even know in advance of these changes, or will the change happen and suddenly your system stops working and you have a crisis on your hands?

2)  Are you prepared to handle latency and unavailability issues, timeouts, etc.?

3)  Have you budgeted for building a sufficiently robust logging system for errors, as well as for when data ends up somewhere it shouldn’t and you need to undo the situation?

4)  Can you guarantee that data won’t get lost when something “bad” happens?

5)  How will you monitor what’s going on in the system?

6)  Coding transformations and business logic in Java, C#, C++ or any other programming language is very time consuming. Transformations and business logic change a LOT. How will you support that? Most integration products support simple or standard scripting languages, drag and drop, reusable objects, etc.

7)  How do you plan to implement mapping – especially between something like a Web Service and a Relational Database (where one can be hierarchical in nature and the other a collection of tables). What happens when one of those things changes? Have you thought about transactionality and serializability? Do you need to support that?  How will you do it?

8)  Many applications require the use of proprietary SDKs for integration. Are you trained in those? Prepared to support changes in the SDKs?

9)  What levels of performance are required? How do you plan to meet those? What happens if that changes – is scalability built into your solution?

10)  If more sources or targets for integration are added, will your system support that, or did you build something that is a throwaway?

11)  Does your home-built system support concurrent development?

Great questions Hollis and a great article. I hope everyone considering hand-coding their data integration reads and shares it.

Cloud Integration: Batch vs. Real Time

In the last few days there have been diametrically opposite messages coming from two vendors who claim to deliver best-in-class data integration solutions for (and other SaaS application vendors). Doug Henschen wrote about both cloud data integration product launches.

  1. The first vendor focused on data migration to the cloud.  They acknowledged that data migration “often turns out to be more complex than expected” and highlighted the importance of performance to data integration (while throwing a few unsubstantiated haymakers at the competition I might add).
  2. The second vendor focused on real-time application integration, making the claim that their product, “offers two-way, system-to-system, event-driven integration that is real-time rather than batch oriented.”

Hmmm. So which is more important? Real time or batch data integration? And is it really an either-or proposition? Are we really now dragging the old ETL vs. EAI debate to the cloud?

To me it gets back to a topic I’ve written about before: What to Look for in a Cloud Data Integration Solution.

If a data integration vendor claims to be all about one or the other it should be a red flag. The bottom line is to understand your requirements, your data volumes (both today and long term), your integration complexity (both today and long term), your resources (on-premise or SaaS?), and the skill set of your users (SaaS administrators and IT roles). And when it comes to a partner, be sure to dig into customer references and read the reviews on the AppExchange. This will reveal quite about about overall customer adoption and success.


Staging Data: The Right Approach for Cloud Integration?

Last week I wrote a post about the differences between direct and staging of data from SaaS/cloud applications. I received a very thorough comment on the Informatica Perspectives blog about the benefits of staging data that I think is worth summarizing here:

The main reasons I opt for the staging are:
  • It enables better business control before the data is pushed from one system to the other. E.g. in SFDC you can have a prospect that you want to become a customer in SAP, you may need to control the data (match existing customer) or enrich it before you push it to SAP. The staging becomes a fire wall so no corrupted data in propagated into your information systems.
  • It enables tracking and reconciliation of a business process. The staging area can also be used as a logging area, each time a data is manipulated, it is logged enabling the audit of any action and the visibility on any reconciliation process.
  • It enables the addition of new sources or targets with reuse instead of building the spaghetti plate of point to point direct interfaces. It responds to the SOA paradigm.
  • It breaks the dependencies between the two systems enabling asynchronous synchronization or synchronous with different size of data set (single message or bulk). And if one or the other system is down or in maintenance for a period of time, it does not affect the synchronization process because the data can be replayed (or compensated) from the staging area.

You can read the full comment here. Any other experiences or best practices to share? Do you agree / disagree? And what if you don’t have the IT resources for this type of architecture? Is direct for SMB and short-term tactical requirements only?

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Join 7,150 other followers

Follow Me on Twitter


Sheffield View

Trying to keep up with Big Data Vendor Landscape

A Passion for Research

Focusing on CRM, cloud computing, ERP and enterprise software

SnapLogic Blog

Accelerate Your Integration. Accelerate Your Business.

Learning by Shipping

products, development, management...

Laurie McCabe's Blog

Perspectives on the SMB Technology Market

SaaStr 2018

Getting From $0 to $100m ARR Faster. With Less Stress. And More Success.

Andreessen Horowitz

Software Is Eating the World

%d bloggers like this: