I work in BI. Almost exclusively on the Microsoft SQL stack.
I’m working for a software development company and am trying to deploy a BI solution onto the back end of their SQL based product. This is driving me slightly insane as we use an agile development approach and BI solutions aren’t really all that agile.
My biggest headache is that we have about 5 or 6 different environments on the go. With software its usual to undertake a process of continuous integration, unit testing and other automated build processes. But with BI this isn’t quite as easy as it sounds.
For one thing you have several different elements that make up a “solution/deployment”. One is the database that you are using as your data mart or data warehouse. Then there are ETL scripts/packages that pull the data from your source system into your data mart/warehouse. We then have Analysis Services cubes and finally a set of defined reports. That’s without thinking about various support scripts, documentation and utilities that are used in and around the system. Don’t get me started on developers and documentation or more accurately the complete lack thereof.
Keeping all of these separate elements in sync isn’t easy, just keeping them all aligned in version control isn’t exactly straightforward, let alone having several different environments running different versions as the data mart/warehouse is updated as changes occur in the source system/reports. I’m going to own up that I’m still in a proving phase with the solution I’m deploying so a certain amount of development is taking place as well.
So deploying various changes to various environments isn’t easy. More so as its other people doing the deployments.
Just thinking about the database that makes up he data mart/warehouse presents a number of challenges in terms of supporting versioning and updating. For one thing you’ve usually spent quite a while loading data into your environment. So changes often have to carry the existing data forwards rather than starting again from scratch. You also tend to never throw any data away. So even reloading from scratch would mean that you would lose some data as some time has passed since the last load and this one and the source system may have “lost” records.
So changes are usually upgrade scripts rather than entire builds from scratch. The upgrade scripts include schema changes, but also often include data manipulations and transformations. Supporting and maintaining these in the available version control environments isn’t particularly easy.
I want to share some of the scripts and tricks that I’ve found useful as I’ve tried to automate the build/deployment process and also some of the methods I’m using to try and create something akin to the continuous integration normaly applied to agile software development. So I have a number of posts planned that will detail some of my adventures.
I can’t believe I’m the first person to suffer from this problem. Other BI bods must have crossed these bridges many times, yet I see very little about it in the blogosphere and I keep up with most of the prominent SQL bloggers in BI and other areas. So I’m hoping that I might find a few people who’ve done this before when I’m at SQLBits who might set me straight on all the mistakes I’ve been making.
Anyway, this post has turned into something of a rant. Anyone who knows me won’t be remotely surprised by this. Hopefully I’ll actually have more meaningful content next time.