Discover the insights hiding in your data
February 22, 2013Posted by on
Tim and Simon will be at the 2013 APAC Business Intelligence & Information Management Summit in Sydney on 25th and 26th February. Drop by the IBM demo booth to have a chat and see demos on IBM Big Data solution and Smarter Analytics. The Big Data demo uses a live Hadoop cluster located in the Silicon Valley Development Lab in California. It centers around a fictional credit card company and shows how Big Data can be used to approve credit card transaction in near real time, reduce fraud and target promotions by analysing customer spending patterns. Furthermore, the analytics and fraud detection are all performed by BigSheets – a spreadsheet like interface to BigData that allows business users and data scientists to do data discovery and analysis. In addition to this we’ll look at how Text Analytics plays a crucial role in our Big Data platform, and is opening up new areas for analytics to drill into and discover new insights. Whether that is the usual social media based analytics, which were very strategically used by the Obama administration during his re-election campaign, or for predicting the winners at the Oscars. These applications of the technology are only a glimpse as to the possibilities, and provide an initial platform from which to spring board and to tackle some of the tougher and potentially more challenging analytical questions.
February 19, 2013Posted by on
I think David Williams, CEO of Merkle, put it very nicely in his recent Forbes article, he wrote; ‘If ‘Big Data’ simply meant lots of data, we would call it ‘Lots of Data’. We believe he was quite correct. “Big Data” represents a lot more than just “lots of data”, it is the combined challenges of the 4 V’s, as defined by IBM as Volume, Velocity, Variety and Veracity. The Big Data challenges of an organisation could be represented by any unique combination of these V’s, and if using traditional technologies, these challenges can become near impossible to deal with in an effective and responsive time frame.
Even dealing with a single V, such as Volume, is no simple task. So imagine dealing with all 4 challenges, at the same time and in such a way that an organisation is able to expose and explore new opportunities, learn new routes to market, better understand their business and their customers or simply find more efficient and cost effective methods of operation. Being able to do this, and do it in just a few hours, is the game changer that is the Big Data opportunity.
Once an organisation has decided they want to take advantage of the Big Data opportunity, their first real decision is what platform. Whether to go with a vendor specific implementation of Hadoop, or the open source. One would imagine that the open source route has the lowest start-up cost and would therefore be the preferred choice – but does it?
The main vendors in the Big Data arena all provide free versions of their products to download and try out. Of course, the vendors don’t give away their crown jewels and the free versions usually have limited functionality and are only licensed for a limited cluster size. But what the vendors do guarantee is that that all the different components of Big Data (which include Hadoop, pig, hive, hbase, zookeeper, oozie – to name but a few) will all install and work together out of the box.
One should not underestimate the effort and time (and therefore cost) it takes to install and integrate all these components from open source. So if you are thinking about starting a Big Data project, why not download one of the free vendor specific versions and get the benefit from the many 1000s of hours of testing that has already gone into these versions.
Once you have a better feel for your project and Big Data, you can make a more informed decision whether to stick with the vendor, switch to a different vendor, or go open source.
Download IBM’s free trial version of Big Data – named InfoSphere BigInsights Basic Edition – free of charge.
July 16, 2012Posted by on
We’ve all heard the horror stories of sensitive customer data being opened to public view – and the massive PR fallout for the organisations responsible. It’s the IT industry equivalent of celebrity gossip. And in water cooler conversations everywhere, blame is often levelled directly at cloud. But is this really warranted?
I’d suggest that these spectacular public breaches are actually the result of deeper security issues, and that blaming cloud is at best unhelpful and at worst a dangerous oversimplification that leaves businesses open to future breaches.
Cloud may seem like an entirely new security challenge, but for organisations with a robust underlying security profile, it’s really just the next progression. Cloud opens new doors and brings with it new risks. But new risks come along every day. What matters is ensuring you have the plans and policies in place to cope with them.
In my experience, as soon as you start talking to clients about cloud, one thing they say is, “There’s no way in the world I’m going to put any data, and especially any customer data, out in a public cloud.”
But when you start asking questions such as, what business partners do you have your e-commerce with? Where’s your email sitting? Do you have any external payment systems? All of a sudden, it becomes clear that they already have services out there in a public cloud, and security isn’t something they’ve given enough thought to.
By highlighting aspects of your security that you perhaps haven’t considered, cloud deployment offers an opportunity to improve your security profile. There are very effective ways to address cloud security; existing products have evolved, and new processes and policies have been created to cater to cloud requirements. The best way to take advantage of the cloud opportunity and actually boost security is to start with your private cloud architecture and security profile. This will then drive the security of how you consume public cloud services into your private cloud infrastructure.
It’s time we stopped fear mongering and started taking a holistic approach to security. This will not only help protect your organisation from serious breaches, you open it up to the countless benefits of cloud technology.
June 29, 2012Posted by on
Analytics in the cloud makes great sense. When used together, this business application and deployment method can deliver big gains to your business. Here are four reasons why.
1. You get the right resources for the job.
Analytics is all about gaining intelligence, whether that’s to improve business revenue, research, processes or whatever. Cloud is able to dynamically and intelligently bring resources together and deliver those resources when and where they are needed. This makes it a perfect fit for the intensive workload requirements of analytics.
2. It frees you up to focus on your business.
Currently, analytics is a very technical undertaking which often requires specialist consultation. By delivering standardised tools and processes, cloud-based analytics takes away that layer of complexity. So instead of spending time dealing with the technology side of things, people within a business can focus on the analytics themselves and finding ways to use that information to benefit the business.
3. Costs come down, quality goes up.
Public cloud models make it possible to buy analytics as a service, which simplifies purchasing. Having standardised analytics packages available in a competitive market environment also helps drive innovation and keep costs low. On the flipside, private cloud analytics will provide a higher quality of data as they can be tailored to company processes and cultures. Hybrid Clouds bring the benefits of Private and Public Cloud together.
4. Organisational silos are broken down.
For organisations that carry out their own analytics, cloud can bring huge benefits. What can happen within these organisations is that analytics tools work for one department only, with little visibility between groups. Cloud-based analytics creates virtual pools of resources, helping to maximise the analytics investment by sharing capacity, software, hardware and skills. This connects data points across the organisation that were previously unrelated, bring deeper insight and more intelligence.
June 13, 2012Posted by on
Managing your business data may feel like an uphill battle when it’s coming at you from all angles. It’s not just the rapid increase of data that’s the issue, but the various forms of unstructured data that keep creeping their way onto the business analytics radar.
If you’re working with systems that were around when Mark Zuckerberg was still at school, you’re probably nodding your head in agreement. Many organisations report that new insights can take a long time to identify, data analysis costs are high and infrastructures are not very flexible or agile.
But at the risk of sounding like an over-enthusiast personal trainer, the data explosion is not an obstacle; it’s a huge opportunity.
How huge? A recent report conducted by Nucleus research found that on average, every $1 invested in business analytics yields a return of $10.66. That’s right, over ten times the initial amount invested.
It makes sense to leverage as much of your data as possible to gain the greatest insight. Therefore, it’s worth putting a bit of thought into your data analytics strategy, specifically in the area of data warehousing.
If you’re competing with multiple data sources, why compete with multiple vendors? As simple as it sounds, sticking with the one vendor for your end-to-end analytics strategy is a clever move.
Keeping it in the family (so to speak) means you can minimise integration points and refine the analytics process thereby mitigating risk of malfunctions or system miscommunications that can lead to inaccurate and delayed data analysis.
- Less integration points = less risk and lower costs
- Data can be analysed in a shorter timeframe
- External data can easily be incorporated into the system
- Sophisticated analytics tools are optimised into the analytical infrastructure
In the one repository
Why not take it one step further and consolidate your data warehousing hardware and analytics software in the one unit?
Why? This solution can accelerate your analytics models, in some cases it has reduced reporting time from nine months down to six weeks. Catalina marketing saw the advantages of this first-hand when it went from building dozens of models per year to 100s of models per year, resulting in a substantial uplift in their response rates.
Netezza is the only analytical appliance specialist that currently offers a ‘data warehouse in a box’ solution, which is surprising considering the brilliance behind this idea. However, we’re sure to see rapid development in this area from competitors.
- Time taken to run complex analytics is reduced
- Use of more immediate data in database
- Faster model build
- More accurate models and refined predictions
- Rapid ROI
- Cost-effective and easy to managea
It’s hard to refute the benefits, so it makes a great deal of sense to invest in a robust business analytics and data warehousing strategy. By creating a simplified end-to-end data analytics process, you can take advantage of faster and more accurate models, reduced risk of errors or inaccuracies and an increased ROI.
So basically, the less complicated your data analytics infrastructure is, the greater your business insights.
May 29, 2012Posted by on
We’ll start from the very beginning. It’s a very good place to start…
Big data is all about Velocity, Variety and Volume, and the greatest of these is Variety. At least it causes the greatest misunderstanding.
Variety, in this context, alludes to the wide variety of data sources and formats that may contain insights to help organizations to make better decisions. Everything from our existing database records of customer purchases, to their tweets about us, to web-logs indicating their trail though our website, to audio files of their conversations with our reps in the call centre. That’s just a customer view to a retailer. You can add in videos from cctv, readings from smart grids and other networks, instrumentation of all kinds on all kinds of appliances (the first big data app I ever conceived was the smart-fridge that tracked my beer consumption and re-ordered automatically).
Some people talk about structured and un-structured data. By structured they mean relational and by un-structured they mean everything else. Those people often have a vested interest in un-structured data analytics technology so it’s in their interest to define their product’s scope as widely as possible and to keep the data warehouse in its little relational box.
But there are countless examples of organizations using relational data warehouses to successfully process what, by that definition, is un-structured data. The one I always trot out here is Call Data Records (CDRs).
These are little packets of data, created on mobile telephone networks for every call. I think there are at least four per call and they contain data like caller id ,callee, start time, end time, source network (e.g. Sprint), destination network (e.g. T-Mobile), etcetera. The network service providers use CDRs to bill us for our phone usage and to bill each other for the calls they complete on each other’s behalf. These billing apps and particularly the reconciliation apps that allow a provider to check that its partners are not billing inaccurately (imagine the bill from T-Mobile to Sprint – with about a gazillion line items on it, each to be checked off against a line item on one of their millions of customers’ bill). It’s a simple app if you can handle that volume of data, and IBM has lots of customers who justified the cost of their IBM Netezza boxes on just that app. But that data isn’t relational. It comes straight off the network. It must be un-structured so it can’t be a relational data warehouse app! But it is structured. You know exactly what each byte of a call data record is. It’s totally structured – just not relational.
So here endeth lesson one:
‘The only data you process in the data warehouse is the relational data from operational CRM & ERP systems. Everything else goes in Hadoop’ is way too simplistic to be a useful guide when you’re building a big data strategy.
Another example might be sentiment analysis of tweets. Now I’ve heard relational database experts say that you can analyze tweets in a relational database, even thought they are text + tags. (Definitely semi-structured), but you can easily enough model a relational schema to hold tweets (this is left as an exercise to the reader), but I tend to the view that you don’t need to if you have your friendly Hadoop cluster to hand. And this brings into play the next factor – economics.
The cost of storing a terabyte of data on a Hadoop cluster is a lot less than a terabyte in a corporate data warehouse. That’s because, in 2012, Hadoop is likely to be sitting on cheap commodity hardware, in some R&D corner of the enterprise without concerns (and extra expense) relating to governance, data cleansing and a host of other technologies and processes that are essential for managing proven high-value assets. Right now it’s a terabyte (or maybe a hundred terabytes) of unexplored data of uncertain value that might yield insight gold, but how you don’t know right now.
So why not start there? If and when you want to move your cool new Hadoop tweet analytics app into production you’ll have more costs to bear, to production harden your R&D Hadoop install, but that’s a question for when the app has proven business value that will justify that cost.
So that’s why I split it out into Unstructured, Semi-Structured and Relational.
Relational speaks for itself – typically this is the standard fare for data warehouses – extracted from ERP and other operational systems. We already know what the data means and what its structure is.
Un-structured is at the other end of the spectrum. It might be in any form: text, audio, video. We definitely don’t know from looking at the data what it means – unless we apply human understanding to it.
Semi-structured is everything in between. Web logs in the form of XML documents, call data records from networks, statuses from components in a smart grid, GPS readings that locate a smart phone. None of these is relational data, but equally the internal structure of all of them is precisely known. So structured or unstructured? That’s not the point; the point is what you want to do with them.
In this post I’ve tried to indicate that there’s no cut-and dried way of deciding how to analyze data just from its source, but I’ve only skimmed the surface and raised more questions than I’ve answered, so I’m planning a series of posts to look at all the other factors you need to consider in building big data analytics apps.
By Guest Blogger Dai Clegg, Director of EMEA Marketing
I’ve been working with relational databases since the 1980s. For most of that time with Oracle, in the trenches as a consultant, in the ivory tower as a methodologist, in a suit as a marketer and in the skunk works as a product developer. I have recently escaped the Death Star and joined IBM Netezza, mostly to rediscover the joy of a small company with great technology, focused on what the customers really want. Oh and the freedom to write a blog like this.
May 18, 2012Posted by on
It’s only a few days away! My flight’s booked, the hotel has says I can stay again this year, so I will be off to the Master Data Management and Data Governance Summit! The dog will have to walk itself.
There are so many thought leaders presenting I’m excited (Big Kev Style).
IBM’s Worldwide Executive Banking Architect, Information Management, Doug Thompson, will be presenting some case studies from the Bank of America and Panasonic. He also promises to enlighten us on the joys of “Accelerating Master Data Management Through Information Integration & Governance”.
This is going to be an interesting one because like you, I would love to be able to accelerate our MDM projects. And I believe integration is key to it (which works out perfectly since I work at a company called EC Integrators). Governance is the part that slows us down, and it often becomes the bump that derails the juggernaut of the MDM initiative. It ends up confusing both Business users and IT that a level of rigour for data and data quality should be agreed at the start of the implementation.
I am really looking forward to hear about IBM’s success in the area and I hope I’ll get to learn something new.
Tell us how you’ve accelerated an MDM implementation, and if governance played a role in the process.
May 16, 2012Posted by on
We’re all going to the Master Data Management and Data Governance Summit next week, well maybe not all of us, but my team is. The Summit promises to be bigger and better than last year, which was a buzzing event! This is a change from most conferences which often turn out to be dour and dry.
This year holds many delights for the lucky few who get to follow the quest of Master Data Management and Data Governance. The expectations are now set and the kick off keynote will give us a big bite to chew on. Aaron Zornes will be addressing us on “Maximising Business Outcomes: Harnessing the Power of ‘Master Data Governance’ to unify MDM & BPM”.
This is huge! I cannot wait to hear the latest thoughts on how the deep thinkers of our industry have dispelled the dogma surrounding ownership of process and technology. From his abstract, I think Aaron is heading in the right direction, because this is a common problem for all organisations; so common in fact that it is probably one of the most relevant topics today. This looks like a corker and a must for all you MDM and DG specialists and enthusiasts out there.
I am also interested to hear how individual organisations are overcoming this problem. I have seen many organisations struggle to keep their business processes up to date changes in technology, desires of the organisation and the silos that inevitably exist.
If you have come across this problem, share with us below how you overcame it.
By Guest Blogger Terence Walsh
I am Principal Consultant at EC Integrators and the MD at CVP Solutions and my Background is in Quality and Systems focused on Engineering & IT (Data Systems and Technology in particular). I have been working towards the goal of enlightened engineering and IT management since I was given my first chance to run an engineering company in 1999.
May 10, 2012Posted by on
The Business Analytics Forum has finished up for 2012. After spending three days discussing smart analytics with people from a range of industries and organisations, some key issues really stood out for me.
Firstly, the good. It’s no surprise that Proof Of Concept (POC) is the new norm. Almost every person I spoke with was keen to explore POC with the latest version of Cognos (Cognos 10). People see it as a no brainer. POC offers a short-term, cost-effective way to demonstrate business benefits to senior management. And if executives can see the benefits, they can see the value in investing capital.
At the other end of the spectrum, speed of accessing data is still a major problem for businesses. To drive maximum efficiency, users need timely access the right information to do their job. The trouble is that outdated legacy systems just aren’t able to deliver on this. As Gartner has shown, mobile applications are certainly helping to drive this trend.
And finally, the last one standing. That would be me, at the BA Forum gala dinner, rocking out on the dance floor to Jon Stevens and the Black Sorrows. A great finale to a great event.
May 2, 2012Posted by on
Today’s data takes many unstructured forms, with email, audio and video files, images, webpages, social media, word documents and spread sheets being used by organisations every day. Sound familiar? You’re not alone. In fact, a recent study revealed that around 80% of businesses are currently dealing with unstructured data.
Capturing all this data is a great start, but it shouldn’t end there. The issue for most organisations is that they are very well-practiced at collecting the data, but not so advanced in finding relationships between the various forms of data. Analytics has the potential to account for 20% of your revenue, so it’s important to amalgamate your unstructured data to create the bigger picture. But how can we do this?
Cleansing and Analysing your data
You need to start by looking at the way you store and manage your multiple data formats. Many organisations aren’t cleansing and analysing their data properly, which leads to inefficiencies and inaccuracies in your analytics process. This in turn has the potential to steer you in the wrong direction when making serious business decisions.
Without the right management solution for your data, you’re going to have a hard time extracting value from it. Data warehousing allows you to correctly sort and store your data and allows you to take full advantage of the sophisticated analytics software offerings out there.
If you’re not 100% confident in building your own data analytics roadmap, which most of us aren’t, it’s a good idea to speak with an analytics consultant. They’ll be able to build a tailored analytics strategy that incorporates the entire analytics journey from data warehousing all the way through identifying new insights.