|
FAQ's
1. What is Data Quality
?
2. What is Data Mining ?
Data Quality - Key to Successful Business Integration
Over the last several years leading industry analysts and data
quality experts have opined that the success or failure of a CRM,
data warehouse, e-business or ERP implementation hinges in large
part on the quality of an organization's customer information. As
more and more organizations turn to these IT-driven initiatives
to increase revenue growth, productivity and customer satisfaction,
an organization's approach to overcoming "data quality"
problems is critical to achieving significant results.
With the implementations of Enterprise Applications like ERP, CRM,
Internet Banking System, many organizations are now looking for
effective Data Quality solutions to handle their large-scale data
quality issues.
The good news is that many of the problems associated with poor
data quality are avoidable, preventable and correctable if an enterprise-wide
data quality (EWDQ) solution is well designed and executed. EWDQ
solutions produce predictable and measurable return on investment
(ROI) by eliminating anomalies, mistakes and duplication in customer
information. Moreover, they provide a single clean source of data
in which to establish a unified view of customers across the organization.
Products like Trillium Software, provide businesses with a software
solution that cleanses and standardizes global customer information
in e-business, CRM and Internet applications.
Understand the real costs and causes of
poor data quality
Poor data quality has an insidious and systematic impact,
signified by the enduring technology acronym, GIGO (garbage in,
garbage out). A proactive and enterprise level approach to instituting
best practices in data quality can prevent some of the most glaring
symptoms and corresponding business problems associated with ongoing
data corruption, duplication, omission, inconsistencies and other
flaws in customer information. According to experts, data quality
issues account for a data warehouse failures. Data quality issues
are also a large component of high failure rate among CRM projects.
EWDQ solutions make it easy for organizations to profile the quality
of their data and facilitate rapid implementation practices to correct
existing data defects. More and more, organizations are looking
to EWDQ solutions to help them "dig out" from under the
high cost of poor data quality.
Sources of Data Corruption
Every customer touch point is a potential source of data
corruption:-
- Online customers intentionally enter non-quality data to protect
their privacy
- Call center operators enter abbreviated data to save time
- Third party data contains inconsistencies, inaccuracies and
errors
- Customer contacts and customers input typing errors into front
office systems
- Data from diverse source systems conforms to disparate operational
standards and formats
Bad data entering enterprise-wide systems at any point reduces
the data quality of the entire system. A data quality solution is
the only way to prevent bad data from debasing business-critical
processes.
Employ a proven methodology
Most successful data quality initiatives employ a methodical,
proven and staged approach to establishing enterprise-wide data
quality. From analyzing enterprise data quality strategy to data
acquisition and inspections to global language and cultural consideration
to technical requirement determination, each step in your plan must
ultimately help produce useful business information for end users.
Given that most successful CRM, ERP and e-business implementations
also include an online component, your methodology should include
the capability to take your data quality solution from batch-mode/high
performance processing to a versatile, flexible transactional environment
via the Web.
|
Problems
|
Symptoms
|
| Low Stock Accuracy |
Reprocessing of Orders |
| Invoicing inaccuracy |
Regular and duplicate clean-up orders |
| Financial penalties |
Incorrect orders |
| Audit compliance challenges |
Duplication of information |
| Financial project inaccuracy |
Incorrect custoemr sales analysis |
Some processes to consider in Data Quality
Initiatives are
- Establishing a data quality strategy
- Committing to a data quality team
- Providing for data quality profiling, inspection and analysis
- Standardizing and reengineering data
- Data integration, relationship matching and linking
Sample ROI of EWDQ initiative
- Faster customer data processing and clean customer master data
- More accurate analytics: accurate "what if" analyses
- Improved yield and profitability management
- Reduced telemarketing costs
- Increased sales force efficiency
- Improved customer retention
- Reduced postal penalties/higher-mailing success rate
- Improved promotion and marketing campaign response rates
- Higher cross- and up-sell volumes
Saving and deploying consistent business rules for data quality
across all organizational channels, the EWDQ initiative ensures
that clean, accurate and relevant data and a unified customer view
constantly support efficient business processes. A single, platform
independent data source enables enterprise-wide communication and
efficiency. In today's dynamic business environments, it ensures
that the unified customer view persists through growth, mergers
& acquisitions and IT system evolution.
The Data Quality software should be
1. Platform independent with C and Java interfaces.
2. Support web data cleansing over TCP/IP.
3. Callable components.
4. Real-time XML data processing.
5. Double byte support.
6. Multinational language support and localized geocoding for many
countries, including those in Asia/Pacific.
7. Real-time Connectors for Enterprise Applications.
What is data mining?
Data mining is the art and science of understanding and characterizing
data using computationally intensive analytic techniques. This description
is fine as far as it goes, but it is not much help as it stands.
Data mining is used to analyze massive amounts of collected data,
such as corporate data bases, and also to analyze data streams -
the continually generated new data that pours into companies every
day from its ongoing operations.
Data mining is used for three crucial tasks
- To recognize and provide early warning of situations that require
management intervention
- To estimate the most likely outcome, and confidence of that
outcome, of several available alternatives so providing management
with business intelligence to make more effective and informed
decisions
- To provide the basis for automated, rational response where
a corporation delegates tactical operational decisions to automated
processes.
Is data mining just statistical analysis?
Data mining does indeed share many features with statistical
analysis, but this does not make them the same.
Statistical analysis starts when someone has an idea, a hunch,
and wants to find out if the data supports the hunch, and if so,
how much justification, or support is there for it. Data mining
starts when someone has a problem and data about the area of the
problem, and wants to know what insights, or hunches, the data has
to suggest about the problem area.
The difference is that a STATISTICIAN has to devise a possible
solution first, and then check it against the data to see if it
is valid. With statistics, a single hunch is checked. There is no
indication if other possibilities exist, only a justification of
the idea investigated: it is up to the user to come up with some
other hunch to check.
A DATA MINER brings the problem to the data and asks what possible
solutions the data suggests. Neither approach provides automated
solutions to problems. With data mining, all of the possible hunches
that the data could possibly support are discovered. However, it's
up to the miner to validate these hunches since many of them may
well turn out to be of little or no value, or not well enough justified
to be of practical use.
I already get summary reports that "drill down" through
the data. Isn't that data mining?
Summaries and "drill down" reports, and other forms of
OLAP (on-line analytical processing) are incredibly useful and an
enormously valuable contribution to any corporate effort to understand
what the data has to say. These tools and techniques do require
that the user make queries about known situations. The summary reports
themselves only report on issues that are preconceived to very likely
be of interest. Thus, all of these techniques require that the user
bring the hunches (which are actually proposed solutions) rather
than the problems. Data mining brings the problem to the data rather
than the user looking for justification for a proposed solution.
With OLAP (and statistical analysis), you try to find what you look
for. With DATA MINING, you look for what you can find.
Do you need a data warehouse or data mart
in order to mine data?
No. It is a short answer, but unambiguous. Data mining is very
often used with data warehouses and data marts, mainly because the
enormous corporate effort to collect massive amounts of data often
don't pay off, or aren't used as fully as might be, when the data
is only used to find what was already known. Exploring data marts
and warehouses with data mining, especially when specific areas
of enquiry are defined for the search, can be extremely rewarding.
However, marts and warehouses represent stored and sometimes aged
data. Mining is just as applicable to the current streams of data
as it is to collections of data. Data to mine is needed for sure,
but it certainly does not have to be warehoused. In fact, warehousing
is often (even usually!) detrimental to its use for mining.
If data is in a warehouse, I thought it
was already prepared. Why does it need to be prepared again?
First, preparing data for a warehouse (or mart, or even for a database)
attempts essentially to make the data consistent and to conform
to business rules so that it works conveniently in the dimensions
of the warehouse (or other aggregative structures of mart or database).
However, data mining is interested in what relationships were in
the original data, not in discovering the business rules asserted
to make the data conform to the warehouse standards. Indeed, asserting
a structure often removes, or distorts beyond recovery, the original
relationships that would have been useful for mining - had they
still been present or detectable. (This does not mean that warehoused
data cannot be mined - it only means that some relationships have
been added and others removed as the data is structured).
Second, and this is true whether the data is "raw" or
from a carefully prepared warehouse, mining tools have very different
needs from a warehouse. As a single example of the needs of mining
tools that requires data preparation, mining tools use algorithms
that either require all values to be presented as numeric, or all
values to be presented as categories. This requires that all categorical
values be translated into appropriate numeric values, or all numerical
values be appropriately recoded into categories. Making these transformations
is not straightforward, although there are techniques and automated
tools that make such transformations easy in practice. The way the
transformations are made can very dramatically affect the quality
of the model. Few, if any, mining tools have default methods that
achieve good results. Principled data preparation not only improves
the quality of models, but also in some cases makes useful models
possible from warehoused data where they were not before.
Do I need to buy a high-priced piece of
software in order to mine data?
No, although you can pay almost as much as you like! Many tools
and tool suites can be purchased for a few hundred to a few thousand
dollars. Remember that data mining tools should not be viewed as
a cost. The return usually far exceeds the investment in any well-designed
data-mining project.
If you have a lot of data, can you define
a problem by mining the data?
Imagine if you get a set of winning lottery numbers, but did not
tell you what they were - you couldn't use them at all. Therefore,
if you get useful data, but no inkling of the problem domain - not
even the variable names - the data would be unusable.
The PROBLEM, not the data, always comes first. However, if you
understand the data well enough AND you understand a business domain
that the data addresses, THEN you may be lucky enough to discover
a useful problem that the data addresses. However, the data alone
can tell you nothing.
|