TDNett Knowledge

FAQ's

1. What is Data Quality ?

2. What is Data Mining ?



Data Quality - Key to Successful Business Integration

Over the last several years leading industry analysts and data quality experts have opined that the success or failure of a CRM, data warehouse, e-business or ERP implementation hinges in large part on the quality of an organization's customer information. As more and more organizations turn to these IT-driven initiatives to increase revenue growth, productivity and customer satisfaction, an organization's approach to overcoming "data quality" problems is critical to achieving significant results.

With the implementations of Enterprise Applications like ERP, CRM, Internet Banking System, many organizations are now looking for effective Data Quality solutions to handle their large-scale data quality issues.

The good news is that many of the problems associated with poor data quality are avoidable, preventable and correctable if an enterprise-wide data quality (EWDQ) solution is well designed and executed. EWDQ solutions produce predictable and measurable return on investment (ROI) by eliminating anomalies, mistakes and duplication in customer information. Moreover, they provide a single clean source of data in which to establish a unified view of customers across the organization. Products like Trillium Software, provide businesses with a software solution that cleanses and standardizes global customer information in e-business, CRM and Internet applications.

Understand the real costs and causes of poor data quality
Poor data quality has an insidious and systematic impact, signified by the enduring technology acronym, GIGO (garbage in, garbage out). A proactive and enterprise level approach to instituting best practices in data quality can prevent some of the most glaring symptoms and corresponding business problems associated with ongoing data corruption, duplication, omission, inconsistencies and other flaws in customer information. According to experts, data quality issues account for a data warehouse failures. Data quality issues are also a large component of high failure rate among CRM projects. EWDQ solutions make it easy for organizations to profile the quality of their data and facilitate rapid implementation practices to correct existing data defects. More and more, organizations are looking to EWDQ solutions to help them "dig out" from under the high cost of poor data quality.

Sources of Data Corruption
Every customer touch point is a potential source of data corruption:-

  • Online customers intentionally enter non-quality data to protect their privacy
  • Call center operators enter abbreviated data to save time
  • Third party data contains inconsistencies, inaccuracies and errors
  • Customer contacts and customers input typing errors into front office systems
  • Data from diverse source systems conforms to disparate operational standards and formats

Bad data entering enterprise-wide systems at any point reduces the data quality of the entire system. A data quality solution is the only way to prevent bad data from debasing business-critical processes.

Employ a proven methodology
Most successful data quality initiatives employ a methodical, proven and staged approach to establishing enterprise-wide data quality. From analyzing enterprise data quality strategy to data acquisition and inspections to global language and cultural consideration to technical requirement determination, each step in your plan must ultimately help produce useful business information for end users. Given that most successful CRM, ERP and e-business implementations also include an online component, your methodology should include the capability to take your data quality solution from batch-mode/high performance processing to a versatile, flexible transactional environment via the Web.

Problems
Symptoms
Low Stock Accuracy Reprocessing of Orders
Invoicing inaccuracy Regular and duplicate clean-up orders
Financial penalties Incorrect orders
Audit compliance challenges Duplication of information
Financial project inaccuracy Incorrect custoemr sales analysis

Some processes to consider in Data Quality Initiatives are

  • Establishing a data quality strategy
  • Defining your initiative
  • Committing to a data quality team
  • Providing for data quality profiling, inspection and analysis
  • Standardizing and reengineering data
  • Data integration, relationship matching and linking
  • Reporting

Sample ROI of EWDQ initiative

  • Faster customer data processing and clean customer master data
  • More accurate analytics: accurate "what if" analyses
  • Improved yield and profitability management
  • Reduced telemarketing costs
  • Increased sales force efficiency
  • Improved customer retention
  • Reduced postal penalties/higher-mailing success rate
  • Improved promotion and marketing campaign response rates
  • Higher cross- and up-sell volumes

Saving and deploying consistent business rules for data quality across all organizational channels, the EWDQ initiative ensures that clean, accurate and relevant data and a unified customer view constantly support efficient business processes. A single, platform independent data source enables enterprise-wide communication and efficiency. In today's dynamic business environments, it ensures that the unified customer view persists through growth, mergers & acquisitions and IT system evolution.

The Data Quality software should be

1. Platform independent with C and Java interfaces.

2. Support web data cleansing over TCP/IP.

3. Callable components.

4. Real-time XML data processing.

5. Double byte support.

6. Multinational language support and localized geocoding for many countries, including those in Asia/Pacific.

7. Real-time Connectors for Enterprise Applications.




What is data mining?

Data mining is the art and science of understanding and characterizing data using computationally intensive analytic techniques. This description is fine as far as it goes, but it is not much help as it stands. Data mining is used to analyze massive amounts of collected data, such as corporate data bases, and also to analyze data streams - the continually generated new data that pours into companies every day from its ongoing operations.

Data mining is used for three crucial tasks

  • To recognize and provide early warning of situations that require management intervention
  • To estimate the most likely outcome, and confidence of that outcome, of several available alternatives so providing management with business intelligence to make more effective and informed decisions
  • To provide the basis for automated, rational response where a corporation delegates tactical operational decisions to automated processes.

Is data mining just statistical analysis?

Data mining does indeed share many features with statistical analysis, but this does not make them the same.

Statistical analysis starts when someone has an idea, a hunch, and wants to find out if the data supports the hunch, and if so, how much justification, or support is there for it. Data mining starts when someone has a problem and data about the area of the problem, and wants to know what insights, or hunches, the data has to suggest about the problem area.

The difference is that a STATISTICIAN has to devise a possible solution first, and then check it against the data to see if it is valid. With statistics, a single hunch is checked. There is no indication if other possibilities exist, only a justification of the idea investigated: it is up to the user to come up with some other hunch to check.

A DATA MINER brings the problem to the data and asks what possible solutions the data suggests. Neither approach provides automated solutions to problems. With data mining, all of the possible hunches that the data could possibly support are discovered. However, it's up to the miner to validate these hunches since many of them may well turn out to be of little or no value, or not well enough justified to be of practical use.

I already get summary reports that "drill down" through the data. Isn't that data mining?

Summaries and "drill down" reports, and other forms of OLAP (on-line analytical processing) are incredibly useful and an enormously valuable contribution to any corporate effort to understand what the data has to say. These tools and techniques do require that the user make queries about known situations. The summary reports themselves only report on issues that are preconceived to very likely be of interest. Thus, all of these techniques require that the user bring the hunches (which are actually proposed solutions) rather than the problems. Data mining brings the problem to the data rather than the user looking for justification for a proposed solution. With OLAP (and statistical analysis), you try to find what you look for. With DATA MINING, you look for what you can find.

Do you need a data warehouse or data mart in order to mine data?

No. It is a short answer, but unambiguous. Data mining is very often used with data warehouses and data marts, mainly because the enormous corporate effort to collect massive amounts of data often don't pay off, or aren't used as fully as might be, when the data is only used to find what was already known. Exploring data marts and warehouses with data mining, especially when specific areas of enquiry are defined for the search, can be extremely rewarding. However, marts and warehouses represent stored and sometimes aged data. Mining is just as applicable to the current streams of data as it is to collections of data. Data to mine is needed for sure, but it certainly does not have to be warehoused. In fact, warehousing is often (even usually!) detrimental to its use for mining.

If data is in a warehouse, I thought it was already prepared. Why does it need to be prepared again?

First, preparing data for a warehouse (or mart, or even for a database) attempts essentially to make the data consistent and to conform to business rules so that it works conveniently in the dimensions of the warehouse (or other aggregative structures of mart or database). However, data mining is interested in what relationships were in the original data, not in discovering the business rules asserted to make the data conform to the warehouse standards. Indeed, asserting a structure often removes, or distorts beyond recovery, the original relationships that would have been useful for mining - had they still been present or detectable. (This does not mean that warehoused data cannot be mined - it only means that some relationships have been added and others removed as the data is structured).

Second, and this is true whether the data is "raw" or from a carefully prepared warehouse, mining tools have very different needs from a warehouse. As a single example of the needs of mining tools that requires data preparation, mining tools use algorithms that either require all values to be presented as numeric, or all values to be presented as categories. This requires that all categorical values be translated into appropriate numeric values, or all numerical values be appropriately recoded into categories. Making these transformations is not straightforward, although there are techniques and automated tools that make such transformations easy in practice. The way the transformations are made can very dramatically affect the quality of the model. Few, if any, mining tools have default methods that achieve good results. Principled data preparation not only improves the quality of models, but also in some cases makes useful models possible from warehoused data where they were not before.

Do I need to buy a high-priced piece of software in order to mine data?

No, although you can pay almost as much as you like! Many tools and tool suites can be purchased for a few hundred to a few thousand dollars. Remember that data mining tools should not be viewed as a cost. The return usually far exceeds the investment in any well-designed data-mining project.

If you have a lot of data, can you define a problem by mining the data?

Imagine if you get a set of winning lottery numbers, but did not tell you what they were - you couldn't use them at all. Therefore, if you get useful data, but no inkling of the problem domain - not even the variable names - the data would be unusable.

The PROBLEM, not the data, always comes first. However, if you understand the data well enough AND you understand a business domain that the data addresses, THEN you may be lucky enough to discover a useful problem that the data addresses. However, the data alone can tell you nothing.



Home | Services | Data Security| Careers | Download | FAQ | Contact Us

©2009 TDNett Knowledge. All rights reserved.