Research Paper: Can Process Automation Be Utilised To Improve Data Quality?

My prior research papers (Data Quality Metrics; The Costs To The Organisation Of Poor Data QualityWhat Causes Data Quality Issues To Arise?; Data Quality Issues In Marketing Databases) have established that data quality encompasses more than finding and fixing missing or inaccurate data. It also encompasses delivering comprehensive, consistent, relevant, fit-for-purpose, and timely data to its consumers regardless of its application, use, or origin. In this discussion with sixteen Database Management experts, I looked to establish whether improvements in data quality are possible using automated processes.

Question 1: Do you believe that automated or manual loading and/or cleaning can best remedy data quality issues?

Difficulties in measuring the benefits of automation can prevent investment in systems. Benefits attributed to automation used to be easily quantifiable. However, more recently automation has been used to develop business insight to support managerial decisions. The benefits here have been much less quantifiable using simple cost benefit analysis methods. Today’s methods of justification have focused too strongly on quantifiable benefits and have not necessarily justified expenditure on automation. While it is understandable that organisations have needed to be able to justify investments in financial terms, these methods need to evolve to access benefits at both a quantitative and qualitative level which will provide a more credible measure of the value of investments. I have attempted to look at whether the benefits in quality that can be leveraged from the application of automated solutions in projects where large-scale marketing efforts have been co-ordinated from often heterogeneous sources.

31.25% of respondents suggested that automation should be employed to detect errors (26.32% of responses). Respondent 13 asserts that automation can be employed to “alert system users to manually inspect certain records if they aren’t clean. The goal is not 100% data quality, but a level of quality that you can comfortably use the system. Companies with the right resources try to address these issues in the source systems.” Respondent 16 asserts automation “doesn’t tend to solve the problems but… can be tremendously powerful for raising flags for things you should be looking at.”

A quarter of respondents believed that neither automated nor manual solutions alone can remedy the data quality problems in most marketing databases, and that a hybrid solution is required to some degree (21.05% of responses to the question). Respondent 12 suggested that “the data that people use now is often so sophisticated, so complex and interrelated, that the tools for cleaning are just not sophisticated enough, and so relying on automated cleaning is certainly helpful but it’s not an ultimate solution.”

25% of respondents believed that automated solutions can be very effective in remedying the data quality problems in marketing databases, but only when placed in the framework of an enterprise-wide, ongoing and iterative approach to data quality (21.05% of responses). Respondent 10 asserted that automation is “one important component, but not the silver bullet; ultimately process is the most important piece to get right.”

18.75% of respondents asserted that neither automated nor manual solutions for data quality problems were ideal, as errors should be identified and eliminated or remedied as close to source as possible by employing techniques to identify errors at point of entry (17.65% of responses).

12.5% of respondents went on to say that automated solutions for understanding, transforming and repurposing complex and unpredictable product information relied upon in an organisation or supply chain can be hugely beneficial and worth the upfront cost (10.53% of responses). Respondent 15 claimed that “to operate more effectively and trust data and ship it around throughout your supply chain, the goal is automation.”

One respondent (5.26% of the total responses) claimed that automation may be the best solution to manipulate values such as date and time values from disparate systems into a consistent format and leverage the potential for reporting. Respondent 1 suggests the use of “mass concatenation and conversion to common format once you’ve imported data into a single view. Text manipulation can be a painstaking job, so better to try and find automated ways of doing repeatable tasks such as this.” This should be utilised in other areas too, such as text string manipulation.

Question 2: How possible do you think it is to achieve fully automated processes to improve data quality?

Through this question I was attempting to delineate the extent to which automation can be deployed to improve data quality. Automating processes to improve data quality would present a challenge for most organisations, not least because they may not be fully aware of the data quality levels within their enterprise (something I have attempted to establish earlier in the interview). Now that we have determined the business impact of poor quality data, I am seeking to ascertain whether the corrective or preventative work can be entirely automated (and if it is desirable to do so).

37.5% of respondents (in 35.29% of responses) suggested that the process of eliminating data quality problems can never be fully automated, asserting that at some point a user interface is required for any system (such as checking success of loading and screening for errors). Respondent 7 advised that “it is possible to implement batch processing on poor quality data, but bear in mind that it isn’t possible to produce as good quality data from automated bulk cleaning processes after data entry than properly sort out validated entry. Also, it costs more to clean data after than to collect data well at source. Companies often prefer to spend more when they can see the poor quality data which needs cleaning than to spend money on prevention of inadequate data getting into the system, but this has poor yields and is a myopic approach.”

A quarter of respondents suggested that fully automated processes to improve data quality were possible to achieve, but required a great resource commitment from any enterprise wishing to implement them (23.53% of responses). Respondent 5 suggests, “the biggest obstacles to automation are time and cost. With enough money and commitment thrown at an automation project, it is possible to achieve automation.” Respondent 15 suggests that automating data quality improvements (such as “automatic detection of missing or suspect values, automated consolidation of data content”) is a prerequisite for successfully automating business processes.

18.75% of respondents asserted that a balance of manual and automated solutions is the correct approach (17.65% of responses). An enthusiasm for some automation on the part of these respondents was tempered by an awareness of its apparent limitations. Respondent 4 assesses that “current technologies don’t really lend themselves to automating such things so most often the data or information is being manually shaped at each step along the way. Respondent 7 proposes that “it is certainly possible to correct and standardise data as it comes in from disparate sources, that’s probably the realistic extent of automation for improving quality of data.”

12.5% of respondents advise further that automation isn’t the panacea for data quality problems (11.76% of responses). Respondent 6 suggests that “the major barriers to improving data quality are… a lack of understanding of the importance of quality by organisations (whether managers or data producers), lack of rigorous standards and data documentation, turnover of, or lack of, personnel trained in data management and collection of data. I am not sure these issues can fully be solved by automated processes. By correcting them in an automated way you can entrench the behaviour that created the problems.” Respondent 5 advises that “in order to achieve data quality, good internal procedures must exist so that staff can be trained and supported in their work. Careful monitoring and error correction can support good quality data, but it is more effective and efficient for data to be entered correctly first time.”

Finally, 12.5% respondents maintain that it is impossible to use automation to improve poor quality data in marketing databases (11.76% of total responses). Due partly to poor user inputs, “there will always have to be manual intervention for exceptions,” as Respondent 10 argues.

Question 3: Are there any major impediments to being able to improve data quality using automated processes?

Impediments to automation are often cost, lack of equipment, and lack of time. I looked to establish how profoundly these issues have impacted decisions to not automate data quality improvement processes, and, beyond that, delineate the technical barriers to automation particular to this area.

18.75% of respondents suggested that due to exceptions and the intricacy of some inputs and visual checks, systems will continue to need significant human interface (17.65% of total responses to the question).

18.75% of respondents suggested a significant impediment to improving data quality using automated processes would be the lack of data standards and data documentation that exists in most enterprises (17.65% of responses).

18.75% of respondents expanded on this, suggesting a universal lack of data standards and naming conventions (or the issue of different or competing global standards) would seriously hinder automation in this area (17.65% of responses). Respondent 15 emphasised the “three to four thousand data elements that financial institutions track [that] all have important functions in terms of business processing,” and cited the example of ‘closing price’ – “the closing price of a stock or the closing price of a bond – and that’s used for valuation, it’s used for trading, it’s used for position keeping, it’s used for a lot of different things. There is not one thing called closing price, there’s probably ten things called closing price, the official close, the last trading price, it could be the last quoted price, it could be an average price, it could be a valuated price, all of those things in essence are used for the function of closing price. They’re not the same thing! If you tag it correctly and everything is identified with precision, then you can compare it, then you can automate more processes.”

12.5% of respondents highlighted the lack of understanding of the importance of quality by potential sponsors and internal stakeholders in many organisations as a significant impediment (11.76% of responses).

Respondent 1 emphasised the cost of undertaking automation and suggested that there was insufficient funding available for data quality improvement in many organisations as data quality isn’t a priority. The same respondent also suggested that a lack of trained personnel in the collection of data and the maintenance of databases might hinder automation programmes in this area.

Respondent 3 was sceptical about the possibility for effecting long-term improvements to data quality using this method, as “users will continue to bypass controls to enter junk, no matter what.”

Respondent 4 suggested that a significant barrier to automation is that few people know how to fully leverage the potential of technology for this purpose.

Respondent 7 said automation would be difficult in this sphere as the “tools used to validate data, such as addressing information, global postal codes, are continually updated and in a state of flux,” meaning there are few validation processes that can be relied on over the long-term.

Respondent 8 suggested that because of the sheer amount of information that is captured in a modern organisation, it will be difficult to know where to focus efforts.

Respondent 9 highlighted the problem of disparate systems, and the difficulties involved in integrating data from such divergent sources, as the major barrier to using automated processes to improve data quality.

Question 4: What change do you think automated data loading and cleaning solutions have on an organisation?

High quality data supports smoother operations and enables effective decision-making. Conversely, poor quality data causes organisational inefficiency and capital losses (Redman, 1996). Traditional data management primarily focuses on functionality and technical efficiency (storage, retrieval, delivery, and presentation). However, with steadily increasing investments in data management (Wixom & Watson, 2001), there is a growing concern about its economic aspects, namely, its contribution to business value and its effect on costs. Successes in automating data quality improvement processes should therefore have an effect in terms of cost savings, revenue generation, and/or profitability. I hoped to outline these and any other advantages brought to an organisation in this sphere and, to some degree, quantify them.

43.75% of respondents professed that a significant benefit to the enterprise would be improved management information (16.67% of responses) as automated data loading and cleaning solutions “make it easier for the organisation to generate and build reports.” (Respondent 2) As a result, revenues can be improved through effective “analytics forecasting and a proper engagement model with customers.” (Respondent 14)

A positive result of automation is that it can save time and money, if done correctly, bringing productivity gains to the enterprise, as stated by 37.5% of respondents (in 14.29% of total responses). Respondent 8 declares that “automation ultimately brings fast, accurate and repeatable production,” resulting in “small amounts of required manual intervention” (Respondent 9) and ultimately “significant time and resource savings for an organisation.” (Respondent 9)

31.25% of respondents asserted that customer satisfaction could be significantly improved as a result of automation, if it is implemented correctly (11.9% of responses). One example of where the customer experience could be enhanced would be through using automated processes to create a single customer view within a database used by a call centre.

25% of respondents argued that a programme of automation is beneficial to an organisation as it entrenches a beneficial focus on data quality and institutionalises important processes (9.25% of responses).

18.75% of respondents advocated automation as a solution to the issue of information overload (7.14% of responses). Automated processes can increase the amount of organisational knowledge captured. As Respondent 5 pointed out, electronic records are more easily stored and archived and locatable than the paper files kept with manual systems.

18.75 of respondents declared that successful automation would allow an organisation to devote more time to understanding and reaching customers and prospects (7.14% of total responses). Automation leads to a change of roles – staff can find themselves freed from making large numbers of simple decisions to spending their time analysing overall patterns of decisions.

Similarly, a benefit of automation could be significantly reduced labour costs for the organisation. This was proposed by 12.5% of respondents (4.76% of responses).

12.5% of respondents highlighted an immediate correctional impact that automation could have on data quality as a potential positive effect (4.76% of responses).

Also related to the benefits automation could offer in creating a single view of the customers, 12.5% of respondents proposed that automation would result in less wasted marketing spend (4.76% of responses).

12.5% of respondents suggested that an organisation that aims to automate the absorption, normalisation and integration of incoming data so that it can be flexible enough to be used whenever and wherever an organisation needs it would reap significant rewards (4.76% of responses). Respondent 4 suggested that if an organisation “can automate solutions for understanding, transforming and repurposing the complex and unpredictable product information relied upon in an organisation or supply chain, that can be hugely beneficial, and worth the upfront cost.”

Other positive impacts of automation were considered to be the elimination of human error and repetition (Respondent 5), the facility to easily flag rule violations and statistical deviations (Respondent 9) and less stress for employees (Respondent 2). Respondent 8 believes that if an automation programme is successfully implemented in data quality it can help in rolling automation out to other business areas.

However, Respondent 3 suggested that automation would actually lead to a significant deterioration in data quality and Respondent 6 thought that automation would be an “expensive way of eradicating issues that needn’t crop up in the first place with some forethought and training.” These can be seen as possible negative impacts of automation.

Respondent 9 held that an organisation’s main goal should be to “entrench change… to ensure data quality is preserved in the long term. This involves incentivising people who are responsible for data production to provide clean, quality data for systems that business intelligence relies on.”

Respondent 4 outlined that there are “two methods of automating things like this – top-down and bottom-up – and the kind of change and its success will probably depend to some extent on how the automation process is steered and handled. The top-down method is quick, with the decision to change made by those with the best overall view of the organisational environment and resources. The implementation will succeed to the extent that users follow the behaviours prescribed for them. The bottom-up method utilises the knowledge of those employees who are doing the work, and the intricacies of the work. The employees have a detailed knowledge of their tasks, but have to be given the overview of organisational aims in order to participate effectively in the change process, in implementing automation. This method takes more time but because it is more democratic produces greater commitment to the change, therefore needing less management control and less effort to foster buy-in.”

References

Redman, T. (1996). Data Quality For The Information Age. Boston, Massachusetts: Artech House.
Watson, H. J., & Wixom, B. H. (2001). An Empirical Investigation Of The Factors Affecting Data Warehousing Success. MIS Quarterly , 25 (1), 17-41.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.