.

Friday, March 1, 2019

Data Preprocessing Essay

selective information Preprocessing 3 Todays real-world infobases argon highly susceptible to noisy, missing, and inconsistent data due to their typically vast size (often several gigabytes or more) and their likely origin from multiple, heterogenous sources. Low-quality data will lead to low-quality mine results. How evict the data be preprocessed in order to help make better the quality of the data and, consequently, of the mining results? How shadower the data be preprocessed so as to improve the ef? ciency and ease of the mining process? There are several data preprocessing techniques.Data cleanup position can be utilise to remove noise and rig inconsistencies in data. Data integration merges data from multiple sources into a coherent data store such as a data warehouse. Data diminution can reduce data size by, for instance, aggregating, eliminating redundant features, or clustering. Data transformations (e. g. , normalization) may be applied, where data are scaled to f all at heart a smaller range like 0. 0 to 1. 0. This can improve the accuracy and ef? ciency of mining algorithms involving distance measurements. These techniques are not mutually scoop shovel they may work together.For example, data cleaning can involve transformations to correct wrong data, such as by transforming all entries for a envision ? eld to a common format. In Chapter 2, we learned about the antithetical attribute types and how to use basic statistical descriptions to study data characteristics. These can help identify erroneous values and outliers, which will be utilizable in the data cleaning and integration steps. Data processing techniques, when applied before mining, can substantially improve the overall quality of the patterns mine and/or the time required for the actual mining.

No comments:

Post a Comment