MARC details
000 -LEADER |
fixed length control field |
16938cam a2200709 i 4500 |
001 - CONTROL NUMBER |
control field |
891186025 |
003 - CONTROL NUMBER IDENTIFIER |
control field |
OCoLC |
005 - DATE AND TIME OF LATEST TRANSACTION |
control field |
20250129142507.0 |
006 - FIXED-LENGTH DATA ELEMENTS--ADDITIONAL MATERIAL CHARACTERISTICS |
fixed length control field |
m o d |
007 - PHYSICAL DESCRIPTION FIXED FIELD--GENERAL INFORMATION |
fixed length control field |
cr ||||||||||| |
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION |
fixed length control field |
140922s2015 enk ob 001 0 eng |
020 ## - INTERNATIONAL STANDARD BOOK NUMBER |
International Standard Book Number |
1118950801 |
Qualifying information |
(electronic bk.) |
|
International Standard Book Number |
1118950844 |
Qualifying information |
(electronic bk.) |
|
International Standard Book Number |
9781118950807 |
Qualifying information |
(electronic bk.) |
|
International Standard Book Number |
9781118950845 |
Qualifying information |
(electronic bk.) |
|
Canceled/invalid ISBN |
111833258X |
Qualifying information |
(hardback) |
|
Canceled/invalid ISBN |
111895095X |
|
Canceled/invalid ISBN |
9781118332580 |
Qualifying information |
(hardback) |
|
Canceled/invalid ISBN |
9781118950951 |
035 ## - SYSTEM CONTROL NUMBER |
System control number |
(OCoLC)891186025 |
037 ## - SOURCE OF ACQUISITION |
Stock number |
0C3F661A-3397-4AD2-8301-24CBBD5AAE9F |
Source of stock number/acquisition |
OverDrive, Inc. |
Note |
http://www.overdrive.com |
040 ## - CATALOGING SOURCE |
Original cataloging agency |
DLC |
Language of cataloging |
eng |
Description conventions |
rda |
-- |
pn |
Transcribing agency |
DLC |
Modifying agency |
N$T |
-- |
YDXCP |
-- |
E7B |
-- |
OSU |
-- |
DG1 |
-- |
OCLCF |
-- |
COO |
-- |
OCLCQ |
-- |
RRP |
-- |
TEFOD |
-- |
OCLCQ |
042 ## - AUTHENTICATION CODE |
Authentication code |
pcc |
050 00 - LIBRARY OF CONGRESS CALL NUMBER |
Classification number |
QA76.9.D343 |
072 #7 - SUBJECT CATEGORY CODE |
Subject category code |
COM |
Subject category code subdivision |
000000 |
Source |
bisacsh |
100 1# - MAIN ENTRY--PERSONAL NAME |
Personal name |
Cichosz, Paweł, |
Authority record control number or standard number |
http://id.loc.gov/authorities/names/n2014057642 |
Relator term |
author |
245 10 - TITLE STATEMENT |
Title |
Data mining algorithms : |
Remainder of title |
explained using R / |
Statement of responsibility, etc. |
Pawel Cichosz |
264 #1 - PRODUCTION, PUBLICATION, DISTRIBUTION, MANUFACTURE, AND COPYRIGHT NOTICE |
Place of production, publication, distribution, manufacture |
Chichester, West Sussex ; |
-- |
Malden, MA : |
Name of producer, publisher, distributor, manufacturer |
John Wiley & Sons Inc., |
Date of production, publication, distribution, manufacture, or copyright notice |
2015 |
300 ## - PHYSICAL DESCRIPTION |
Extent |
1 online resource (xxxi, 683 pages) |
336 ## - CONTENT TYPE |
Content type term |
text |
Content type code |
txt |
Source |
rdacontent |
337 ## - MEDIA TYPE |
Media type term |
computer |
Media type code |
c |
Source |
rdamedia |
338 ## - CARRIER TYPE |
Carrier type term |
online resource |
Carrier type code |
cr |
Source |
rdacarrier |
504 ## - BIBLIOGRAPHY, ETC. NOTE |
Bibliography, etc. note |
|
505 0# - FORMATTED CONTENTS NOTE |
Formatted contents note |
Machine generated contents note: pt. I Preliminaries -- 1. Tasks -- 1.1. Introduction -- 1.1.1. Knowledge -- 1.1.2. Inference -- 1.2. Inductive learning tasks -- 1.2.1. Domain -- 1.2.2. Instances -- 1.2.3. Attributes -- 1.2.4. Target attribute -- 1.2.5. Input attributes -- 1.2.6. Training set -- 1.2.7. Model -- 1.2.8. Performance -- 1.2.9. Generalization -- 1.2.10. Overfitting -- 1.2.11. Algorithms -- 1.2.12. Inductive learning as search -- 1.3. Classification -- 1.3.1. Concept -- 1.3.2. Training set -- 1.3.3. Model -- 1.3.4. Performance -- 1.3.5. Generalization -- 1.3.6. Overfitting -- 1.3.7. Algorithms -- 1.4. Regression -- 1.4.1. Target function -- 1.4.2. Training set -- 1.4.3. Model -- 1.4.4. Performance -- 1.4.5. Generalization -- 1.4.6. Overfitting -- 1.4.7. Algorithms -- 1.5. Clustering -- 1.5.1. Motivation -- 1.5.2. Training set -- 1.5.3. Model -- 1.5.4. Crisp vs. soft clustering -- 1.5.5. Hierarchical clustering -- 1.5.6. Performance -- 1.5.7. Generalization -- 1.5.8. Algorithms |
|
Formatted contents note |
1.5.9. Descriptive vs. predictive clustering -- 1.6. Practical issues -- 1.6.1. Incomplete data -- 1.6.2. Noisy data -- 1.7. Conclusion -- 1.8. Further readings -- References -- 2. Basic statistics -- 2.1. Introduction -- 2.2. Notational conventions -- 2.3. Basic statistics as modeling -- 2.4. Distribution description -- 2.4.1. Continuous attributes -- 2.4.2. Discrete attributes -- 2.4.3. Confidence intervals -- 2.4.4.m-Estimation -- 2.5. Relationship detection -- 2.5.1. Significance tests -- 2.5.2. Continuous attributes -- 2.5.3. Discrete attributes -- 2.5.4. Mixed attributes -- 2.5.5. Relationship detection caveats -- 2.6. Visualization -- 2.6.1. Boxplot -- 2.6.2. Histogram -- 2.6.3. Barplot -- 2.7. Conclusion -- 2.8. Further readings -- References -- pt. II Classification -- 3. Decision trees -- 3.1. Introduction -- 3.2. Decision tree model -- 3.2.1. Nodes and branches -- 3.2.2. Leaves -- 3.2.3. Split types -- 3.3. Growing -- 3.3.1. Algorithm outline |
|
Formatted contents note |
10.4. Conclusion -- 10.5. Further readings -- References -- pt. IV Clustering -- 11.(Dis)similarity measures -- 11.1. Introduction -- 11.2. Measuring dissimilarity and similarity -- 11.3. Difference-based dissimilarity -- 11.3.1. Euclidean distance -- 11.3.2. Minkowski distance -- 11.3.3. Manhattan distance -- 11.3.4. Canberra distance -- 11.3.5. Chebyshev distance -- 11.3.6. Hamming distance -- 11.3.7. Gower's coefficient -- 11.3.8. Attribute weighting -- 11.3.9. Attribute transformation -- 11.4. Correlation-based similarity -- 11.4.1. Discrete attributes -- 11.4.2. Pearson's correlation similarity -- 11.4.3. Spearman's correlation similarity -- 11.4.4. Cosine similarity -- 11.5. Missing attribute values -- 11.6. Conclusion -- 11.7. Further readings -- References -- 12.k-Centers clustering -- 12.1. Introduction -- 12.1.1. Basic principle -- 12.1.2.(Dis)similarity measures -- 12.2. Algorithm scheme -- 12.2.1. Initialization -- 12.2.2. Stop criteria -- 12.2.3. Cluster formation |
|
Formatted contents note |
12.2.4. Implicit cluster modeling -- 12.2.5. Instantiations -- 12.3.k-Means -- 12.3.1. Center adjustment -- 12.3.2. Minimizing dissimilarity to centers -- 12.4. Beyond means -- 12.4.1.k-Medians -- 12.4.2.k-Medoids -- 12.5. Beyond (fixed) k -- 12.5.1. Multiple runs -- 12.5.2. Adaptive k-centers -- 12.6. Explicit cluster modeling -- 12.7. Conclusion -- 12.8. Further readings -- References -- 13. Hierarchical clustering -- 13.1. Introduction -- 13.1.1. Basic approaches -- 13.1.2.(Dis)similarity measures -- 13.2. Cluster hierarchies -- 13.2.1. Motivation -- 13.2.2. Model representation -- 13.3. Agglomerative clustering -- 13.3.1. Algorithm scheme -- 13.3.2. Cluster linkage -- 13.4. Divisive clustering -- 13.4.1. Algorithm scheme -- 13.4.2. Wrapping a flat clustering algorithm -- 13.4.3. Stop criteria -- 13.5. Hierarchical clustering visualization -- 13.6. Hierarchical clustering prediction -- 13.6.1. Cutting cluster hierarchies -- 13.6.2. Cluster membership assignment |
|
Formatted contents note |
13.7. Conclusion -- 13.8. Further readings -- References -- 14. Clustering model evaluation -- 14.1. Introduction -- 14.1.1. Dataset performance -- 14.1.2. Training performance -- 14.1.3. True performance -- 14.2. Per-cluster quality measures -- 14.2.1. Diameter -- 14.2.2. Separation -- 14.2.3. Isolation -- 14.2.4. Silhouette width -- 14.2.5. Davies -- Bouldin Index -- 14.3. Overall quality measures -- 14.3.1. Dunn Index -- 14.3.2. Average Davies -- Bouldin Index -- 14.3.3.C Index -- 14.3.4. Average silhouette width -- 14.3.5. Loglikelihood -- 14.4. External quality measures -- 14.4.1. Misclassification error -- 14.4.2. Rand Index -- 14.4.3. General relationship detection measures -- 14.5. Using quality measures -- 14.6. Conclusion -- 14.7. Further readings -- References -- pt. V Getting Better Models -- 15. Model ensembles -- 15.1. Introduction -- 15.2. Model committees -- 15.3. Base models -- 15.3.1. Different training sets -- 15.3.2. Different algorithms |
|
Formatted contents note |
15.3.3. Different parameter setups -- 15.3.4. Algorithm randomization -- 15.3.5. Base model diversity -- 15.4. Model aggregation -- 15.4.1. Voting/Averaging -- 15.4.2. Probability averaging -- 15.4.3. Weighted voting/averaging -- 15.4.4. Using as attributes -- 15.5. Specific ensemble modeling algorithms -- 15.5.1. Bagging -- 15.5.2. Stacking -- 15.5.3. Boosting -- 15.5.4. Random forest -- 15.5.5. Random Naive Bayes -- 15.6. Quality of ensemble predictions -- 15.7. Conclusion -- 15.8. Further readings -- References -- 16. Kernel methods -- 16.1. Introduction -- 16.2. Support vector machines -- 16.2.1. Classification margin -- 16.2.2. Maximum-margin hyperplane -- 16.2.3. Primal form -- 16.2.4. Dual form -- 16.2.5. Soft margin -- 16.3. Support vector regression -- 16.3.1. Regression tube -- 16.3.2. Primal form -- 16.3.3. Dual form -- 16.4. Kernel trick -- 16.5. Kernel functions -- 16.5.1. Linear kernel -- 16.5.2. Polynomial kernel -- 16.5.3. Radial kernel -- 16.5.4. Sigmoid kernel |
|
Formatted contents note |
16.6. Kernel prediction -- 16.7. Kernel-based algorithms -- 16.7.1. Kernel-based SVM -- 16.7.2. Kernel-based SVR -- 16.8. Conclusion -- 16.9. Further readings -- References -- 17. Attribute transformation -- 17.1. Introduction -- 17.2. Attribute transformation task -- 17.2.1. Target task -- 17.2.2. Target attribute -- 17.2.3. Transformed attribute -- 17.2.4. Training set -- 17.2.5. Modeling transformations -- 17.2.6. Nonmodeling transformations -- 17.3. Simple transformations -- 17.3.1. Standardization -- 17.3.2. Normalization -- 17.3.3. Aggregation -- 17.3.4. Imputation -- 17.3.5. Binary encoding -- 17.4. Multiclass encoding -- 17.4.1. Encoding and decoding functions -- 17.4.2.1-ok-k encoding -- 17.4.3. Error-correcting encoding -- 17.4.4. Effects of multiclass encoding -- 17.5. Conclusion -- 17.6. Further readings -- References -- 18. Discretization -- 18.1. Introduction -- 18.2. Discretization task -- 18.2.1. Motivation -- 18.2.2. Task definition |
|
Formatted contents note |
18.2.3. Discretization as modeling -- 18.2.4. Discretization quality -- 18.3. Unsupervised discretization -- 18.3.1. Equal-width intervals -- 18.3.2. Equal-frequency intervals -- 18.3.3. Nonmodeling discretization -- 18.4. Supervised discretization -- 18.4.1. Pure-class discretization -- 18.4.2. Bottom-up discretization -- 18.4.3. Top-down discretization -- 18.5. Effects of discretization -- 18.6. Conclusion -- 18.7. Further readings -- References -- 19. Attribute selection -- 19.1. Introduction -- 19.2. Attribute selection task -- 19.2.1. Motivation -- 19.2.2. Task definition -- 19.2.3. Algorithms -- 19.3. Attribute subset search -- 19.3.1. Search task -- 19.3.2. Initial state -- 19.3.3. Search operators -- 19.3.4. State selection -- 19.3.5. Stop criteria -- 19.4. Attribute selection filters -- 19.4.1. Simple statistical filters -- 19.4.2. Correlation-based filters -- 19.4.3. Consistency-based filters -- 19.4.4. RELIEF -- 19.4.5. Random forest -- 19.4.6. Cutoff criteria |
|
Formatted contents note |
19.4.7. Filter-driven search -- 19.5. Attribute selection wrappers -- 19.5.1. Subset evaluation -- 19.5.2. Wrapper attribute selection -- 19.6. Effects of attribute selection -- 19.7. Conclusion -- 19.8. Further readings -- References -- 20. Case studies -- 20.1. Introduction -- 20.1.1. Datasets -- 20.1.2. Packages -- 20.1.3. Auxiliary functions -- 20.2. Census income -- 20.2.1. Data loading and preprocessing -- 20.2.2. Default model -- 20.2.3. Incorporating misclassification costs -- 20.2.4. Pruning -- 20.2.5. Attribute selection -- 20.2.6. Final models -- 20.3.Communities and crime -- 20.3.1. Data loading -- 20.3.2. Data quality -- 20.3.3. Regression trees -- 20.3.4. Linear models -- 20.3.5. Attribute selection -- 20.3.6. Piecewise-linear models -- 20.4. Cover type -- 20.4.1. Data loading and preprocessing -- 20.4.2. Class imbalance -- 20.4.3. Decision trees -- 20.4.4. Class rebalancing -- 20.4.5. Multiclass encoding -- 20.4.6. Final classification models -- 20.4.7. Clustering |
|
Formatted contents note |
20.5. Conclusion -- 20.6. Further readings -- References -- Closing -- A. Notation -- A.1. Attribute values -- A.2. Data subsets -- A.3. Probabilities -- B.R packages -- B.1. CRAN packages -- B.2. DMR packages -- B.3. Installing packages -- References -- C. Datasets |
|
Formatted contents note |
3.3.2. Class distribution calculation -- 3.3.3. Class label assignment -- 3.3.4. Stop criteria -- 3.3.5. Split selection -- 3.3.6. Split application -- 3.3.7.Complete process -- 3.4. Pruning -- 3.4.1. Pruning operators -- 3.4.2. Pruning criterion -- 3.4.3. Pruning control strategy -- 3.4.4. Conversion to rule sets -- 3.5. Prediction -- 3.5.1. Class label prediction -- 3.5.2. Class probability prediction -- 3.6. Weighted instances -- 3.7. Missing value handling -- 3.7.1. Fractional instances -- 3.7.2. Surrogate splits -- 3.8. Conclusion -- 3.9. Further readings -- References -- 4. Naive Bayes classifier -- 4.1. Introduction -- 4.2. Bayes rule -- 4.3. Classification by Bayesian inference -- 4.3.1. Conditional class probability -- 4.3.2. Prior class probability -- 4.3.3. Independence assumption -- 4.3.4. Conditional attribute value probabilities -- 4.3.5. Model construction -- 4.3.6. Prediction -- 4.4. Practical issues -- 4.4.1. Zero and small probabilities |
|
Formatted contents note |
4.4.2. Linear classification -- 4.4.3. Continuous attributes -- 4.4.4. Missing attribute values -- 4.4.5. Reducing naivety -- 4.5. Conclusion -- 4.6. Further readings -- References -- 5. Linear classification -- 5.1. Introduction -- 5.2. Linear representation -- 5.2.1. Inner representation function -- 5.2.2. Outer representation function -- 5.2.3. Threshold representation -- 5.2.4. Logit representation -- 5.3. Parameter estimation -- 5.3.1. Delta rule -- 5.3.2. Gradient descent -- 5.3.3. Distance to decision boundary -- 5.3.4. Least squares -- 5.4. Discrete attributes -- 5.5. Conclusion -- 5.6. Further readings -- References -- 6. Misclassification costs -- 6.1. Introduction -- 6.2. Cost representation -- 6.2.1. Cost matrix -- 6.2.2. Per-class cost vector -- 6.2.3. Instance-specific costs -- 6.3. Incorporating misclassification costs -- 6.3.1. Instance weighting -- 6.3.2. Instance resampling -- 6.3.3. Minimum-cost rule -- 6.3.4. Instance relabeling |
|
Formatted contents note |
6.4. Effects of cost incorporation -- 6.5. Experimental procedure -- 6.6. Conclusion -- 6.7. Further readings -- References -- 7. Classification model evaluation -- 7.1. Introduction -- 7.1.1. Dataset performance -- 7.1.2. Training performance -- 7.1.3. True performance -- 7.2. Performance measures -- 7.2.1. Misclassification error -- 7.2.2. Weighted misclassification error -- 7.2.3. Mean misclassification cost -- 7.2.4. Confusion matrix -- 7.2.5. ROC analysis -- 7.2.6. Probabilistic performance measures -- 7.3. Evaluation procedures -- 7.3.1. Model evaluation vs. modeling procedure evaluation -- 7.3.2. Evaluation caveats -- 7.3.3. Hold-out -- 7.3.4. Cross-validation -- 7.3.5. Leave-one-out -- 7.3.6. Bootstrapping -- 7.3.7. Choosing the right procedure -- 7.3.8. Evaluation procedures for temporal data -- 7.4. Conclusion -- 7.5. Further readings -- References -- pt. III Regression -- 8. Linear regression -- 8.1. Introduction -- 8.2. Linear representation |
|
Formatted contents note |
8.2.1. Parametric representation -- 8.2.2. Linear representation function -- 8.2.3. Nonlinear representation functions -- 8.3. Parameter estimation -- 8.3.1. Mean square error minimization -- 8.3.2. Delta rule -- 8.3.3. Gradient descent -- 8.3.4. Least squares -- 8.4. Discrete attributes -- 8.5. Advantages of linear models -- 8.6. Beyond linearity -- 8.6.1. Generalized linear representation -- 8.6.2. Enhanced representation -- 8.6.3. Polynomial regression -- 8.6.4. Piecewise-linear regression -- 8.7. Conclusion -- 8.8. Further readings -- References -- 9. Regression trees -- 9.1. Introduction -- 9.2. Regression tree model -- 9.2.1. Nodes and branches -- 9.2.2. Leaves -- 9.2.3. Split types -- 9.2.4. Piecewise-constant regression -- 9.3. Growing -- 9.3.1. Algorithm outline -- 9.3.2. Target function summary statistics -- 9.3.3. Target value assignment -- 9.3.4. Stop criteria -- 9.3.5. Split selection -- 9.3.6. Split application -- 9.3.7.Complete process -- 9.4. Pruning |
|
Formatted contents note |
9.4.1. Pruning operators -- 9.4.2. Pruning criterion -- 9.4.3. Pruning control strategy -- 9.5. Prediction -- 9.6. Weighted instances -- 9.7. Missing value handling -- 9.7.1. Fractional instances -- 9.7.2. Surrogate splits -- 9.8. Piecewise linear regression -- 9.8.1. Growing -- 9.8.2. Pruning -- 9.8.3. Prediction -- 9.9. Conclusion -- 9.10. Further readings -- References -- 10. Regression model evaluation -- 10.1. Introduction -- 10.1.1. Dataset performance -- 10.1.2. Training performance -- 10.1.3. True performance -- 10.2. Performance measures -- 10.2.1. Residuals -- 10.2.2. Mean absolute error -- 10.2.3. Mean square error -- 10.2.4. Root mean square error -- 10.2.5. Relative absolute error -- 10.2.6. Coefficient of determination -- 10.2.7. Correlation -- 10.2.8. Weighted performance measures -- 10.2.9. Loss functions -- 10.3. Evaluation procedures -- 10.3.1. Hold-out -- 10.3.2. Cross-validation -- 10.3.3. Leave-one-out -- 10.3.4. Bootstrapping -- 10.3.5. Choosing the right procedure |
506 ## - RESTRICTIONS ON ACCESS NOTE |
Terms governing access |
Available to OhioLINK libraries |
520 ## - SUMMARY, ETC. |
Summary, etc. |
"This book narrows down the scope of data mining by adopting a heavily modeling-oriented perspective"-- |
Assigning source |
Provided by publisher |
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM |
Topical term or geographic name entry element |
Computer algorithms. |
Authority record control number or standard number |
http://id.loc.gov/authorities/subjects/sh91000149 |
9 (RLIN) |
534 |
|
Topical term or geographic name entry element |
Data mining. |
Authority record control number or standard number |
http://id.loc.gov/authorities/subjects/sh97002073 |
9 (RLIN) |
6146 |
|
Topical term or geographic name entry element |
R (Computer program language) |
Authority record control number or standard number |
http://id.loc.gov/authorities/subjects/sh2002004407 |
9 (RLIN) |
25812 |
655 #4 - INDEX TERM--GENRE/FORM |
Genre/form data or focus term |
Electronic books |
9 (RLIN) |
2032 |
710 2# - ADDED ENTRY--CORPORATE NAME |
Corporate name or jurisdiction name as entry element |
Ohio Library and Information Network. |
Authority record control number or standard number |
http://id.loc.gov/authorities/names/no95058981 |
776 08 - ADDITIONAL PHYSICAL FORM ENTRY |
Relationship information |
Print version: |
Main entry heading |
Cichosz, Pawel. |
Title |
Data mining algorithms. |
Place, publisher, and date of publication |
Chichester, West Sussex, United Kingdom : Wiley, 2015 |
International Standard Book Number |
9781118332580 |
Record control number |
(DLC) 2014036992 |
-- |
(OCoLC)890971737 |
856 40 - ELECTRONIC LOCATION AND ACCESS |
Materials specified |
OhioLINK |
Public note |
Connect to resource |
Uniform Resource Identifier |
<a href="https://rave.ohiolink.edu/ebooks/ebc2/9781118950951">https://rave.ohiolink.edu/ebooks/ebc2/9781118950951</a> |
|
Materials specified |
Wiley Online Library |
Public note |
Connect to resource (off-campus) |
Uniform Resource Identifier |
<a href="https://go.ohiolink.edu/goto?url=https://onlinelibrary.wiley.com/doi/book/10.1002/9781118950951">https://go.ohiolink.edu/goto?url=https://onlinelibrary.wiley.com/doi/book/10.1002/9781118950951</a> |
|
Materials specified |
Wiley Online Library |
Public note |
Connect to resource |
Uniform Resource Identifier |
<a href="https://onlinelibrary.wiley.com/doi/book/10.1002/9781118950951">https://onlinelibrary.wiley.com/doi/book/10.1002/9781118950951</a> |
|
Materials specified |
O'Reilly |
Public note |
Connect to resource |
Uniform Resource Identifier |
<a href="https://learning.oreilly.com/library/view/~/9781118950807/?ar">https://learning.oreilly.com/library/view/~/9781118950807/?ar</a> |