Data mining algorithms : (Record no. 200463775)

MARC details
000 -LEADER
fixed length control field 16938cam a2200709 i 4500
001 - CONTROL NUMBER
control field 891186025
003 - CONTROL NUMBER IDENTIFIER
control field OCoLC
005 - DATE AND TIME OF LATEST TRANSACTION
control field 20250129142507.0
006 - FIXED-LENGTH DATA ELEMENTS--ADDITIONAL MATERIAL CHARACTERISTICS
fixed length control field m o d
007 - PHYSICAL DESCRIPTION FIXED FIELD--GENERAL INFORMATION
fixed length control field cr |||||||||||
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field 140922s2015 enk ob 001 0 eng
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
International Standard Book Number 1118950801
Qualifying information (electronic bk.)
International Standard Book Number 1118950844
Qualifying information (electronic bk.)
International Standard Book Number 9781118950807
Qualifying information (electronic bk.)
International Standard Book Number 9781118950845
Qualifying information (electronic bk.)
Canceled/invalid ISBN 111833258X
Qualifying information (hardback)
Canceled/invalid ISBN 111895095X
Canceled/invalid ISBN 9781118332580
Qualifying information (hardback)
Canceled/invalid ISBN 9781118950951
035 ## - SYSTEM CONTROL NUMBER
System control number (OCoLC)891186025
037 ## - SOURCE OF ACQUISITION
Stock number 0C3F661A-3397-4AD2-8301-24CBBD5AAE9F
Source of stock number/acquisition OverDrive, Inc.
Note http://www.overdrive.com
040 ## - CATALOGING SOURCE
Original cataloging agency DLC
Language of cataloging eng
Description conventions rda
-- pn
Transcribing agency DLC
Modifying agency N$T
-- YDXCP
-- E7B
-- OSU
-- DG1
-- OCLCF
-- COO
-- OCLCQ
-- RRP
-- TEFOD
-- OCLCQ
042 ## - AUTHENTICATION CODE
Authentication code pcc
050 00 - LIBRARY OF CONGRESS CALL NUMBER
Classification number QA76.9.D343
072 #7 - SUBJECT CATEGORY CODE
Subject category code COM
Subject category code subdivision 000000
Source bisacsh
100 1# - MAIN ENTRY--PERSONAL NAME
Personal name Cichosz, Paweł,
Authority record control number or standard number http://id.loc.gov/authorities/names/n2014057642
Relator term author
245 10 - TITLE STATEMENT
Title Data mining algorithms :
Remainder of title explained using R /
Statement of responsibility, etc. Pawel Cichosz
264 #1 - PRODUCTION, PUBLICATION, DISTRIBUTION, MANUFACTURE, AND COPYRIGHT NOTICE
Place of production, publication, distribution, manufacture Chichester, West Sussex ;
-- Malden, MA :
Name of producer, publisher, distributor, manufacturer John Wiley & Sons Inc.,
Date of production, publication, distribution, manufacture, or copyright notice 2015
300 ## - PHYSICAL DESCRIPTION
Extent 1 online resource (xxxi, 683 pages)
336 ## - CONTENT TYPE
Content type term text
Content type code txt
Source rdacontent
337 ## - MEDIA TYPE
Media type term computer
Media type code c
Source rdamedia
338 ## - CARRIER TYPE
Carrier type term online resource
Carrier type code cr
Source rdacarrier
504 ## - BIBLIOGRAPHY, ETC. NOTE
Bibliography, etc. note
505 0# - FORMATTED CONTENTS NOTE
Formatted contents note Machine generated contents note: pt. I Preliminaries -- 1. Tasks -- 1.1. Introduction -- 1.1.1. Knowledge -- 1.1.2. Inference -- 1.2. Inductive learning tasks -- 1.2.1. Domain -- 1.2.2. Instances -- 1.2.3. Attributes -- 1.2.4. Target attribute -- 1.2.5. Input attributes -- 1.2.6. Training set -- 1.2.7. Model -- 1.2.8. Performance -- 1.2.9. Generalization -- 1.2.10. Overfitting -- 1.2.11. Algorithms -- 1.2.12. Inductive learning as search -- 1.3. Classification -- 1.3.1. Concept -- 1.3.2. Training set -- 1.3.3. Model -- 1.3.4. Performance -- 1.3.5. Generalization -- 1.3.6. Overfitting -- 1.3.7. Algorithms -- 1.4. Regression -- 1.4.1. Target function -- 1.4.2. Training set -- 1.4.3. Model -- 1.4.4. Performance -- 1.4.5. Generalization -- 1.4.6. Overfitting -- 1.4.7. Algorithms -- 1.5. Clustering -- 1.5.1. Motivation -- 1.5.2. Training set -- 1.5.3. Model -- 1.5.4. Crisp vs. soft clustering -- 1.5.5. Hierarchical clustering -- 1.5.6. Performance -- 1.5.7. Generalization -- 1.5.8. Algorithms
Formatted contents note 1.5.9. Descriptive vs. predictive clustering -- 1.6. Practical issues -- 1.6.1. Incomplete data -- 1.6.2. Noisy data -- 1.7. Conclusion -- 1.8. Further readings -- References -- 2. Basic statistics -- 2.1. Introduction -- 2.2. Notational conventions -- 2.3. Basic statistics as modeling -- 2.4. Distribution description -- 2.4.1. Continuous attributes -- 2.4.2. Discrete attributes -- 2.4.3. Confidence intervals -- 2.4.4.m-Estimation -- 2.5. Relationship detection -- 2.5.1. Significance tests -- 2.5.2. Continuous attributes -- 2.5.3. Discrete attributes -- 2.5.4. Mixed attributes -- 2.5.5. Relationship detection caveats -- 2.6. Visualization -- 2.6.1. Boxplot -- 2.6.2. Histogram -- 2.6.3. Barplot -- 2.7. Conclusion -- 2.8. Further readings -- References -- pt. II Classification -- 3. Decision trees -- 3.1. Introduction -- 3.2. Decision tree model -- 3.2.1. Nodes and branches -- 3.2.2. Leaves -- 3.2.3. Split types -- 3.3. Growing -- 3.3.1. Algorithm outline
Formatted contents note 10.4. Conclusion -- 10.5. Further readings -- References -- pt. IV Clustering -- 11.(Dis)similarity measures -- 11.1. Introduction -- 11.2. Measuring dissimilarity and similarity -- 11.3. Difference-based dissimilarity -- 11.3.1. Euclidean distance -- 11.3.2. Minkowski distance -- 11.3.3. Manhattan distance -- 11.3.4. Canberra distance -- 11.3.5. Chebyshev distance -- 11.3.6. Hamming distance -- 11.3.7. Gower's coefficient -- 11.3.8. Attribute weighting -- 11.3.9. Attribute transformation -- 11.4. Correlation-based similarity -- 11.4.1. Discrete attributes -- 11.4.2. Pearson's correlation similarity -- 11.4.3. Spearman's correlation similarity -- 11.4.4. Cosine similarity -- 11.5. Missing attribute values -- 11.6. Conclusion -- 11.7. Further readings -- References -- 12.k-Centers clustering -- 12.1. Introduction -- 12.1.1. Basic principle -- 12.1.2.(Dis)similarity measures -- 12.2. Algorithm scheme -- 12.2.1. Initialization -- 12.2.2. Stop criteria -- 12.2.3. Cluster formation
Formatted contents note 12.2.4. Implicit cluster modeling -- 12.2.5. Instantiations -- 12.3.k-Means -- 12.3.1. Center adjustment -- 12.3.2. Minimizing dissimilarity to centers -- 12.4. Beyond means -- 12.4.1.k-Medians -- 12.4.2.k-Medoids -- 12.5. Beyond (fixed) k -- 12.5.1. Multiple runs -- 12.5.2. Adaptive k-centers -- 12.6. Explicit cluster modeling -- 12.7. Conclusion -- 12.8. Further readings -- References -- 13. Hierarchical clustering -- 13.1. Introduction -- 13.1.1. Basic approaches -- 13.1.2.(Dis)similarity measures -- 13.2. Cluster hierarchies -- 13.2.1. Motivation -- 13.2.2. Model representation -- 13.3. Agglomerative clustering -- 13.3.1. Algorithm scheme -- 13.3.2. Cluster linkage -- 13.4. Divisive clustering -- 13.4.1. Algorithm scheme -- 13.4.2. Wrapping a flat clustering algorithm -- 13.4.3. Stop criteria -- 13.5. Hierarchical clustering visualization -- 13.6. Hierarchical clustering prediction -- 13.6.1. Cutting cluster hierarchies -- 13.6.2. Cluster membership assignment
Formatted contents note 13.7. Conclusion -- 13.8. Further readings -- References -- 14. Clustering model evaluation -- 14.1. Introduction -- 14.1.1. Dataset performance -- 14.1.2. Training performance -- 14.1.3. True performance -- 14.2. Per-cluster quality measures -- 14.2.1. Diameter -- 14.2.2. Separation -- 14.2.3. Isolation -- 14.2.4. Silhouette width -- 14.2.5. Davies -- Bouldin Index -- 14.3. Overall quality measures -- 14.3.1. Dunn Index -- 14.3.2. Average Davies -- Bouldin Index -- 14.3.3.C Index -- 14.3.4. Average silhouette width -- 14.3.5. Loglikelihood -- 14.4. External quality measures -- 14.4.1. Misclassification error -- 14.4.2. Rand Index -- 14.4.3. General relationship detection measures -- 14.5. Using quality measures -- 14.6. Conclusion -- 14.7. Further readings -- References -- pt. V Getting Better Models -- 15. Model ensembles -- 15.1. Introduction -- 15.2. Model committees -- 15.3. Base models -- 15.3.1. Different training sets -- 15.3.2. Different algorithms
Formatted contents note 15.3.3. Different parameter setups -- 15.3.4. Algorithm randomization -- 15.3.5. Base model diversity -- 15.4. Model aggregation -- 15.4.1. Voting/Averaging -- 15.4.2. Probability averaging -- 15.4.3. Weighted voting/averaging -- 15.4.4. Using as attributes -- 15.5. Specific ensemble modeling algorithms -- 15.5.1. Bagging -- 15.5.2. Stacking -- 15.5.3. Boosting -- 15.5.4. Random forest -- 15.5.5. Random Naive Bayes -- 15.6. Quality of ensemble predictions -- 15.7. Conclusion -- 15.8. Further readings -- References -- 16. Kernel methods -- 16.1. Introduction -- 16.2. Support vector machines -- 16.2.1. Classification margin -- 16.2.2. Maximum-margin hyperplane -- 16.2.3. Primal form -- 16.2.4. Dual form -- 16.2.5. Soft margin -- 16.3. Support vector regression -- 16.3.1. Regression tube -- 16.3.2. Primal form -- 16.3.3. Dual form -- 16.4. Kernel trick -- 16.5. Kernel functions -- 16.5.1. Linear kernel -- 16.5.2. Polynomial kernel -- 16.5.3. Radial kernel -- 16.5.4. Sigmoid kernel
Formatted contents note 16.6. Kernel prediction -- 16.7. Kernel-based algorithms -- 16.7.1. Kernel-based SVM -- 16.7.2. Kernel-based SVR -- 16.8. Conclusion -- 16.9. Further readings -- References -- 17. Attribute transformation -- 17.1. Introduction -- 17.2. Attribute transformation task -- 17.2.1. Target task -- 17.2.2. Target attribute -- 17.2.3. Transformed attribute -- 17.2.4. Training set -- 17.2.5. Modeling transformations -- 17.2.6. Nonmodeling transformations -- 17.3. Simple transformations -- 17.3.1. Standardization -- 17.3.2. Normalization -- 17.3.3. Aggregation -- 17.3.4. Imputation -- 17.3.5. Binary encoding -- 17.4. Multiclass encoding -- 17.4.1. Encoding and decoding functions -- 17.4.2.1-ok-k encoding -- 17.4.3. Error-correcting encoding -- 17.4.4. Effects of multiclass encoding -- 17.5. Conclusion -- 17.6. Further readings -- References -- 18. Discretization -- 18.1. Introduction -- 18.2. Discretization task -- 18.2.1. Motivation -- 18.2.2. Task definition
Formatted contents note 18.2.3. Discretization as modeling -- 18.2.4. Discretization quality -- 18.3. Unsupervised discretization -- 18.3.1. Equal-width intervals -- 18.3.2. Equal-frequency intervals -- 18.3.3. Nonmodeling discretization -- 18.4. Supervised discretization -- 18.4.1. Pure-class discretization -- 18.4.2. Bottom-up discretization -- 18.4.3. Top-down discretization -- 18.5. Effects of discretization -- 18.6. Conclusion -- 18.7. Further readings -- References -- 19. Attribute selection -- 19.1. Introduction -- 19.2. Attribute selection task -- 19.2.1. Motivation -- 19.2.2. Task definition -- 19.2.3. Algorithms -- 19.3. Attribute subset search -- 19.3.1. Search task -- 19.3.2. Initial state -- 19.3.3. Search operators -- 19.3.4. State selection -- 19.3.5. Stop criteria -- 19.4. Attribute selection filters -- 19.4.1. Simple statistical filters -- 19.4.2. Correlation-based filters -- 19.4.3. Consistency-based filters -- 19.4.4. RELIEF -- 19.4.5. Random forest -- 19.4.6. Cutoff criteria
Formatted contents note 19.4.7. Filter-driven search -- 19.5. Attribute selection wrappers -- 19.5.1. Subset evaluation -- 19.5.2. Wrapper attribute selection -- 19.6. Effects of attribute selection -- 19.7. Conclusion -- 19.8. Further readings -- References -- 20. Case studies -- 20.1. Introduction -- 20.1.1. Datasets -- 20.1.2. Packages -- 20.1.3. Auxiliary functions -- 20.2. Census income -- 20.2.1. Data loading and preprocessing -- 20.2.2. Default model -- 20.2.3. Incorporating misclassification costs -- 20.2.4. Pruning -- 20.2.5. Attribute selection -- 20.2.6. Final models -- 20.3.Communities and crime -- 20.3.1. Data loading -- 20.3.2. Data quality -- 20.3.3. Regression trees -- 20.3.4. Linear models -- 20.3.5. Attribute selection -- 20.3.6. Piecewise-linear models -- 20.4. Cover type -- 20.4.1. Data loading and preprocessing -- 20.4.2. Class imbalance -- 20.4.3. Decision trees -- 20.4.4. Class rebalancing -- 20.4.5. Multiclass encoding -- 20.4.6. Final classification models -- 20.4.7. Clustering
Formatted contents note 20.5. Conclusion -- 20.6. Further readings -- References -- Closing -- A. Notation -- A.1. Attribute values -- A.2. Data subsets -- A.3. Probabilities -- B.R packages -- B.1. CRAN packages -- B.2. DMR packages -- B.3. Installing packages -- References -- C. Datasets
Formatted contents note 3.3.2. Class distribution calculation -- 3.3.3. Class label assignment -- 3.3.4. Stop criteria -- 3.3.5. Split selection -- 3.3.6. Split application -- 3.3.7.Complete process -- 3.4. Pruning -- 3.4.1. Pruning operators -- 3.4.2. Pruning criterion -- 3.4.3. Pruning control strategy -- 3.4.4. Conversion to rule sets -- 3.5. Prediction -- 3.5.1. Class label prediction -- 3.5.2. Class probability prediction -- 3.6. Weighted instances -- 3.7. Missing value handling -- 3.7.1. Fractional instances -- 3.7.2. Surrogate splits -- 3.8. Conclusion -- 3.9. Further readings -- References -- 4. Naive Bayes classifier -- 4.1. Introduction -- 4.2. Bayes rule -- 4.3. Classification by Bayesian inference -- 4.3.1. Conditional class probability -- 4.3.2. Prior class probability -- 4.3.3. Independence assumption -- 4.3.4. Conditional attribute value probabilities -- 4.3.5. Model construction -- 4.3.6. Prediction -- 4.4. Practical issues -- 4.4.1. Zero and small probabilities
Formatted contents note 4.4.2. Linear classification -- 4.4.3. Continuous attributes -- 4.4.4. Missing attribute values -- 4.4.5. Reducing naivety -- 4.5. Conclusion -- 4.6. Further readings -- References -- 5. Linear classification -- 5.1. Introduction -- 5.2. Linear representation -- 5.2.1. Inner representation function -- 5.2.2. Outer representation function -- 5.2.3. Threshold representation -- 5.2.4. Logit representation -- 5.3. Parameter estimation -- 5.3.1. Delta rule -- 5.3.2. Gradient descent -- 5.3.3. Distance to decision boundary -- 5.3.4. Least squares -- 5.4. Discrete attributes -- 5.5. Conclusion -- 5.6. Further readings -- References -- 6. Misclassification costs -- 6.1. Introduction -- 6.2. Cost representation -- 6.2.1. Cost matrix -- 6.2.2. Per-class cost vector -- 6.2.3. Instance-specific costs -- 6.3. Incorporating misclassification costs -- 6.3.1. Instance weighting -- 6.3.2. Instance resampling -- 6.3.3. Minimum-cost rule -- 6.3.4. Instance relabeling
Formatted contents note 6.4. Effects of cost incorporation -- 6.5. Experimental procedure -- 6.6. Conclusion -- 6.7. Further readings -- References -- 7. Classification model evaluation -- 7.1. Introduction -- 7.1.1. Dataset performance -- 7.1.2. Training performance -- 7.1.3. True performance -- 7.2. Performance measures -- 7.2.1. Misclassification error -- 7.2.2. Weighted misclassification error -- 7.2.3. Mean misclassification cost -- 7.2.4. Confusion matrix -- 7.2.5. ROC analysis -- 7.2.6. Probabilistic performance measures -- 7.3. Evaluation procedures -- 7.3.1. Model evaluation vs. modeling procedure evaluation -- 7.3.2. Evaluation caveats -- 7.3.3. Hold-out -- 7.3.4. Cross-validation -- 7.3.5. Leave-one-out -- 7.3.6. Bootstrapping -- 7.3.7. Choosing the right procedure -- 7.3.8. Evaluation procedures for temporal data -- 7.4. Conclusion -- 7.5. Further readings -- References -- pt. III Regression -- 8. Linear regression -- 8.1. Introduction -- 8.2. Linear representation
Formatted contents note 8.2.1. Parametric representation -- 8.2.2. Linear representation function -- 8.2.3. Nonlinear representation functions -- 8.3. Parameter estimation -- 8.3.1. Mean square error minimization -- 8.3.2. Delta rule -- 8.3.3. Gradient descent -- 8.3.4. Least squares -- 8.4. Discrete attributes -- 8.5. Advantages of linear models -- 8.6. Beyond linearity -- 8.6.1. Generalized linear representation -- 8.6.2. Enhanced representation -- 8.6.3. Polynomial regression -- 8.6.4. Piecewise-linear regression -- 8.7. Conclusion -- 8.8. Further readings -- References -- 9. Regression trees -- 9.1. Introduction -- 9.2. Regression tree model -- 9.2.1. Nodes and branches -- 9.2.2. Leaves -- 9.2.3. Split types -- 9.2.4. Piecewise-constant regression -- 9.3. Growing -- 9.3.1. Algorithm outline -- 9.3.2. Target function summary statistics -- 9.3.3. Target value assignment -- 9.3.4. Stop criteria -- 9.3.5. Split selection -- 9.3.6. Split application -- 9.3.7.Complete process -- 9.4. Pruning
Formatted contents note 9.4.1. Pruning operators -- 9.4.2. Pruning criterion -- 9.4.3. Pruning control strategy -- 9.5. Prediction -- 9.6. Weighted instances -- 9.7. Missing value handling -- 9.7.1. Fractional instances -- 9.7.2. Surrogate splits -- 9.8. Piecewise linear regression -- 9.8.1. Growing -- 9.8.2. Pruning -- 9.8.3. Prediction -- 9.9. Conclusion -- 9.10. Further readings -- References -- 10. Regression model evaluation -- 10.1. Introduction -- 10.1.1. Dataset performance -- 10.1.2. Training performance -- 10.1.3. True performance -- 10.2. Performance measures -- 10.2.1. Residuals -- 10.2.2. Mean absolute error -- 10.2.3. Mean square error -- 10.2.4. Root mean square error -- 10.2.5. Relative absolute error -- 10.2.6. Coefficient of determination -- 10.2.7. Correlation -- 10.2.8. Weighted performance measures -- 10.2.9. Loss functions -- 10.3. Evaluation procedures -- 10.3.1. Hold-out -- 10.3.2. Cross-validation -- 10.3.3. Leave-one-out -- 10.3.4. Bootstrapping -- 10.3.5. Choosing the right procedure
506 ## - RESTRICTIONS ON ACCESS NOTE
Terms governing access Available to OhioLINK libraries
520 ## - SUMMARY, ETC.
Summary, etc. "This book narrows down the scope of data mining by adopting a heavily modeling-oriented perspective"--
Assigning source Provided by publisher
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element Computer algorithms.
Authority record control number or standard number http://id.loc.gov/authorities/subjects/sh91000149
9 (RLIN) 534
Topical term or geographic name entry element Data mining.
Authority record control number or standard number http://id.loc.gov/authorities/subjects/sh97002073
9 (RLIN) 6146
Topical term or geographic name entry element R (Computer program language)
Authority record control number or standard number http://id.loc.gov/authorities/subjects/sh2002004407
9 (RLIN) 25812
655 #4 - INDEX TERM--GENRE/FORM
Genre/form data or focus term Electronic books
9 (RLIN) 2032
710 2# - ADDED ENTRY--CORPORATE NAME
Corporate name or jurisdiction name as entry element Ohio Library and Information Network.
Authority record control number or standard number http://id.loc.gov/authorities/names/no95058981
776 08 - ADDITIONAL PHYSICAL FORM ENTRY
Relationship information Print version:
Main entry heading Cichosz, Pawel.
Title Data mining algorithms.
Place, publisher, and date of publication Chichester, West Sussex, United Kingdom : Wiley, 2015
International Standard Book Number 9781118332580
Record control number (DLC) 2014036992
-- (OCoLC)890971737
856 40 - ELECTRONIC LOCATION AND ACCESS
Materials specified OhioLINK
Public note Connect to resource
Uniform Resource Identifier <a href="https://rave.ohiolink.edu/ebooks/ebc2/9781118950951">https://rave.ohiolink.edu/ebooks/ebc2/9781118950951</a>
Materials specified Wiley Online Library
Public note Connect to resource (off-campus)
Uniform Resource Identifier <a href="https://go.ohiolink.edu/goto?url=https://onlinelibrary.wiley.com/doi/book/10.1002/9781118950951">https://go.ohiolink.edu/goto?url=https://onlinelibrary.wiley.com/doi/book/10.1002/9781118950951</a>
Materials specified Wiley Online Library
Public note Connect to resource
Uniform Resource Identifier <a href="https://onlinelibrary.wiley.com/doi/book/10.1002/9781118950951">https://onlinelibrary.wiley.com/doi/book/10.1002/9781118950951</a>
Materials specified O'Reilly
Public note Connect to resource
Uniform Resource Identifier <a href="https://learning.oreilly.com/library/view/~/9781118950807/?ar">https://learning.oreilly.com/library/view/~/9781118950807/?ar</a>
Holdings
Withdrawn status Lost status Source of classification or shelving scheme Not for loan Collection code Home library Current library Date acquired Source of acquisition Cost, normal purchase price Inventory number Barcode Date last seen Cost, replacement price Date shelved Koha item type
    Library of Congress Classification Geçerli değil-e-Kitap / Not applicable-e-Book E-Kitap Koleksiyonu Merkez Kütüphane Merkez Kütüphane 30/12/2024 Satın Alma / Purchase 0.00 GİT EBK03662 30/12/2024 0.00 30/12/2024 E-Book
Devinim Yazılım Eğitim Danışmanlık tarafından Koha'nın orjinal sürümü uyarlanarak geliştirilip kurulmuştur.