Normal view MARC view ISBD view

Büyük veri ve akan verinin mahremiyet korumalı anonimleştirilmesi / (Record no. 200438569)

MARC details
000 -LEADER
fixed length control field	07332nam a2200421 i 4500
003 - CONTROL NUMBER IDENTIFIER
control field	TR-AnTOB
005 - DATE AND TIME OF LATEST TRANSACTION
control field	20230908000948.0
007 - PHYSICAL DESCRIPTION FIXED FIELD--GENERAL INFORMATION
fixed length control field	ta
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field	171111s2018 xxu e mmmm 00\| 0 eng d
035 ## - SYSTEM CONTROL NUMBER
System control number	(TR-AnTOB)200438569
040 ## - CATALOGING SOURCE
Original cataloging agency	TR-AnTOB
Language of cataloging	eng
Description conventions	rda
Transcribing agency	TR-AnTOB
041 0# - LANGUAGE CODE
Language code of text/sound track or separate title	Türkçe
099 ## - LOCAL FREE-TEXT CALL NUMBER (OCLC)
Classification number	TEZ TOBB FBE BİL Ph.D’20 SOP
100 1# - MAIN ENTRY--PERSONAL NAME
Personal name	Sopaoğlu, Uğur
Relator term	author
9 (RLIN)	128311
245 10 - TITLE STATEMENT
Title	Büyük veri ve akan verinin mahremiyet korumalı anonimleştirilmesi /
Statement of responsibility, etc.	Uğur Sopaoğlu ; thesis advisor Osman Abul.
246 11 - VARYING FORM OF TITLE
Title proper/short title	Privacy preserving anonymization of big data and data streams
264 #1 - PRODUCTION, PUBLICATION, DISTRIBUTION, MANUFACTURE, AND COPYRIGHT NOTICE
Place of production, publication, distribution, manufacture	Ankara :
Name of producer, publisher, distributor, manufacturer	TOBB ETÜ Fen Bilimleri Enstitüsü,
Date of production, publication, distribution, manufacture, or copyright notice	2020.
300 ## - PHYSICAL DESCRIPTION
Extent	xv, 106 pages :
Other physical details	illustrations ;
Dimensions	29 cm
336 ## - CONTENT TYPE
Source	rdacontent
Content type code	txt
Content type term	text
337 ## - MEDIA TYPE
Source	rdamedia
Media type code	n
Media type term	unmediated
338 ## - CARRIER TYPE
Source	rdacarrier
Carrier type code	nc
Carrier type term	volume
502 ## - DISSERTATION NOTE
Dissertation note	Tez (Doktora Tezi)--TOBB ETÜ Fen Bilimleri Enstitüsü Nisan 2020
520 ## - SUMMARY, ETC.
Summary, etc.	Geleneksel veri anonimleştirme yöntemleri yalnız statik veri kümeleri için geliştirilmiş olup ölçeklenebilirlik hep ikinci planda kalmıştır. Büyük veri ve akan veri ihtiyaçlarının son yıllarda çeşitlenerek artması ile ölçeklenebilirlik ve verinin dinamikliği unsurları öne çıkmaya başlamıştır. Literatürde büyük veri ve akan veri mahremiyetinin sağlanmasına yönelik bu doğrultuda çalışmalar önerilmiş olsa da problemin çeşitli unsurları nedeniyle daha etkin ve daha kapsamlı veri anonimleştirme yöntemlerine ihtiyaç duyulmaktadır. Bu tez kapsamında büyük veri ve akan veri mahremiyetinin sağlanması için daha etkin ve daha kapsamlı anonimleştirme yöntemleri üzerinde çalışılmıştır. Apache Spark büyük veri işleme alanında günümüzün en gelişmiş teknoloji ve platformları arasında yer almaktadır. Tezde, büyük veri anonimleştirmeyi de büyük veri işlemenin özel bir durumu olarak ele alıp yarı-tanımlayıcı özniteliklerin alan hiyerarşisi üzerinde yukarıdan-aşağıya özelleşme arama tekniğini kullanan dağıtık bir büyük veri k-anonimleştirme yöntemi önerilmiştir. Arama kriteri olarak bilgi kazancı – mahremiyet kaybı metriği kullanılmıştır. Yöntemin verimliliği ve ölçeklenebilirliği büyütülmüş gerçek veri kümeleri üzerinde gösterilmiştir. Literatürde akan veriyi k-anonimleştirmeye yönelik geliştirilen çözümler, problemi yarı-tanımlayıcı özniteliklerin bilgi kaybı metriğini minimize etmeye çalışan tek amaçlı optimizasyon problemi olarak formüle eden dar kapsamlı çözümlerdir. Tez kapsamında tespiti yapılan ihtiyaçlara yönelik olarak daha kapsamlı çözümler önerilmiş ve gerçek veri kümeleri üzerinde etkinlikleri geniş kapsamlı deneysel çalışmalarla gösterilmiştir. İlk olarak, akan veri için bilgi kaybı ile ortalama gecikme süresini beraber minimize etmeye yönelik çok amaçlı bir optimizasyon çatısı önerilmiştir. Böylelikle, akan veri için veri kullanışlılığı, bilgi kaybı metriği ile ölçülen veri kalitesi ve ortalama gecikme süresi metriği ile ölçülen veri güncelliğinin bir fonksiyonu olarak ele alınmıştır. Önerilen yöntemde bu iki bileşen kullanıcı tarafından ağırlıklandırılabilmektedir. İlave olarak, probleme özgü yeni bir bilgi kaybı metriği tanıtılmıştır. İkinci olarak, veri alıcısının akan anonim veri üzerinde yapacağı analiz işleminden haberdar bir k-anonimleştirme çatısı önerilmiştir. Birçok veri alıcısının anonim veri üzerinde sınıflandırma veri madenciliği görevi çalıştırdığı bilinmektedir. Bu yüzden, bu çalışmada bilgi kaybını minimize etmenin yanında sınıflama doğruluğunu maksimize etmek de bir diğer amaçtır. Hatta akan veride, yarı-tanımlayıcı öznitelikler ve sınıflama hedef özniteliğine ilave olarak hassas öznitelikler olması durumunda bunların hassasiyetinin de en üst düzeyde korunması gerekir. Önerilen yöntem, ağırlıkları kullanıcı tarafından belirlenebilen, bu üç amaçlı optimizasyon problemini çözmektedir

Summary, etc.	Traditional data anonymization methods have been developed only for static datasets, where the scalability has usually been disregarded. With the diversified increase of big data and streaming data needs in recent years, the scalability and dynamic nature of data started to come to the foreground. Although studies have been proposed in the literature to provide big data and streaming data privacy solutions, more effective and high coverage data anonymization methods are needed due to various traits of the problem. Within the scope of this thesis, more effective and high coverage anonymization methods have been studied to ensure big data and streaming data privacy. Apache Spark is among the most advanced technologies and platforms in the field of big data processing. In this thesis, a distributed big data k-anonymization method is proposed, which takes big data anonymization as a special case of big data processing and uses the top-down specialization search technique on the domain hierarchy of quasi-identifier attributes. Information gain - privacy loss metric is used as the search criteria. The effectiveness and the scalability of the method have been demonstrated on extended real datasets. The solutions developed for k-anonymization of data streams in the literature are low coverage solutions that formulate the problem as a single-objective optimization problem that tries to minimize the information loss metric on quasi-identifier attributes. High coverage solutions have been proposed for the needs identified within the scope of the thesis and their effectiveness on real data sets has been shown through extensive experimental evaluations. First, a multi-objective optimization framework is proposed to minimize the information loss and average delay together for streaming data. Thus, the data utility for streaming data is measured as a function of the data quality measured by the information loss metric and the data aging measured by the average delay metric. In the proposed method, the component weights can be tuned by the user. Moreover, a custom information loss metric is introduced. Secondly, a down-stream data analysis process aware k-anonymization framework is proposed. Many data recipients are known to run classification data mining tasks on the anonymized data. Therefore, in this study, besides minimizing information loss, maximizing classification accuracy is another objective. In fact, in case there exists sensitive attributes in addition to the quasi-identifier and the classification target attributes, the sensitivity of these sensitive attributes should be maintained at the highest level. The proposed method solves this three-objective optimization problem, the weights of which can be tuned by the user.
650 #7 - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element	Tezler, Akademik
9 (RLIN)	32546
653 ## - INDEX TERM--UNCONTROLLED
Uncontrolled term	Veri mahremiyeti

Uncontrolled term	Büyük veri

Uncontrolled term	Akan veri

Uncontrolled term	Anonimleştirme

Uncontrolled term	Data privacy

Uncontrolled term	Big data

Uncontrolled term	Data streams

Uncontrolled term	Anonymization
700 1# - ADDED ENTRY--PERSONAL NAME
Personal name	Abul, Osman
9 (RLIN)	128312
Relator term	advisor
710 ## - ADDED ENTRY--CORPORATE NAME
Corporate name or jurisdiction name as entry element	TOBB Ekonomi ve Teknoloji Üniversitesi.
Subordinate unit	Fen Bilimleri Enstitüsü
9 (RLIN)	77078
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Koha item type	Thesis
Source of classification or shelving scheme	Other/Generic Classification Scheme

Holdings
Withdrawn status	Lost status	Source of classification or shelving scheme	Not for loan	Collection code	Home library	Current library	Shelving location	Date acquired	Source of acquisition	Total Checkouts	Full call number	Barcode	Date last seen	Copy number	Date shelved	Koha item type
		Other/Generic Classification Scheme	Ödünç Verilemez-Tez / Not For Loan-Thesis	Tezler	Merkez Kütüphane	Merkez Kütüphane	Tez Koleksiyonu / Thesis Collection	01/09/2020	Bağış / Donation		TEZ TOBB FBE BİL Ph.D’20 SOP	TZ01117	01/09/2020	1	01/10/2020	Thesis