Normal view MARC view ISBD view

MultiAIGCD : (Record no. 200466331)

MARC details
000 -LEADER
fixed length control field	06422nam a2200397 i 4500
001 - CONTROL NUMBER
control field	200466331
003 - CONTROL NUMBER IDENTIFIER
control field	TR-AnTOB
005 - DATE AND TIME OF LATEST TRANSACTION
control field	20251124114714.0
007 - PHYSICAL DESCRIPTION FIXED FIELD--GENERAL INFORMATION
fixed length control field	ta
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field	171111s2025 xxu e mmmm 00\| 0 eng d
035 ## - SYSTEM CONTROL NUMBER
System control number	(TR-AnTOB)200466331
040 ## - CATALOGING SOURCE
Original cataloging agency	TR-AnTOB
Language of cataloging	eng
Description conventions	rda
Transcribing agency	TR-AnTOB
041 0# - LANGUAGE CODE
Language code of text/sound track or separate title	Türkçe
099 ## - LOCAL FREE-TEXT CALL NUMBER (OCLC)
Classification number	TEZ TOBB FBE BİL YL’25 DEM
100 1# - MAIN ENTRY--PERSONAL NAME
Personal name	Demirok, Gökçe Başak
Relator term	author
9 (RLIN)	150872
245 10 - TITLE STATEMENT
Title	MultiAIGCD :
Remainder of title	Çoklu model, dil, istem ve senaryolarda yapay zeka tarafından oluşturulan kodların tespiti için yeni bir veri kümesi /
Statement of responsibility, etc.	Gökçe Başak Demirok; thesis advisor Ahmet Murat Özbayoğlu.
246 13 - VARYING FORM OF TITLE
Title proper/short title	MultiAIGCD : Çoklu model, dil, istem ve senaryolarda yapay zeka tarafından oluşturulan kodların tespiti için yeni bir veri kümesi
264 #1 - PRODUCTION, PUBLICATION, DISTRIBUTION, MANUFACTURE, AND COPYRIGHT NOTICE
Place of production, publication, distribution, manufacture	Ankara :
Name of producer, publisher, distributor, manufacturer	TOBB ETÜ Fen Bilimleri Enstitüsü,
Date of production, publication, distribution, manufacture, or copyright notice	2025.
300 ## - PHYSICAL DESCRIPTION
Extent	xviii, 46 pages :
Other physical details	illustrations ;
Dimensions	29 cm
336 ## - CONTENT TYPE
Content type term	text
Content type code	txt
Source	rdacontent
337 ## - MEDIA TYPE
Media type term	unmediated
Media type code	n
Source	rdamedia
338 ## - CARRIER TYPE
Carrier type term	volume
Carrier type code	nc
Source	rdacarrier
502 ## - DISSERTATION NOTE
Dissertation note	Tez (Yüksek Lisans)--TOBB ETÜ Fen Bilimleri Enstitüsü Ağustos 2025
520 ## - SUMMARY, ETC.
Summary, etc.	Son yıllarda büyük dil modellerinin (BDM - LLM: Large Language Models) hızlı bir şekilde gelişmesiyle birlikte, bu modellerin yazılım geliştirme süreçlerinde kod üretimindeki rolü de dikkate değer ölçüde artmıştır. Bu ilerleme, yazılım üretimini daha hızlı ve erişilebilir kılarken; özellikle eğitim, işe alım ve değerlendirme süreçlerinde ciddi etik ve güvenilirlik sorunlarını da beraberinde getirmiştir. Öğrencilerin ödevlerde yapay zeka destekli araçlarla kod üretmesi veya adayların mülakat süreçlerinde bu tür araçlardan yararlanması, akademik dürüstlük ve adil değerlendirme ilkelerini tehdit etmektedir. Bu bağlamda, yapay zeka tarafından üretilmiş kodları güvenilir şekilde tespit edebilen sistemlerin geliştirilmesi, yalnızca teknik değil aynı zamanda sosyal bir zorunluluk haline gelmiştir. Bu çalışmada, Python, Java ve Go dillerinde üretilmiş yapay zeka kaynaklı kodların tespiti için oluşturulan MultiAIGCD veri kümesi tanıtılmaktadır. Veri kümesi, CodeNet veri setindeki problem tanımlarından ve insan yazımı kodlardan yararlanılarak oluşturulmuştur. Bu problemler üzerinden, altı farklı BDM kullanılarak üç farklı istem (prompt) türüyle çok sayıda yapay kod örneği üretilmiştir. Kod üretimi sürecinde üç temel senaryo dikkate alınmıştır: (i) problem tanımından sıfırdan kod üretimi, (ii) insan yazımı kodlardaki çalışma zamanı (runtime) hatalarının düzeltilmesi, (iii) insan yazımı kodlardaki hatalı çıktıyla sonuçlanan kodların düzeltilerek doğru çıktılar üretmesinin sağlanması. Bu sistemli üretim süreci sonucunda MultiAIGCD toplamda 121,271 adet yapay zeka tarafından oluşturulmuş ve 32,148 adet insan tarafından yazılmış kod parçacığı içeren büyük ölçekli ve dengeli bir veri kümesine dönüşmüştür. Ayrıca çalışmamızda, alandaki güncel yapay zeka kod tespiti sistemlerinden üç tanesi bu veri kümesi üzerinde değerlendirilmiş ve modellerin farklı test senaryolarındaki başarıları analiz edilmiştir. Değerlendirme sürecinde çapraz model (cross-model) ve çapraz dil (cross-language) gibi gerçekçi ve zorlu senaryolar özel olarak ele alınmıştır. Sunmuş olduğumuz bu veri kümesi ve beraberindeki açık kaynak kodlar, yapay zeka tarafından üretilen kodların tespiti alanındaki araştırmaları desteklemek amacıyla kamuoyuyla paylaşılmaktadır. Bu sayede, hem akademik hem de endüstriyel düzeyde daha güvenilir, adil ve şeffaf değerlendirme sistemlerinin geliştirilmesine katkı sağlanması hedeflenmektedir.

Summary, etc.	With the rapid development of large language models (LLMs) in recent years, their role in code generation in software development has increased significantly. While this progress has made software production faster and more accessible, it has also brought about serious ethical and reliability issues, particularly in education, recruitment, and evaluation processes. Students generating code with artificial intelligence (AI)-powered tools for assignments, or candidates using such tools during interviews, threaten academic integrity and fair evaluation principles. In this context, developing systems that can reliably detect AI-generated code has become not only a technical but also a social imperative. This study introduces MultiAIGCD, a comprehensive dataset created for identifying AI-generated code in Python, Java, and Go. This dataset was created using problem definitions and human-written code from the CodeNet dataset. Based on these problems, a large number of artificial code samples were generated using six different large language models (LLMs) and three different prompt types. Three basic scenarios were considered during the code generation process: (i) code generation from scratch based on the problem definition, (ii) correction of human-written code that has a runtime error, (iii) correction of code resulting in incorrect output in human-written code to ensure it produces correct output. As a result of this systematic generation process, MultiAIGCD has evolved into a large-scale and balanced dataset containing a total of 121,271 AI-generated and 32,148 human-written code snippets. Furthermore, our study evaluated three of the current AI code detection systems on this dataset, analyzing the models' performance in various test scenarios. The evaluation process specifically addressed realistic and challenging scenarios, such as cross-model and cross-language scenarios. This dataset and the accompanying open-source code are being shared with the public to support research in the field of AI-generated code detection. In this way, it is aimed to contribute to the development of more reliable, fair, and transparent evaluation systems at both academic and industrial levels.<br/><br/>
653 ## - INDEX TERM--UNCONTROLLED
Uncontrolled term	Büyük dil modelleri (BDM)

Uncontrolled term	Kod yazarı tespiti

Uncontrolled term	Makine öğrenimi (MÖ)

Uncontrolled term	Large language models (LLM)

Uncontrolled term	Code author detection

Uncontrolled term	Machine learning (ML)
700 1# - ADDED ENTRY--PERSONAL NAME
Personal name	Özbayoğlu, A. Murat
9 (RLIN)	125250
710 ## - ADDED ENTRY--CORPORATE NAME
Corporate name or jurisdiction name as entry element	TOBB Ekonomi ve Teknoloji Üniversitesi.
Subordinate unit	Fen Bilimleri Enstitüsü
9 (RLIN)	77078
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Koha item type	Thesis
Source of classification or shelving scheme	Other/Generic Classification Scheme

Holdings
Withdrawn status	Lost status	Source of classification or shelving scheme	Not for loan	Collection code	Home library	Current library	Shelving location	Date acquired	Source of acquisition	Total Checkouts	Full call number	Barcode	Date last seen	Copy number	Date shelved	Koha item type
		Other/Generic Classification Scheme	Ödünç Verilemez-Tez / Not For Loan-Thesis	Tezler	Merkez Kütüphane	Merkez Kütüphane	Tez Koleksiyonu / Thesis Collection	24/11/2025	Bağış / Donation		TEZ TOBB FBE BİL YL’25 DEM	TZ01860	24/11/2025	1	24/11/2025	Thesis