MultiAIGCD : Çoklu model, dil, istem ve senaryolarda yapay zeka tarafından oluşturulan kodların tespiti için yeni bir veri kümesi / Gökçe Başak Demirok; thesis advisor Ahmet Murat Özbayoğlu.

By:

Demirok, Gökçe Başak [author]

Contributor(s):

Material type: Text

TextLanguage: Türkçe Publisher: Ankara : TOBB ETÜ Fen Bilimleri Enstitüsü, 2025Description: xviii, 46 pages : illustrations ; 29 cmContent type:

text

Media type:

unmediated

Carrier type:

volume

Other title:

MultiAIGCD : Çoklu model, dil, istem ve senaryolarda yapay zeka tarafından oluşturulan kodların tespiti için yeni bir veri kümesi [Other title]

Subject(s):

Dissertation note: Tez (Yüksek Lisans)--TOBB ETÜ Fen Bilimleri Enstitüsü Ağustos 2025 Summary: Son yıllarda büyük dil modellerinin (BDM - LLM: Large Language Models) hızlı bir şekilde gelişmesiyle birlikte, bu modellerin yazılım geliştirme süreçlerinde kod üretimindeki rolü de dikkate değer ölçüde artmıştır. Bu ilerleme, yazılım üretimini daha hızlı ve erişilebilir kılarken; özellikle eğitim, işe alım ve değerlendirme süreçlerinde ciddi etik ve güvenilirlik sorunlarını da beraberinde getirmiştir. Öğrencilerin ödevlerde yapay zeka destekli araçlarla kod üretmesi veya adayların mülakat süreçlerinde bu tür araçlardan yararlanması, akademik dürüstlük ve adil değerlendirme ilkelerini tehdit etmektedir. Bu bağlamda, yapay zeka tarafından üretilmiş kodları güvenilir şekilde tespit edebilen sistemlerin geliştirilmesi, yalnızca teknik değil aynı zamanda sosyal bir zorunluluk haline gelmiştir. Bu çalışmada, Python, Java ve Go dillerinde üretilmiş yapay zeka kaynaklı kodların tespiti için oluşturulan MultiAIGCD veri kümesi tanıtılmaktadır. Veri kümesi, CodeNet veri setindeki problem tanımlarından ve insan yazımı kodlardan yararlanılarak oluşturulmuştur. Bu problemler üzerinden, altı farklı BDM kullanılarak üç farklı istem (prompt) türüyle çok sayıda yapay kod örneği üretilmiştir. Kod üretimi sürecinde üç temel senaryo dikkate alınmıştır: (i) problem tanımından sıfırdan kod üretimi, (ii) insan yazımı kodlardaki çalışma zamanı (runtime) hatalarının düzeltilmesi, (iii) insan yazımı kodlardaki hatalı çıktıyla sonuçlanan kodların düzeltilerek doğru çıktılar üretmesinin sağlanması. Bu sistemli üretim süreci sonucunda MultiAIGCD toplamda 121,271 adet yapay zeka tarafından oluşturulmuş ve 32,148 adet insan tarafından yazılmış kod parçacığı içeren büyük ölçekli ve dengeli bir veri kümesine dönüşmüştür. Ayrıca çalışmamızda, alandaki güncel yapay zeka kod tespiti sistemlerinden üç tanesi bu veri kümesi üzerinde değerlendirilmiş ve modellerin farklı test senaryolarındaki başarıları analiz edilmiştir. Değerlendirme sürecinde çapraz model (cross-model) ve çapraz dil (cross-language) gibi gerçekçi ve zorlu senaryolar özel olarak ele alınmıştır. Sunmuş olduğumuz bu veri kümesi ve beraberindeki açık kaynak kodlar, yapay zeka tarafından üretilen kodların tespiti alanındaki araştırmaları desteklemek amacıyla kamuoyuyla paylaşılmaktadır. Bu sayede, hem akademik hem de endüstriyel düzeyde daha güvenilir, adil ve şeffaf değerlendirme sistemlerinin geliştirilmesine katkı sağlanması hedeflenmektedir.Summary: With the rapid development of large language models (LLMs) in recent years, their role in code generation in software development has increased significantly. While this progress has made software production faster and more accessible, it has also brought about serious ethical and reliability issues, particularly in education, recruitment, and evaluation processes. Students generating code with artificial intelligence (AI)-powered tools for assignments, or candidates using such tools during interviews, threaten academic integrity and fair evaluation principles. In this context, developing systems that can reliably detect AI-generated code has become not only a technical but also a social imperative. This study introduces MultiAIGCD, a comprehensive dataset created for identifying AI-generated code in Python, Java, and Go. This dataset was created using problem definitions and human-written code from the CodeNet dataset. Based on these problems, a large number of artificial code samples were generated using six different large language models (LLMs) and three different prompt types. Three basic scenarios were considered during the code generation process: (i) code generation from scratch based on the problem definition, (ii) correction of human-written code that has a runtime error, (iii) correction of code resulting in incorrect output in human-written code to ensure it produces correct output. As a result of this systematic generation process, MultiAIGCD has evolved into a large-scale and balanced dataset containing a total of 121,271 AI-generated and 32,148 human-written code snippets. Furthermore, our study evaluated three of the current AI code detection systems on this dataset, analyzing the models' performance in various test scenarios. The evaluation process specifically addressed realistic and challenging scenarios, such as cross-model and cross-language scenarios. This dataset and the accompanying open-source code are being shared with the public to support research in the field of AI-generated code detection. In this way, it is aimed to contribute to the development of more reliable, fair, and transparent evaluation systems at both academic and industrial levels.

Tags from this library: No tags from this library for this title. Log in to add tags.

Average rating: 0.0 (0 votes)

Holdings
Item type	Current library	Home library	Collection	Call number	Copy number	Status	Date due	Barcode
Thesis	Merkez Kütüphane Tez Koleksiyonu / Thesis Collection	Merkez Kütüphane	Tezler	TEZ TOBB FBE BİL YL’25 DEM (Browse shelf(Opens below))	1	Ödünç Verilemez-Tez / Not For Loan-Thesis		TZ01860

Tez (Yüksek Lisans)--TOBB ETÜ Fen Bilimleri Enstitüsü Ağustos 2025

Son yıllarda büyük dil modellerinin (BDM - LLM: Large Language Models) hızlı bir şekilde gelişmesiyle birlikte, bu modellerin yazılım geliştirme süreçlerinde kod üretimindeki rolü de dikkate değer ölçüde artmıştır. Bu ilerleme, yazılım üretimini daha hızlı ve erişilebilir kılarken; özellikle eğitim, işe alım ve değerlendirme süreçlerinde ciddi etik ve güvenilirlik sorunlarını da beraberinde getirmiştir. Öğrencilerin ödevlerde yapay zeka destekli araçlarla kod üretmesi veya adayların mülakat süreçlerinde bu tür araçlardan yararlanması, akademik dürüstlük ve adil değerlendirme ilkelerini tehdit etmektedir. Bu bağlamda, yapay zeka tarafından üretilmiş kodları güvenilir şekilde tespit edebilen sistemlerin geliştirilmesi, yalnızca teknik değil aynı zamanda sosyal bir zorunluluk haline gelmiştir. Bu çalışmada, Python, Java ve Go dillerinde üretilmiş yapay zeka kaynaklı kodların tespiti için oluşturulan MultiAIGCD veri kümesi tanıtılmaktadır. Veri kümesi, CodeNet veri setindeki problem tanımlarından ve insan yazımı kodlardan yararlanılarak oluşturulmuştur. Bu problemler üzerinden, altı farklı BDM kullanılarak üç farklı istem (prompt) türüyle çok sayıda yapay kod örneği üretilmiştir. Kod üretimi sürecinde üç temel senaryo dikkate alınmıştır: (i) problem tanımından sıfırdan kod üretimi, (ii) insan yazımı kodlardaki çalışma zamanı (runtime) hatalarının düzeltilmesi, (iii) insan yazımı kodlardaki hatalı çıktıyla sonuçlanan kodların düzeltilerek doğru çıktılar üretmesinin sağlanması. Bu sistemli üretim süreci sonucunda MultiAIGCD toplamda 121,271 adet yapay zeka tarafından oluşturulmuş ve 32,148 adet insan tarafından yazılmış kod parçacığı içeren büyük ölçekli ve dengeli bir veri kümesine dönüşmüştür. Ayrıca çalışmamızda, alandaki güncel yapay zeka kod tespiti sistemlerinden üç tanesi bu veri kümesi üzerinde değerlendirilmiş ve modellerin farklı test senaryolarındaki başarıları analiz edilmiştir. Değerlendirme sürecinde çapraz model (cross-model) ve çapraz dil (cross-language) gibi gerçekçi ve zorlu senaryolar özel olarak ele alınmıştır. Sunmuş olduğumuz bu veri kümesi ve beraberindeki açık kaynak kodlar, yapay zeka tarafından üretilen kodların tespiti alanındaki araştırmaları desteklemek amacıyla kamuoyuyla paylaşılmaktadır. Bu sayede, hem akademik hem de endüstriyel düzeyde daha güvenilir, adil ve şeffaf değerlendirme sistemlerinin geliştirilmesine katkı sağlanması hedeflenmektedir.

With the rapid development of large language models (LLMs) in recent years, their role in code generation in software development has increased significantly. While this progress has made software production faster and more accessible, it has also brought about serious ethical and reliability issues, particularly in education, recruitment, and evaluation processes. Students generating code with artificial intelligence (AI)-powered tools for assignments, or candidates using such tools during interviews, threaten academic integrity and fair evaluation principles. In this context, developing systems that can reliably detect AI-generated code has become not only a technical but also a social imperative. This study introduces MultiAIGCD, a comprehensive dataset created for identifying AI-generated code in Python, Java, and Go. This dataset was created using problem definitions and human-written code from the CodeNet dataset. Based on these problems, a large number of artificial code samples were generated using six different large language models (LLMs) and three different prompt types. Three basic scenarios were considered during the code generation process: (i) code generation from scratch based on the problem definition, (ii) correction of human-written code that has a runtime error, (iii) correction of code resulting in incorrect output in human-written code to ensure it produces correct output. As a result of this systematic generation process, MultiAIGCD has evolved into a large-scale and balanced dataset containing a total of 121,271 AI-generated and 32,148 human-written code snippets. Furthermore, our study evaluated three of the current AI code detection systems on this dataset, analyzing the models' performance in various test scenarios. The evaluation process specifically addressed realistic and challenging scenarios, such as cross-model and cross-language scenarios. This dataset and the accompanying open-source code are being shared with the public to support research in the field of AI-generated code detection. In this way, it is aimed to contribute to the development of more reliable, fair, and transparent evaluation systems at both academic and industrial levels.

There are no comments on this title.

to post a comment.