Jest ve mimiklerden yapay sinir ağları ile duygu sınıflandırma / Büşra Karatay; thesis advisor Fatma Betül Atalay Satoğlu.

By:

Karatay, Büşra [author]

Contributor(s):

Material type: Text

TextLanguage: Türkçe Publisher: Ankara : TOBB ETÜ Fen Bilimleri Enstitüsü, 2022Description: xvi, 47 pages : illustrations ; 29 cmContent type:

text

Media type:

unmediated

Carrier type:

volume

Other title:

Emotion classification wıth artificial neural networks from facial expressions and gestures

Subject(s):

Dissertation note: Tez (Yüksek Lisans Tezi)--TOBB ETÜ Fen Bilimleri Enstitüsü Nisan 2022 Summary: Metin, resim, video ve konuşma gibi farklı veri kaynaklarından doğru duyguyu sınıflandırmak, çeşitli disiplinlerden araştırmacılar için ilham verici bir alanı olmuştur. Videolardan ve fotoğraflardan otomatik duygu algılama, denetimli ve denetimsiz makine öğrenimi yöntemleri kullanılarak üzerinde çalışılan zorlu konulardan biridir. Bu tez çalışmasında bir takım ön işleme adımları ve yeni bir derin öğrenme mimarisi ile videolardan duygu analizi yöntemi sunulmaktadır. Videolardan OpenPose aracı kullanılarak elde edilen yüz ve vücut pozisyon bilgileri modellerde kullanılmak üzere poz tanımlayıcılara dönüştürüldü, ardından LSTM ve Dönüştürücü modelleri bu veri ile eğitilerek performansları karşılaştırıldı. Ardından LSTM ve Dönüştürücü modellerine bir CNN bloğu ön katman olarak eklenmiş poz tanımlayıcılarla beslenen CNN bloğunun çıktısı LSTM ve Dönüştürücü modelleri için girdi olarak kullanıldı. Model doğruluklarını iyileştirmek amacı ile Video Çoklama, Anahtar Kare Seçimi ve Gauss Karışım Merkezi yaklaşımları ön işleme adımları olarak eklenmiş ve deneyler bu farklı yaklaşımların kombinaysonları için tekrarlandı. Yapılan kapsamlı deneylerin ardından sonuçlar karşılaştırıldı ve önerilen iki katmanlı sınıflandırıcı yapısı ve ön işleme adımlarının etkileri gözlemlendi. Sonuçlar ayrıca aynı veri kümesini kullanan güncel, yüksek doığruluk oranlarına sahip diğer yöntemlerle de karşılaştırıldı. FABO ve CK+ olmak üzere iki yaygın veri kümesi kullanılarak gerçekleştirilen deneyler, FABO veri seti için video çoklama uygulanmış CNN-Dönüştürücü yapısının %99 doğruluk oranı ile, diğer modellerden daha iyi bir performansa sahip olduğunu gösterdi. Her iki veri kümesi için de bir çok versiyonda önerilen model %90 üzerinde doğruluğa ulaşarak kayda değer başarımlar elde etti.Summary: Classifying the right emotion from different data sources such as text, images, video, and speech has been an inspiring field for researchers from various disciplines. Automatic emotion detection from videos and photos is one of the challenging topics being studied using supervised and unsupervised machine learning methods. In this thesis, several preprocessing steps and a new deep learning architecture and emotion classification method from videos are presented. The face and body position information obtained from the videos using the OpenPose tool was converted into pose descriptors for use in the models, then the LSTM and Transformer models were trained with this data and their performances were compared. Then, the output of the CNN block fed with pose descriptors was used as input for the LSTM and Transformer models. Video Generation, Keyframe Selection, and Gaussian Mixture Center approaches were added as preprocessing steps to improve model accuracy and the experiments were repeated for combinations of these different approaches. After extensive experiments, the results were compared and the effects of the proposed two-layer classifier structure and preprocessing steps were observed. Results were also compared with other recent, high-accuracy methods using the same dataset. Experiments using two common datasets, FABO and CK+, showed that the CNN-Transformer structure with video generation approach for the FABO dataset outperforms the other models, with an accuracy of 99%. For both datasets, the proposed method in many versions achieved remarkable success, reaching an accuracy of over 90%.

Tags from this library: No tags from this library for this title. Log in to add tags.

Average rating: 0.0 (0 votes)

Holdings
Item type	Current library	Home library	Collection	Call number	Copy number	Status	Date due	Barcode
Thesis	Merkez Kütüphane Tez Koleksiyonu / Thesis Collection	Merkez Kütüphane	Tezler	TEZ TOBB FBE BİL YL’22 KAR (Browse shelf(Opens below))	1	Ödünç Verilemez-Tez / Not For Loan-Thesis		TZ01375

Tez (Yüksek Lisans Tezi)--TOBB ETÜ Fen Bilimleri Enstitüsü Nisan 2022

Metin, resim, video ve konuşma gibi farklı veri kaynaklarından doğru duyguyu sınıflandırmak, çeşitli disiplinlerden araştırmacılar için ilham verici bir alanı olmuştur. Videolardan ve fotoğraflardan otomatik duygu algılama, denetimli ve denetimsiz makine öğrenimi yöntemleri kullanılarak üzerinde çalışılan zorlu konulardan biridir. Bu tez çalışmasında bir takım ön işleme adımları ve yeni bir derin öğrenme mimarisi ile videolardan duygu analizi yöntemi sunulmaktadır. Videolardan OpenPose aracı kullanılarak elde edilen yüz ve vücut pozisyon bilgileri modellerde kullanılmak üzere poz tanımlayıcılara dönüştürüldü, ardından LSTM ve Dönüştürücü modelleri bu veri ile eğitilerek performansları karşılaştırıldı. Ardından LSTM ve Dönüştürücü modellerine bir CNN bloğu ön katman olarak eklenmiş poz tanımlayıcılarla beslenen CNN bloğunun çıktısı LSTM ve Dönüştürücü modelleri için girdi olarak kullanıldı. Model doğruluklarını iyileştirmek amacı ile Video Çoklama, Anahtar Kare Seçimi ve Gauss Karışım Merkezi yaklaşımları ön işleme adımları olarak eklenmiş ve deneyler bu farklı yaklaşımların kombinaysonları için tekrarlandı. Yapılan kapsamlı deneylerin ardından sonuçlar karşılaştırıldı ve önerilen iki katmanlı sınıflandırıcı yapısı ve ön işleme adımlarının etkileri gözlemlendi. Sonuçlar ayrıca aynı veri kümesini kullanan güncel, yüksek doığruluk oranlarına sahip diğer yöntemlerle de karşılaştırıldı. FABO ve CK+ olmak üzere iki yaygın veri kümesi kullanılarak gerçekleştirilen deneyler, FABO veri seti için video çoklama uygulanmış CNN-Dönüştürücü yapısının %99 doğruluk oranı ile, diğer modellerden daha iyi bir performansa sahip olduğunu gösterdi. Her iki veri kümesi için de bir çok versiyonda önerilen model %90 üzerinde doğruluğa ulaşarak kayda değer başarımlar elde etti.

Classifying the right emotion from different data sources such as text, images, video, and speech has been an inspiring field for researchers from various disciplines. Automatic emotion detection from videos and photos is one of the challenging topics being studied using supervised and unsupervised machine learning methods. In this thesis, several preprocessing steps and a new deep learning architecture and emotion classification method from videos are presented. The face and body position information obtained from the videos using the OpenPose tool was converted into pose descriptors for use in the models, then the LSTM and Transformer models were trained with this data and their performances were compared. Then, the output of the CNN block fed with pose descriptors was used as input for the LSTM and Transformer models. Video Generation, Keyframe Selection, and Gaussian Mixture Center approaches were added as preprocessing steps to improve model accuracy and the experiments were repeated for combinations of these different approaches. After extensive experiments, the results were compared and the effects of the proposed two-layer classifier structure and preprocessing steps were observed. Results were also compared with other recent, high-accuracy methods using the same dataset. Experiments using two common datasets, FABO and CK+, showed that the CNN-Transformer structure with video generation approach for the FABO dataset outperforms the other models, with an accuracy of 99%. For both datasets, the proposed method in many versions achieved remarkable success, reaching an accuracy of over 90%.

There are no comments on this title.

to post a comment.