Use of Generative Adversarial Networks (GANs) in Educational Technology Research

  1. Anabel Bethencourt Aguilar 1
  2. Dagoberto Castellanos Nieves 1
  3. Juan José Sosa Alonso 1
  4. Manuel Area Moreira 1
  1. 1 Universidad de La Laguna
    info

    Universidad de La Laguna

    San Cristobal de La Laguna, España

    ROR https://ror.org/01r9z8p25

Revista:
NAER: Journal of New Approaches in Educational Research

ISSN: 2254-7339

Any de publicació: 2023

Volum: 12

Número: 1

Pàgines: 153-170

Tipus: Article

DOI: 10.7821/NAER.2023.1.1231 DIALNET GOOGLE SCHOLAR lock_openDialnet editor

Altres publicacions en: NAER: Journal of New Approaches in Educational Research

Resum

In the context of Artificial Intelligence, Generative Adversarial Nets (GANs) allow the creation and reproduction of artificial data from real datasets. The aims of this work are to seek to verify the equivalence of synthetic data with real data and to verify the possibilities of GAN in educational research. The research methodology begins with the creation of a survey that collects data related to the self-perceptions of university teachers regarding their digital competence and technological-pedagogical knowledge of the content (TPACK model). Once the original dataset is generated, twenty-nine different synthetic samples are created (with an increasing N) using the COPULA-GAN procedure. Finally, a two-stage cluster analysis is applied to verify the interchangeability of the synthetic samples with the original, in addition to extracting descriptive data of the distribution characteristics, thereby checking the similarity of the qualitative results. In the results, qualitatively very similar cluster structures have been obtained in the 150 tests carried out, with a clear tendency to identify three types of teaching profiles, based on their level of technical-pedagogical knowledge of the content. It is concluded that the use of synthetic samples is an interesting way of improving data quality, both for security and anonymization and for increasing sample sizes.

Informació de finançament

Referències bibliogràfiques

  • Area-Moreira, M., Hernández-Rivero, V. & Sosa-Alonso, J.-J. (2016). Modelos de integración didáctica de las TIC en el aula. Comunicar: Revista Científica de Comunicación y Educación, 24(47), 79–87. https://doi.org/10.3916/C47-2016-08
  • Bacher, J., Wenzig, K. & Vogler, M. (2004). SPSS TwoStep Cluster - a first evaluation. Arbeits- und Diskussionspapiere, 2(2).
  • Basilotta-Gómez-Pablos, V., Matarranz, M., Casado-Aranda, L. & Otto, A. (2022). Teachers’ digital competencies in higher education: A systematic literature review. International Journal of Educational Technology in Higher Education, 19(1), 1–16.
  • Bautista, P. & Inventado, P. S. (2021). Protecting Student Privacy with Synthetic Data from Generative Adversarial Networks. In I. Roll, D. McNamara, S. Sosnovsky, R. Luckin, & V. Dimitrova (Eds.), Artificial Intelligence in Education. (pp. 66–70). Springer International Publishing. https://doi.org/10.1007/978-3-030-78270-2_11
  • Bethencourt-Aguilar, A., Area-Moreira, M., Sosa-Alonso, J. J. & Castellano-Nieves, D. (2021). The digital transformation of postgraduate degrees. A study on academic analytics at the University of La Laguna. In 2021 XI International Conference on Virtual Campus (JICV). (pp. 1–4). https://doi.org/10.1109/JICV53222.2021.9600311
  • Bonami, B., Piazentini, L. & Dala-Possa, A. (2020). Educación, Big Data e Inteligencia Artificial: Metodologías mixtas en plataformas digitales. Comunicar: Revista Científica de Comunicación y Educación, 28(65), 43–52. https://doi.org/10.3916/C65-2020-04
  • Bonnéry, D., Feng, Y., Henneberger, A. K., Johnson, T. L., Lachowicz, M., Rose, B. A., ... Zheng, Y. (2019). The Promise and Limitations of Synthetic Data as a Strategy to Expand Access to State-Level Multi-Agency Longitudinal Data. Journal of Research on Educational Effectiveness, 12(4), 616–647. https://doi.org/10.1080/19345747.2019.1631421
  • Burlina, P. M., Joshi, N., Pacheco, K. D., Liu, T. Y. A. & Bressler, N. M. (2019). Assessment of Deep Generative Models for High-Resolution Synthetic Retinal Image Generation of Age-Related Macular Degeneration. JAMA Ophthalmology, 137(3), 258–264. https://doi.org/10.1001/jamaophthalmol.2018.6156
  • Castañeda, L., Esteve, F. & Adell, J. (2018). ¿Por qué es necesario repensar la competencia docente para el mundo digital? Revista de Educación a Distancia (RED), 56. Retrieved from https://revistas.um.es/red/article/view/321581
  • Cheng, Y., Dai, Z., Ji, Y., Li, S., Jia, Z., Hirota, K. & Dai, Y. (2020). Student Action Recognition Based on Deep Convolutional Generative Adversarial Network. In Proceedings of the 32nd 2020 Chinese Control and Decision Conference. (pp. 128–133). Retrieved from http://www.webofscience.com/wos/alldb/full-record/WOS:000621616900023
  • Chiu, T., Fang, D., Chen, J., Wang, Y. & Jeris, C. (2001). A robust and scalable clustering algorithm for mixed type attributes in large database environment. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. (pp. 263–268). https://doi.org/10.1145/502512.502549
  • Colas-Bravo, M. P. (1985). Dificultades y errores metodológicos en la investigación educativa. Enseñanza & Teaching: Revista interuniversitaria de didáctica, 3, 165–172.
  • Creswell, A., White, T., Dumoulin, V., Arulkumaran, K., Sengupta, B. & Bharath, A. (2017). Generative Adversarial Networks: An Overview. IEEE Signal Processing Magazine, 35. https://doi.org/10.1109/MSP.2017.2765202
  • Dorodchi, M., Al-Hossami, E., Benedict, A. & Demeter, E. (2019). Using Synthetic Data Generators to Promote Open Science in Higher Education Learning Analytics. IEEE International Conference on Big Data (Big Data)4672–4675. https://doi.org/10.1109/BigData47090.2019.9006475
  • Esteve-Mon, F., Llopis-Nebot, M. & Segura, J. (2020). Digital Teaching Competence of University Teachers: A Systematic Review of the Literature. IEEE-RITA, 15(4), 399–406.
  • Esteve-Mon, F., Llopis-Nebot, M. A., Viñoles-Cosentino, V. & Segura, J. (2020). Digital Teaching Competence of University Teachers: Levels and Teaching Typologies. International Journal of Emerging Technologies in Learning, 17(13).
  • Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... Bengio, Y. (2014). Generative Adversarial Nets. Advances in Neural Information Processing Systems, 27. https://doi.org/10.48550/arXiv.1406.2661
  • Huang, L. & Lajoie, S. P. (2021). Process analysis of teachers’ self-regulated learning patterns in technological pedagogical content knowledge development. Computers & Education, 166, 104169. https://doi.org/10.1016/j.compedu.2021.104169
  • Hurtado, M. J. R. & Baños, R. V. (2017). El análisis de conglomerados bietápico o en dos fases con SPSS. REIRE: revista d’innovació i recerca en educació, 10(1), 118–126.
  • Kaur, D., Sobiesk, M., Patil, S., Liu, J., Bhagat, P., Gupta, A. & Markuzon, N. (2020). Application of Bayesian networks to generate synthetic health data. Journal of the American Medical Informatics Association : JAMIA, 28(4), 801–811. https://doi.org/10.1093/jamia/ocaa303
  • Koehler, M. J., Mishra, P. & Yahya, K. (2008). Tracing the development of teacher knowledge in a design seminar: Integrating content, pedagogy, and technology. Computers & Education, 49, 740–762.
  • Koh, J. H. L. & Chai, C. S. (2014). Teacher clusters and their perceptions of technological pedagogical content knowledge (TPACK) development through ICT lesson design. Computers & Education, 70, 222–232. https://doi.org/10.1016/j.compedu.2013.08.017
  • Koh, J. H. L., Chai, C. S., Tay, L. Y. & (2014). TPACK-in-Action: Unpacking the contextual influences of teachers’ construction of technological pedagogical content knowledge (TPACK) Computers & Education, 78, 20–29. https://doi.org/10.1016/j.compedu.2014.04.022
  • Kyritsi, K. H., Zorkadis, V., Stavropoulos, E. C. & Verykios, V. S. (2019). The Pursuit of Patterns in Educational Data Mining as a Threat to Student Privacy. Journal of Interactive Media in Education, 1.
  • Lin, Z., Jain, A., Wang, C., Fanti, G. & Sekar, V. (2020). Using GANs for Sharing Networked Time Series Data: Challenges, Initial Promise, and Open Questions. Proceedings of the ACM Internet Measurement Conference464–483. https://doi.org/10.1145/3419394.3423643
  • Lishchuk, V., Haller, E., Martinsson, O. & Bauer, T. E. (2021). Analytical Modeling of a Synthetic VMS Deposit Data: A Proxy Tool for Education and Initial Research. Mining, Metallurgy and Exploration, 38(2), 863–874. https://doi.org/10.1007/s42461-020-00377-5
  • Liu, Y., Zhou, Y., Liu, X., Dong, F., Wang, C. & Wang, Z. (2019). Wasserstein GAN-Based Small-Sample Augmentation for New-Generation Artificial Intelligence: A Case Study of Cancer-Staging Data in. Biology. Engineering, 5(1), 156–163. https://doi.org/10.1016/j.eng.2018.11.018
  • Mayorga-Fernández, M. J. & Ruiz-Baeza, V. M. (2014). Muestreos utilizados en investigación educativa en España. RELIEVE - Revista Electrónica de Investigación y Evaluación Educativa, 8(2). Retrieved from https://doi.org/10.7203/relieve.8.2.4364https://doi.org/10.7203/relieve.8.2.4364
  • Mishra, P. & Koehler, M. J. (2006). Technological Pedagogical Content Knowledge: A new framework for teacher knowledge. Teachers College Record, 108(6), 1017–1054.
  • Ndou, N., Ajoodha, R. & Jadhav, A. (2020). Educational Data-mining to Determine Student Success at Higher Education Institutions. In 2020 2nd International Multidisciplinary Information Technology and Engineering Conference, IMITEC 2020. Retrieved from https://doi.org/10.1109/IMITEC50163.2020.9334139https://doi.org/10.1109/IMITEC50163.2020.9334139
  • Patki, N., Wedge, R. & Veeramachaneni, K. (2016). The synthetic data vault. In 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA). (pp. 399–410). IEEE.
  • Reyes, V. C., Reading, C., Doyle, H. & Gregory, S. (2017). Integrating ICT into teacher education programs from a TPACK perspective: Exploring perceptions of university lecturers. Computers & Education, 115, 1–19. https://doi.org/10.1016/j.compedu.2017.07.009
  • Romero, W. A. M., Morante, M. C. F. & López, B. C. (2022). Alfabetización mediática crítica para mejorar la competencia del alumnado. Comunicar: Revista científica iberoamericana de comunicación y educación, 70, 47–57.
  • Shafique, U. & Qaiser, H. (2014). A comparative study of data mining process models (KDD, CRISP-DM and SEMMA) International Journal of Innovation and Scientific Research, 12, 217–222.
  • Shearer, C. (2000). The CRISP-DM model: the new blueprint for data mining. Journal of data warehousing, 5, 13–22.
  • Sklar, A. (1973). Random variables, joint distribution functions, and copulas. Kybernetika, 9(6), 449–495.
  • Vallez, N., Mata, A. V., Cotorro, J. J. & Deniz, Ó. (2019). ¿Es posible entrenar modelos de aprendizaje profundo con datos sintéticos? In XL Jornadas de Automática: libro de actas, Ferrol, 4-6 de septiembre de 2019. (pp. 859–865). https://doi.org/10.17979/spudc.9788497497169.859
  • Vilardell, M., Buxó, M., Clèries, R., Martínez, J. M., Garcia, G., Ameijide, A., ... Borràs, J. M. (2020). Missing data imputation and synthetic data simulation through modeling graphical probabilistic dependencies between variables (ModGraProDep): An application to breast cancer survival. Artificial Intelligence in Medicine, 107, 101875. https://doi.org/10.1016/j.artmed.2020.101875
  • Xu, L., Skoularidou, M., Cuesta-Infante, A. & Veeramachaneni, K. (2019). Modeling tabular data using conditional gan. Advances in Neural Information Processing Systems, 32. Retrieved from http://arxiv.org/abs/1907.00503
  • Yale, A., Dash, S., Bhanot, K., Guyon, I., Erickson, J. S. & Bennett, K. P. (2020). Synthesizing Quality Open Data Assets from Private Health Research Studies. Lecture Notes in Business Information Processing, 394, 324–335. https://doi.org/10.1007/978-3-030-61146-0_26
  • Yeh, Y.-F., Chan, K. K. H. & Hsu, Y.-S. (2021). Toward a framework that connects individual TPACK and collective TPACK: A systematic review of TPACK studies investigating teacher collaborative discourse in the learning by design process. Computers & Education, 171.
  • Yoon, J., Drumright, L. N., Van Der, & Schaar, M. (2020). Anonymization Through Data Synthesis Using Generative Adversarial Networks (ADS-GAN) IEEE Journal of Biomedical and Health Informatics, 24(8), 2378–2388. https://doi.org/10.1109/JBHI.2020.2980262