CREATE SOLUTIONS FOR VERSIONING AND MANAGING DATASETS USED IN AI AND ML.
DOI:
https://doi.org/10.36676/jrps.v12.i2.1546Keywords:
Version Control, Dataset Management, Scalability, Data Integrit, Reproducibility, Real-time Performance, IntegrationAbstract
It is also essential to correctly version and manage datasets to make them easily recognizable, traceable, and sharable throughout the various stages of AI & ML model development. Notably, there are many solutions to dataset versioning and management, with the best one touching on existing machine learning pipelines, highlighted by tools like DVC and MLflow, in this paper. To achieve this, the study provides simulation reports on using these tools in the current dynamic data environments, including healthcare, finance, and e-commerce, requiring robust version control mechanisms to counter quickly evolving data. Potential issues such as scale, data accuracy, and compatibility with present system adoptions are discerned with suggested solutions such as cloud-based management, checks and balances on data integrity, and ease of integration. The use of visuals shows how data lineage visualization helps in understanding the data flow for better implementation of measures and how different versioning tools compare in performance. The conclusions drawn from the study pertain to the fact that the implementation of structured data versioning strategies contributes to the enhancement of model quality and efficiency in addition to enhancing interaction between data scientists and engineers. This research finds that proper methods of developing and applying data versioning and data management practices are critical for effectively implementing AI and ML models in complex ecosystems that make decisions based on the most contemporary data. Future work will investigate the applicability of these tools as the number of data points to process increases, as well as the variability of those data points.
References
Amershi, S., Begel, A., Bird, C., DeLine, R., Gall, H., Kamar, E., ... & Zimmermann, T. (2019, May). Software engineering for machine learning: A case study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP) (pp. 291-300). IEEE. https://www.microsoft.com/en-us/research/uploads/prod/2019/03/amershi-icse-2019_Software_Engineering_for_Machine_Learning.pdf DOI: https://doi.org/10.1109/ICSE-SEIP.2019.00042
Vasa, Y. (2021). Develop Explainable AI (XAI) Solutions For Data Engineers. NVEO - Natural Volatiles & Essential Oils, 8(3), 425–432. https://doi.org/https://doi.org/10.53555/nveo.v8i3.5769 DOI: https://doi.org/10.53555/nveo.v8i3.5769
Singirikonda, P., Jaini, S., & Vasa, Y. (2021). Develop Solutions To Detect And Mitigate Data Quality Issues In ML Models. NVEO - Natural Volatiles & Essential Oils, 8(4), 16968–16973. https://doi.org/https://doi.org/10.53555/nveo.v8i4.5771 DOI: https://doi.org/10.53555/nveo.v8i4.5771
Vasa, Y., Jaini, S., & Singirikonda, P. (2021). Design Scalable Data Pipelines For Ai Applications. NVEO - Natural Volatiles & Essential Oils, 8(1), 215–221. https://doi.org/https://doi.org/10.53555/nveo.v8i1.5772 DOI: https://doi.org/10.53555/nveo.v8i1.5772
Katikireddi, P. M., Singirikonda, P., & Vasa, Y. (2021). Revolutionizing DEVOPS with Quantum Computing: Accelerating CI/CD pipelines through Advanced Computational Techniques. Innovative Research Thoughts, 7(2), 97–103. https://doi.org/10.36676/irt.v7.i2.1482 DOI: https://doi.org/10.36676/irt.v7.i2.1482
Jangampeta, S., Mallreddy, S. R., & Padamati, J. R. (2021). Data Security: Safeguarding the Digital Lifeline in an Era of Growing Threats. International Journal for Innovative Engineering and Management Research, 10(4), 630-632.
Sukender Reddy Mallreddy(2020).Cloud Data Security: Identifying Challenges and Implementing Solutions.JournalforEducators,TeachersandTrainers,Vol.11(1).96 -102.
Nunnaguppala, L. S. C. , Sayyaparaju, K. K., & Padamati, J. R.. (2021). "Securing The Cloud: Automating Threat Detection with SIEM, Artificial Intelligence & Machine Learning", International Journal For Advanced Research In Science & Technology, Vol 11 No 3, 385-392
Padamati, J., Nunnaguppala, L., & Sayyaparaju, K. . (2021). "Evolving Beyond Patching: A Framework for Continuous Vulnerability Management", Journal for Educators, Teachers and Trainers, 12(2), 185-193.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 International Journal for Research Publication and Seminar
This work is licensed under a Creative Commons Attribution 4.0 International License.
Re-users must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. This license allows for redistribution, commercial and non-commercial, as long as the original work is properly credited.