CREATE SOLUTIONS FOR VERSIONING AND MANAGING DATASETS USED IN AI AND ML.

Authors

  • Sukender Reddy Mallreddy Independent Researcher

DOI:

https://doi.org/10.36676/jrps.v12.i2.1546

Keywords:

Version Control, Dataset Management, Scalability, Data Integrit, Reproducibility, Real-time Performance, Integration

Abstract

It is also essential to correctly version and manage datasets to make them easily recognizable, traceable, and sharable throughout the various stages of AI & ML model development. Notably, there are many solutions to dataset versioning and management, with the best one touching on existing machine learning pipelines, highlighted by tools like DVC and MLflow, in this paper. To achieve this, the study provides simulation reports on using these tools in the current dynamic data environments, including healthcare, finance, and e-commerce, requiring robust version control mechanisms to counter quickly evolving data. Potential issues such as scale, data accuracy, and compatibility with present system adoptions are discerned with suggested solutions such as cloud-based management, checks and balances on data integrity, and ease of integration. The use of visuals shows how data lineage visualization helps in understanding the data flow for better implementation of measures and how different versioning tools compare in performance. The conclusions drawn from the study pertain to the fact that the implementation of structured data versioning strategies contributes to the enhancement of model quality and efficiency in addition to enhancing interaction between data scientists and engineers. This research finds that proper methods of developing and applying data versioning and data management practices are critical for effectively implementing AI and ML models in complex ecosystems that make decisions based on the most contemporary data. Future work will investigate the applicability of these tools as the number of data points to process increases, as well as the variability of those data points.

References

Amershi, S., Begel, A., Bird, C., DeLine, R., Gall, H., Kamar, E., ... & Zimmermann, T. (2019, May). Software engineering for machine learning: A case study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP) (pp. 291-300). IEEE. https://www.microsoft.com/en-us/research/uploads/prod/2019/03/amershi-icse-2019_Software_Engineering_for_Machine_Learning.pdf DOI: https://doi.org/10.1109/ICSE-SEIP.2019.00042

Vasa, Y. (2021). Develop Explainable AI (XAI) Solutions For Data Engineers. NVEO - Natural Volatiles & Essential Oils, 8(3), 425–432. https://doi.org/https://doi.org/10.53555/nveo.v8i3.5769 DOI: https://doi.org/10.53555/nveo.v8i3.5769

Singirikonda, P., Jaini, S., & Vasa, Y. (2021). Develop Solutions To Detect And Mitigate Data Quality Issues In ML Models. NVEO - Natural Volatiles & Essential Oils, 8(4), 16968–16973. https://doi.org/https://doi.org/10.53555/nveo.v8i4.5771 DOI: https://doi.org/10.53555/nveo.v8i4.5771

Vasa, Y., Jaini, S., & Singirikonda, P. (2021). Design Scalable Data Pipelines For Ai Applications. NVEO - Natural Volatiles & Essential Oils, 8(1), 215–221. https://doi.org/https://doi.org/10.53555/nveo.v8i1.5772 DOI: https://doi.org/10.53555/nveo.v8i1.5772

Katikireddi, P. M., Singirikonda, P., & Vasa, Y. (2021). Revolutionizing DEVOPS with Quantum Computing: Accelerating CI/CD pipelines through Advanced Computational Techniques. Innovative Research Thoughts, 7(2), 97–103. https://doi.org/10.36676/irt.v7.i2.1482 DOI: https://doi.org/10.36676/irt.v7.i2.1482

Jangampeta, S., Mallreddy, S. R., & Padamati, J. R. (2021). Data Security: Safeguarding the Digital Lifeline in an Era of Growing Threats. International Journal for Innovative Engineering and Management Research, 10(4), 630-632.

Sukender Reddy Mallreddy(2020).Cloud Data Security: Identifying Challenges and Implementing Solutions.JournalforEducators,TeachersandTrainers,Vol.11(1).96 -102.

Nunnaguppala, L. S. C. , Sayyaparaju, K. K., & Padamati, J. R.. (2021). "Securing The Cloud: Automating Threat Detection with SIEM, Artificial Intelligence & Machine Learning", International Journal For Advanced Research In Science & Technology, Vol 11 No 3, 385-392

Padamati, J., Nunnaguppala, L., & Sayyaparaju, K. . (2021). "Evolving Beyond Patching: A Framework for Continuous Vulnerability Management", Journal for Educators, Teachers and Trainers, 12(2), 185-193.

Downloads

Published

30-06-2021

How to Cite

Sukender Reddy Mallreddy. (2021). CREATE SOLUTIONS FOR VERSIONING AND MANAGING DATASETS USED IN AI AND ML. International Journal for Research Publication and Seminar, 12(2), 551–562. https://doi.org/10.36676/jrps.v12.i2.1546