FYP2_WEEK 2 (DATA PROCESSING AND CLEANING)

In the second week of my project development, I dedicated my efforts to the crucial tasks of processing and cleaning the data that was collected in the preceding week. The data processing stage was essential to ensure that the gathered data could be effectively utilized for training the model later on. This involved performing various operations, such as formatting, organizing, and transforming the data into a suitable structure that aligns with the requirements of the model.


Simultaneously, the data cleaning stage played a vital role in preparing the dataset for analysis. This involved meticulously examining the data for any inconsistencies, inaccuracies, or abnormalities that could potentially impact the performance and reliability of the model. In the process of data cleaning, any identified issues were addressed and rectified to ensure the data's integrity and quality.


Subsequently, after the processing and cleaning stages were completed, the data was split into two distinct subsets. The first subset, constituting approximately 80% of the total data, was designated for model training. This subset served as the foundation for training the model to learn patterns, associations, and characteristics present in the data. The remaining 20% of the data was set aside for model testing, enabling the evaluation of the model's performance on unseen data, which helps assess its generalizability and effectiveness beyond the training dataset. This division into training and testing sets facilitates the assessment and refinement of the model's predictive capabilities.











Comments