Frankenstein by Mary Shelley
August 10, 2020
Diverse Ways of Thinking: Economic Ideologies
August 10, 2020

Data Mining

You should use the Weka data mining package, which is installed in all the Schools laboratories. Weka is also available for download from: .Supporting material on how to use Weka is available on Moodle.The primary aims of this individual data mining assignment is to give you the opportunity to:The report should contain the following:Describe the task you were given, the data you received and the requirements of thefinished system. Define any terminology that you will use in the report (for example,model, variable, task, etc.).List the variables that you found in the file provided by the company (available on Moodle week 15). For each one, say whether it is nominal or numeric, continuous or discrete and whether or not it is of use in building the solution. Explain your decisions.Describe what you did with the data prior to the modelling process. Show histograms of the data before and after any pre-processing that you carried out. If you corrected any mis-typed entries in the data, report what you changed.You must use two different techniques and build models with both: pick a suitable tree building algorithm and one other suitable algorithm of your choice. Justify your selection Describe the different methods you used and the results that you got. Give a brief technical description of the techniques and the way the models are represented. Include one diagram showing the structure of each type of model that you build. Describe what parameters may be changed and what effect this has.If you varied the parameters of a model, show how this impacted on the results. Describe how you split the data for training and testing purposes. Be methodical and record each result. This stage is a little like scientific research you are carrying out experiments in your search for the best solution. Once you have a solution, show how you verified its robustness.For the two different techniques report on their comparative ability to predict a defaulted loan, and also on how easy it would be for the insurance company to understand the model and the reasons behind each prediction it makes.Analyse and describe the level of accuracy the model achieves and the errors your model makes. Show a confusion matrix for each model. Are there any areas of the data where it performs worse than in others? Show a lift curve or an ROC curve for the decision as to whether or not a loan will be repaid.Summarise the results of your experiments and what you have learnt.Each individual will submit one hard paper copy of their report (25%) to the CourseworkYou do not need to submit the models that you built, just the report. There is not a word limit on the report just write what you need to provide the required information clearly and concisely. You can assume that the client has a good technical understanding of data mining and statistics, so do not shy away from technical terms in your report. Where you use them, however, explain what they mean in plain language too.You may be required to make a live demonstration of your work to the assessors of thiscoursework, should it be deemed necessary.