Fill in Order Details

  • Submit paper details for free using our simple order form

Make Payment Securely

  • Add funds to your account. There are no upfront payments. The writer will only be paid once you have approved your paper

Writing Process

  • The best qualified expert writer is assigned to work on your order
  • Your paper is written to standard and delivered as per your instructions

Download your paper

  • Download the completed paper from your online account or your email
  • You can request a plagiarism and quality report along with your paper

data mining assignment 6

Please answer the questions below. They are picked from the textbook.

Chapter 2:

18.

This exercise compares and contrasts some similarity and distance measures.

(a)

For binary data, the L1 distance corresponds to the Hamming distance; that is, the number of bits that are different between two binary vectors. The Jaccard similarity is a measure of the similarity between two binary vectors. Compute the Hamming distance and the Jaccard similarity between the following two binary vectors.

x = 0101010001

y = 0100011000

(b)

Which approach, Jaccard or Hamming distance, is more similar to the Simple Matching Coefficient, and which approach is more similar to the cosine measure? Explain.

(c)

Suppose that you are comparing how similar two organisms of different species are in terms of the number of genes they share. Describe which measure, Hamming or Jaccard, you think would be more appropriate for comparing the genetic makeup of two organisms. Explain. (Assume that each animal is represented as a binary vector, where each attribute is 1 if a particular gene is present in the organism and 0 otherwise.)

(d)

If you wanted to compare the genetic makeup of two organisms of the same species, e.g., two human beings, would you use the Hamming distance, the Jaccard coefficient, or a different measure of similarity or distance? Explain.

Chapter5:

20.

Consider the task of building a classifier from random data, where the attribute values are generated randomly irrespective of the class labels. Assume the data set contains records from two classes, “+” and “−.” Half of the data set is used for training while the remaining half is used for testing.

(a)

Suppose there are an equal number of positive and negative records in the data and the decision tree classifier predicts every test record to be positive. What is the expected error rate of the classifier on the test data?

(b)

Repeat the previous analysis assuming that the classifier predicts each test record to be positive class with probability 0.8 and negative class with probability 0.2.

(c)

Suppose two-thirds of the data belong to the positive class and the remaining one-third belong to the negative class. What is the expected error of a classifier that predicts every test record to be positive?

(d)

Repeat the previous analysis assuming that the classifier predicts each test record to be positive class with probability 2/3 and negative class with probability 1/3.

Chapter 6:

17.

Suppose we have market basket data consisting of 100 transactions and 20 items. If the support for item a is 25%, the support for item b is 90% and the support for itemset {a, b} is 20%. Let the support and confidence thresholds be 10% and 60%, respectively.

(a)

Compute the confidence of the association rule {a} -> {b}. Is the rule interesting according to the confidence measure?

(b)

Compute the interest measure for the association pattern {a, b}. Describe the nature of the relationship between item a and item b in terms of the interest measure.

(c)

What conclusions can you draw from the results of parts (a) and (b)?

(d) NOT NEEDED FOR THE TEST

Chapter 7:

5.

For the data set with the attributes given below, describe how you would convert it into a binary transaction data set appropriate for association analysis. Specifically, indicate for each attribute in the original data set.

(a) How many binary attributes it would correspond to in the transaction data set,

(b) How the values of the original attribute would be mapped to values of the binary attributes, and

(c) If there is any hierarchical structure in the data values of an attribute that could be useful for grouping the data into fewer binary attributes.

The following is a list of attributes for the data set along with their possible values. Assume that all attributes are collected on a per-student basis:

• Year : Freshman, Sophomore, Junior, Senior, Graduate: Masters, Graduate: PhD, Professional

• Zip code : zip code for the home address of a U.S. student, zip code for the local address of a non-U.S. student

• College : Agriculture, Architecture, Continuing Education, Education, Liberal Arts, Engineering, Natural Sciences, Business, Law, Medical, Dentistry, Pharmacy, Nursing, Veterinary Medicine

• On Campus : 1 if the student lives on campus, 0 otherwise

• Each of the following is a separate attribute that has a value of 1 if the person speaks the language and a value of 0, otherwise.

– Arabic

– Bengali

– Chinese Mandarin

– English

– Portuguese

– Russian

– Spanish

Chapter 8:

1.

Consider a data set consisting of 2^(20) data vectors, where each vector has 32 components and each component is a 4-byte value. Suppose that vector quantization is used for compression and that 2^(16) prototype vectors are used. How many bytes of storage does that data set take before and after compression and what is the compression ratio?

9.

Give an example of a data set consisting of three natural clusters, for which (almost always) K-means would likely find the correct clusters, but bisecting K-means would not.

WHAT OUR CURRENT CUSTOMERS SAY

  • Google
  • Sitejabber
  • Trustpilot
Zahraa S
Zahraa S
Absolutely spot on. I have had the best experience with Elite Academic Research and all my work have scored highly. Thank you for your professionalism and using expert writers with vast and outstanding knowledge in their fields. I highly recommend any day and time.
Stuart L
Stuart L
Thanks for keeping me sane for getting everything out of the way, I’ve been stuck working more than full time and balancing the rest but I’m glad you’ve been ensuring my school work is taken care of. I'll recommend Elite Academic Research to anyone who seeks quality academic help, thank you so much!
Mindi D
Mindi D
Brilliant writers and awesome support team. You can tell by the depth of research and the quality of work delivered that the writers care deeply about delivering that perfect grade.
Samuel Y
Samuel Y
I really appreciate the work all your amazing writers do to ensure that my papers are always delivered on time and always of the highest quality. I was at a crossroads last semester and I almost dropped out of school because of the many issues that were bombarding but I am glad a friend referred me to you guys. You came up big for me and continue to do so. I just wish I knew about your services earlier.
Cindy L
Cindy L
You can't fault the paper quality and speed of delivery. I have been using these guys for the past 3 years and I not even once have they ever failed me. They deliver properly researched papers way ahead of time. Each time I think I have had the best their professional writers surprise me with even better quality work. Elite Academic Research is a true Gem among essay writing companies.
Got an A and plagiarism percent was less than 10%! Thanks!

ORDER NOW


Consider Your Assignments Done

“All my friends and I are getting help from eliteacademicresearch. It’s every college student’s best kept secret!”

Jermaine Byrant
BSN

“I was apprehensive at first. But I must say it was a great experience and well worth the price. I got an A!”

Nicole Johnson
Finance & Economics

Our Top Experts

See Why Our Clients Hire Us Again And Again!


OVER

10.3k
Reviews

RATING
4.89/5
Average

YEARS
13
Mastery

Success Guarantee

When you order form the best, some of your greatest problems as a student are solved!

Reliable

Professional

Affordable

Quick

Using this writing service is legal and is not prohibited by any law, university or college policies. Services of Elite Academic Research are provided for research and study purposes only with the intent to help students improve their writing and academic experience. We do not condone or encourage cheating, academic dishonesty, or any form of plagiarism. Our original, plagiarism-free, zero-AI expert samples should only be used as references. It is your responsibility to cite any outside sources appropriately. This service will be useful for students looking for quick, reliable, and efficient online class-help on a variety of topics.