In information theory, the cross entropy between two probability distributions ‘p’ and ‘q’ over the same underlying set of events measures the average number of bits needed to identify an event drawn from the set if a coding scheme used for the set is optimized for an estimated probability distribution ‘q’, rather than the true distribution ‘p’.