Studies in empirical aesthetics vary largely in the kind of stimuli they use, while trying to make general claims about the effects of stimulus factors (e.g., complexity) on aesthetic attributes (e.g., beauty, pleasure, interest). Here, we wanted to investigate how stable these effects are when different stimulus categories are intermingled rather than presented in homogeneous blocks, which is more common. We selected five main stimulus categories that have frequently been used in past research, each with a rather abstract and more figurative/recognizable subset: fractals, geometric patterns, natural images, art photography, and paintings. All ten stimulus categories consisted of twenty images, evenly distributed across five levels of objectively computed complexity values. The total stimulus set of 200 images was either presented in 10 homogeneous blocks (one block per stimulus category) or in 10 heterogenous blocks (with all stimulus categories randomly intermingled), to two groups of >200 participants each. All participants rated how beautiful they found these images on a 7-point Likert scale. Different subgroups of participants also rated two additional scales, either pleasure and interest, or order and complexity, also on 7-point Likert scales. Results indicated that the aesthetic measures varied with the preselected levels of complexity in different ways for the ten stimulus categories and the five scales. More interestingly, some ratings were either higher or lower for heterogeneous compared to homogeneous blocks. In general, complex interactions were obtained, indicating that many effects are not only stimulus-dependent but also context-dependent, in the sense of being affected by the other stimuli included in the set of stimuli to be judged. This study suggests that researchers have to be cautious in drawing general conclusions about effects of stimulus variables on aesthetic ratings from experiments with rather limited and homogeneous sets of stimuli.