The fourth meeting of our convergence team centered on challenges and opportunities related to model construction for social media research. This meeting involved researchers from seven institutions and nine fields. As could be expected, then, perspectives were varied. Indeed, some of our differences were quite fundamental. In this blogpost, we briefly discuss some of the most exciting (at least to us) discussion points that fundamentally changed the way we, as the main organizers of this particular meeting, think about modeling.
What is a model? Where does modeling start and end?
If we are to start discussing where scholars disagreed when it comes to modeling, we ought to start from the most fundamental of distinctions: What is model construction? Where does it start and end? Our discussions highlighted the fact that computer scientists think of modeling more broadly. The critical distinction was between the social scientists’ approach that tended to separate ‘measurement’ from ‘modeling,’ and the computer scientists’ more holistic view of modeling as including the mathematical transformations needed to produce the variables used in ‘models.’ The difference may be a product of computer scientists placing a lot of emphasis on the development and optimization of computational models for different descriptive and predictive tasks. Of course social scientists have a strong tradition of thinking of sampling and measurement very carefully. The subtle distinction is that in computer science, these steps are intertwined.
Are there advantages to thinking about modeling using the broader framework that is more common in computer science? We suspect that there is — although a more general claim, and one that does not require entirely rethinking what ‘models’ are to social scientists, is that the construction of models and the calculation of variables are fundamentally intertwined.
Ground truth? Does it even exist?
Ground truth: this is a terminology heavily used in computer science. It is used to refer to information that is known to be true (as opposed to, say, inferred information). A common way to evaluate models is to use “ground truth” data to determine how good a model is in approximating/finding that truth. The self-reflections from the computer scientists in the crowd revealed an inclination to label datasets as ground truth without much inspection/questioning (see our white papers on sampling and measurement for more on that). Why are the two fields so different in their comfort in labeling something as ground truth? Two things come to this computer scientist’s mind. First, computer scientists are used to modeling and building systems for automated tasks. Such systems generally have inputs and outputs with a clear set of features/characteristics and clear/fully accurate measurements. This contrasts highly with how social scientists gather and interpret data on human attitudes. So, the idea that we can build sets of data that can be labeled as ground truth is more commonly accepted in CS. Another reason can be due to one concept of importance for computer science training: abstraction. This is how we can build and study complex systems. Each developer working on a particular layer only needs to think about the inputs/outputs/processes at their layer. A safe assumption is that the proceeding process has produced exactly the right input for the current process. Perhaps this comfort remained in place as computer scientists moved on to using their techniques to study social science problems. However, as the ideas of ground truth move to social science questions, calibration of the quality and reliability of the ground truth must be better understood.
Inductive vs. Deductive thinking
Disciplinary differences in the degree to which inductive vs deductive thinking is valued were readily apparent. Inductive approach starts with observations, aiming to generalize from them and developing theories. This approach is more heavily relied on by computer scientists. Deductive theory starts with a compelling theory and tests it. This approach is more of a norm for social science research.
Interestingly, there was acknowledgment on both sides that their discipline leans perhaps too heavily on one side. What explains this behavior? Part of this is grounded in true scholarly priorities/training. Some, however, are simply due to the conferences/journals we publish in and the style of research they expect. While disciplines can help shape research for the better, our discussions revealed that some of these disciplinary norms serve as artificial limitations that would be better removed.
What is the purpose?
Going into the meeting, our organizing team neatly grouped discussions into descriptive, predictive, explanatory, and prescriptive modeling categories. Or at least we thought we *neatly* organized them. This led to some discussions we expected and some we did not. As expected, we saw a clear divide between social and computer sciences when it comes to the degree to which they use/value models for predictive vs explanatory purposes. This nicely paralleled discussions on inductive vs. deductive thinking. Here too, we observed interest in social scientists to identify cases where modeling with a purpose of prediction can be useful. Computer scientists, on the other hand, pointed out an over-reliance on predictive approaches.
We also discussed the importance of correctly identifying the purpose of modeling. We further discussed how a constructed model that is originally not built for a particular purpose can be used for different purposes. Take a regression model predicting one’s income based on numerous demographic and environmental variables. The researchers building this model or policy makers they are informing can try to interpret this model to *prescribe* a treatment. This can have unintended consequences. While the researchers who built the model might be clearer about the strengths and weaknesses of the model they used, policy makers or other researchers might be less well-informed. It is, therefore, crucial for us to be explicit about the goals of our research as well as potential uses of the built models. This includes potential adverse uses of models, with important implications for ethics and fairness.
Where do we go from here?
These represent some (but not nearly all) of our favorite moments from the model construction convergence meeting. So where do we go from here? More conversations that cross disciplinary boundaries are needed to identify other challenges and find ways to tackle them.
How about next steps for our convergence meetings? Our next and final methodology meeting was on analysis and visualization. Stay tuned for a blog post about that meeting as well.