We propose a hybrid methodology that will be the starting point for debate through the course of Phase 1 of the grant. The figure below shows the five principal components of the Iterative Method for Social Media Research (IMSMR) in a bi-directional cycle. While the sequence typically begins with Study Design and/or Data Acquisition and Sampling and ends with Analysis & Visualization, the process is iterative throughout and may include a full methodological cycle at each step. The Responsible Data Use component is set apart in Figure 1 because it needs to be considered at every stage of the process because of the more organic data generation process.
Developing the complex methodology we have proposed cannot be done by a single discipline or in a single meeting. It requires in-depth and extended discussions in order to understand the issues specific to social media data from different perspectives and develop strategies for incorporating these data into different types of robust designs. Thus, we plan to initiate these discussions by hosting five meetings during this grant. Each meeting will roughly focus on one of the components of the proposed methodology. For each meeting, we will invite experts from academia and industry who have used social media data in their research or have written papers about specific challenges associated with the methodological component being discussed. All of our team members will also participate in the meetings because they are pioneering the use of social media data in their disciplines. Each meeting will be a two-day event – the first day focused on identifying and discussing the different challenges and issues, and the second day focused on identifying reasonable ways to address these challenges. We believe this approach will help engage thought leaders and develop community advisers interested or already using social media data in their research.
We propose a hybrid methodology that will be the starting point for debate through the course of Phase 1 of the grant. This figure shows the five principal components of the Iterative Method for Social Media Research (IMSMR) in a bi-directional cycle.
Analyzing organic data means social scientists need to draw data from existing sources with no control over the data generation process. Posing a research question that could be answered with social media data, however, requires an understating of how these data were generated and whether their design supports investigating the specific research question. Our white paper on study designs for social science research using social media is available here.
Data Acquisition & Sampling
This component focuses on data collection, sample creation, and data storage, drawing features from the sampling stage in social science and from the data selection stage of the KDD process. Our white paper on data acquisition and sampling for social science research using social media is available here.
Measurement & Feature Engineering
This stage aligns with the measurement stage of the social science process and the transformation and data mining stage of the KDD process; however, it is highly probable that it may also involve revisiting the preprocessing decisions made during the earlier phases. For example, certain elements of the social media text (e.g., emojis) might need to be removed if they hamper the data mining process with regards to validity or reliability. Our white paper on measurement concerns for social science research using social media is available here.
This component generally maps to the data mining and machine learning stage of the KDD process and to the social science Analysis step.
Analysis & Visualization
After constructing our model, we need to analyze and interpret the results. This component maps to interpretation and evaluation in the KDD process and Analysis in the social science process.
The Responsible Research component is set apart in the methodology because it needs to be considered at every stage of the process because of the more organic data generation process.