Data Matters: Data Science Short Courses
Sponsored by the National Consortium for Data Science (NCDS), the Renaissance Computing Institute (RENCI), and the Odum Institute for Research in Social Science, the "Data Matters: Data Science Summer Workshop Series" is a week-long series of classes for researchers, data analysts, and other individuals who wish to increase their skills in data studies and integrate data science methods into their research designs and skill sets. Scholars, analysts, and researchers from all disciplines and industries are welcome. Both one- and two-day courses will be offered; participants are welcome to register for one, two, or three classes. Classes will run from 10 a.m. to 4:45 p.m.
Registration: Please check back in early-mid February http://datamatters.org
June 20-24, 2016
- Introduction to Data Science (Tom Carsey)
- Introduction to Information Visualization (Angela Zoss)
- Introduction to Data Science Using R (Chris Bail)
- Data Curation: Managing Data throughout the Research Lifecycle (Jon Crabtree, Thu-Mai Christian, Sophia Lafferty-Hess)
- Writing Questions for Surveys (Nora Cate Schaeffer)
- Introduction to Survey Sampling (Sharon Lohr)
- Conceptual Diagrams in in Information Visualization (Eric Monson)
- Programming in R (Chris Bail)
- Open(ing) Data: Considerations in Data Sharing and Reuse (Jon Crabtree, Thu-Mai Christian, Sophia Lafferty-Hess)
- Creating Surveys in Qualtrics (Teresa Edwards)
- Introduction to Big Data and Machine Learning for Survey Researchers (Trent Buskirk)
- Introduction to Data Mining and Machine Learning (Ashok Krishnamurthy)
- Health Informatics in the Age of Interoperability (Mark Braunstein)
- Collecting, Classifying, and Analyzing Textual Data (Chris Bail)
- Simulation Strategies in Data Science: System Dynamics and Agent-based Modeling (Todd BenDor)
- Conducting and Analyzing Cognitive Interviews: A Hands-on Approach (Gordon Willis)
- Analysis with Complex Sample Survey Data (Brady West)
If you have questions, please contact Paul_Mihas@unc.edu.
Introduction to Data ScienceTom Carsey
This course provides an introduction to data science, focusing on data about people. It will cover basic building blocks, key concepts, strengths and limitations, and the ethical issues that emerge in data science. Numerous examples will be discussed and sample code and data will be explored.
Why Take This Course?Data science combines tools from information science, computer science, and statistics to collect, manage, analyze, and understand digital data. Modern data science pays particular interest to data regarding the social and economic attitudes and behaviors of people.
What Will Participants Learn?This course will help equip participants from various disciplines and industries with a general understanding of data science terms, approaches, and strategies for effectively using data science.
Introduction to Information VisualizationAngela Zoss
This course will help beginners get started preparing and designing information visualizations – a true “zero to sixty” course. Participants will learn how to clean and structure data; see how freely available software can be used to create charts, maps, and graphs; and follow basic design suggestions to fine-tune the final presentation of visualizations for publication or reporting.
Why Take This Course?Visualization is a growing area of interest for researchers in all disciplines. Visualizations can illuminate important trends in a data analysis project or help an audience engage emotionally with a research area. Many tools are available to produce visualizations, however, and it is not always clear which tool is best or how to structure data to work with the tool. This course will walk participants through a wide variety of data sources and chart types to help even beginners to visualization feel comfortable embarking on a new visualization project.
What Will Participants Learn?The course will be organized in four major sections: basic charts; static and web-based maps; network diagrams and hierarchical visualizations; graphic design for information visualization.
The instructor will demonstrate several tools. These will likely include Excel, Tableau, QGIS, CartoDB, RAW, and Gephi (though the course may adjust slightly to take advantage of any sudden changes in available technology). This is not a hands-on course, but participants are welcome to download any of these packages on their laptops and follow along with the instructor’s examples.
Prerequisites and RequirementsThis course will assume a basic understanding of spreadsheets as a way of storing and processing data. No programming will be necessary, though we may cover tools that work with HTML (especially SVG) in advanced examples. Bringing a laptop is not required, but participants are welcome to do so.
Introduction to Data Science Using RChris Bail
+ This course provides a basic introduction to the R software environment for the purpose of data science. The course covers importing and exporting data, manipulating data or recoding variables, and visualization and statistical analysis.
Why Take This Course?R has recently become the preferred computing and statistical analysis software for academic analysis because it offers unparalleled breadth of tools for virtually any model of interest to social scientists—and particularly those interested in so-called “big data.” Unfortunately R also has a steep learning curve because it is maintained by academics that have few career incentives to make it user friendly. Courses such as this one are therefore indispensable for obtaining a basic working knowledge of the language and learning how to navigate the complex web of information about R that is currently available online.
What Will Participants Learn?This course is divided into four sections. The first section provides an overview of how to install R on your computer, import files, and interface with other software such as STATA, SPSS, and R. The second section of the course covers data cleaning and coding, which can be somewhat complicated in R because it uses a variety of data formats that are not used within other languages. The third and fourth sections covers basic descriptive analysis, including cross-tabs, histograms, and scatterplots, and basic linear regression models.
Prerequisites and RequirementsThis course assumes no knowledge of computer programming, but basic familiarity with another statistical analysis software such as STATA, SPSS, or SAS will make the course easier to follow.
Note: In order to participate in the hands-on sections of the course, participants must bring their own laptop computer with enough space to install R and RStudio.
Data Curation: Managing Data throughout the Research LifecycleJon Crabtree, Thu-Mai Christian, and Sophia Lafferty-Hess
This course will provide an introduction to data management best practices as well as demonstrations of digital curation tools including the Dataverse Network™ open source virtual archive platform.
Why Take This Course?Today, a growing number of funding agencies and journals require researchers to share, archive, and plan for the management of their data. In 2013, an Office of Science and Technology policy memo highlighted the importance of providing open access to datasets and scholarly publications as a method of promoting innovation, accountability, transparency, and efficiency. As researchers and information professionals respond to these new requirements, data curation knowledge is necessary for the effective management, long-term preservation, and reuse of data.
What Will Participants Learn?Participants will learn about: the diversity of data and their management needs across the research data lifecycle; the impetus and importance of preserving and sharing data; the processes required for preserving and sharing data; digital repository activities and assessment; the role of advocacy and communication when discussing data management best practices.
Writing Questions for SurveysNora Cate Schaeffer
The course focuses on the structure and wording of individual survey questions, whether for interviewer-administered or self-administered instruments. There are opportunities to apply the guidelines and principles during in-class exercises.
Why Take This Course?This course will be of use to researchers who will be writing or reviewing survey questions or survey instruments as well as to those who analyze survey data. This course gives practical guidance to those who have written survey questions but who are not familiar with research on question design, those who are just beginning to design survey instruments, and those who use survey data but do not themselves design survey instruments.
What Will Participants Learn?The course topics include a structural analysis of parts of a survey question and an introduction to cognitive interviewing as a method for testing survey questions. The largest portion of the class is devoted to guidelines for diagnosing problems in survey questions and writing new survey questions. These guidelines summarize and apply research that underlies the key decisions in writing survey questions.
Prerequisites and RequirementsThere are no requirements or prerequisites. Those who attend might find it useful to download these two papers in advance:
Schaeffer NC. Presser S. 2003. “The Science of Asking Questions.” Annual Review of Sociology 29: 65–88. http://arjournals.annualreviews.org/eprint/rU4UOoizjrXROhijkRIS/full/10.1146/annurev.soc.29.110702.110112
Schaeffer NC, Dykema J. 2011. “Questions for Surveys: Current Trends and Future Directions.” Public Opinion Quarterly, 75, 5: 919-961. http://poq.oxfordjournals.org/content/75/5/909.full.pdf+html
Introduction to Survey SamplingSharon Lohr
This course will introduce participants to concepts of survey sample design, weighting, and variance estimation. Starting with the basics of simple random sampling, we move on to the building blocks of stratification and clustering, including proper calculation and use of weights and variance estimates. To understand large complex samples, we will study the documentation of surveys such as the National Crime Victimization Survey (NCVS) or the National Health and Nutrition Examination Survey (NHANES). The concepts will be presented through numerous examples and hands-on activities.
Why Take This Course?Sample surveys are used to obtain information about populations of interest in many areas, including education, public health, sociology, ecology, political science, and economics. This course gives participants from these and related disciplines the knowledge needed to understand why features such as stratification, clustering, and differential sampling rates are used and how they affect estimates. We will read the online documentation and discuss the survey design, weighting, and nonresponse adjustments for the NCVS, NHANES, and other surveys as part of the course.
What Will Participants Learn?Participants will learn how to evaluate the quality of a survey, and become conversant with the terminology used in survey sampling. They will learn about the various types of sample designs and survey estimators available and when/why they should be used. We will discuss common mistakes people make when designing or analyzing survey samples and how to prevent them. Topics include: key terms of sampling: population, sampling frame, probability sampling, primary sampling unit, secondary sampling unit, sampling weight, probability proportional to size sampling, design effect; building blocks of survey design: stratification and clustering; use of sampling weights for estimating population means, totals, and regression relationships; methods for estimating variances; nonresponse and remedies.
PrerequisitesParticipants should have experience analyzing data using multiple linear regression and should be familiar with basic concepts from probability and statistics such as independence, bias, and variance. Examples using sample code and output from SAS® software will be discussed in the course, but participants do not need prior experience with SAS® and the concepts can be applied to other statistical software packages. Participants should bring a wifi-capable laptop containing a spreadsheet program (for example, Microsoft Excel) to the course.
Conceptual Diagrams in Information Visualization: Graphic Design for Effective CommunicationEric Monson
Well-designed diagrams in information visualization aren’t just pretty; they convey information effectively by working in concert with human perception. This course will equip you with the tools you need to make clear and impactful conceptual diagrams using Adobe Illustrator.
Why Take This Course?Words are essential for thinking and reasoning, but listening and reading are serial processes which require your audience to retain information in working memory while putting the pieces together. Information graphics, on the other hand, can be consumed quickly using the parallel nature of our visual systems, decreasing the cognitive load on the viewer. The problem is that effective graphic design isn’t intuitive – it takes some training that not many of us have had. The good news is that with a bit of guidance, we can quickly make large improvements in what we produce and recognize how to improve what we’ve created in the past.
What Will Participants Learn?In this course you will learn a few core principles of good graphic design, along with common visual metaphors for conveying your ideas. We will also practice the process of diagram creation, from rough brainstorming sketches to final digital artwork. You will learn the basics of using Adobe Illustrator, the professional standard in vector graphics software, which many people avoid because of its steep learning curve. You will see that it is quite easy to combine simple shapes to create interesting and clear diagrams.
PrerequisitesThere are no prerequisites. If you want to practice the Adobe Illustrator techniques in class, you will need to bring a laptop with the free trial version of the software installed. Please go to http://www.adobe.com/products/illustrator.html to sign up for an Adobe ID, download and install the software. Note: Since the free trial period is only 30 days, you’ll want to wait until less than 30 days before the course date to install the package.
Programming in RChris Bail
This class provides students with an introduction to basic programming techniques in R, a program with stronger object-oriented programming facilities than most statistical computing languages. The R language is widely used among statisticians and data miners for developing statistical software and data analysis. R's popularity has increased substantially in recent years.
Why Take This Course?This class will be useful to those who wish to restructure or clean unstructured data, collect new data in an automated fashion, or improve the speed of data analysis.
What Will Participants Learn?Students will learn basic programming techniques such as functions, “for” loops, if/else statements, vectorized functions, and parallel computing techniques.
PrerequisitesBasic familiarity with R syntax, objects (e.g. matrices, lists, data frames etc.)
Open(ing) Data: Considerations in Data Sharing and ReuseJon Crabtree, Thu-Mai Christian, and Sophia Lafferty-Hess
This workshop will examine the opportunities and challenges of open access to data resources and several of the open-source mechanisms available to share research data.
Why Take This Course?The benefits of making data open and accessible have been widely discussed within the academic and public policy communities. Sharing research data enables others to verify and build upon published results, supports transparency and accountability of research findings, increases the return on public investments in research, encourages new scientific innovations, and supports collaboration within and across disciplines. However, there are also some challenges related to opening up data to the broader community which this workshop will examine.
What Will Participants Learn?Specifically, participants will learn about 1) the open data access movement, 2) data security considerations, 3) protection of the confidentiality of research participants, 4) the process of anonymizing datasets, 5) embargos and rights of first use, 6) access restrictions, 7) data ownership, 8) data citation and 9) other ethical questions related to data sharing and reuse.
Creating Surveys in QualtricsTeresa Edwards
Surveys are one of the many tools data scientists can employ to address complex and evolving programs of inquiry. Qualtrics is an easy, intuitive, yet powerful survey tool with capabilities for simple to very complex online surveys. The course will use a hands-on format to teach basic and intermediate skills in survey building and distribution. In addition, we will demonstrate many of Qualtrics' advanced capabilities.
Participants should bring their own laptop with wireless capability; free wireless network access will be provided. Participants who have no exposure to Qualtrics are asked to review the 20-minute “Basic Building” video tutorial at http://www.qualtrics.com/university/researchsuite prior to the course date.
Please note: This course will not discuss question wording or the construction of survey content. For treatment of this topic, please register for the June 20-21 Data Matters short course Writing Questions for Surveys.
What Will Participants Learn?Hands-on survey creation: Question types; Skip logic and display logic; Advanced survey flow/routing; Tailoring question wording based on previous responses or preloaded data; Content validation; Recoding values; Customizing colors and screen layout; Creating survey invitation and reminder messages; Creating and uploading participant panels; Distributing surveys through Qualtrics mailer or anonymous link
PrerequisitesBring your own laptop computer with recent version of an internet browser (Chrome, Internet Explorer, or Firefox) installed. UNC students, staff, or faculty should have or create a Qualtrics account at http://qualtrics.unc.edu prior to the course. Participants without UNC affiliation should create a trial account at www.qualtrics.com.
Introduction to Big Data and Machine Learning for Survey Researchers and Social ScientistsTrent Buskirk
Data science, machine learning and big data are all the rage in many areas where decisions are required or insights need to be made. In this course we explore how big data concepts, processes and methods can be used within the context of social science and survey research. We also provide a technical overview of common machine learning algorithms coupled with examples that are specifically motivated by social science and survey research applications.
Why Take This Course?Big data and machine learning can be valuable assets to survey research and other social science methods. Applications of passive data collection and machine learning in social science have begun to emerge in many contexts and for many purposes. Survey researchers have long used auxiliary data sources to append person-specific information to sampling frames or survey responses. These days the auxiliary data often come from big data sources. In other contexts, administrative data and other big data sources are being harvested as alternatives to traditional surveys, in part due to cost considerations and in other part due to time sensitivities. So in this new era where data are bigger and machines learn along with humans, what does the future of social science look like and how can these methods help us derive better insights, improve our surveys and refine our designs? While certainly big data can provide insights into social and survey related areas, it is not the panacea nor the replacement for traditional methodologies, per se and much work is needed to translate the volume of data into useable information. This course will explore the many roles that big data and machine learning may play in the social science arena, with particular focus survey research methods.
What Will Participants Learn?This course will offer participants: an overview of key big data terminology and concepts; an introduction to common data generating processes; a discussion of some primary issues with linking big data with survey data; issues of coverage and measurement errors within the big data context; a discussion of information extraction and signal detection in the context of big data; a discussion of the similarities and differences in model building for inference versus prediction; an overview of general concepts from machine learning as they apply to processing big data; a discussion of signal detection and information extraction; a discussion of the potential pitfalls for inference from big data; an introduction to a set of key machine learning algorithms (e.g. cluster analysis, classification trees, random forests, conditional forests) to process big data using R with example code provided
Prerequisites/Who Should AttendThe course is aimed at both producers and users of social science and survey data. The course is aimed equally at researchers from academia, government and the voluntary and private sector and is appropriate for researchers new to this topic. While we illustrate big data in the context of survey research concepts such as responsive/tailored survey designs, measurement error, nonresponse bias and data linkage, it is not required that the participants be fully conversant in these concepts. Familiarity with model building and model selection as well as the R program is not required but is suggested. While this course is not intended to teach participants machine learning via R, we will explore four common machine learning algorithms and provide R code and output to illustrate these methods within the context of the R language.
Introduction to Data Mining and Machine LearningAshok Krishnamurthy
This course will introduce participants to a selection of the techniques used in data mining and machine learning in a hands-on, application-oriented way. Topics covered will include data exploration, decision trees, clustering, association rules, regression and pattern classification. The computing exercises will be based on the statistical programming language, R. At the end of the two days, you will be able to explore a data set, and determine which analysis method is appropriate for the data, and be able to use R packages to obtain results.
Why Take This Course?The ready availability of digital data from numerous sources is a tremendous opportunity for businesses and scientists to obtain new insights and confirm hypotheses. Data mining provides the theoretical basis, algorithms and computational methods to manage, analyze and get information from the data. In the world of big data and data science, data mining is a fundamental tool for data insights.
What Will Participants Learn?The course will be organized in the following major sections: data exploration; association rules; decision trees; clustering; regression; classification Each section will have an associated computer exercise. We will make extensive use of R and R packages in the computer exercises.
PrerequisitesThis course will assume a basic understanding of statistics and calculus at the undergraduate level. Some experience with R or SAS would be helpful.
Health Informatics in the Age of InteroperabilityMark Braunstein
Not that many years ago most health records were locked away in filing cabinets and were on paper. While this may have served the purposes of the provider who created the records, it made them virtually useless for coordinating care among providers and as a source of data for measuring provider performance and gaining new medical knowledge. As a result of a massive federal investment electronic health records (EHRs) are now virtually universal in hospitals and are used by the substantial majority of physicians. However, with adoption having been largely achieved, the attention has not turned to interoperability -- how to make the hundreds of systems in use talk to each other and provide facile access to the data they contain for innovation uses by both physicians and their patients. While this challenge is long standing it has only become solvable in practical terms as the health informatics industry has begun to adopt the same web technologies that have successfully provider similar capabilities to other industries. The first day of this course will explore this story – from the need for electronic records to the federal programs that finally led to their adoption to the shortcomings of current EHRs to the latest web services based interoperability technology called FHIR (Fast Healthcare Interoperability Resources) to the exciting things that can and are being done with health data once it is digital and can be aggregated and analyzed. The second day will provide students with hands on exposure to the new FHIR standard and real world apps that have been developed using it but in a way that does not assume programming skills.
Why Take this Course?Anyone in the healthcare industry or in affiliated fields either is or will be impacted by the dramatic changes coming about because of the introduction of digital technologies to the industry. This short course will provide students with the background to appreciate what’s happening and why it’s important along with a significant (but non-technical) introduction to the latest standards for sharing and using digital health data to transform care.
What Will Participants Learn?
Day 1: Module 1: Health informatics, broadly speaking, is the application of information technology to care delivery. The field is arguably now at a "tipping point" because of the relatively recent widespread adoption of electronic record and other digital systems for use by both providers and patients. Because of that, we are now, at least in theory, able to aggregate data from millions of patient encounters in order to analyze it to gain new knowledge and to obtain feedback on the quality and efficiency of the care those patients received. Of course, it's not that simple, and in this module we will explore some of the unique structural issues of U.S. healthcare and what the federal government has been doing to bridge those issues in order to create what the Institute of Medicine calls a "learning health system" -- a continuously improving system based on data from the applications of information technology just described.
Module 2: To usefully aggregate and analyze data from thousands of electronic record systems and millions of patient encounters these systems need to be interoperable -- they need to be able to meaningfully share data. It is generally agreed that accomplishing this requires standards. The degree to which they are critical is somewhat less clear now that computers are powerful enough to do sophisticated natural language processing. However, it is likely that, for many years into the future, useful health data will be standardized to some degree. In this module we'll explore the most commonly used health data standards, and we will briefly discuss how standardized data (along with free text) is packaged into standard electronic clinical documents and into messages to link together diverse systems in hospitals and beyond. We'll focus on Fast Healthcare Interoperability Resources (FHIR), the new, rapidly evolving standard that offers to accelerate interoperability and even create a universal health app platform that could potentially help solve some of the challenges with current electronic record systems we'll discuss in Module 3.
Module 3: With the increased use of electronic health record and other digital health systems and tools it is increasingly clear that much work remains to be done to make those systems easier and more efficient to use. These systems must also insure the privacy of patients and the security of the data they contain. Finally, they should provide researchers and other secondary users with the highest possible data quality consistent with the practical needs of the providers entering that data. In this module we will explore all these issues as they occur in real world systems. We'll interview the developers of one of the most innovative new electronic health record systems to gain some insights into the future directions these systems may take.
Module 4: Despite the many challenges we've discussed a great deal is already being done to aggregate and analyze health data from actual patient care for purposes such as improved diagnosis and treatment. In this module we'll look at examples of that as well as some of the other exciting future opportunities in big health data and analytics. We'll also see how analytic-based tools and systems can help overcome some of the challenges we introduced earlier including protecting patient privacy and security, helping to improve the quality of the data that is collected and making massive amounts of clinical data useful in daily clinical practice.
Day 2: Module 1: This morning sessions consists of hands-on exercises and activities (that don’t require programming) using Georgia Tech’s state of the art FHIR server and some other publicly available FHIR resources. At Georgia Tech we have synthetic patient records that mirror the U.S. chronic disease population. This facilitates realistic scenarios using FHIR. We’ll use other publicly available sites to see what is being done using FHIR to help deliver better care to patients.
Module 2: In the students will form teams and create a PowerPoint presentation of a basic design for a FHIR app that they would like to see developed to solve a healthcare problem/challenge they feel is important. Each team will present their ideas and defend them based on questions posed by the class and the instructor.
Collecting, Classifying, and Analyzing Big DataChris Bail
This course explains how to collect, classify, and analyze text-based data from the internet or other digital sources using R. The course will cover screen-scraping, interfacing with Application Programming Interfaces (APIs), basic natural language processing such as topic models, and explain how these data can be incorporated into traditional social science models.
Why Take this Course?Big data has become one of the most significant buzzwords in academic circles over the past few years, yet the study of how to use text as data crosses so many different academic disciplines, programming languages, and styles of communication that those who wish to enter this nascent field are quickly overwhelmed. This course will provide students with a panoramic perspective of the field and the programming skills necessary to navigate the rapidly growing wealth of information online about this subject.
What Will Participants Learn?This course is divided into four segments. The first section will cover basic techniques for collecting text-based data from the internet such as screen scraping and writing code to extract data from application programming interfaces. The second section will explain how to clean and code text-based data using a variety of pre-processing techniques such as stemming. The third section will explain how to apply topic models and other natural language processing tools to sample data. The fourth and final section will discuss best practices for incorporating variables produced via these methods into conventional social science models such as regression or social network analysis.
Prerequisites and RequirementsThis course assumes a basic working knowledge of the R language. Students with no knowledge of R might consider pairing this course with the “Introduction to Data Science in R” course that is also being offered early in the week.
Note: In order to participate in the hands-on sections of the course, participants must bring a laptop computer with enough space to install R and R Studio.
Simulation Strategies in Data Science: System Dynamics and Agent-based ModelingTodd BenDor
This course offers a step-by step, interactive approach to conceptualizing, creating, and implementing simulation models. These analytical tools can be used in addition to traditional triangulation strategies to operationalize quantitative and qualitative variables (or a combination of both) into a simulation. This two-day course will introduce two computer simulation approaches: systems thinking and system dynamics modeling (day 1), and agent-based modeling (day 2). The goal of this course is to enhance knowledge and skills in understanding and analyzing the complex feedback dynamics in social, economic, and environmental problems.
Why Take This Course?With an emphasis on aggregate behavior, system dynamics modeling can be useful in understanding the non-intuitive behavior of systems. Using basic concepts such as accumulation, rates of change, and feedback loops, systems thinking (qualitative) and system dynamics modeling (quantitative) can help researchers better address complex questions. Conversely, with a particular emphasis on individual behavior, agent-based modeling techniques can harness large-scale datasets to represent individual behavior and the social, economic, or environmental system structure that emerges. Agent-based modeling provides a sophisticated way to translate research goals into a dynamic model in simulation form. For both modeling approaches, we will emphasize the application and interpretation of modeling concepts and output rather than mathematical theory.
What Will Participants Learn?On day 1, we will also spend substantial time understanding how policy interventions affect the behavior and structure of systems. Students will develop a better understanding of feedback and its non-intuitive effects within social and physical systems, as well as an understanding of how to quantify causal relationships in dynamic, complex systems. The course will introduce system dynamics modeling through the STELLA and Vensim modeling platforms. On day 2, we will introduce the emerging analytical method of agent-based modeling, focusing first on when and why to use agent-based modeling, followed by a tutorial with the NetLogo simulation software.
Prerequisites and RequirementsThis course will assume a basic understanding of computer literacy and algebra. Basic computer programming concepts will be useful for the agent-based modeling part of the course as we will be stepping through the creation of basic models. Note: In order to participate in the hands-on sections of the course, participants must bring a laptop computer.
Conducting and Analyzing Cognitive Interviews: A Hands-On ApproachGordon Willis
The short course will provide a solid grounding in the design and implementation of cognitive testing of survey questionnaires, and in the analysis of the data produced in cognitive interviews. There will be coverage of a range of verbal probing techniques, with practice exercises included.
Why Take This Course?Cognitive testing is a widely used approach to pretest and evaluate survey questions, but there are few venues for learning how to conduct cognitive interviews. The course will emphasize the development and implementation of verbal probing techniques for both pretesting and evaluating survey questions, focusing on flexible, yet unbiased approaches to probing, based on Willis’s Cognitive Interviewing and Questionnaire Design: A Tool for Improving Survey Questions (2005). Participants will receive hands-on practice and feedback. We will also discuss analysis of cognitive interview results, a commonly neglected area of cognitive testing, guided by Willis’s Analysis of the Cognitive Interview in Questionnaire Design (2015). Finally, Dr. Willis will discuss novel developments in the field, such as web-based probing, and cognitive testing with multicultural populations.
What Will Participants Learn?Participants will learn how to design, conduct, and analyze cognitive interviews. Procedures to be addressed include reviewing the draft questionnaire to identify potential problems and issues, formulating cognitive probing questions to address identified concerns, using probing to detect unanticipated problems, follow-up probing, and avoiding pitfalls when conducting the interview. Regarding analysis of cognitive interview data, participants will learn about: (a) methods for producing data, coding interview observations, and summarizing the results of cognitive interviews; (b) techniques for combining results across interviewers and testing labs; (c) five major analysis strategies applicable to testing results; and (d) the interpretation and communication of findings. Finally, there will be a discussion of software that facilitates analysis, a framework for the transparent and comprehensive development of testing reports, and the inclusion of reports within an online database of existing testing reports.
PrerequisitesBasic knowledge of questionnaire design, but no specific types of training or credentials.
Analysis of Complex Sample Survey DataBrady West
In order to extract maximum information at minimum cost, sample designs are typically more complex than simple random samples. Stratified cluster sample designs are common. But how do you analyze the survey data collected from a complex sample? In particular, how do you determine margins of error and make inferences that take into account the complex sample design features? This one-day short course will discuss methods for the analysis of complex sample survey data, including estimation of descriptive parameters, methods for variance estimation, and linear and logistic regression modeling. This short course is intended for anyone analyzing survey data collected from complex samples and assumes a background in applied statistical analysis. The course is largely based on selected chapters from the book Applied Survey Data Analysis by Steve Heeringa, Brady West, and Pat Berglund (Chapman & Hall / CRC Press, 2010). The course will be lecture-based, but participants may bring their own laptop computers with software for the analysis of survey data installed to follow the examples.