Big Data, Digital Research Methods, and Circumnavigating Archival Challenges

Since 2022, severely curtailed access to archives has made the study of the history of the former Soviet Union more challenging. My own scholarly work on German prisoners of war in the Soviet Union during and after the Second World War illustrates how digital research methods can overcome issues with archival sources. New ways of investigating sources are becoming accessible. My research has focused on two central questions. Why did the Soviet Union hold German POWs until 1956, eleven years after the war’s end and seven years after the other Allied powers had released theirs? And why does the trope of “Siberian captivity” dominate German memory, despite evidence that most of the German POWs were kept in the European part of Russia? 

Neither question could be fully answered without employing digital methods, though the tools themselves were not the solution to the problem. It was technology, used critically and responsibly, that allowed traditional historical questions to be asked on a different scale. Digital humanities – computer-assisted research, teaching and publishing – cannot replace traditional historical inquiry or sources. Digital humanities tools and methods help scholars to evaluate sources in new ways, ways that often assimilate information that no one person could process. The computer can examine these “big data” collections from a distance, providing new perspectives. 

In my own research, I had a theory. It was wartime and postwar conditions that compelled Soviet officials to incarcerate German POWs for seven years longer than the other Allied victor nations. The realities of twenty-seven million deaths and the losses of a quarter of the nation’s total physical assets led Soviet leaders to use able-bodied Germans for postwar reconstruction. The POW camp system could be superimposed onto a highly developed GULAG system of forced labor. 

German officers held as prisoners of war having lunch at the Krasnogorsk Special Operational Transit Camp No. 27 (1944).

The utility of any archive lies in the sources that are retained and in the sources that an archival regime is willing to declassify. Although the Soviet Union was a centrally planned economy, I could find no declassified comprehensive records about the labor contribution of German POWs. I did find scattered sources, which provided snapshots of POWs’ impact upon specific industries or industrial sites. I even found orders from General Secretary of the Communist Party of the Soviet Union Joseph Stalin , dictating the assignments of POW contingents to particular cities for reconstruction or to coal fields for resource extraction. Fragmentary sources like this were informative, but they did not prove my thesis about the correlation between incarceration and postwar reconstruction. 

Digital methods and geographic information system (GIS) mapping in particular allowed me to take a dataset and then to produce a series of maps. This provided an alternative means to support my thesis. I digitized a print encyclopedia of German POW labor camps in the Soviet Union from 1941 to 1956. Through a process called optical character recognition (OCR), I scanned the book and then turned the images of the scans into machine-readable text. The data tables from the book images became rows and columns of text in a spreadsheet. Suddenly, my work on POWs had become a big data project. I had over 4,300 camp sites to locate and map. I then worked with a programmer to ask Google maps to find latitude and longitude coordinates for the cities and villages associated with the camps in the encyclopedia. Multiple rounds of refining the query process revealed approximate locations for almost 99% of the camps listed in the encyclopedia. I was then able to do the type of research at which computers and big data projects excel, which is known as distant reading. As opposed to close reading of individual documents, the computer could place these 4,300 locations on the map. 

The visualization and analysis of this massive dataset helped me to illustrate and substantiate the important role German POWs played in postwar reconstruction. All was based on the widespread nature of the camps across the ruined western territories of the USSR and on a clustering of camps in major industrial regions such as the Ural mountain range’s coal basins and factory cities. Full results of this study can be found in my book, From Incarceration to Repatriation: German Prisoners of War in the Soviet Union.

Map of German Prisoner of War Camps in the USSR 1941-1956 and Railway Lines in Russia

This digital research brought me to my second major research topic: questioning the popular trope among returnees and German families that. wartime and postwar captivity in the USSR was predominantly in Siberia. In fact, my maps demonstrated that only a minority of camps were in Siberia. I generated an offshoot research project on the origin of this myth, postulating that the trope may have come from returnee memoirs. I could access, scan, and digitize roughly thirty-five memoirs published between the 1940s and 2010s, across a series of countries including West Germany, East Germany, reunified Germany, and the United States. I produced a dataset of captivity locations from memoirs, finding once again that Siberian captivity was the exception rather than the norm. 

Computer-driven research into memoir literature gave me the names of interwar memoirs and novels based on the experience of Germans captured by the Russian Empire during World War I. Siberian captivity at that time left a lasting impression on the young men that would be drafted into the German army during World War II. Additionally, the computer- assisted research turned up a wartime Nazi propaganda slogan: that a loss to the Soviet Union in the war would result in Siberian captivity. I could potentially have come across the details by reading these memoirs word for word, but my digitization and automated searching process made this much faster. A series of anecdotal conversations with Germans related to detainees in the USSR plus the results of my mapping work on the Soviet economics and motivations behind incarceration of German POWs inspired a separate journal article. Again, the right digital methods - text analysis and mapping - gave enhanced answers to a traditional research question. 

Places of Captivity Named in German Returnee Memoirs

Scholars curious about employing digital methods for their own research should not start with the technology. They should start with traditional research questions for the sources at hand. From there, determine which digital methods are the most appropriate to dive into the sources in new ways. While archival access may currently be limited, there are plentiful published volumes of primary sources, statistical collections, encyclopedias, economic geography atlases, diaries, and memoirs related to the history of the former Soviet Union. Some data has been digitized and is easily accessible online. The Peripheral Histories? blog curated a list of online collections of primary sources, ranging from independent source collections or datasets to national archives across the former Soviet Union. Digital datasets from print materials can also be created.

The Carnegie Mellon Digital Humanities Literacy Guidebook offers a great starting point for learning about different digital tools and methods. Not every library specializes in digital methods, though some do. Many librarians are knowledgeable about some of these methods or know people within the university that can assist with them. Members of the Association for Slavic, East European, and Eurasian Studies (ASEEES) can also connect with SlavicDH, the working group for digital humanities at ASEEES, which hosts a drop-in networking and help session at each annual convention. Programming Historian also offers excellent online tutorials across four different languages. 

The world of digital humanities is vibrant, friendly, welcoming, and interdisciplinary. Although technology cannot replace the human element of research, it can be of enormous help. More scholars studying the history of the former Soviet Union should make use of technology to enhance the old-fashioned research methods. Given the restrictions on travel and access to archives, from war and from budget cuts, the future of the field will require the broader adoption of new technologies. When correctly applied, these big data digital research projects have extraordinary potential.

Susan Grunewald is an Assistant Professor of European History at Southern New Hampshire University, and an award-winning teacher, researcher, and author. Her work combines digital humanities methods, including GIS mapping and text analysis, with archival research to examine Soviet and German World War II and Cold War history. She earned her MA and PhD from Carnegie Mellon University and her BA in Russian and East European Studies (with a minor in German) from Lafayette College. A Fulbright alumna in Ulyanovsk, Russia, she previously served as the Digital History Postdoctoral Fellow at the University of Pittsburgh World History Center (2019–2022) and an Assistant Professor of 20th Century European History at Louisiana State University. She is available to consult on European and Soviet history, digital humanities methods (including GIS mapping and text analysis), archival research strategies, and the design of innovative, research-driven educational content.

s.grunewald@snhu.edu

Next
Next

Non-Political-Yet-Political Theater Comes to Washington