3 bookmaker.py is a helper for optimizing PDFs of books for the production of small self-printed, self-bound physical books. Towards this goal it offers various PDF manipulation options that may also be used indepéndently and for other purposes.
8 Concatenate two PDFs A.pdf and B.pdf to COMBINED.pdf:
9 bookmaker.py --input_file A.pdf --input_file B.pdf --output_file COMBINED.pdf
11 Produce OUTPUT.pdf containing all pages of (inclusive) page number range 3-7 from INPUT.pdf:
12 bookmaker.py -i INPUT.pdf --page_range 3-7 -o OUTPUT.pdf
14 Produce COMBINED.pdf from A.pdf's first 7 pages, B.pdf's pages except its first two, and all pages of C.pdf:
15 bookmaker.py -i A.pdf -p start-7 -i B.pdf -p 3-end -i C.pdf -o COMBINED.pdf
17 Crop each page 5cm from the left, 10cm from the bottom, 2cm from the right, and 0cm from the top:
18 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf --crops "5,10,2,0"
20 Include all pages from INPUT.pdf, but crop pages 10-20 by 5cm each from bottom and top:
21 bookmaker.py -i INPUT.pdf -c "10-20:0,5,0,5" -o OUTPUT.pdf
23 Same crops for pages 10-20, but also crop all pages 30 and later by 3cm each from left and right:
24 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf -c "10-20:0,5,0,5" -c "30-end:3,0,3,0"
26 Rotate by 90° pages 3, 5, 7; rotate page 7 once more by 90% (i.e. 180° in total):
27 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf --rotate 3 -r 5 -r 7 -r 7
29 Initially declare 5cm crop from the left and 1cm crop from right, but alternate direction between even and odd pages:
30 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf -c "5,0,1,0" -s
32 Quarter each OUTPUT.pdf page to carry 4 pages from INPUT.pdf, draw stencils into inner margins for cuts to carry binding strings:
33 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf --nup4
35 Same --nup4, but define a printable-region margin of 1.3cm to limit the space for the INPUT.pdf pages in OUTPUT.pdf page quarters:
36 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf -n --print_margin 1.3
38 Same --nup4, but draw lines marking printable-region margins, page quarts, spine margins:
39 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf -n --analyze
43 For arguments like -p, page numbers are assumed to start with 1 (not 0, which is treated as an invalid page number value).
45 The target page shape so far is assumed to be A4 in portrait orientation; bookmaker.py normalizes all pages to this format before applying crops, and removes any source PDF /Rotate commands (for their production of landscape orientations).
47 For --nup4, the -c cropping instructions do not so much erase content outside the cropped area, but rather zoom into the page in a way that maximes the cropped area as much as possible into the available per-page area between printable-area margins and the borders to the other quartered pages. If the zoomed cropped area does not fit in neatly into its per-page area, this will preserve additional page content.
49 The --nup4 quartering puts pages into a specific order optimized for no-tumble duplex print-outs that can easily be folded and cut into pages of a small A6 book. Each unit of 8 pages from the source PDF is mapped thus onto two subsequent pages (i.e. front and back of a printed A4 paper):
58 To facilitate this layout, --nup4 also pads the input PDF pages to a total number that is a multiple of 8, by adding empty pages if necessary.
60 (To turn above double-sided example page into a tiny 8-page book: Cut the paper in two on its horizontal middle line. Fold the two halves by their vertical middle lines, with pages 3-2 and 7-6 on the folds' insides. This creates two 4-page books of pages 1-4 and pages 5-8. Fold them both closed and (counter-intuitively) put the book of pages 5-8 on top of the other one (creating a temporary page order of 5,6,7,8,1,2,3,4). A binding cut stencil should be visible on the top left of this stack – cut it out (with all pages folded together) to add the same inner-margin upper cut to each page. Turn around your 8-pages stack to find the mirror image of aforementioned stencil on the stack's back's bottom, and cut that out too. Each page now has binding cuts on top and bottom of its inner margins. Swap the order of both books (back to the final page order of 1,2,3,4,5,6,7,8), and you now have an 8-pages book that can be "bound" in its binding cuts through a rubber band or the like. Repeat with the next 8-pages double-page, et cetera. (Actually, with just 8 pages, the paper may curl under the pressure of a rubber band – but go up to 32 pages or so, and the result will become quite stable.)
67 def handled_error_exit(msg):
68 print(f"ERROR: {msg}")
74 handled_error_exit("Can't run at all without pypdf installed.")
77 POINTS_PER_CM = 10 * 72 / 25.4
78 A4_WIDTH = 21 * POINTS_PER_CM
79 A4_HEIGHT = 29.7 * POINTS_PER_CM
80 A4 = (A4_WIDTH, A4_HEIGHT)
81 CUT_DEPTH = 1.95 * POINTS_PER_CM
82 CUT_WIDTH = 1.05 * POINTS_PER_CM
83 MIDDLE_POINT_DEPTH = 0.4 * POINTS_PER_CM
84 SPINE_LIMIT = 1 * POINTS_PER_CM
87 class HandledException(Exception):
90 def validate_page_range(p_string, err_msg_prefix):
91 prefix = f"{err_msg_prefix}: page range string"
92 if '-' not in p_string:
93 raise HandledException(f"{prefix} lacks '-': {p_string}")
94 tokens = p_string.split("-")
96 raise HandledException(f"{prefix} has too many '-': {p_string}")
97 for i, token in enumerate(tokens):
100 if i == 0 and token == "start":
102 if i == 1 and token == "end":
107 raise HandledException(f"{prefix} carries value neither integer, nor 'start', nor 'end': {p_string}")
109 raise HandledException(f"{prefix} carries page number <1: {p_string}")
113 start = int(tokens[0])
117 if start > 0 and end > 0 and start > end:
118 raise HandledException(f"{prefix} has higher start than end value: {p_string}")
120 def split_crops_string(c_string):
121 initial_split = c_string.split(':')
122 if len(initial_split) > 1:
123 page_range = initial_split[0]
124 crops = initial_split[1]
127 crops = initial_split[0]
128 return page_range, crops
130 def parse_page_range(range_string, pages):
132 end_page = len(pages)
134 start, end = range_string.split('-')
135 if not (len(start) == 0 or start == "start"):
136 start_page = int(start) - 1
137 if not (len(end) == 0 or end == "end"):
139 return start_page, end_page
142 parser = argparse.ArgumentParser(description=__doc__, epilog=help_epilogue, formatter_class=argparse.RawDescriptionHelpFormatter)
143 parser.add_argument("-i", "--input_file", action="append", required=True, help="input PDF file")
144 parser.add_argument("-o", "--output_file", required=True, help="output PDF file")
145 parser.add_argument("-p", "--page_range", action="append", help="page range, e.g., '2-9' or '3-end' or 'start-14'")
146 parser.add_argument("-c", "--crops", action="append", help="cm crops left, bottom, right, top – e.g., '10,10,10,10'; prefix with ':'-delimited page range to limit effect")
147 parser.add_argument("-r", "--rotate_page", type=int, action="append", help="rotate page of number by 90° (usable multiple times on same page!)")
148 parser.add_argument("-s", "--symmetry", action="store_true", help="alternate horizontal crops between odd and even pages")
149 parser.add_argument("-n", "--nup4", action='store_true', help="puts 4 input pages onto 1 output page, adds binding cut stencil")
150 parser.add_argument("-a", "--analyze", action="store_true", help="in --nup4, print lines identifying spine, page borders")
151 parser.add_argument("-m", "--print_margin", type=float, default=0.43, help="print margin for --nup4 in cm (default 0.43)")
152 args = parser.parse_args()
154 # some basic input validation
155 for filename in args.input_file:
156 if not os.path.isfile(filename):
157 raise HandledException(f"-i: {filename} is not a file")
159 with open(filename, 'rb') as file:
160 pypdf.PdfReader(file)
161 except pypdf.errors.PdfStreamError:
162 raise HandledException(f"-i: cannot interpret {filename} as PDF file")
164 for p_string in args.page_range:
165 validate_page_range(p_string, "-p")
166 if len(args.page_range) > len(args.input_file):
167 raise HandledException("-p: more --page_range arguments than --input_file arguments")
169 for c_string in args.crops:
170 initial_split = c_string.split(':')
171 if len(initial_split) > 2:
172 raise HandledException(f"-c: cropping string has multiple ':': {c_string}")
173 page_range, crops = split_crops_string(c_string)
174 crops = crops.split(",")
176 validate_page_range(page_range, "-c")
178 raise HandledException(f"-c: cropping does not contain exactly three ',': {c_string}")
183 raise HandledException(f"-c: non-number crop in: {c_string}")
185 for r in args.rotate_page:
189 raise HandledException(f"-r: non-integer value: {r}")
191 raise HandledException(f"-r: value must not be <1: {r}")
193 float(args.print_margin)
195 raise HandledException(f"-m: non-float value: {arg.print_margin}")
203 import reportlab.pdfgen.canvas
205 raise HandledException("-n: need reportlab library installed for --nup4")
207 # select pages from input files
211 for i, input_file in enumerate(args.input_file):
212 file = open(input_file, 'rb')
213 opened_files += [file]
214 reader = pypdf.PdfReader(file)
216 if args.page_range and len(args.page_range) > i:
217 range_string = args.page_range[i]
218 start_page, end_page = parse_page_range(range_string, reader.pages)
219 if end_page > len(reader.pages): # no need to test start_page cause start_page > end_page is checked above
220 raise HandledException(f"-p: page range goes beyond pages of input file: {range_string}")
221 for old_page_num in range(start_page, end_page):
223 page = reader.pages[old_page_num]
224 pages_to_add += [page]
225 print(f"-i, -p: read in {input_file} page number {old_page_num+1} as new page {new_page_num}")
227 # we can do some more input validations now that we know how many pages output should have
229 for c_string in args.crops:
230 page_range, _= split_crops_string(c_string)
232 start, end = parse_page_range(page_range, pages_to_add)
233 if end > len(pages_to_add):
234 raise HandledException(f"-c: page range goes beyond number of pages we're building: {page_range}")
236 for r in args.rotate_page:
237 if r > len(pages_to_add):
238 raise HandledException(f"-r: page number beyond number of pages we're building: {r}")
240 # rotate page canvas (as opposed to using PDF's /Rotate command)
242 for rotate_page in args.rotate_page:
243 page = pages_to_add[rotate_page - 1]
244 page.add_transformation(pypdf.Transformation().translate(tx=-A4_WIDTH/2, ty=-A4_HEIGHT/2))
245 page.add_transformation(pypdf.Transformation().rotate(-90))
246 page.add_transformation(pypdf.Transformation().translate(tx=A4_WIDTH/2, ty=A4_HEIGHT/2))
247 print(f"-r: rotating (by 90°) page {rotate_page}")
249 # if necessary, pad pages to multiple of 8
251 mod_to_8 = len(pages_to_add) % 8
253 print(f"-n: number of input pages {len(pages_to_add)} not multiple of 8, padding to that")
254 for _ in range(8 - mod_to_8):
255 new_page = pypdf.PageObject.create_blank_page(width=A4_WIDTH, height=A4_HEIGHT)
256 pages_to_add += [new_page]
258 # normalize all pages to portrait A4
259 for page in pages_to_add:
260 if "/Rotate" in page:
261 page.rotate(360 - page["/Rotate"])
262 page.mediabox.left = 0
263 page.mediabox.bottom = 0
264 page.mediabox.top = A4_HEIGHT
265 page.mediabox.right = A4_WIDTH
266 page.cropbox = page.mediabox
268 # determine page crops, zooms, crop symmetry
269 crops_at_page = [(0,0,0,0)]*len(pages_to_add)
270 zoom_at_page = [1]*len(pages_to_add)
272 for c_string in args.crops:
273 page_range, crops = split_crops_string(c_string)
274 start_page, end_page = parse_page_range(page_range, pages_to_add)
275 crop_left_cm, crop_bottom_cm, crop_right_cm, crop_top_cm = [float(x) for x in crops.split(',')]
276 crop_left = crop_left_cm * POINTS_PER_CM
277 crop_bottom = crop_bottom_cm * POINTS_PER_CM
278 crop_right = crop_right_cm * POINTS_PER_CM
279 crop_top = crop_top_cm * POINTS_PER_CM
280 prefix = "-c, -t" if args.symmetry else "-c"
281 suffix = " (but alternating left and right crop between even and odd pages)" if args.symmetry else ""
282 print(f"{prefix}: to pages {start_page + 1} to {end_page} applying crops: left {crop_left_cm}cm, bottom {crop_bottom_cm}cm, right {crop_right_cm}cm, top {crop_top_cm}cm{suffix}")
283 cropped_width = A4_WIDTH - crop_left - crop_right
284 cropped_height = A4_HEIGHT - crop_bottom - crop_top
286 zoom_horizontal = A4_WIDTH / (A4_WIDTH - crop_left - crop_right)
287 zoom_vertical = A4_HEIGHT / (A4_HEIGHT - crop_bottom - crop_top)
288 if (zoom_horizontal > 1 and zoom_vertical < 1) or (zoom_horizontal < 1 and zoom_vertical > 1):
289 raise HandledException("-c: crops would create opposing zoom directions")
290 elif zoom_horizontal + zoom_vertical > 2:
291 zoom = min(zoom_horizontal, zoom_vertical)
293 zoom = max(zoom_horizontal, zoom_vertical)
294 for page_num in range(start_page, end_page):
295 if args.symmetry and page_num % 2:
296 crops_at_page[page_num] = (crop_right, crop_bottom, crop_left, crop_top)
298 crops_at_page[page_num] = (crop_left, crop_bottom, crop_right, crop_top)
299 zoom_at_page[page_num] = zoom
301 writer = pypdf.PdfWriter()
304 print("building 1-input-page-per-output-page book")
306 for i, page in enumerate(pages_to_add):
307 crop_left, crop_bottom, crop_right, crop_top = crops_at_page[i]
308 zoom = zoom_at_page[i]
309 page.add_transformation(pypdf.Transformation().translate(tx=-crop_left, ty=-crop_bottom))
310 page.add_transformation(pypdf.Transformation().scale(zoom, zoom))
311 cropped_width = A4_WIDTH - crop_left - crop_right
312 cropped_height = A4_HEIGHT - crop_bottom - crop_top
313 page.mediabox.right = cropped_width * zoom
314 page.mediabox.top = cropped_height * zoom
315 writer.add_page(page)
316 odd_page = not odd_page
317 print(f"built page number {i+1} (of {len(pages_to_add)})")
320 print("-n: building 4-input-pages-per-output-page book")
321 print(f"-m: applying printable-area margin of {args.print_margin}cm")
323 print("-a: drawing page borders, spine limits")
325 printable_margin = args.print_margin * POINTS_PER_CM
326 printable_scale = (A4_WIDTH - 2*printable_margin)/A4_WIDTH
327 half_width = A4_WIDTH / n_pages_per_axis
328 half_height = A4_HEIGHT / n_pages_per_axis
329 section_scale_factor = 1 / n_pages_per_axis
330 spine_part_of_page = (SPINE_LIMIT / half_width) / printable_scale
331 bonus_shrink_factor = 1 - spine_part_of_page
337 for page in pages_to_add:
344 new_i_order += [8 * n_eights + 3,
353 new_page_order += [eight_pack[3]] # page front, upper left
354 new_page_order += [eight_pack[0]] # page front, upper right
355 new_page_order += [eight_pack[7]] # page front, lower left
356 new_page_order += [eight_pack[4]] # page front, lower right
357 new_page_order += [eight_pack[1]] # page back, upper left
358 new_page_order += [eight_pack[2]] # page back, upper right
359 new_page_order += [eight_pack[5]] # page back, lower left
360 new_page_order += [eight_pack[6]] # page back, lower right
364 for j, page in enumerate(new_page_order):
366 new_page = pypdf.PageObject.create_blank_page(width=A4_WIDTH, height=A4_HEIGHT)
368 # in-section transformations: align pages on top, left-hand pages to left, right-hand to right
369 new_i = new_i_order[j]
370 crop_left, crop_bottom, crop_right, crop_top = crops_at_page[new_i]
371 zoom = zoom_at_page[new_i]
372 page.add_transformation(pypdf.Transformation().translate(ty=(A4_HEIGHT / zoom - (A4_HEIGHT - crop_top))))
374 page.add_transformation(pypdf.Transformation().translate(tx=-crop_left))
375 elif i == 1 or i == 3:
376 page.add_transformation(pypdf.Transformation().translate(tx=(A4_WIDTH / zoom - (A4_WIDTH - crop_right))))
377 page.add_transformation(pypdf.Transformation().scale(zoom * bonus_shrink_factor, zoom * bonus_shrink_factor))
379 page.add_transformation(pypdf.Transformation().translate(ty=-2*printable_margin/printable_scale))
381 # outer section transformations
382 page.add_transformation(pypdf.Transformation().translate(ty=(1-bonus_shrink_factor)*A4_HEIGHT))
384 y_section = A4_HEIGHT
385 page.mediabox.bottom = half_height
386 page.mediabox.top = A4_HEIGHT
389 page.mediabox.bottom = 0
390 page.mediabox.top = half_height
393 page.mediabox.left = 0
394 page.mediabox.right = half_width
396 page.add_transformation(pypdf.Transformation().translate(tx=(1-bonus_shrink_factor)*A4_WIDTH))
398 page.mediabox.left = half_width
399 page.mediabox.right = A4_WIDTH
400 page.add_transformation(pypdf.Transformation().translate(tx=x_section, ty=y_section))
401 page.add_transformation(pypdf.Transformation().scale(section_scale_factor, section_scale_factor))
402 new_page.merge_page(page)
404 print(f"merged page number {page_count} (of {len(pages_to_add)})")
409 packet = io.BytesIO()
410 c = reportlab.pdfgen.canvas.Canvas(packet, pagesize=A4)
412 c.line(0, A4_HEIGHT, A4_WIDTH, A4_HEIGHT)
413 c.line(0, half_height, A4_WIDTH, half_height)
414 c.line(0, 0, A4_WIDTH, 0)
415 c.line(0, A4_HEIGHT, 0, 0)
416 c.line(half_width, A4_HEIGHT, half_width, 0)
417 c.line(A4_WIDTH, A4_HEIGHT, A4_WIDTH, 0)
419 new_pdf = pypdf.PdfReader(packet)
420 new_page.merge_page(new_pdf.pages[0])
421 printable_offset_x = printable_margin
422 printable_offset_y = printable_margin * A4_HEIGHT / A4_WIDTH
423 new_page.add_transformation(pypdf.Transformation().scale(printable_scale, printable_scale))
424 new_page.add_transformation(pypdf.Transformation().translate(tx=printable_offset_x, ty=printable_offset_y))
425 x_left_SPINE_LIMIT = half_width * bonus_shrink_factor
426 x_right_SPINE_LIMIT = A4_WIDTH - x_left_SPINE_LIMIT
427 if args.analyze or front_page:
428 packet = io.BytesIO()
429 c = reportlab.pdfgen.canvas.Canvas(packet, pagesize=A4)
433 c.line(x_left_SPINE_LIMIT, A4_HEIGHT, x_left_SPINE_LIMIT, 0)
434 c.line(x_right_SPINE_LIMIT, A4_HEIGHT, x_right_SPINE_LIMIT, 0)
438 start_up_left_left_x = x_left_SPINE_LIMIT - 0.5 * CUT_WIDTH
439 start_up_left_right_x = x_left_SPINE_LIMIT + 0.5 * CUT_WIDTH
440 middle_point_up_left_y = half_height + MIDDLE_POINT_DEPTH
441 end_point_up_left_y = half_height + CUT_DEPTH
442 c.line(start_up_left_right_x, half_height, x_left_SPINE_LIMIT, end_point_up_left_y)
443 c.line(x_left_SPINE_LIMIT, end_point_up_left_y, x_left_SPINE_LIMIT, middle_point_up_left_y)
444 c.line(x_left_SPINE_LIMIT, middle_point_up_left_y, start_up_left_left_x, half_height)
446 start_down_right_left_x = x_right_SPINE_LIMIT - 0.5 * CUT_WIDTH
447 start_down_right_right_x = x_right_SPINE_LIMIT + 0.5 * CUT_WIDTH
448 middle_point_down_right_y = half_height - MIDDLE_POINT_DEPTH
449 end_point_down_right_y = half_height - CUT_DEPTH
450 c.line(start_down_right_left_x, half_height, x_right_SPINE_LIMIT, end_point_down_right_y)
451 c.line(x_right_SPINE_LIMIT, end_point_down_right_y, x_right_SPINE_LIMIT, middle_point_down_right_y)
452 c.line(x_right_SPINE_LIMIT, middle_point_down_right_y, start_down_right_right_x, half_height)
453 if args.analyze or front_page:
455 new_pdf = pypdf.PdfReader(packet)
456 new_page.merge_page(new_pdf.pages[0])
457 writer.add_page(new_page)
459 front_page = not front_page
462 for file in opened_files:
464 with open(args.output_file, 'wb') as output_file:
465 writer.write(output_file)
468 if __name__ == "__main__":
471 except HandledException as e:
472 handled_error_exit(e)