3 bookmaker.py is a helper for optimizing PDFs of books for the production of small self-printed, self-bound physical books. Towards this goal it offers various PDF manipulation options that may also be used indepéndently and for other purposes.
8 Concatenate two PDFs A.pdf and B.pdf to COMBINED.pdf:
9 bookmaker.py --input_file A.pdf --input_file B.pdf --output_file COMBINED.pdf
11 Produce OUTPUT.pdf containing all pages of (inclusive) page number range 3-7 from INPUT.pdf:
12 bookmaker.py -i INPUT.pdf --page_range 3-7 -o OUTPUT.pdf
14 Produce COMBINED.pdf from A.pdf's first 7 pages, B.pdf's pages except its first two, and all pages of C.pdf:
15 bookmaker.py -i A.pdf -p start-7 -i B.pdf -p 3-end -i C.pdf -o COMBINED.pdf
17 Crop each page 5cm from the left, 10cm from the bottom, 2cm from the right, and 0cm from the top:
18 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf --crops "5,10,2,0"
20 Include all pages from INPUT.pdf, but crop pages 10-20 by 5cm each from bottom and top:
21 bookmaker.py -i INPUT.pdf -c "10-20:0,5,0,5" -o OUTPUT.pdf
23 Same crops for pages 10-20, but also crop all pages 30 and later by 3cm each from left and right:
24 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf -c "10-20:0,5,0,5" -c "30-end:3,0,3,0"
26 Rotate by 90° pages 3, 5, 7; rotate page 7 once more by 90% (i.e. 180° in total):
27 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf --rotate 3 -r 5 -r 7 -r 7
29 Initially declare 5cm crop from the left and 1cm crop from right, but alternate direction between even and odd pages:
30 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf -c "5,0,1,0" -s
32 Quarter each OUTPUT.pdf page to carry 4 pages from INPUT.pdf, draw stencils into inner margins for cuts to carry binding strings:
33 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf --nup4
35 Same --nup4, but define a printable-region margin of 1.3cm to limit the space for the INPUT.pdf pages in OUTPUT.pdf page quarters:
36 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf -n --print_margin 1.3
38 Same --nup4, but draw lines marking printable-region margins, page quarts, spine margins:
39 bookmaker.py -i INPUT.pdf -o OUTPUT.pdf -n --analyze
43 For arguments like -p, page numbers are assumed to start with 1 (not 0, which is treated as an invalid page number value).
45 The target page shape so far is assumed to be A4 in portrait orientation; bookmaker.py normalizes all pages to this format before applying crops, and removes any source PDF /Rotate commands (for their production of landscape orientations).
47 For --nup4, the -c cropping instructions do not so much erase content outside the cropped area, but rather zoom into the page in a way that maximes the cropped area as much as possible into the available per-page area between printable-area margins and the borders to the other quartered pages. If the zoomed cropped area does not fit in neatly into its per-page area, this will preserve additional page content.
49 The --nup4 quartering puts pages into a specific order optimized for no-tumble duplex print-outs that can easily be folded and cut into pages of a small A6 book. Each unit of 8 pages from the source PDF is mapped thus onto two subsequent pages (i.e. front and back of a printed A4 paper):
58 To facilitate this layout, --nup4 also pads the input PDF pages to a total number that is a multiple of 8, by adding empty pages if necessary.
60 (To turn above double-sided example page into a tiny 8-page book: Cut the paper in two on its horizontal middle line. Fold the two halves by their vertical middle lines, with pages 3-2 and 7-6 on the folds' insides. This creates two 4-page books of pages 1-4 and pages 5-8. Fold them both closed and (counter-intuitively) put the book of pages 5-8 on top of the other one (creating a temporary page order of 5,6,7,8,1,2,3,4). A binding cut stencil should be visible on the top left of this stack – cut it out (with all pages folded together) to add the same inner-margin upper cut to each page. Turn around your 8-pages stack to find the mirror image of aforementioned stencil on the stack's back's bottom, and cut that out too. Each page now has binding cuts on top and bottom of its inner margins. Swap the order of both books (back to the final page order of 1,2,3,4,5,6,7,8), and you now have an 8-pages book that can be "bound" in its binding cuts through a rubber band or the like. Repeat with the next 8-pages double-page, et cetera. (Actually, with just 8 pages, the paper may curl under the pressure of a rubber band – but go up to 32 pages or so, and the result will become quite stable.)
66 from collections import namedtuple
68 def handled_error_exit(msg):
69 print(f"ERROR: {msg}")
75 handled_error_exit("Can't run at all without pypdf installed.")
77 # some general paper geometry constants
78 POINTS_PER_CM = 10 * 72 / 25.4
79 A4_WIDTH = 21 * POINTS_PER_CM
80 A4_HEIGHT = 29.7 * POINTS_PER_CM
81 A4 = (A4_WIDTH, A4_HEIGHT)
83 # constants specifically for --nup4
84 A4_HALF_WIDTH = A4_WIDTH / 2
85 A4_HALF_HEIGHT = A4_HEIGHT / 2
86 CUT_DEPTH = 1.95 * POINTS_PER_CM
87 CUT_WIDTH = 1.05 * POINTS_PER_CM
88 MIDDLE_POINT_DEPTH = 0.4 * POINTS_PER_CM
89 SPINE_LIMIT = 1 * POINTS_PER_CM
90 QUARTER_SCALE_FACTOR = 0.5
91 PAGE_ORDER_FOR_NUP4 = (3,0,7,4,1,2,5,6)
94 PageCrop = namedtuple("PageCrop", ["left", "bottom", "right", "top"], defaults=[0,0,0,0])
96 class HandledException(Exception):
100 parser = argparse.ArgumentParser(description=__doc__, epilog=help_epilogue, formatter_class=argparse.RawDescriptionHelpFormatter)
101 parser.add_argument("-i", "--input_file", action="append", required=True, help="input PDF file")
102 parser.add_argument("-o", "--output_file", required=True, help="output PDF file")
103 parser.add_argument("-p", "--page_range", action="append", help="page range, e.g., '2-9' or '3-end' or 'start-14'")
104 parser.add_argument("-c", "--crops", action="append", help="cm crops left, bottom, right, top – e.g., '10,10,10,10'; prefix with ':'-delimited page range to limit effect")
105 parser.add_argument("-r", "--rotate_page", type=int, action="append", help="rotate page of number by 90° (usable multiple times on same page!)")
106 parser.add_argument("-s", "--symmetry", action="store_true", help="alternate horizontal crops between odd and even pages")
107 parser.add_argument("-n", "--nup4", action='store_true', help="puts 4 input pages onto 1 output page, adds binding cut stencil")
108 parser.add_argument("-a", "--analyze", action="store_true", help="in --nup4, print lines identifying spine, page borders")
109 parser.add_argument("-m", "--print_margin", type=float, default=0.43, help="print margin for --nup4 in cm (default 0.43)")
110 return parser.parse_args()
112 def validate_inputs_first_pass(args):
113 for filename in args.input_file:
114 if not os.path.isfile(filename):
115 raise HandledException(f"-i: {filename} is not a file")
117 with open(filename, 'rb') as file:
118 pypdf.PdfReader(file)
119 except pypdf.errors.PdfStreamError:
120 raise HandledException(f"-i: cannot interpret {filename} as PDF file")
122 for p_string in args.page_range:
123 validate_page_range(p_string, "-p")
124 if len(args.page_range) > len(args.input_file):
125 raise HandledException("-p: more --page_range arguments than --input_file arguments")
127 for c_string in args.crops:
128 initial_split = c_string.split(':')
129 if len(initial_split) > 2:
130 raise HandledException(f"-c: cropping string has multiple ':': {c_string}")
131 page_range, crops = split_crops_string(c_string)
132 crops = crops.split(",")
134 validate_page_range(page_range, "-c")
136 raise HandledException(f"-c: cropping does not contain exactly three ',': {c_string}")
141 raise HandledException(f"-c: non-number crop in: {c_string}")
143 for r in args.rotate_page:
147 raise HandledException(f"-r: non-integer value: {r}")
149 raise HandledException(f"-r: value must not be <1: {r}")
151 float(args.print_margin)
153 raise HandledException(f"-m: non-float value: {arg.print_margin}")
155 def validate_page_range(p_string, err_msg_prefix):
156 prefix = f"{err_msg_prefix}: page range string"
157 if '-' not in p_string:
158 raise HandledException(f"{prefix} lacks '-': {p_string}")
159 tokens = p_string.split("-")
161 raise HandledException(f"{prefix} has too many '-': {p_string}")
162 for i, token in enumerate(tokens):
165 if i == 0 and token == "start":
167 if i == 1 and token == "end":
172 raise HandledException(f"{prefix} carries value neither integer, nor 'start', nor 'end': {p_string}")
174 raise HandledException(f"{prefix} carries page number <1: {p_string}")
178 start = int(tokens[0])
182 if start > 0 and end > 0 and start > end:
183 raise HandledException(f"{prefix} has higher start than end value: {p_string}")
185 def split_crops_string(c_string):
186 initial_split = c_string.split(':')
187 if len(initial_split) > 1:
188 page_range = initial_split[0]
189 crops = initial_split[1]
192 crops = initial_split[0]
193 return page_range, crops
195 def parse_page_range(range_string, pages):
197 end_page = len(pages)
199 start, end = range_string.split('-')
200 if not (len(start) == 0 or start == "start"):
201 start_page = int(start) - 1
202 if not (len(end) == 0 or end == "end"):
204 return start_page, end_page
206 def read_inputs_to_pagelist(args_input_file, args_page_range):
210 for i, input_file in enumerate(args_input_file):
211 file = open(input_file, 'rb')
212 opened_files += [file]
213 reader = pypdf.PdfReader(file)
215 if args_page_range and len(args_page_range) > i:
216 range_string = args_page_range[i]
217 start_page, end_page = parse_page_range(range_string, reader.pages)
218 if end_page > len(reader.pages): # no need to test start_page cause start_page > end_page is checked above
219 raise HandledException(f"-p: page range goes beyond pages of input file: {range_string}")
220 for old_page_num in range(start_page, end_page):
222 page = reader.pages[old_page_num]
223 pages_to_add += [page]
224 print(f"-i, -p: read in {input_file} page number {old_page_num+1} as new page {new_page_num}")
225 return pages_to_add, opened_files
227 def validate_inputs_second_pass(args, pages_to_add):
229 for c_string in args.crops:
230 page_range, _= split_crops_string(c_string)
232 start, end = parse_page_range(page_range, pages_to_add)
233 if end > len(pages_to_add):
234 raise HandledException(f"-c: page range goes beyond number of pages we're building: {page_range}")
236 for r in args.rotate_page:
237 if r > len(pages_to_add):
238 raise HandledException(f"-r: page number beyond number of pages we're building: {r}")
240 def rotate_pages(args_rotate_page, pages_to_add):
242 for rotate_page in args_rotate_page:
243 page = pages_to_add[rotate_page - 1]
244 page.add_transformation(pypdf.Transformation().translate(tx=-A4_WIDTH/2, ty=-A4_HEIGHT/2))
245 page.add_transformation(pypdf.Transformation().rotate(-90))
246 page.add_transformation(pypdf.Transformation().translate(tx=A4_WIDTH/2, ty=A4_HEIGHT/2))
247 print(f"-r: rotating (by 90°) page {rotate_page}")
249 def pad_pages_to_multiple_of_8(pages_to_add):
250 mod_to_8 = len(pages_to_add) % 8
252 old_len = len(pages_to_add)
253 for _ in range(8 - mod_to_8):
254 new_page = pypdf.PageObject.create_blank_page(width=A4_WIDTH, height=A4_HEIGHT)
255 pages_to_add += [new_page]
256 print(f"-n: number of input pages {old_len} not required multiple of 8, padded to {len(pages_to_add)}")
258 def normalize_pages_to_A4(pages_to_add):
259 for page in pages_to_add:
260 if "/Rotate" in page: # TODO: preserve rotation, but in canvas?
261 page.rotate(360 - page["/Rotate"])
262 page.mediabox.left = 0
263 page.mediabox.bottom = 0
264 page.mediabox.top = A4_HEIGHT
265 page.mediabox.right = A4_WIDTH
266 page.cropbox = page.mediabox
268 def collect_per_page_crops_and_zooms(args_crops, args_symmetry, pages_to_add):
269 crop_at_page = [PageCrop()] * len(pages_to_add)
270 zoom_at_page = [1]*len(pages_to_add)
272 for c_string in args_crops:
273 page_range, crops = split_crops_string(c_string)
274 start_page, end_page = parse_page_range(page_range, pages_to_add)
275 prefix = "-c, -t" if args_symmetry else "-c"
276 suffix = " (but alternating left and right crop between even and odd pages)" if args_symmetry else ""
277 page_crop_cm = PageCrop(*[x for x in crops.split(',')])
278 page_crop = PageCrop(*[float(x) * POINTS_PER_CM for x in page_crop_cm])
279 crop_listing = ", ".join([f"{key} {val}cm" for key, val in page_crop_cm._asdict().items()])
280 print(f"{prefix}: to pages {start_page + 1} to {end_page} applying crop: {crop_listing}{suffix}")
281 cropped_width = A4_WIDTH - page_crop.left - page_crop.right
282 cropped_height = A4_HEIGHT - page_crop.bottom - page_crop.top
284 zoom_horizontal = A4_WIDTH / (A4_WIDTH - page_crop.left - page_crop.right)
285 zoom_vertical = A4_HEIGHT / (A4_HEIGHT - page_crop.bottom - page_crop.top)
286 if (zoom_horizontal > 1 and zoom_vertical < 1) or (zoom_horizontal < 1 and zoom_vertical > 1):
287 raise HandledException("-c: crops would create opposing zoom directions")
288 elif zoom_horizontal + zoom_vertical > 2:
289 zoom = min(zoom_horizontal, zoom_vertical)
291 zoom = max(zoom_horizontal, zoom_vertical)
292 for page_num in range(start_page, end_page):
293 if args_symmetry and page_num % 2:
294 crop_at_page[page_num] = PageCrop(left=page_crop.right, right=page_crop.left, bottom=page_crop.bottom, top=page_crop.top)
296 crop_at_page[page_num] = page_crop
297 zoom_at_page[page_num] = zoom
298 return crop_at_page, zoom_at_page
300 def build_single_pages_output(writer, pages_to_add, crop_at_page, zoom_at_page):
301 print("building 1-input-page-per-output-page book")
303 for i, page in enumerate(pages_to_add):
304 zoom = zoom_at_page[i]
305 page.add_transformation(pypdf.Transformation().translate(tx=-crop_at_page[i].left, ty=-crop_at_page[i].bottom))
306 page.add_transformation(pypdf.Transformation().scale(zoom, zoom))
307 cropped_width = A4_WIDTH - crop_at_page[i].left - crop_at_page[i].right
308 cropped_height = A4_HEIGHT - crop_at_page[i].bottom - crop_at_page[i].top
309 page.mediabox.right = cropped_width * zoom
310 page.mediabox.top = cropped_height * zoom
311 writer.add_page(page)
312 odd_page = not odd_page
313 print(f"built page number {i+1} (of {len(pages_to_add)})")
315 def build_nup4_output(writer, pages_to_add, crop_at_page, zoom_at_page, args_print_margin, args_analyze, canvas_class):
316 print("-n: building 4-input-pages-per-output-page book")
317 print(f"-m: applying printable-area margin of {args_print_margin}cm")
319 print("-a: drawing page borders, spine limits")
320 printable_margin = args_print_margin * POINTS_PER_CM
321 printable_scale = (A4_WIDTH - 2 * printable_margin)/A4_WIDTH
322 spine_part_of_page = (SPINE_LIMIT / A4_HALF_WIDTH) / printable_scale
323 bonus_shrink_factor = 1 - spine_part_of_page
324 pages_to_add, new_i_order = resort_pages_for_nup4(pages_to_add)
328 for i, page in enumerate(pages_to_add):
329 if nup4_position == 0:
330 new_page = pypdf.PageObject.create_blank_page(width=A4_WIDTH, height=A4_HEIGHT)
331 corrected_i = new_i_order[i]
332 nup4_inner_page_transform(page, crop_at_page[corrected_i], zoom_at_page[corrected_i], bonus_shrink_factor, printable_margin, printable_scale, nup4_position)
333 nup4_outer_page_transform(page, bonus_shrink_factor, nup4_position)
334 new_page.merge_page(page)
336 print(f"merged page number {page_count} (of {len(pages_to_add)})")
338 if nup4_position > 3:
339 ornate_nup4(writer, args_analyze, is_front_page, new_page, printable_margin, printable_scale, bonus_shrink_factor, canvas_class)
340 writer.add_page(new_page)
342 is_front_page = not is_front_page
344 def resort_pages_for_nup4(pages_to_add):
350 for page in pages_to_add:
357 for n in PAGE_ORDER_FOR_NUP4:
358 new_i_order += [8 * n_eights + n]
359 new_page_order += [eight_pack[n]]
361 return new_page_order, new_i_order
363 def nup4_inner_page_transform(page, crop, zoom, bonus_shrink_factor, printable_margin, printable_scale, nup4_position):
364 page.add_transformation(pypdf.Transformation().translate(ty=(A4_HEIGHT / zoom - (A4_HEIGHT - crop.top))))
365 if nup4_position == 0 or nup4_position == 2:
366 page.add_transformation(pypdf.Transformation().translate(tx=-crop.left))
367 elif nup4_position == 1 or nup4_position == 3:
368 page.add_transformation(pypdf.Transformation().translate(tx=(A4_WIDTH / zoom - (A4_WIDTH - crop.right))))
369 page.add_transformation(pypdf.Transformation().scale(zoom * bonus_shrink_factor, zoom * bonus_shrink_factor))
370 if nup4_position == 2 or nup4_position == 3:
371 page.add_transformation(pypdf.Transformation().translate(ty=-2*printable_margin/printable_scale))
373 def nup4_outer_page_transform(page, bonus_shrink_factor, nup4_position):
374 page.add_transformation(pypdf.Transformation().translate(ty=(1-bonus_shrink_factor)*A4_HEIGHT))
375 if nup4_position == 0 or nup4_position == 1:
376 y_section = A4_HEIGHT
377 page.mediabox.bottom = A4_HALF_HEIGHT
378 page.mediabox.top = A4_HEIGHT
379 if nup4_position == 2 or nup4_position == 3:
381 page.mediabox.bottom = 0
382 page.mediabox.top = A4_HALF_HEIGHT
383 if nup4_position == 0 or nup4_position == 2:
385 page.mediabox.left = 0
386 page.mediabox.right = A4_HALF_WIDTH
387 if nup4_position == 1 or nup4_position == 3:
388 page.add_transformation(pypdf.Transformation().translate(tx=(1-bonus_shrink_factor)*A4_WIDTH))
390 page.mediabox.left = A4_HALF_WIDTH
391 page.mediabox.right = A4_WIDTH
392 page.add_transformation(pypdf.Transformation().translate(tx=x_section, ty=y_section))
393 page.add_transformation(pypdf.Transformation().scale(QUARTER_SCALE_FACTOR, QUARTER_SCALE_FACTOR))
395 def ornate_nup4(writer, args_analyze, is_front_page, new_page, printable_margin, printable_scale, bonus_shrink_factor, canvas_class):
398 packet = io.BytesIO()
399 c = canvas_class(packet, pagesize=A4)
401 c.line(0, A4_HEIGHT, A4_WIDTH, A4_HEIGHT)
402 c.line(0, A4_HALF_HEIGHT, A4_WIDTH, A4_HALF_HEIGHT)
403 c.line(0, 0, A4_WIDTH, 0)
404 c.line(0, A4_HEIGHT, 0, 0)
405 c.line(A4_HALF_WIDTH, A4_HEIGHT, A4_HALF_WIDTH, 0)
406 c.line(A4_WIDTH, A4_HEIGHT, A4_WIDTH, 0)
408 new_pdf = pypdf.PdfReader(packet)
409 new_page.merge_page(new_pdf.pages[0])
410 printable_offset_x = printable_margin
411 printable_offset_y = printable_margin * A4_HEIGHT / A4_WIDTH
412 new_page.add_transformation(pypdf.Transformation().scale(printable_scale, printable_scale))
413 new_page.add_transformation(pypdf.Transformation().translate(tx=printable_offset_x, ty=printable_offset_y))
414 x_left_spine_limit = A4_HALF_WIDTH * bonus_shrink_factor
415 x_right_spine_limit = A4_WIDTH - x_left_spine_limit
416 if args_analyze or is_front_page:
417 packet = io.BytesIO()
418 c = canvas_class(packet, pagesize=A4)
422 c.line(x_left_spine_limit, A4_HEIGHT, x_left_spine_limit, 0)
423 c.line(x_right_spine_limit, A4_HEIGHT, x_right_spine_limit, 0)
426 draw_cut(c, x_left_spine_limit, (1))
427 draw_cut(c, x_right_spine_limit, (-1))
428 if args_analyze or is_front_page:
430 new_pdf = pypdf.PdfReader(packet)
431 new_page.merge_page(new_pdf.pages[0])
433 def draw_cut(canvas, x_spine_limit, direction):
434 outer_start_x = x_spine_limit - 0.5 * CUT_WIDTH * direction
435 inner_start_x = x_spine_limit + 0.5 * CUT_WIDTH * direction
436 middle_point_y = A4_HALF_HEIGHT + MIDDLE_POINT_DEPTH * direction
437 end_point_y = A4_HALF_HEIGHT + CUT_DEPTH * direction
438 canvas.line(inner_start_x, A4_HALF_HEIGHT, x_spine_limit, end_point_y)
439 canvas.line(x_spine_limit, end_point_y, x_spine_limit, middle_point_y)
440 canvas.line(x_spine_limit, middle_point_y, outer_start_x, A4_HALF_HEIGHT)
444 validate_inputs_first_pass(args)
447 from reportlab.pdfgen.canvas import Canvas
449 raise HandledException("-n: need reportlab.pdfgen.canvas installed for --nup4")
450 pages_to_add, opened_files = read_inputs_to_pagelist(args.input_file, args.page_range)
451 validate_inputs_second_pass(args, pages_to_add)
452 rotate_pages(args.rotate_page, pages_to_add)
454 pad_pages_to_multiple_of_8(pages_to_add)
455 normalize_pages_to_A4(pages_to_add)
456 crop_at_page, zoom_at_page = collect_per_page_crops_and_zooms(args.crops, args.symmetry, pages_to_add)
457 writer = pypdf.PdfWriter()
459 build_nup4_output(writer, pages_to_add, crop_at_page, zoom_at_page, args.print_margin, args.analyze, Canvas)
461 build_single_pages_output(writer, pages_to_add, crop_at_page, zoom_at_page)
462 for file in opened_files:
464 with open(args.output_file, 'wb') as output_file:
465 writer.write(output_file)
468 if __name__ == "__main__":
471 except HandledException as e:
472 handled_error_exit(e)